- Generalize the custody-by-root request to accept a `Vec<Hash256>` of block roots so a whole batch is fetched in one request; `DataColumnsByRootRequestParams { block_roots, indices }` completes at `block_roots.len() * indices.len()`.
- `custody_lookup_request` takes `block_roots: &[Hash256]` + `block_epoch`; `cached_data_column_indexes` takes `block_epoch`.
- `block_sidecar_coupling.rs`: couple a batch's blocks with the columns returned by the single custody-by-root request; drop the per-column request map and retry/faulty-peer machinery.
- Remove the now-unused columns-by-range range-sync path and the serialize-downloads gate.
Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>
This PR fixes two issues:
1. This condition is inverted: dfb259171a/beacon_node/network/src/network_beacon_processor/gossip_methods.rs (L1507-L1508)
We are supposed to filter out incomplete columns when we DON'T have local blobs yet!
2. When the EL returns no blobs, we never store a partial in the assembler, and this code fails to publish our need to the network, as no partials are returned: dfb259171a/beacon_node/network/src/network_beacon_processor/mod.rs (L1038-L1050)
The simple fix for 1 would be to invert the condition, but we can improve the flow here: Instead of not publishing anything, we can publish what we got, but not request anything. This ties into the fix for 2: After get blobs completes, we not only publish anything in the partial assembler, but also for every missing custody column in there, publish an empty column and a request for all cells.
In particular:
- When sending a partial message to `network`, allow specifying a request bitmap instead of hardcoding an all-ones bitmap.
- For clarity and to prepare for Gloas integration, add a `PubsubPartialMessage` enum with a `DataColumnFulu` variant.
- On republishing after merging a gossip column: always publish, but only request cells if local blobs are known or get blobs is disabled. This also prepares us to request only *some* cells, e.g. in cases where we are aware of the blobs that the EL is going to send us, e.g. via `engine_hasBlobs`.
- Move guards in `fetch_engine_blobs_and_publish` to ensure everything works fine if there are no blobs or if get_blobs is disabled.
Co-Authored-By: Daniel Knopik <daniel@dknopik.de>
Fix a bug in fork choice whereby the unrealized justified and finalized checkpoints from a parent block would be incorrectly carried over to a child block in the same epoch. The documented optimisation in `fork_choice.rs` was wrong because it failed to account for slashings.
This bug is not considered to be sensitive due to the difficulty of triggering it, and the low payoff for doing so (fleeting divergence).
Keep the optimisation, updating it to correctly skip reusing the parent checkpoints when slashings are present.
A more minimal alternative would be to scrap the optimisation altogether (always compute the checkpoints), however this would come with a minor performance downside. I think the updated optimisation is simple enough to be worth retaining.
There are 3 regression tests added which confirm the correct behaviour. Temporarily setting `has_slashings` to `false` causes all 3 tests to fail.
Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>
Handle payload-present attestations before the payload is seen (gloas)
A gloas beacon_attestation with index == 1 claims a past block's payload is already present. If we haven't seen that block's payload envelope yet, we shouldn't reject it the envelope may just be in flight.
So instead we IGNORE it (new AttnError::UnknownPayloadEnvelope), ask sync to fetch the envelope, and park the attestation in the reprocess queue. When the envelope is imported, the parked attestations are released and re-verified.
The envelope lookup itself is stubbed here and wired up in #9155 or a follow up PR
Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>
Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>
Use the `LruCache` implementation provided by `hashlink` instead of the current `lru` one.
This is mostly a 1-to-1 swap with only slight API incompatibilities.
I have decided to leave some config files which previously used `NonZeroUsize` but they may not be required anymore and could potentially switch to `usize`.
Co-Authored-By: Mac L <mjladson@pm.me>
Small collection of improved diagnostics for partials.
- In the peer info struct (exposing peer data via `/lighthouse/peers`), add a new field to indicate on which subnets the peer supports partials.
- Fix the description of several metrics.
- Downgrade some noisy logging when sending partials to trace, and downgrade one warning that should not worry the user.
Co-Authored-By: Daniel Knopik <daniel@dknopik.de>
N/A
Implement range sync in gloas.
Basically requests blocks and payloads post gloas from the same peer, couples them and sends it for processing.
Does not change sync much at all other than adding the machinery for payloads by range requests.
Main changes are:
`RangeSyncBlock` which used to be a struct is an enum to account for the Gloas case. This allows a clear separation between gloas and pre-gloas code.
`AvailableBlockData` now has a `BlockInEnvelope` variant. This is to clearly indicate the post gloas case. I feel this is simpler to follow compared to `NoData` variant.
Tries to extract post gloas logic into its own functions so that there is minimal logic change in mainnet range sync behaviour.
This is meant as a stable base on which we can iterate further to make range sync cleaner and for unblocking range sync support on devnet. Some ideas for later is removing the retry mechanism in favour of delegating column fetching to lookup sync which can be done post #9155 and batch signature verifying envelopes.
Co-Authored-By: Pawan Dhananjay <pawandhananjay@gmail.com>
Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>
Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>
The initial partial send after getBlobs only sends incomplete partial columns, causing issues as we do not propagate these columns.
Return complete and incomplete columns from `get_columns_and_mark_as_local_fetched`, and convert any complete columns to partials if necessary. Send all columns retrieved this way after getBlobs
Co-Authored-By: Daniel Knopik <daniel@dknopik.de>
N/A
Currently, we have `EnvelopeError` having a `ImportError` wrapping a `BlockError`. I feel this is extremely unintuitive because most of the envelope processing functions can simply return an `EnvelopeError` that makes sense in the function's context. It revealed further ugliness when implementing range sync in #9362
This PR does 2 main things:
1. Removes `ImportError(BlockError)` variant
2. Adds `EnvelopeError(EnvelopeError)` variant to a `BlockError`.
I feel this is more natural as there can be envelope errors when we try importing a Block but envelope errors can be contained to just envelope related errors.
The main blocker to doing this was `PayloadVerificationHandle` returning a `BlockError`. It uses a very small subset of `BlockError` which I extracted to its own error type which can be converted into both a BlockError and EnvelopeError.
This allows us to keep most of the pure envelope processing functions to just return EnvelopeErrors while we convert it to a `BlockError` only in import paths where we need to return a consolidated `BlockError`.
Co-Authored-By: Pawan Dhananjay <pawandhananjay@gmail.com>
- https://github.com/sigp/lighthouse/pull/9155 remove the trait abstraction for processing block / blobs / columns / payloads
As a result we would have to duplicate x3 the big match on `BlockProcessingResult` we currently have in block lookups mod.rs
This PR moves the match of `BlockProcessingResult` to `sync_methods` to reduce the diff of https://github.com/sigp/lighthouse/pull/9155. There are some subtle changes that deserve dedicated attention, and may be drowned in the bigger diff of https://github.com/sigp/lighthouse/pull/9155 otherwise:
| Unstable | This PR / #9115 |
| - | - |
| Some error conditions immediately `Drop` the lookup (no retries). For example for "internal" errors like the BeaconChainError | Retries ALL errors 4 times. I believe assuming some errors are internal is risky as dropping a lookup drops all its children potentially forcing the node to resync a lot of blocks because of an internal timeout
Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>
On Glamsterdam devnets we started seeing Lighthouse nodes unable to start with errors like:
> May 26 04:34:01.582 CRIT Failed to start beacon node reason: "Unable to load fork choice from disk: ForkChoiceError(ProtoArrayStringError(\"find_head failed: InvalidBestNode(InvalidBestNodeInfo { current_slot: Slot(23550), start_root: 0x2c70b1641c29ec46360c99f9a8512f077862cbbc603e16f4a423007d210b0c5f, justified_checkpoint: Checkpoint { epoch: Epoch(712), root: 0x2c70b1641c29ec46360c99f9a8512f077862cbbc603e16f4a423007d210b0c5f }, finalized_checkpoint: Checkpoint { epoch: Epoch(710), root: 0xede5e0b09b51bdb5445ade3398e685bd193b845e0b0ffb827f0c3fec8277ea51 }, head_root: 0x2c70b1641c29ec46360c99f9a8512f077862cbbc603e16f4a423007d210b0c5f, head_justified_checkpoint: Checkpoint { epoch: Epoch(710), root: 0xede5e0b09b51bdb5445ade3398e685bd193b845e0b0ffb827f0c3fec8277ea51 }, head_finalized_checkpoint: Checkpoint { epoch: Epoch(709), root: 0xbb243eff616ff362c52b83113e7c536d0a68cb9ca3d6a1cb1055e732219d9736 } })\"))"
This error was the result of an overly-strict sanity check, based on assumptions that are not true under extreme network conditions.
Completely remove the `InvalidBestNode` failure path: it is not compliant with the spec, and is actively harmful when triggered (it prevents Lighthouse from starting at all). The error was reachable in any situation where all leaf nodes of fork choice were ineligible to be the head. The payload invalidation tests show some examples of cases where this would happen, and the [newly-added regression test](9a5df1d982) shows a contrived case where it can happen on a Gloas network without _any_ slashings or invalid blocks. There are probably many more cases where it can happen.
We do not lose anything by removing it. The spec's implementation of `get_head` _always_ returns something (unless it crashes), and in these cases it is correct to return the starting node of the traversal: the justified checkpoint block. This is what we now do, and what the new test verifies.
I've also added some facilities to the harness for injecting attestations with fixed `payload_present` fields. @hopinheimer found himself needing something similar when messing with reorg tests, so I think these are probably useful. It might be possible to do without them by juggling the payload reveal timing in just the right way, but I think this approach is just way simpler.
Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
Breakout from:
- https://github.com/sigp/lighthouse/pull/9295
We currently do not handle the verification of payload attestations on non-canonical side chains, we always attempt to use the head. The included regression test demonstrates this, and there is _also_ a fork choice compliance test in #9295 that triggers it.
This PR is a bit opinionated, but I'll explain my judgements:
- We need a way to get the PTC for an arbitrary slot from an arbitrary state. This involves potential state advances, database lookups, etc. There is some fiddly logic required to check that states are in range/etc.
- We _already have_ a cache with the exact same lifecycle as the PTCs, namely the attester shuffling cache. Therefore, we can de-duplicate a lot of the complexity by storing the PTCs for a given epoch (and decision block) in this cache.
The other opinionated change is in the tests. The previous tests were set up kind of nicely to avoid instantiating a `BeaconChainHarness`. However they were not using mocking, which made testing the non-canonical chain case kind of infeasible. To remedy this, I've changed them to just use a beacon chain harness and create two chains using its relatively easy to use methods for doing this. The running time of the tests goes from something like 2.6s for 8 tests to 3.3s for 9 tests, which is only an increase of 0.04s/test. Negligible. Another plus to using the `BeaconChainHarness` is that it avoids a bunch of the cruft to create synthetic non-mocked beacon chain bits.
At the same time, I've made some attempt to improve modularity (and fit with the `GossipVerificationContext`) by pulling out the guts of `with_committee_cache` into a new function (`with_cached_shuffling`) that clearly shows its dependency surface.
Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>
#8314 left a few ugly potentially panicking location behind - all of them believed to be unreachable, but this PR fixes them regardless for good hygiene.
Update to `ethereum_ssz 0.10.4` for two new helpers: `not_inplace` and `clone_zeroed`.
Remove remaining `expect` and `todo!` in favour of these helpers and one new fallible (but practically infallible) method.
Co-Authored-By: Daniel Knopik <daniel@dknopik.de>
In Gloas, beacon blocks are imported into fork choice immediately - the payload envelope and data columns arrive
separately. KZG commitments moved from the column sidecar into the execution payload bid, so the existing
`DataAvailabilityChecker` (which assumes block and data are coupled) can't be used for Gloas.
* Introduced `PendingPayloadCache` to keep track of payload and data columns per block root.
* Added gossip column verification
* Added support for Gloas data column reconstruction
* Payload envelope verification simplified: removed `MaybeAvailableEnvelope`, `ExecutedEnvelope`, `EnvelopeImportData`
Not yet implemented (tracked with TODOs):
- Proper lookup sync for Gloas columns arriving before blocks
- Partial column merging for Gloas
- Moving `load_gloas_payload_bid` disk reads off the async runtime
- Backfill/range sync for Gloas
Based on @eserilev's PR and work in progress. See also #9202 for verification.
Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>
Co-Authored-By: Eitan Seri- Levi <eserilev@gmail.com>
Co-Authored-By: Daniel Knopik <daniel@dknopik.de>
Co-Authored-By: Daniel Knopik <107140945+dknopik@users.noreply.github.com>
Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
Using the `ignore-ws-check` doesn't actually let you start up a node thats outside the weak subjectivity period
Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>
Test helpers `add_attested_block_at_slot` and `add_attested_blocks_at_slot` accepted `state_root` argument which was computed before applying the block.
Co-Authored-By: hopinheimer <knmanas6@gmail.com>
We have a legacy `TestRandom` trait which generates random types for testing and fuzzing.
This function overlaps with `arbitrary` which is used very commonly in the ecosystem.
Remove `TestRandom` and generate random type instances using `Arbitrary`.
Co-Authored-By: Mac L <mjladson@pm.me>
Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
Allow for the vc to submit its proposer preferences to the network
Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
We yolo'd to alpha 7. We're just changing the proposer preference to include dependent root, instead of checkpoint root. This way we can actually construct it within the VC without needing a view of fork choice.
Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>
Closes:
- https://github.com/sigp/lighthouse/issues/8689
- Calculate the proposer index on the canonical chain (from canonical head) at `slot` and plumb it through to fork choice so it can be used to determine whether or not to apply the proposer boost. We use the proposer cache to handle state advances and avoid duplicate work.
- Update our FC tests to use `block.message().proposer_index()` (always pass), we are not attempting to test this feature in those tests. The EF tests use the correct canonical proposer idnex via `on_block`, except for invalid blocks which just auto-pass this check (these blocks get rejected by other checks in `on_block` anyway).
Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
Store gossip-verified `PayloadAttestationMessage`s in the operation pool and pack them into the block body at during block production.
Built on top of #9145.
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
For gloas `attestation.data.index` should be set to 1 if we are attesting to a block whose slot is not the attestation duty slot and slot payload_status is `FULL`
Co-Authored-By: Eitan Seri- Levi <eserilev@gmail.com>
Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>
Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>