Small collection of improved diagnostics for partials.
- In the peer info struct (exposing peer data via `/lighthouse/peers`), add a new field to indicate on which subnets the peer supports partials.
- Fix the description of several metrics.
- Downgrade some noisy logging when sending partials to trace, and downgrade one warning that should not worry the user.
Co-Authored-By: Daniel Knopik <daniel@dknopik.de>
N/A
Implement range sync in gloas.
Basically requests blocks and payloads post gloas from the same peer, couples them and sends it for processing.
Does not change sync much at all other than adding the machinery for payloads by range requests.
Main changes are:
`RangeSyncBlock` which used to be a struct is an enum to account for the Gloas case. This allows a clear separation between gloas and pre-gloas code.
`AvailableBlockData` now has a `BlockInEnvelope` variant. This is to clearly indicate the post gloas case. I feel this is simpler to follow compared to `NoData` variant.
Tries to extract post gloas logic into its own functions so that there is minimal logic change in mainnet range sync behaviour.
This is meant as a stable base on which we can iterate further to make range sync cleaner and for unblocking range sync support on devnet. Some ideas for later is removing the retry mechanism in favour of delegating column fetching to lookup sync which can be done post #9155 and batch signature verifying envelopes.
Co-Authored-By: Pawan Dhananjay <pawandhananjay@gmail.com>
Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>
Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>
The initial partial send after getBlobs only sends incomplete partial columns, causing issues as we do not propagate these columns.
Return complete and incomplete columns from `get_columns_and_mark_as_local_fetched`, and convert any complete columns to partials if necessary. Send all columns retrieved this way after getBlobs
Co-Authored-By: Daniel Knopik <daniel@dknopik.de>
Implementing gloas lookup sync is currently incompatible with the `GossipBlockProcessResult` mechanism.
Today it's implemented such that if we receive a sucessful `GossipBlockProcessResult` we directly mark the lookup as Complete and delete it. In Gloas we can't delete a lookup after block import, as we may still have FULL child awaiting the payload.
IMO this `GossipBlockProcessResult` brings a lot of headache and edge cases that we can just live without. Also the `reset_request` business is nasty and can easily leave the lookup in a bad state.
If we get rid of `GossipBlockProcessResult` we only pay the following performance penalty:
- Lookup is created exactly while the block's payload is being execution validated
- (new degradation) we download the block again
- send the block for processing but the duplicate cache prevents double execution
So in the worst case we spend a few KBs of extra download bandwidth. Remember each block is downloaded 8x times through gossip in the happy case.
Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>
Co-Authored-By: Pawan Dhananjay <pawandhananjay@gmail.com>
No. But related to #9009 and #8996
- Change the `ForkContext::next_fork_digest()` to return `[u8; 4]` (returning `[0u8; 4]` for "no next fork").
- Update the initialization path and runtime fork transition path accordingly.
Added tests:
- [x] `test_next_fork_digest` — existing test passes with non-Option return type
- [x] `test_next_fork_digest_returns_zero_when_no_next_fork` — init at last BPO fork returns `[0u8; 4]`
- [x] `test_next_fork_digest_zero_after_runtime_transition_to_last_fork` — simulates `update_current_fork` to last fork, then verifies zero
Co-Authored-By: alleysira <1367108378@qq.com>
Co-Authored-By: Alleysira <56925051+Alleysira@users.noreply.github.com>
Co-Authored-By: chonghe <44791194+chong-he@users.noreply.github.com>
N/A
Currently, we have `EnvelopeError` having a `ImportError` wrapping a `BlockError`. I feel this is extremely unintuitive because most of the envelope processing functions can simply return an `EnvelopeError` that makes sense in the function's context. It revealed further ugliness when implementing range sync in #9362
This PR does 2 main things:
1. Removes `ImportError(BlockError)` variant
2. Adds `EnvelopeError(EnvelopeError)` variant to a `BlockError`.
I feel this is more natural as there can be envelope errors when we try importing a Block but envelope errors can be contained to just envelope related errors.
The main blocker to doing this was `PayloadVerificationHandle` returning a `BlockError`. It uses a very small subset of `BlockError` which I extracted to its own error type which can be converted into both a BlockError and EnvelopeError.
This allows us to keep most of the pure envelope processing functions to just return EnvelopeErrors while we convert it to a `BlockError` only in import paths where we need to return a consolidated `BlockError`.
Co-Authored-By: Pawan Dhananjay <pawandhananjay@gmail.com>
When debugging ePBS with columns, we noticed that columns arriving before their block dont pass gossip verification checks and are dropped. This PR ensures that columns arriving before the block are sent to the reprocess queue. Once their block arrives, they are reprocessed.
This isn't an issue pre-gloas because we don't make block root checks for fulu data columns. This allows us to gossip verify the column and send it to the DA cache before the block arrives.
I think we also need to handle this edge case for partial data columns. Theres an existing TODO for that already.
Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>
- Simplification from https://github.com/sigp/lighthouse/pull/9155
Lookup sync does not cache sidecars, so sending the full network object adds unnecessary complexity. Sync only needs to know: We have received a header that has an unknown parent.
Replace `UnknownParentDataColumn` and `UnknownParentPartialDataColumn` for `UnknownParentSidecarHeader`
Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>
Co-Authored-By: Eitan Seri-Levi <eserilev@gmail.com>
- https://github.com/sigp/lighthouse/pull/9155 remove the trait abstraction for processing block / blobs / columns / payloads
As a result we would have to duplicate x3 the big match on `BlockProcessingResult` we currently have in block lookups mod.rs
This PR moves the match of `BlockProcessingResult` to `sync_methods` to reduce the diff of https://github.com/sigp/lighthouse/pull/9155. There are some subtle changes that deserve dedicated attention, and may be drowned in the bigger diff of https://github.com/sigp/lighthouse/pull/9155 otherwise:
| Unstable | This PR / #9115 |
| - | - |
| Some error conditions immediately `Drop` the lookup (no retries). For example for "internal" errors like the BeaconChainError | Retries ALL errors 4 times. I believe assuming some errors are internal is risky as dropping a lookup drops all its children potentially forcing the node to resync a lot of blocks because of an internal timeout
Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>
Addresses #9232 partially. This PR covers two topics only.
* #9232
Wires up networking test vectors for `gossip_proposer_slashing` and `gossip_attester_slashing` topics.
The tests also revealed minor spec non-compliance where invalid slashings were ignored rather than rejected.
- Refactor `process_gossip_proposer_slashing` and `process_gossip_attester_slashing` to return `MessageAcceptance`, so it can be verified in the tests
- Add `GossipValidation` test case, handler, and test entries
- Spec compliance fix: distinguish between internal errors and validation error - return `Reject` when the slashing is invalid and only penalise on invalid messages
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
Breakout from:
- https://github.com/sigp/lighthouse/pull/9295
We currently do not handle the verification of payload attestations on non-canonical side chains, we always attempt to use the head. The included regression test demonstrates this, and there is _also_ a fork choice compliance test in #9295 that triggers it.
This PR is a bit opinionated, but I'll explain my judgements:
- We need a way to get the PTC for an arbitrary slot from an arbitrary state. This involves potential state advances, database lookups, etc. There is some fiddly logic required to check that states are in range/etc.
- We _already have_ a cache with the exact same lifecycle as the PTCs, namely the attester shuffling cache. Therefore, we can de-duplicate a lot of the complexity by storing the PTCs for a given epoch (and decision block) in this cache.
The other opinionated change is in the tests. The previous tests were set up kind of nicely to avoid instantiating a `BeaconChainHarness`. However they were not using mocking, which made testing the non-canonical chain case kind of infeasible. To remedy this, I've changed them to just use a beacon chain harness and create two chains using its relatively easy to use methods for doing this. The running time of the tests goes from something like 2.6s for 8 tests to 3.3s for 9 tests, which is only an increase of 0.04s/test. Negligible. Another plus to using the `BeaconChainHarness` is that it avoids a bunch of the cruft to create synthetic non-mocked beacon chain bits.
At the same time, I've made some attempt to improve modularity (and fit with the `GossipVerificationContext`) by pulling out the guts of `with_committee_cache` into a new function (`with_cached_shuffling`) that clearly shows its dependency surface.
Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>
Peers that advertise that they have imported a block may not have the columns for that slot available post-Gloas. Ensure that we dont penalize them.
Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>
During custody backfill sync if a peer fails to serve columns for a batch don't penalize them more than once per batch
Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>
#8314 left a few ugly potentially panicking location behind - all of them believed to be unreachable, but this PR fixes them regardless for good hygiene.
Update to `ethereum_ssz 0.10.4` for two new helpers: `not_inplace` and `clone_zeroed`.
Remove remaining `expect` and `todo!` in favour of these helpers and one new fallible (but practically infallible) method.
Co-Authored-By: Daniel Knopik <daniel@dknopik.de>
In Gloas, beacon blocks are imported into fork choice immediately - the payload envelope and data columns arrive
separately. KZG commitments moved from the column sidecar into the execution payload bid, so the existing
`DataAvailabilityChecker` (which assumes block and data are coupled) can't be used for Gloas.
* Introduced `PendingPayloadCache` to keep track of payload and data columns per block root.
* Added gossip column verification
* Added support for Gloas data column reconstruction
* Payload envelope verification simplified: removed `MaybeAvailableEnvelope`, `ExecutedEnvelope`, `EnvelopeImportData`
Not yet implemented (tracked with TODOs):
- Proper lookup sync for Gloas columns arriving before blocks
- Partial column merging for Gloas
- Moving `load_gloas_payload_bid` disk reads off the async runtime
- Backfill/range sync for Gloas
Based on @eserilev's PR and work in progress. See also #9202 for verification.
Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>
Co-Authored-By: Eitan Seri- Levi <eserilev@gmail.com>
Co-Authored-By: Daniel Knopik <daniel@dknopik.de>
Co-Authored-By: Daniel Knopik <107140945+dknopik@users.noreply.github.com>
Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
We got in a little bit of trouble in devnet-3. After gossip verifying an envelope and before importing it, we got the following error
```
May 07 08:04:24.383 WARN Execution payload envelope rejected reason: "EnvelopeProcessingError(WithdrawalsRootMismatch { state: 0x852d38aaecc9f4e2e309919f74020c7bbcf050fea4a20edf3304f171e44ee9d5, payload:
```
The envelope had already passed gossip verification checks and was correctly propagated to the network, we should not penalize peers for doing this. This caused our node to isolate itself from the rest of the network. This PR removes peer penalties for any envelope that passes gossip validation
Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>
We have a legacy `TestRandom` trait which generates random types for testing and fuzzing.
This function overlaps with `arbitrary` which is used very commonly in the ecosystem.
Remove `TestRandom` and generate random type instances using `Arbitrary`.
Co-Authored-By: Mac L <mjladson@pm.me>
Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
Allow for the vc to submit its proposer preferences to the network
Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
Payloads from the reprocess queue should be gossiped after import if they are still timely. In devnets this happens frequently since there are many cases where the envelope arrives before the block
Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>
Store gossip-verified `PayloadAttestationMessage`s in the operation pool and pack them into the block body at during block production.
Built on top of #9145.
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
N/A
Adds lints for rust 1.95. Mostly cosmetic.
1. .zip(a.into_iter()) -> .zip(a) . Also a few more places where into_iter is not required
2. replace sort_by with sort_by_key
3. move if statements inside match block.
4. use checked_div instead of if statements. I think this is debatable in terms of being better, happy to remove it if others also feel its unnecessary
Co-Authored-By: Pawan Dhananjay <pawandhananjay@gmail.com>
Closes#8949
Implements peer penalties and REJECT/IGNORE message propagation for `SignedExecutionPayloadEnvelope` gossip handling, completing follow-up work from #8806.
Feedback on the error classification would be appreciated.
### Key Implementation Details
- Maps all 15 `EnvelopeError` variants to REJECT/IGNORE based on [Gloas p2p spec](https://github.com/ethereum/consensus-specs/blob/master/specs/gloas/p2p-interface.md#execution_payload)
- Follows `ExecutionPayloadError` handling pattern from block gossip (`penalize_peer()` method)
- Uses explicit variant matching (rather than catch-all `_`) for type safety
- Applies `LowToleranceError` penalty for protocol violations (invalid signatures, mismatches, etc.)
- Ignores without penalty for spec-defined cases (unknown block root, prior to finalization) and internal errors
Co-Authored-By: 0u-Y <yyw1000@naver.com>
Co-Authored-By: Eitan Seri-Levi <eserilev@gmail.com>
Gossip verify and cache bids and proposer preferences. This PR also ensures we subscribe to new fork topics one epoch early instead of two slots early. This is required for proposer preferences.
Co-Authored-By: Eitan Seri- Levi <eserilev@gmail.com>
#9077
Where possible replaces all instances of `validator_monitor::timestamp_now` with `chain.slot_clock.now_duration().unwrap_or_default()`.
Where chain/slot_clock is not available, instead replace it with a convenience function `slot_clock::timestamp_now`.
Remove the `validator_monitor::timestamp_now` function.
Co-Authored-By: Mac L <mjladson@pm.me>