The proposer preferences service was attempting to publish preferences at the start of each epoch. This caused it to race with the validator duties service, it wouldn't calculate validator duties in time for the proposer preference service.
This PR first updates the validator duties service to calculate proposer duties for the current epoch and the next epoch. After Fulu we have the ability to look ahead one epoch for proposer duties, but we never updated the vc to leverage this feature.
This PR also updates the proposer preferences service to fire at every slot. We have an `(Epoch, DependentRoot)` map that prevents us from publishing the same preferences twice.
The changes here should prevent the race condition between the two services and make the proposer preferences service more robust in general.
Co-Authored-By: Eitan Seri- Levi <eserilev@gmail.com>
Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>
Co-Authored-By: Michael Sproul <michaelsproul@users.noreply.github.com>
This PR fixes two issues:
1. This condition is inverted: dfb259171a/beacon_node/network/src/network_beacon_processor/gossip_methods.rs (L1507-L1508)
We are supposed to filter out incomplete columns when we DON'T have local blobs yet!
2. When the EL returns no blobs, we never store a partial in the assembler, and this code fails to publish our need to the network, as no partials are returned: dfb259171a/beacon_node/network/src/network_beacon_processor/mod.rs (L1038-L1050)
The simple fix for 1 would be to invert the condition, but we can improve the flow here: Instead of not publishing anything, we can publish what we got, but not request anything. This ties into the fix for 2: After get blobs completes, we not only publish anything in the partial assembler, but also for every missing custody column in there, publish an empty column and a request for all cells.
In particular:
- When sending a partial message to `network`, allow specifying a request bitmap instead of hardcoding an all-ones bitmap.
- For clarity and to prepare for Gloas integration, add a `PubsubPartialMessage` enum with a `DataColumnFulu` variant.
- On republishing after merging a gossip column: always publish, but only request cells if local blobs are known or get blobs is disabled. This also prepares us to request only *some* cells, e.g. in cases where we are aware of the blobs that the EL is going to send us, e.g. via `engine_hasBlobs`.
- Move guards in `fetch_engine_blobs_and_publish` to ensure everything works fine if there are no blobs or if get_blobs is disabled.
Co-Authored-By: Daniel Knopik <daniel@dknopik.de>
- Move info about pre-8.0 schema migrations to the historical migrations section.
- Include info about v8.1.x and v8.2.x.
Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
Refactors the payload attestation service
- Returns a `Result`. we need this for #9434 so we can keep track if we've succeeded with producing payload attestation earlier than the deadline
- Separates getting payload attestation data and signing + publishing. This mimics what we do in attestation service and also is needed for #9434 to surface the error while still keeping the same spawning mechanism. In #9434 we want to broadcast payload attestations early if we've already seen an avail envelope. If the SSE event fires, but for some reason getting the payload attestation data from the BN fails, we still want to retry at the deadline. If signing + publishing fails we wont retry at the deadline (similar to the attestation service).
Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>
Fix a bug in fork choice whereby the unrealized justified and finalized checkpoints from a parent block would be incorrectly carried over to a child block in the same epoch. The documented optimisation in `fork_choice.rs` was wrong because it failed to account for slashings.
This bug is not considered to be sensitive due to the difficulty of triggering it, and the low payoff for doing so (fleeting divergence).
Keep the optimisation, updating it to correctly skip reusing the parent checkpoints when slashings are present.
A more minimal alternative would be to scrap the optimisation altogether (always compute the checkpoints), however this would come with a minor performance downside. I think the updated optimisation is simple enough to be worth retaining.
There are 3 regression tests added which confirm the correct behaviour. Temporarily setting `has_slashings` to `false` causes all 3 tests to fail.
Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>
Handle payload-present attestations before the payload is seen (gloas)
A gloas beacon_attestation with index == 1 claims a past block's payload is already present. If we haven't seen that block's payload envelope yet, we shouldn't reject it the envelope may just be in flight.
So instead we IGNORE it (new AttnError::UnknownPayloadEnvelope), ask sync to fetch the envelope, and park the attestation in the reprocess queue. When the envelope is imported, the parked attestations are released and re-verified.
The envelope lookup itself is stubbed here and wired up in #9155 or a follow up PR
Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>
Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>
Similar to https://github.com/sigp/lighthouse/pull/9476, partial columns is a feature expected to become the default soon, so instead of introducing a CLI option that will be removed again soon, consolidate into `--enable-partial-columns` which now takes a boolean argument.
Co-Authored-By: Daniel Knopik <daniel@dknopik.de>
Single block lookups do not respect the `earliest_available_slot` peers sent. This causes us to potentially request columns from peers that do not custody columns yet (but will soon).
Pass down the block's slot and only consider peers where `earliest_available_slot <= block_slot` for custody column requests.
Co-Authored-By: Daniel Knopik <daniel@dknopik.de>
Use the `LruCache` implementation provided by `hashlink` instead of the current `lru` one.
This is mostly a 1-to-1 swap with only slight API incompatibilities.
I have decided to leave some config files which previously used `NonZeroUsize` but they may not be required anymore and could potentially switch to `usize`.
Co-Authored-By: Mac L <mjladson@pm.me>
Bump `discv5` to the latest release. This removes the duplicated `hashlink` dependency and also removes `ahash`.
Co-Authored-By: Mac L <mjladson@pm.me>
There's a recent breaking change that deprecated dashboard-based snapshot enablement, and is breaking our CI:
https://www.warpbuild.com/docs/ci/changelog/2026-june#june-8-2026
> Legacy dashboard-based snapshot enablement has been removed. Runners previously configured with snapshots enabled from the dashboard will no longer use snapshots. To continue using snapshot runners, migrate to the label-based configuration by adding snapshot.enabled=true or snapshot.key=<alias> to your runs-on labels. See the [Snapshot Runners docs](https://www.warpbuild.com/docs/ci/features/snapshot-runners) for migration details.
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
Small collection of improved diagnostics for partials.
- In the peer info struct (exposing peer data via `/lighthouse/peers`), add a new field to indicate on which subnets the peer supports partials.
- Fix the description of several metrics.
- Downgrade some noisy logging when sending partials to trace, and downgrade one warning that should not worry the user.
Co-Authored-By: Daniel Knopik <daniel@dknopik.de>
N/A
Implement range sync in gloas.
Basically requests blocks and payloads post gloas from the same peer, couples them and sends it for processing.
Does not change sync much at all other than adding the machinery for payloads by range requests.
Main changes are:
`RangeSyncBlock` which used to be a struct is an enum to account for the Gloas case. This allows a clear separation between gloas and pre-gloas code.
`AvailableBlockData` now has a `BlockInEnvelope` variant. This is to clearly indicate the post gloas case. I feel this is simpler to follow compared to `NoData` variant.
Tries to extract post gloas logic into its own functions so that there is minimal logic change in mainnet range sync behaviour.
This is meant as a stable base on which we can iterate further to make range sync cleaner and for unblocking range sync support on devnet. Some ideas for later is removing the retry mechanism in favour of delegating column fetching to lookup sync which can be done post #9155 and batch signature verifying envelopes.
Co-Authored-By: Pawan Dhananjay <pawandhananjay@gmail.com>
Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>
Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>
The initial partial send after getBlobs only sends incomplete partial columns, causing issues as we do not propagate these columns.
Return complete and incomplete columns from `get_columns_and_mark_as_local_fetched`, and convert any complete columns to partials if necessary. Send all columns retrieved this way after getBlobs
Co-Authored-By: Daniel Knopik <daniel@dknopik.de>
Implementing gloas lookup sync is currently incompatible with the `GossipBlockProcessResult` mechanism.
Today it's implemented such that if we receive a sucessful `GossipBlockProcessResult` we directly mark the lookup as Complete and delete it. In Gloas we can't delete a lookup after block import, as we may still have FULL child awaiting the payload.
IMO this `GossipBlockProcessResult` brings a lot of headache and edge cases that we can just live without. Also the `reset_request` business is nasty and can easily leave the lookup in a bad state.
If we get rid of `GossipBlockProcessResult` we only pay the following performance penalty:
- Lookup is created exactly while the block's payload is being execution validated
- (new degradation) we download the block again
- send the block for processing but the duplicate cache prevents double execution
So in the worst case we spend a few KBs of extra download bandwidth. Remember each block is downloaded 8x times through gossip in the happy case.
Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>
Co-Authored-By: Pawan Dhananjay <pawandhananjay@gmail.com>
No. But related to #9009 and #8996
- Change the `ForkContext::next_fork_digest()` to return `[u8; 4]` (returning `[0u8; 4]` for "no next fork").
- Update the initialization path and runtime fork transition path accordingly.
Added tests:
- [x] `test_next_fork_digest` — existing test passes with non-Option return type
- [x] `test_next_fork_digest_returns_zero_when_no_next_fork` — init at last BPO fork returns `[0u8; 4]`
- [x] `test_next_fork_digest_zero_after_runtime_transition_to_last_fork` — simulates `update_current_fork` to last fork, then verifies zero
Co-Authored-By: alleysira <1367108378@qq.com>
Co-Authored-By: Alleysira <56925051+Alleysira@users.noreply.github.com>
Co-Authored-By: chonghe <44791194+chong-he@users.noreply.github.com>
N/A
Currently, we have `EnvelopeError` having a `ImportError` wrapping a `BlockError`. I feel this is extremely unintuitive because most of the envelope processing functions can simply return an `EnvelopeError` that makes sense in the function's context. It revealed further ugliness when implementing range sync in #9362
This PR does 2 main things:
1. Removes `ImportError(BlockError)` variant
2. Adds `EnvelopeError(EnvelopeError)` variant to a `BlockError`.
I feel this is more natural as there can be envelope errors when we try importing a Block but envelope errors can be contained to just envelope related errors.
The main blocker to doing this was `PayloadVerificationHandle` returning a `BlockError`. It uses a very small subset of `BlockError` which I extracted to its own error type which can be converted into both a BlockError and EnvelopeError.
This allows us to keep most of the pure envelope processing functions to just return EnvelopeErrors while we convert it to a `BlockError` only in import paths where we need to return a consolidated `BlockError`.
Co-Authored-By: Pawan Dhananjay <pawandhananjay@gmail.com>
When debugging ePBS with columns, we noticed that columns arriving before their block dont pass gossip verification checks and are dropped. This PR ensures that columns arriving before the block are sent to the reprocess queue. Once their block arrives, they are reprocessed.
This isn't an issue pre-gloas because we don't make block root checks for fulu data columns. This allows us to gossip verify the column and send it to the DA cache before the block arrives.
I think we also need to handle this edge case for partial data columns. Theres an existing TODO for that already.
Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>
- Simplification from https://github.com/sigp/lighthouse/pull/9155
Lookup sync does not cache sidecars, so sending the full network object adds unnecessary complexity. Sync only needs to know: We have received a header that has an unknown parent.
Replace `UnknownParentDataColumn` and `UnknownParentPartialDataColumn` for `UnknownParentSidecarHeader`
Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>
Co-Authored-By: Eitan Seri-Levi <eserilev@gmail.com>
- https://github.com/sigp/lighthouse/pull/9155 remove the trait abstraction for processing block / blobs / columns / payloads
As a result we would have to duplicate x3 the big match on `BlockProcessingResult` we currently have in block lookups mod.rs
This PR moves the match of `BlockProcessingResult` to `sync_methods` to reduce the diff of https://github.com/sigp/lighthouse/pull/9155. There are some subtle changes that deserve dedicated attention, and may be drowned in the bigger diff of https://github.com/sigp/lighthouse/pull/9155 otherwise:
| Unstable | This PR / #9115 |
| - | - |
| Some error conditions immediately `Drop` the lookup (no retries). For example for "internal" errors like the BeaconChainError | Retries ALL errors 4 times. I believe assuming some errors are internal is risky as dropping a lookup drops all its children potentially forcing the node to resync a lot of blocks because of an internal timeout
Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>
On Glamsterdam devnets we started seeing Lighthouse nodes unable to start with errors like:
> May 26 04:34:01.582 CRIT Failed to start beacon node reason: "Unable to load fork choice from disk: ForkChoiceError(ProtoArrayStringError(\"find_head failed: InvalidBestNode(InvalidBestNodeInfo { current_slot: Slot(23550), start_root: 0x2c70b1641c29ec46360c99f9a8512f077862cbbc603e16f4a423007d210b0c5f, justified_checkpoint: Checkpoint { epoch: Epoch(712), root: 0x2c70b1641c29ec46360c99f9a8512f077862cbbc603e16f4a423007d210b0c5f }, finalized_checkpoint: Checkpoint { epoch: Epoch(710), root: 0xede5e0b09b51bdb5445ade3398e685bd193b845e0b0ffb827f0c3fec8277ea51 }, head_root: 0x2c70b1641c29ec46360c99f9a8512f077862cbbc603e16f4a423007d210b0c5f, head_justified_checkpoint: Checkpoint { epoch: Epoch(710), root: 0xede5e0b09b51bdb5445ade3398e685bd193b845e0b0ffb827f0c3fec8277ea51 }, head_finalized_checkpoint: Checkpoint { epoch: Epoch(709), root: 0xbb243eff616ff362c52b83113e7c536d0a68cb9ca3d6a1cb1055e732219d9736 } })\"))"
This error was the result of an overly-strict sanity check, based on assumptions that are not true under extreme network conditions.
Completely remove the `InvalidBestNode` failure path: it is not compliant with the spec, and is actively harmful when triggered (it prevents Lighthouse from starting at all). The error was reachable in any situation where all leaf nodes of fork choice were ineligible to be the head. The payload invalidation tests show some examples of cases where this would happen, and the [newly-added regression test](9a5df1d982) shows a contrived case where it can happen on a Gloas network without _any_ slashings or invalid blocks. There are probably many more cases where it can happen.
We do not lose anything by removing it. The spec's implementation of `get_head` _always_ returns something (unless it crashes), and in these cases it is correct to return the starting node of the traversal: the justified checkpoint block. This is what we now do, and what the new test verifies.
I've also added some facilities to the harness for injecting attestations with fixed `payload_present` fields. @hopinheimer found himself needing something similar when messing with reorg tests, so I think these are probably useful. It might be possible to do without them by juggling the payload reveal timing in just the right way, but I think this approach is just way simpler.
Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
Addresses #9232 partially. This PR covers two topics only.
* #9232
Wires up networking test vectors for `gossip_proposer_slashing` and `gossip_attester_slashing` topics.
The tests also revealed minor spec non-compliance where invalid slashings were ignored rather than rejected.
- Refactor `process_gossip_proposer_slashing` and `process_gossip_attester_slashing` to return `MessageAcceptance`, so it can be verified in the tests
- Add `GossipValidation` test case, handler, and test entries
- Spec compliance fix: distinguish between internal errors and validation error - return `Reject` when the slashing is invalid and only penalise on invalid messages
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>