This PR fixes two issues:
1. This condition is inverted: dfb259171a/beacon_node/network/src/network_beacon_processor/gossip_methods.rs (L1507-L1508)
We are supposed to filter out incomplete columns when we DON'T have local blobs yet!
2. When the EL returns no blobs, we never store a partial in the assembler, and this code fails to publish our need to the network, as no partials are returned: dfb259171a/beacon_node/network/src/network_beacon_processor/mod.rs (L1038-L1050)
The simple fix for 1 would be to invert the condition, but we can improve the flow here: Instead of not publishing anything, we can publish what we got, but not request anything. This ties into the fix for 2: After get blobs completes, we not only publish anything in the partial assembler, but also for every missing custody column in there, publish an empty column and a request for all cells.
In particular:
- When sending a partial message to `network`, allow specifying a request bitmap instead of hardcoding an all-ones bitmap.
- For clarity and to prepare for Gloas integration, add a `PubsubPartialMessage` enum with a `DataColumnFulu` variant.
- On republishing after merging a gossip column: always publish, but only request cells if local blobs are known or get blobs is disabled. This also prepares us to request only *some* cells, e.g. in cases where we are aware of the blobs that the EL is going to send us, e.g. via `engine_hasBlobs`.
- Move guards in `fetch_engine_blobs_and_publish` to ensure everything works fine if there are no blobs or if get_blobs is disabled.
Co-Authored-By: Daniel Knopik <daniel@dknopik.de>
Handle payload-present attestations before the payload is seen (gloas)
A gloas beacon_attestation with index == 1 claims a past block's payload is already present. If we haven't seen that block's payload envelope yet, we shouldn't reject it the envelope may just be in flight.
So instead we IGNORE it (new AttnError::UnknownPayloadEnvelope), ask sync to fetch the envelope, and park the attestation in the reprocess queue. When the envelope is imported, the parked attestations are released and re-verified.
The envelope lookup itself is stubbed here and wired up in #9155 or a follow up PR
Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>
Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>
Use the `LruCache` implementation provided by `hashlink` instead of the current `lru` one.
This is mostly a 1-to-1 swap with only slight API incompatibilities.
I have decided to leave some config files which previously used `NonZeroUsize` but they may not be required anymore and could potentially switch to `usize`.
Co-Authored-By: Mac L <mjladson@pm.me>
In Gloas, beacon blocks are imported into fork choice immediately - the payload envelope and data columns arrive
separately. KZG commitments moved from the column sidecar into the execution payload bid, so the existing
`DataAvailabilityChecker` (which assumes block and data are coupled) can't be used for Gloas.
* Introduced `PendingPayloadCache` to keep track of payload and data columns per block root.
* Added gossip column verification
* Added support for Gloas data column reconstruction
* Payload envelope verification simplified: removed `MaybeAvailableEnvelope`, `ExecutedEnvelope`, `EnvelopeImportData`
Not yet implemented (tracked with TODOs):
- Proper lookup sync for Gloas columns arriving before blocks
- Partial column merging for Gloas
- Moving `load_gloas_payload_bid` disk reads off the async runtime
- Backfill/range sync for Gloas
Based on @eserilev's PR and work in progress. See also #9202 for verification.
Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>
Co-Authored-By: Eitan Seri- Levi <eserilev@gmail.com>
Co-Authored-By: Daniel Knopik <daniel@dknopik.de>
Co-Authored-By: Daniel Knopik <107140945+dknopik@users.noreply.github.com>
Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
#9077
Where possible replaces all instances of `validator_monitor::timestamp_now` with `chain.slot_clock.now_duration().unwrap_or_default()`.
Where chain/slot_clock is not available, instead replace it with a convenience function `slot_clock::timestamp_now`.
Remove the `validator_monitor::timestamp_now` function.
Co-Authored-By: Mac L <mjladson@pm.me>
https://github.com/sigp/lighthouse/issues/8012
Replace all instances of `VariableList::from` and `FixedVector::from` to their `try_from` variants.
While I tried to use proper error handling in most cases, there were certain situations where adding an `expect` for situations where `try_from` can trivially never fail avoided adding a lot of extra complexity.
Co-Authored-By: Mac L <mjladson@pm.me>
Co-Authored-By: Michael Sproul <michaelsproul@users.noreply.github.com>
Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
#7815
- removes all existing spans, so some span fields that appear in logs like `service_name` may be lost.
- instruments a few key code paths in the beacon node, starting from **root spans** named below:
* Gossip block and blobs
* `process_gossip_data_column_sidecar`
* `process_gossip_blob`
* `process_gossip_block`
* Rpc block and blobs
* `process_rpc_block`
* `process_rpc_blobs`
* `process_rpc_custody_columns`
* Rpc blocks (range and backfill)
* `process_chain_segment`
* `PendingComponents` lifecycle
* `pending_components`
To test locally:
* Run Grafana and Tempo with https://github.com/sigp/lighthouse-metrics/pull/57
* Run Lighthouse BN with `--telemetry-collector-url http://localhost:4317`
Some captured traces can be found here: https://hackmd.io/@jimmygchen/r1sLOxPPeg
Removing the old spans seem to have reduced the memory usage quite a lot - i think we were using them on long running tasks and too excessively:
<img width="910" height="495" alt="image" src="https://github.com/user-attachments/assets/5208bbe4-53b2-4ead-bc71-0b782c788669" />
This PR fixes a bug where wrong columns could get processed immediately after a CGC increase.
Scenario:
- The node's CGC increased due to additional validators attached to it (lets say from 10 to 11)
- The new CGC is advertised and new subnets are subscribed immediately, however the change won't be effective in the data availability check until the next epoch (See [this](ab0e8870b4/beacon_node/beacon_chain/src/validator_custody.rs (L93-L99))). Data availability checker still only require 10 columns for the current epoch.
- During this time, data columns for the additional custody column (lets say column 11) may arrive via gossip as we're already subscribed to the topic, and it may be incorrectly used to satisfy the existing data availability requirement (10 columns), and result in this additional column (instead of a required one) getting persisted, resulting in database inconsistency.
#7700
As described in title, the EL already performs KZG verification on all blobs when they entered the mempool, so it's redundant to perform extra validation on blobs returned from the EL.
This PR removes
- KZG verification for both blobs and data columns during block production
- KZG verification for data columns after fetch engine blobs call. I have not done this for blobs because it requires extra changes to check the observed cache, and doesn't feel like it's a worthy optimisation given the number of blobs per block.
This PR does not remove KZG verification on the block publishing path yet.
Getting this error on a non-PeerDAS network:
```
May 29 13:30:13.484 ERROR Error fetching or processing blobs from EL error: BlobProcessingError(AvailabilityCheck(Unexpected("empty blobs"))), block_root: 0x98aa3927056d453614fefbc79eb1f9865666d1f119d0e8aa9e6f4d02aa9395d9
```
It appears we're passing an empty `Vec` to DA checker, because all blobs were already seen on gossip and filtered out, this causes a `AvailabilityCheckError::Unexpected("empty blobs")`.
I've added equivalent unit tests for `getBlobsV1` to cover all the scenarios we test in `getBlobsV2`. This would have caught the bug if I had added it earlier. It also caught another bug which could trigger duplicate block import.
Thanks Santito for reporting this! 🙏
Addresses a regression recently introduced when we started gossip verifying data columns from EL blobs
```
failures:
network_beacon_processor::tests::accept_processed_gossip_data_columns_without_import
test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 90 filtered out; finished in 16.60s
stderr ───
thread 'network_beacon_processor::tests::accept_processed_gossip_data_columns_without_import' panicked at beacon_node/network/src/network_beacon_processor/tests.rs:829:10:
should put data columns into availability cache: Unexpected("empty columns")
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
```
https://github.com/sigp/lighthouse/actions/runs/15309278812/job/43082341868?pr=7521
If an empty `Vec` is passed to the DA checker, it causes an unexpected error.
This PR addresses it by not passing an empty `Vec` for processing, and not spawning a task to publish.
#7461 and partly #6439.
Desired behaviour after receiving `engine_getBlobs` response:
1. Gossip verify the blobs and proofs, but don't mark them as observed yet. This is because not all blobs are published immediately (due to staggered publishing). If we mark them as observed and not publish them, we could end up blocking the gossip propagation.
2. Blobs are marked as observed _either_ when:
* They are received from gossip and forwarded to the network .
* They are published by the node.
Current behaviour:
- ❗ We only gossip verify `engine_getBlobsV1` responses, but not `engine_getBlobsV2` responses (PeerDAS).
- ❗ After importing EL blobs AND before they're published, if the same blobs arrive via gossip, they will get re-processed, which may result in a re-import.
1. Perform gossip verification on data columns computed from EL `getBlobsV2` response. We currently only do this for `getBlobsV1` to prevent importing blobs with invalid proofs into the `DataAvailabilityChecker`, this should be done on V2 responses too.
2. Add additional gossip verification to make sure we don't re-process a ~~blob~~ or data column that was imported via the EL `getBlobs` but not yet "seen" on the gossip network. If an "unobserved" gossip blob is found in the availability cache, then we know it has passed verification so we can immediately propagate the `ACCEPT` result and forward it to the network, but without re-processing it.
**UPDATE:** I've left blobs out for the second change mentioned above, as the likelihood and impact is very slow and we haven't seen it enough, but under PeerDAS this issue is a regular occurrence and we do see the same block getting imported many times.
Update `engine_getBlobsV2` response type to `Option<Vec<BlobsAndProofV2>>`. See recent spec change [here](https://github.com/ethereum/execution-apis/pull/630).
Added some tests to cover basic fetch blob scenarios.