Lighthouse currently requires checkpoint sync to be performed against a supernode in a PeerDAS network, as only supernodes can serve blobs.
This PR lifts that requirement, enabling Lighthouse to checkpoint sync from either a fullnode or a supernode (See https://github.com/sigp/lighthouse/issues/6837#issuecomment-2933094923)
Missing data columns for the checkpoint block isn't a big issue, but we should be able to easily implement backfill once we have the logic to backfill data columns.
Getting this error on a non-PeerDAS network:
```
May 29 13:30:13.484 ERROR Error fetching or processing blobs from EL error: BlobProcessingError(AvailabilityCheck(Unexpected("empty blobs"))), block_root: 0x98aa3927056d453614fefbc79eb1f9865666d1f119d0e8aa9e6f4d02aa9395d9
```
It appears we're passing an empty `Vec` to DA checker, because all blobs were already seen on gossip and filtered out, this causes a `AvailabilityCheckError::Unexpected("empty blobs")`.
I've added equivalent unit tests for `getBlobsV1` to cover all the scenarios we test in `getBlobsV2`. This would have caught the bug if I had added it earlier. It also caught another bug which could trigger duplicate block import.
Thanks Santito for reporting this! 🙏
Addresses a regression recently introduced when we started gossip verifying data columns from EL blobs
```
failures:
network_beacon_processor::tests::accept_processed_gossip_data_columns_without_import
test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 90 filtered out; finished in 16.60s
stderr ───
thread 'network_beacon_processor::tests::accept_processed_gossip_data_columns_without_import' panicked at beacon_node/network/src/network_beacon_processor/tests.rs:829:10:
should put data columns into availability cache: Unexpected("empty columns")
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
```
https://github.com/sigp/lighthouse/actions/runs/15309278812/job/43082341868?pr=7521
If an empty `Vec` is passed to the DA checker, it causes an unexpected error.
This PR addresses it by not passing an empty `Vec` for processing, and not spawning a task to publish.
#7461 and partly #6439.
Desired behaviour after receiving `engine_getBlobs` response:
1. Gossip verify the blobs and proofs, but don't mark them as observed yet. This is because not all blobs are published immediately (due to staggered publishing). If we mark them as observed and not publish them, we could end up blocking the gossip propagation.
2. Blobs are marked as observed _either_ when:
* They are received from gossip and forwarded to the network .
* They are published by the node.
Current behaviour:
- ❗ We only gossip verify `engine_getBlobsV1` responses, but not `engine_getBlobsV2` responses (PeerDAS).
- ❗ After importing EL blobs AND before they're published, if the same blobs arrive via gossip, they will get re-processed, which may result in a re-import.
1. Perform gossip verification on data columns computed from EL `getBlobsV2` response. We currently only do this for `getBlobsV1` to prevent importing blobs with invalid proofs into the `DataAvailabilityChecker`, this should be done on V2 responses too.
2. Add additional gossip verification to make sure we don't re-process a ~~blob~~ or data column that was imported via the EL `getBlobs` but not yet "seen" on the gossip network. If an "unobserved" gossip blob is found in the availability cache, then we know it has passed verification so we can immediately propagate the `ACCEPT` result and forward it to the network, but without re-processing it.
**UPDATE:** I've left blobs out for the second change mentioned above, as the likelihood and impact is very slow and we haven't seen it enough, but under PeerDAS this issue is a regular occurrence and we do see the same block getting imported many times.
Update `engine_getBlobsV2` response type to `Option<Vec<BlobsAndProofV2>>`. See recent spec change [here](https://github.com/ethereum/execution-apis/pull/630).
Added some tests to cover basic fetch blob scenarios.
Use slice.is_sorted which was stabilised in Rust 1.82.0
I thought there would be more places we could use this, but it seems we often want to check strict monotonicity (i.e. sorted + no duplicates)
Closes#5016
The op pool was using the wrong denominator when calculating proposer block rewards! This was mostly inconsequential as our studies of Lighthouse's block profitability already showed that it is very close to optimal. The wrong denominator was leftover from phase0 code, and wasn't properly updated for Altair.
Fix clippy lints for `rustc` 1.87
clippy complains about `BeaconChainError` being too large. I went on a bit of a boxing spree because of this. We may instead want to `Box` some of the `BeaconChainError` variants?
Workaround/fix for:
- https://github.com/sigp/lighthouse/issues/7323
- Remove the `StateSummariesNotContiguousError`. This allows us to continue with DAG construction and pruning, even in the case where the DAG is disjointed. We will treat any disjoint summaries as roots of their own tree, and prune them (as they are not descended from finalized). This should be safe, as canonical summaries should not be disjoint (if they are, then the DB is already corrupt).
This PR adds transitions to Electra ~~and Fulu~~ fork epochs in the simulator tests.
~~It also covers blob inclusion verification and data column syncing on a full node in Fulu.~~
UPDATE: Remove fulu fork from sim tests due to https://github.com/sigp/lighthouse/pull/7199#issuecomment-2852281176
When we perform data column gossip verification, we sometimes see multiple proposer shuffling cache miss simultaneously and this results in multiple threads computing the shuffling cache and potentially slows down the gossip verification.
Proposal here is to use a `OnceCell` for each shuffling key to make sure it's only computed once. I have only implemented this in data column verification as a PoC, but this can also be applied to blob and block verification
Related issues:
- https://github.com/sigp/lighthouse/issues/4447
- https://github.com/sigp/lighthouse/issues/7203
Did not find a specific issue beside https://github.com/sigp/lighthouse/issues/6821
Leverage `whistleblower_reward_quotient_for_state` to have accurate post-electra `proposer_slashings` and `attester_slashings` fields returned by `/eth/v1/beacon/rewards/blocks/<id>`.
Downgrade light client errors to debug
Error messages are alarming and usually indicate somethings wrong with the beacon node. The Light Client service is supposed to minimally impact users, and most will not care if the light client server is erroring. Furthermore, the only errors we've seen in the wild are during hard forks, for the first few epochs before the fork finalizes.
#6296: Deterministic RNG in peer DAS publish block tests
Made test functions to call publish-block APIs with true for the deterministic RNG boolean parameter while production code with false. This will deterministically shuffle columns for unit tests under broadcast_validation_tests.rs.
The head tracker is a persisted piece of state that must be kept in sync with the fork-choice. It has been a source of pruning issues in the past, so we want to remove it
- see https://github.com/sigp/lighthouse/issues/1785
When implementing tree-states in the hot DB we have to change the pruning routine (more details below) so we want to do those changes first in isolation.
- see https://github.com/sigp/lighthouse/issues/6580
- If you want to see the full feature of tree-states hot https://github.com/dapplion/lighthouse/pull/39
Closes https://github.com/sigp/lighthouse/issues/1785
**Current DB migration routine**
- Locate abandoned heads with head tracker
- Use a roots iterator to collect the ancestors of those heads can be pruned
- Delete those abandoned blocks / states
- Migrate the newly finalized chain to the freezer
In summary, it computes what it has to delete and keeps the rest. Then it migrates data to the freezer. If the abandoned forks routine has a bug it can break the freezer migration.
**Proposed migration routine (this PR)**
- Migrate the newly finalized chain to the freezer
- Load all state summaries from disk
- From those, just knowing the head and finalized block compute two sets: (1) descendants of finalized (2) newly finalized chain
- Iterate all summaries, if a summary does not belong to set (1) or (2), delete
This strategy is more sound as it just checks what's there in the hot DB, computes what it has to keep and deletes the rest. Because it does not rely and 3rd pieces of data we can drop the head tracker and pruning checkpoint. Since the DB migration happens **first** now, as long as the computation of the sets to keep is correct we won't have pruning issues.
N/A
Return state.eth1_data() early if we have passed the transition period post electra. Even if we don't return early, the function would still return state.eth1_data() based on the current conditions. However, doing this explicitly here to match the spec. This covers setting the right eth1_data in our block.
The other thing we need to ensure is that the deposits returned by the eth1_chain is empty post transition.
The only way we get non-empty deposits post the transition is if `state.eth1_deposit_index` in the below code is less than `min(deposit_requests_start_index, state.eth1_data().deposit_count)`.
0850bcfb89/beacon_node/beacon_chain/src/eth1_chain.rs (L543-L579)
This can never happen because state.eth1_deposit_index will be equal to state.eth1_data.deposit count and cannot exceed the value.
@michaelsproul @ethDreamer please double check the logic for deposits being empty post transition. Following the logic in the spec makes my head hurt.
From testing conducted by Sunnyside Labs, they noticed that the "expected blobs" are quite low on bandwidth constrained nodes. This observation revealed that we don't record the `beacon_blobs_from_el_expected_total` metric at all if the EL doesn't return any response. The fetch blobs function returns without recording the metric.
To fix this, I've moved `BLOBS_FROM_EL_EXPECTED_TOTAL` and `BLOBS_FROM_EL_RECEIVED_TOTAL` to as early as possible, to make the metric more accurate.
This is a workaround for #7216
In the case of gaps between the in-memory pub key cache and its on-disk representation, use the head state on startup to "top-up" the cache/db w/ any missing validators
- Part of https://github.com/sigp/lighthouse/issues/6767
Validator custody makes the CGC and set of sampling columns dynamic. Right now this information is stored twice:
- in the data availability checker
- in the network globals
If that state becomes dynamic we must make sure it is in sync updating it twice, or guarding it behind a mutex. However, I noted that we don't really have to keep the CGC inside the data availability checker. All consumers can actually read it from the network globals, and we can update `make_available` to read the expected count of data columns from the block.
Backport of:
- https://github.com/sigp/lighthouse/pull/7067
For:
- https://github.com/sigp/lighthouse/issues/7039
- Prevent writing to state cache when migrating the database
- Add `state-cache-headroom` flag to control pruning
- Prune old epoch boundary states ahead of mid-epoch states
- Never prune head block's state
- Avoid caching ancestor states unless they are on an epoch boundary
- Log when states enter/exit the cache
Co-authored-by: Eitan Seri-Levi <eserilev@ucsc.edu>
- #6452 (partially)
Remove dependencies on `store` and `lighthouse_network` from `eth2`. This was achieved as follows:
- depend on `enr` and `multiaddr` directly instead of using `lighthouse_network`'s reexports.
- make `lighthouse_network` responsible for converting between API and internal types.
- in two cases, remove complex internal types and use the generic `serde_json::Value` instead - this is not ideal, but should be fine for now, as this affects two internal non-spec endpoints which are meant for debugging, unstable, and subject to change without notice anyway. Inspired by #6679. The alternative is to move all relevant types to `eth2` or `types` instead - what do you think?
NA
Bumps the `ethereum_ssz` version, along with other crates that share the dep.
Primarily, this give us bitfields which can store 128 bytes on the stack before allocating, rather than 32 bytes (https://github.com/sigp/ethereum_ssz/pull/38). The validator count has increase massively since we set it at 32 bytes, so aggregation bitfields (et al) now require a heap allocation. This new value of 128 should get us to ~2m active validators.
This change makes the `total_difficulty` field in `ExecutionBlock` an `Option<Uint256>` since newer clients are no longer including the `totalDifficulty` field.
I think this will fix https://github.com/sigp/lighthouse/issues/6937 but I was actually more focused on the builder registration case described below.
In our [builder-playground](https://github.com/flashbots/builder-playground) we setup a local devnet using lighthouse, reth, and mev-boost-relay. After upgrading to reth 1.2.0 and lighthouse v7.0.0.beta.0 for Pectra, we noticed that the validator registration process was _sometimes_ failing with:
```
Feb 25 23:35:25.038 ERRO Unable to publish proposer preparation to all beacon nodes, error: Some endpoints failed, num_failed: 1 http://localhost:3500/ => RequestFailed(ServerMessage(ErrorMessage { code: 400, message: "BAD_REQUEST: error updating proposer preparations: ForkchoiceUpdate(EngineError(Api { error: Json(Error(\"missing field `totalDifficulty`\", line: 0, column: 0)) }))", stacktraces: [] })), service: preparation
Feb 25 23:35:25.099 WARN Unable to publish validator registrations to the builder network, error: Some endpoints failed, num_failed: 1 http://localhost:3500/ => RequestFailed(ServerMessage(ErrorMessage { code: 400, message: "BAD_REQUEST: error updating proposer preparations: ForkchoiceUpdate(EngineError(Api { error: Json(Error(\"missing field `totalDifficulty`\", line: 0, column: 0)) }))", stacktraces: [] })), service: preparation
```
What was even more confusing, was that it was sometimes working, which actually led to a wild goose chase thinking it was a networking issue. However, when tracing through the LH code, I came across this comment:
70194dfc6a/beacon_node/beacon_chain/src/beacon_chain.rs (L6048-L6049)
This explained why it sometimes worked, in our playground we run lighthouse with `--prepare-payload-lookahead 8000` thus there was always a 4-second window where the call wasn't made.
But, if the call was made, then this code would 100% fail with updated reth:
https://github.com/sigp/lighthouse/blob/unstable/beacon_node/execution_layer/src/lib.rs#L1688-L1692
Which would then mapped to a `Error::ForkchoiceUpdate` in `update_execution_engine_forkchoice`.
Anyways, the fix was to make `total_difficulty` Optional, and then to update any code paths where it was used. In doing so, I assume that if the EL doesn't include total difficulty then the chain is already post-merge.
PeerDAS has undergone multiple refactors + the blending with the get_blobs optimization has generated technical debt.
A function signature like this
f008b84079/beacon_node/beacon_chain/src/beacon_chain.rs (L7171-L7178)
Allows at least the following combination of states:
- blobs: Some / None
- data_columns: Some / None
- data_column_recv: Some / None
- Block has data? Yes / No
- Block post-PeerDAS? Yes / No
In reality, we don't have that many possible states, only:
- `NoData`: pre-deneb, pre-PeerDAS with 0 blobs or post-PeerDAS with 0 blobs
- `Blobs(BlobSidecarList<E>)`: post-Deneb pre-PeerDAS with > 0 blobs
- `DataColumns(DataColumnSidecarList<E>)`: post-PeerDAS with > 0 blobs
- `DataColumnsRecv(oneshot::Receiver<DataColumnSidecarList<E>>)`: post-PeerDAS with > 0 blobs, but we obtained the columns via reconstruction
^ this are the variants of the new `AvailableBlockData` enum
So we go from 2^5 states to 4 well-defined. Downstream code benefits nicely from this clarity and I think it makes the whole feature much more maintainable.
Currently `is_available` returns a bool, and then we construct the available block in `make_available`. In a way the availability condition is duplicated in both functions. Instead, this PR constructs `AvailableBlockData` in `is_available` so the availability conditions are written once
```rust
if let Some(block_data) = is_available(..) {
let available_block = make_available(block_data);
}
```