Commit Graph

1467 Commits

Author SHA1 Message Date
Michael Sproul
c2c7fb87a8 Make DAG construction more permissive (#7460)
Workaround/fix for:

- https://github.com/sigp/lighthouse/issues/7323


  - Remove the `StateSummariesNotContiguousError`. This allows us to continue with DAG construction and pruning, even in the case where the DAG is disjointed. We will treat any disjoint summaries as roots of their own tree, and prune them (as they are not descended from finalized). This should be safe, as canonical summaries should not be disjoint (if they are, then the DB is already corrupt).
2025-05-15 02:15:35 +00:00
Eitan Seri-Levi
807848bc7a Next sync committee branch bug (#7443)
#7441


  Make sure we're correctly caching light client data
2025-05-13 01:13:15 +00:00
SunnysidedJ
593390162f peerdas-devnet-7: update DataColumnSidecarsByRoot request to use DataColumnsByRootIdentifier (#7399)
Update DataColumnSidecarsByRoot request to use DataColumnsByRootIdentifier #7377


  As described in https://github.com/ethereum/consensus-specs/pull/4284
2025-05-12 00:20:55 +00:00
Jimmy Chen
4b9c16fc71 Add Electra forks to basic sim tests (#7199)
This PR adds transitions to Electra ~~and Fulu~~ fork epochs in the simulator tests.

~~It also covers blob inclusion verification and data column syncing on a full node in Fulu.~~

UPDATE: Remove fulu fork from sim tests due to https://github.com/sigp/lighthouse/pull/7199#issuecomment-2852281176
2025-05-08 08:43:44 +00:00
Jimmy Chen
93ec9df137 Compute proposer shuffling only once in gossip verification (#7304)
When we perform data column gossip verification, we sometimes see multiple proposer shuffling cache miss simultaneously and this results in multiple threads computing the shuffling cache and potentially slows down the gossip verification.

Proposal here is to use a `OnceCell` for each shuffling key to make sure it's only computed once. I have only implemented this in data column verification as a PoC, but this can also be applied to blob and block verification

Related issues:
- https://github.com/sigp/lighthouse/issues/4447
- https://github.com/sigp/lighthouse/issues/7203
2025-05-01 01:30:42 +00:00
Eitan Seri-Levi
9779b4ba2c Optimize validate_data_columns (#7326) 2025-04-30 04:36:50 +00:00
Michael Sproul
e61e92b926 Merge remote-tracking branch 'origin/stable' into unstable 2025-04-22 18:55:06 +10:00
Jean-Baptiste Pinalie
5352d5f78a Update proposer_slashings and attester_slashings amounts for electra. (#7316)
Did not find a specific issue beside https://github.com/sigp/lighthouse/issues/6821


  Leverage `whistleblower_reward_quotient_for_state` to have accurate post-electra `proposer_slashings` and `attester_slashings` fields returned by `/eth/v1/beacon/rewards/blocks/<id>`.
2025-04-17 00:58:36 +00:00
Lion - dapplion
be68dd24d0 Fix wrong custody column count for lookup blocks (#7281)
Fixes
- https://github.com/sigp/lighthouse/issues/7278


  Don't assume 0 columns for `RpcBlockInner::Block`
2025-04-11 22:00:57 +00:00
Mac L
39eb8145f8 Merge branch 'release-v7.0.0' into unstable 2025-04-11 21:32:24 +10:00
Eitan Seri-Levi
aed562abef Downgrade light client errors (#7300)
Downgrade light client errors to debug

Error messages are alarming and usually indicate somethings wrong with the beacon node. The Light Client service is supposed to minimally impact users, and most will not care if the light client server is erroring. Furthermore, the only errors we've seen in the wild are during hard forks, for the first few epochs before the fork finalizes.
2025-04-10 02:17:07 +00:00
SunnysidedJ
d96b73152e Fix for #6296: Deterministic RNG in peer DAS publish block tests (#7192)
#6296: Deterministic RNG in peer DAS publish block tests


  Made test functions to call publish-block APIs with true for the deterministic RNG boolean parameter while production code with false. This will deterministically shuffle columns for unit tests under broadcast_validation_tests.rs.
2025-04-09 15:35:15 +00:00
Jimmy Chen
759b0612b3 Offloading KZG Proof Computation from the beacon node (#7117)
Addresses #7108

- Add EL integration for `getPayloadV5` and `getBlobsV2`
- Offload proof computation and use proofs from EL RPC APIs
2025-04-08 07:37:16 +00:00
Lion - dapplion
70850fe58d Drop head tracker for summaries DAG (#6744)
The head tracker is a persisted piece of state that must be kept in sync with the fork-choice. It has been a source of pruning issues in the past, so we want to remove it
- see https://github.com/sigp/lighthouse/issues/1785

When implementing tree-states in the hot DB we have to change the pruning routine (more details below) so we want to do those changes first in isolation.
- see https://github.com/sigp/lighthouse/issues/6580
- If you want to see the full feature of tree-states hot https://github.com/dapplion/lighthouse/pull/39

Closes https://github.com/sigp/lighthouse/issues/1785


  **Current DB migration routine**

- Locate abandoned heads with head tracker
- Use a roots iterator to collect the ancestors of those heads can be pruned
- Delete those abandoned blocks / states
- Migrate the newly finalized chain to the freezer

In summary, it computes what it has to delete and keeps the rest. Then it migrates data to the freezer. If the abandoned forks routine has a bug it can break the freezer migration.

**Proposed migration routine (this PR)**

- Migrate the newly finalized chain to the freezer
- Load all state summaries from disk
- From those, just knowing the head and finalized block compute two sets: (1) descendants of finalized (2) newly finalized chain
- Iterate all summaries, if a summary does not belong to set (1) or (2), delete

This strategy is more sound as it just checks what's there in the hot DB, computes what it has to keep and deletes the rest. Because it does not rely and 3rd pieces of data we can drop the head tracker and pruning checkpoint. Since the DB migration happens **first** now, as long as the computation of the sets to keep is correct we won't have pruning issues.
2025-04-07 04:23:52 +00:00
Pawan Dhananjay
091e292c99 Return eth1_data early post transition (#7248)
N/A


  Return state.eth1_data() early if we have passed the transition period post electra. Even if we don't return early, the function would still return state.eth1_data() based on the current conditions. However, doing this explicitly here to match the spec. This covers setting the right eth1_data in our block.

The other thing we need to ensure is that the deposits returned by the eth1_chain is empty post transition.

The only way we get non-empty deposits post the transition is if `state.eth1_deposit_index` in the below code is less than `min(deposit_requests_start_index, state.eth1_data().deposit_count)`.
0850bcfb89/beacon_node/beacon_chain/src/eth1_chain.rs (L543-L579)

This can never happen because state.eth1_deposit_index will be equal to state.eth1_data.deposit count and cannot exceed the value.

@michaelsproul @ethDreamer please double check the logic for deposits being empty post transition. Following the logic in the spec makes my head hurt.
2025-04-07 03:16:48 +00:00
Lion - dapplion
d511ca0494 Compute roots for unfinalized by_range requests with fork-choice (#7098)
Includes PRs

- https://github.com/sigp/lighthouse/pull/7058
- https://github.com/sigp/lighthouse/pull/7066

Cleaner for the `release-v7.0.0` branch
2025-04-07 03:16:41 +00:00
Jimmy Chen
6a75f24ab1 Fix the getBlobs metric and ensure it is recorded promptly to prevent miscounts (#7188)
From testing conducted by Sunnyside Labs, they noticed that the "expected blobs" are quite low on bandwidth constrained nodes. This observation revealed that we don't record the `beacon_blobs_from_el_expected_total` metric at all if the EL doesn't return any response. The fetch blobs function returns without recording the metric.

To fix this, I've moved `BLOBS_FROM_EL_EXPECTED_TOTAL` and `BLOBS_FROM_EL_RECEIVED_TOTAL` to as early as possible, to make the metric more accurate.
2025-04-04 09:01:39 +00:00
Mac L
0e6da0fcaf Merge branch 'release-v7.0.0' into v7-backmerge 2025-04-04 13:32:58 +11:00
Mac L
82d1674455 Rust 1.86.0 lints (#7254)
Implement lints for the new Rust compiler version 1.86.0.
2025-04-04 02:30:22 +00:00
Michael Sproul
bde0f1ef0b Merge remote-tracking branch 'origin/release-v7.0.0' into unstable 2025-03-29 13:01:58 +11:00
Eitan Seri-Levi
a5ea05ce2a Top-up pubkey cache on startup (#7217)
This is a workaround for #7216


  In the case of gaps between the in-memory pub key cache and its on-disk representation, use the head state on startup to "top-up" the cache/db w/ any missing validators
2025-03-28 08:29:19 +00:00
Lion - dapplion
6f31d44343 Remove CGC from data_availability checker (#7033)
- Part of https://github.com/sigp/lighthouse/issues/6767

Validator custody makes the CGC and set of sampling columns dynamic. Right now this information is stored twice:
- in the data availability checker
- in the network globals

If that state becomes dynamic we must make sure it is in sync updating it twice, or guarding it behind a mutex. However, I noted that we don't really have to keep the CGC inside the data availability checker. All consumers can actually read it from the network globals, and we can update `make_available` to read the expected count of data columns from the block.
2025-03-26 05:19:51 +00:00
Eitan Seri-Levi
cbf1c04a14 resolve merge conflicts between untstable and release-v7.0.0 2025-03-23 11:09:02 -06:00
Eitan Seri-Levi
e4c9805438 Reject attestations to blocks prior to the split (#7084) 2025-03-19 13:39:28 +00:00
Eitan Seri-Levi
ed1b7689ae Manual compaction endpoint backport (#7104)
Backports:
- https://github.com/sigp/lighthouse/pull/7072

To:
- https://github.com/sigp/lighthouse/issues/7039

#7103 should be merged first


  This PR introduces an endpoint that allows users to manually trigger background compaction.
2025-03-18 06:29:12 +00:00
Eitan Seri-Levi
27aabe8159 Pseudo finalization endpoint (#7103)
This is a backport of:
-  https://github.com/sigp/lighthouse/pull/7059
- https://github.com/sigp/lighthouse/pull/7071

For:
- https://github.com/sigp/lighthouse/issues/7039


  Introduce a new lighthouse endpoint that allows a user to force a pseudo finalization. This migrates data to the freezer db and prunes sidechains which may help reduce disk space issues on non finalized networks like Holesky

We also ban peers that send us blocks that conflict with the manually finalized checkpoint.

There were some CI fixes in https://github.com/sigp/lighthouse/pull/7071 that I tried including here

Co-authored with: @jimmygchen  @pawanjay176 @michaelsproul
2025-03-18 05:21:05 +00:00
Michael Sproul
4de062626b State cache tweaks (#7095)
Backport of:

- https://github.com/sigp/lighthouse/pull/7067

For:

- https://github.com/sigp/lighthouse/issues/7039


  - Prevent writing to state cache when migrating the database
- Add `state-cache-headroom` flag to control pruning
- Prune old epoch boundary states ahead of mid-epoch states
- Never prune head block's state
- Avoid caching ancestor states unless they are on an epoch boundary
- Log when states enter/exit the cache

Co-authored-by: Eitan Seri-Levi <eserilev@ucsc.edu>
2025-03-18 02:10:21 +00:00
Eitan Seri-Levi
8ce9edc584 Add block ban flag --invalid-block-roots (#7042) 2025-03-17 13:18:22 +00:00
Daniel Knopik
574b204bdb decouple eth2 from store and lighthouse_network (#6680)
- #6452 (partially)


  Remove dependencies on `store` and `lighthouse_network`  from `eth2`. This was achieved as follows:

- depend on `enr` and `multiaddr` directly instead of using `lighthouse_network`'s reexports.
- make `lighthouse_network` responsible for converting between API and internal types.
- in two cases, remove complex internal types and use the generic `serde_json::Value` instead - this is not ideal, but should be fine for now, as this affects two internal non-spec endpoints which are meant for debugging, unstable, and subject to change without notice anyway. Inspired by #6679. The alternative is to move all relevant types to `eth2` or `types` instead - what do you think?
2025-03-14 16:44:48 +00:00
ThreeHrSleep
d60c24ef1c Integrate tracing (#6339)
Tracing Integration
- [reference](5bbf1859e9/projects/project-ideas.md (L297))


  - [x] replace slog & log with tracing throughout the codebase
- [x] implement custom crit log
- [x] make relevant changes in the formatter
- [x] replace sloggers
- [x] re-write SSE logging components

cc: @macladson @eserilev
2025-03-12 22:31:05 +00:00
Paul Hauner
8d1abce26e Bump SSZ version for larger bitfield SmallVec (#6915)
NA


  Bumps the `ethereum_ssz` version, along with other crates that share the dep.

Primarily, this give us bitfields which can store 128 bytes on the stack before allocating, rather than 32 bytes (https://github.com/sigp/ethereum_ssz/pull/38). The validator count has increase massively since we set it at 32 bytes, so aggregation bitfields (et al) now require a heap allocation. This new value of 128 should get us to ~2m active validators.
2025-03-10 08:18:33 +00:00
Michael Sproul
b4e79edf2a Merge remote-tracking branch 'origin/release-v7.0.0' into unstable 2025-03-10 15:21:24 +11:00
Jimmy Chen
09849e841b Use sync_tolerance_epochs flag to control the proposer prep routines (#7044)
Replace the `2 + 2 == 5` hacks from `holesky-rescue` and use the existing `sync_tolerance_epochs` flag to control the proposer prep routines.
2025-03-06 03:50:42 +00:00
Ryan Schneider
efa6ba37bb Make ExecutionBlock::total_difficulty Optional (#7050)
This change makes the `total_difficulty` field in `ExecutionBlock` an `Option<Uint256>` since newer clients are no longer including the `totalDifficulty` field.

I think this will fix https://github.com/sigp/lighthouse/issues/6937 but I was actually more focused on the builder registration case described below.

In our [builder-playground](https://github.com/flashbots/builder-playground) we setup a local devnet using lighthouse, reth, and mev-boost-relay.  After upgrading to reth 1.2.0 and lighthouse v7.0.0.beta.0 for Pectra, we noticed that the validator registration process was _sometimes_ failing with:

```
Feb 25 23:35:25.038 ERRO Unable to publish proposer preparation to all beacon nodes, error: Some endpoints failed, num_failed: 1 http://localhost:3500/ => RequestFailed(ServerMessage(ErrorMessage { code: 400, message: "BAD_REQUEST: error updating proposer preparations: ForkchoiceUpdate(EngineError(Api { error: Json(Error(\"missing field `totalDifficulty`\", line: 0, column: 0)) }))", stacktraces: [] })), service: preparation
Feb 25 23:35:25.099 WARN Unable to publish validator registrations to the builder network, error: Some endpoints failed, num_failed: 1 http://localhost:3500/ => RequestFailed(ServerMessage(ErrorMessage { code: 400, message: "BAD_REQUEST: error updating proposer preparations: ForkchoiceUpdate(EngineError(Api { error: Json(Error(\"missing field `totalDifficulty`\", line: 0, column: 0)) }))", stacktraces: [] })), service: preparation
```

What was even more confusing, was that it was sometimes working, which actually led to a wild goose chase thinking it was a networking issue.  However, when tracing through the LH code, I came across this comment:

70194dfc6a/beacon_node/beacon_chain/src/beacon_chain.rs (L6048-L6049)

This explained why it sometimes worked, in our playground we run lighthouse with `--prepare-payload-lookahead 8000` thus there was always a 4-second window where the call wasn't made.

But, if the call was made, then this code would 100% fail with updated reth:

https://github.com/sigp/lighthouse/blob/unstable/beacon_node/execution_layer/src/lib.rs#L1688-L1692

Which would then mapped to a `Error::ForkchoiceUpdate` in `update_execution_engine_forkchoice`.


  Anyways, the fix was to make `total_difficulty` Optional, and then to update any code paths where it was used.  In doing so, I assume that if the EL doesn't include total difficulty then the chain is already post-merge.
2025-03-05 01:53:00 +00:00
Michael Sproul
cf4104abe5 Merge remote-tracking branch 'origin/release-v7.0.0' into unstable 2025-02-25 08:42:16 +11:00
Krishang Shah
6e11bddd4b feat: adds CLI flags to delay publishing for edge case testing on PeerDAS devnets (#6947)
Closes #6919
2025-02-24 06:03:17 +00:00
Lion - dapplion
3fab6a2c0b Block availability data enum (#6866)
PeerDAS has undergone multiple refactors + the blending with the get_blobs optimization has generated technical debt.

A function signature like this

f008b84079/beacon_node/beacon_chain/src/beacon_chain.rs (L7171-L7178)

Allows at least the following combination of states:
- blobs: Some / None
- data_columns: Some / None
- data_column_recv: Some / None
- Block has data? Yes / No
- Block post-PeerDAS? Yes / No

In reality, we don't have that many possible states, only:

- `NoData`: pre-deneb, pre-PeerDAS with 0 blobs or post-PeerDAS with 0 blobs
- `Blobs(BlobSidecarList<E>)`: post-Deneb pre-PeerDAS with > 0 blobs
- `DataColumns(DataColumnSidecarList<E>)`: post-PeerDAS with > 0 blobs
- `DataColumnsRecv(oneshot::Receiver<DataColumnSidecarList<E>>)`: post-PeerDAS with > 0 blobs, but we obtained the columns via reconstruction

^ this are the variants of the new `AvailableBlockData` enum

So we go from 2^5 states to 4 well-defined. Downstream code benefits nicely from this clarity and I think it makes the whole feature much more maintainable.

Currently `is_available` returns a bool, and then we construct the available block in `make_available`. In a way the availability condition is duplicated in both functions. Instead, this PR constructs `AvailableBlockData` in `is_available` so the availability conditions are written once

```rust
if let Some(block_data) = is_available(..) {
let available_block = make_available(block_data);
}
```
2025-02-24 04:47:09 +00:00
Pawan Dhananjay
522b3cbaab Fix builder API headers (#7009)
Resolves https://github.com/sigp/lighthouse/issues/7000


  Set the accept header on builder to the correct value when requesting ssz.

This PR also adds a flag to disable ssz over the builder api altogether. In the case that builders/relays have an ssz bug, we can react quickly by asking clients to restart their nodes with the `--disable-ssz-builder` flag to force json. I'm not fully convinced if this is useful so open to removing it or opening another PR for it.

Testing this currently.
2025-02-24 03:39:13 +00:00
Michael Sproul
e5e43ecd81 Merge remote-tracking branch 'origin/release-v7.0.0' into unstable 2025-02-24 13:59:40 +11:00
Pawan Dhananjay
b3b6aea1c5 Rust 1.85 lints (#7019)
N/A


  2 changes:
1. Replace Option::map_or(true, ...) with is_none_or(...)
2. Remove unnecessary `Into::into` blocks where the type conversion is apparent from the types
2025-02-24 02:36:13 +00:00
Lion - dapplion
3992d6ba74 Fix misc PeerDAS todos (#6862)
Address misc PeerDAS TODOs that are not too big for a dedicated PR


  I'll justify each TODO on an inlined comment
2025-02-11 06:07:13 +00:00
Eitan Seri-Levi
afdda83798 Enable Light Client server by default (#6950) 2025-02-10 01:27:03 +00:00
Michael Sproul
0344f68cfd Update attestation rewards API for Electra (#6819)
Closes:

- https://github.com/sigp/lighthouse/issues/6818


  Use `MAX_EFFECTIVE_BALANCE_ELECTRA` (2048) for attestation reward calculations involving Electra.

Add a new `InteropGenesisBuilder` that tries to provide a more flexible way to build genesis states. Unfortunately due to lifetime jank, it is quite unergonomic at present. We may want to refactor this builder in future to make it easier to use.
2025-02-09 10:15:33 +00:00
Michael Sproul
2bd5bbdffb Optimise and refine SingleAttestation conversion (#6934)
Closes

- https://github.com/sigp/lighthouse/issues/6805


  - Use a new `WorkEvent::GossipAttestationToConvert` to handle the conversion from `SingleAttestation` to `Attestation` _on_ the beacon processor (prevents a Tokio thread being blocked).
- Improve the error handling for single attestations. I think previously we had no ability to reprocess single attestations for unknown blocks -- we would just error. This seemed to be the case in both gossip processing and processing of `SingleAttestation`s from the HTTP API.
- Move the `SingleAttestation -> Attestation` conversion function into `beacon_chain` so that it can return the `attestation_verification::Error` type, which has well-defined error handling and peer penalties. The now-unused variants of `types::Attestation::Error` have been removed.
2025-02-07 23:18:57 +00:00
Michael Sproul
cb117f859d Fix fetch blobs in all-null case (#6940)
Fix another issue with fetch-blobs, similar to:

- https://github.com/sigp/lighthouse/pull/6911


  Check if the list of blobs returned is all `None`, and if so, do not proceed any further.

This prevents an ugly error like:

> Feb 03 17:32:12.384 ERRO Error fetching or processing blobs from EL, block_root: 0x7326fe2dc1cb9036c9de7a07a662c86a339085597849016eadf061b70b7815ba, error: BlobProcessingError(AvailabilityCheck(Unexpected)), module
: network::network_beacon_processor:1011
2025-02-07 09:19:32 +00:00
chonghe
d6596dbe21 Keep execution payload during historical backfill when prune-payloads set to false (#6766)
- #6510


  - Keep execution payload during historical backfill when `--prune-payloads false` is set
- Add a field in the historical backfill debug log to indicate if execution payload is kept
- Add a test to check historical blocks has execution payload when `--prune-payloads false is set
- Very minor typo correction that I notice when working on this
2025-02-07 09:19:29 +00:00
Akihito Nakano
7408719de8 Remove unused metrics (#6817)
N/A


  Removed metrics that were defined but not used anywhere.
2025-02-07 07:48:52 +00:00
Krishang Shah
a4e3f361bf Update metrics.rs (#6863)
Fixes #5206, a low-hanging fruit.
2025-02-06 05:19:51 +00:00
Lion - dapplion
95cec45c38 Use data column batch verification consistently (#6851)
Resolve a `TODO(das)` to use KZG batch verification in `put_rpc_custody_columns`


  Uses `verify_kzg_for_data_column_list_with_scoring` in all paths that send more than one column. To use batch verification and have attributability of which peer is sending a bad column.

Needs to move `verify_kzg_for_data_column_list_with_scoring` into the type's module to convert to the KZG verified type.
2025-02-03 06:07:45 +00:00
Lion - dapplion
027bb973f8 Compute columns in post-PeerDAS checkpoint sync (#6760)
Addresses #6026.

Post-PeerDAS the DB expects to have data columns for the finalized block.


  Instead of forcing the user to submit the columns, this PR computes the columns from the blobs that we can already fetch from the checkpointz server or with the existing CLI options.

Note 1: (EDIT) Pruning concern addressed

Note 2: I have not tested this feature

Note 3: @michaelsproul an alternative I recall is to not require the blobs / columns at this point and expect backfill to populate the finalized block
2025-01-31 06:00:52 +00:00