Fix the failure of the beacon-chain tests for phase0/altair, which now only runs nightly.
Just skip the payload invalidation tests, they don't make any sense prior to Bellatrix anyway.
Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
The flow for local block building is
1. Create execution payload and bid
2. Construct beacon block
3. Sign beacon block and publish
4. Sign execution payload and publish
This PR adds the beacon block v4 flow , GET payload envelope and POST payload envelope (local block building only). The spec for these endpoints can be found here: https://github.com/ethereum/beacon-APIs/pull/552 and is subject to change.
We needed a way to store the unsigned execution payload envelope associated to the execution payload bid that was included in the block. I introduced a new cache that stores these unsigned execution payload envelopes. the GET payload envelope queries this cache directly so that a proposer, after publishing a block, can fetch the payload envelope + sign and publish it.
I kept payload signing and publishing within the validators block service to keep things simple for now. The idea was to build out a block production MVP for devnet 0, try not to affect any non gloas code paths and build things out in such a way that it will be easy to deprecate pre-gloas code paths later on (for example block production v2 and v3).
We will eventually need to track which beacon node was queried for the block so that we can later query it for the payload. But thats not needed for the devnet.
Co-Authored-By: Eitan Seri- Levi <eserilev@gmail.com>
Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>
- Step 0 of the tree-sync roadmap https://github.com/sigp/lighthouse/issues/7678
Current lookup sync tests are written in an explicit way that assume how the internals of lookup sync work. For example the test would do:
- Emit unknown block parent message
- Expect block request for X
- Respond with successful block request
- Expect block processing request for X
- Response with successful processing request
- etc..
This is unnecessarily verbose. And it will requires a complete re-write when something changes in the internals of lookup sync (has happened a few times, mostly for deneb and fulu).
What we really want to assert is:
- WHEN: we receive an unknown block parent message
- THEN: Lookup sync can sync that block
- ASSERT: Without penalizing peers, without unnecessary retries
Keep all existing tests and add new cases but written in the new style described above. The logic to serve and respond to request is in this function `fn simulate` 2288a3aeb1/beacon_node/network/src/sync/tests/lookups.rs (L301)
- It controls peer behavior based on a `CompleteStrategy` where you can set for example "respond to BlocksByRoot requests with empty"
- It actually runs beacon processor messages running their clousures. Now sync tests actually import blocks, increasing the test coverage to the interaction of sync and the da_checker.
- To achieve the above the tests create real blocks with the test harness. To make the tests as fast as before, I disabled crypto with `TestConfig`
Along the way I found a couple bugs, which I documented on the diff.
Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>
Currently, `consensus/types` cannot build with `no-default-features` since we use "legacy" standard arithmetic operations.
- Remove the offending arithmetic to fix compilation.
- Rename `legacy-arith` to `saturating-arith` and disable it by default.
Co-Authored-By: Mac L <mjladson@pm.me>
This reverts some of the changes from #8524 by adding back the typed network endpoints with an optional `network` feature. Without the `network` feature, these endpoints (and associated dependencies) will not be built.
This means the `enr`, `multiaddr` and `libp2p-identity` dependencies have returned but are now optional
Co-Authored-By: Mac L <mjladson@pm.me>
I accidentally broke `unstable` while merging some missed commits from `release-v8.0`. The merge was clean but semantically broken, and I didn't notice because I pushed without running CI 😬
- Fix the regression test added for #8528, for compatibility with the recent `RpcBlock` changes. I'm passing `is_available = false` which seems correct for this test.
Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
N/A
In https://github.com/sigp/lighthouse/pull/4801 , we added a state lru cache to avoid having too many states in memory which was a concern with 200mb+ states pre tree-states.
With https://github.com/sigp/lighthouse/pull/5891 , we made the overflow cache a simpler in memory lru cache that can only hold 32 pending states at the most and doesn't flush anything to disk. As noted in #5891, we can always fetch older blocks which never became available over rpc if they become available later.
Since we merged tree states, I don't think the state lru cache is relevant anymore. Instead of having the `DietAvailabilityPendingExecutedBlock` that stores only the state root, we can just store the full state in the `AvailabilityPendingExecutedBlock`.
Given entries in the cache can span max 1 epoch (cache size is 32), the underlying `BeaconState` objects in the cache share most of their memory. The state_lru_cache is one level of indirection that doesn't give us any benefit.
Please check me on this cc @dapplion
Co-Authored-By: Pawan Dhananjay <pawandhananjay@gmail.com>
Adds support for payload envelopes in the db. This is the minimum we'll need to store and fetch payloads.
Co-Authored-By: Eitan Seri- Levi <eserilev@gmail.com>
Pulling out consensus type changes from #8677.
This PR covers all type changes for spec 1.7.0-alpha.1 (except for `DataColumnSidecar` changes, which is covered in @eserilev's PR #8682)
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
Removes the remaining facade re-exports from `consensus/types`.
I have left `graffiti` as I think it has some utility so am leaning towards keeping it in the final API design.
Co-Authored-By: Mac L <mjladson@pm.me>
Removes some of the temporary re-exports in `consensus/types`.
I am doing this in multiple parts to keep each diff small.
Co-Authored-By: Mac L <mjladson@pm.me>
#8652
- This removes instances of `BeaconStateError` from `eth_spec.rs`, and replaces them directly with `ArithError` which can be trivially converted back to `BeaconStateError` at the call site.
- Also moves the state related methods on `ChainSpec` to be methods on `BeaconState` instead. I think this might be a more natural place for them to exist anyway.
Co-Authored-By: Mac L <mjladson@pm.me>
N/A
The `beacon_data_column_sidecar_computation_seconds` used to record the full kzg proof generation times before we changed getBlobsV2 to just return the full proofs + cells. This metric should be taking way less time than 100ms which was the minimum bucket previously.
Update the metric to use the default buckets for better granularity.
Co-Authored-By: Pawan Dhananjay <pawandhananjay@gmail.com>
Closes#8426
Added a new regression test: `reproduction_unaligned_checkpoint_sync_pruned_payload`.
This test reproduces the bug where unaligned checkpoint syncs (skipped slots at epoch boundaries) fail to import the anchor block's execution payload when `prune_payloads` is enabled.
The test simulates the failure mode by:
- Skipping if execution payloads are not applicable.
- Creating a harness with an unaligned checkpoint (gap of 3 slots).
- Configuring the client with prune_payloads = true.
It asserts that the Beacon Chain builds successfully (previously it panicked with `MissingFullBlockExecutionPayloadPruned`), confirming the fix logic in `try_get_full_block`.
Co-Authored-By: Andrurachi <andruvrch@gmail.com>
Co-Authored-By: Michael Sproul <michaelsproul@users.noreply.github.com>
There are certain crates which we re-export within `types` which creates a fragmented DevEx, where there are various ways to import the same crates.
```rust
// consensus/types/src/lib.rs
pub use bls::{
AggregatePublicKey, AggregateSignature, Error as BlsError, Keypair, PUBLIC_KEY_BYTES_LEN,
PublicKey, PublicKeyBytes, SIGNATURE_BYTES_LEN, SecretKey, Signature, SignatureBytes,
get_withdrawal_credentials,
};
pub use context_deserialize::{ContextDeserialize, context_deserialize};
pub use fixed_bytes::FixedBytesExtended;
pub use milhouse::{self, List, Vector};
pub use ssz_types::{BitList, BitVector, FixedVector, VariableList, typenum, typenum::Unsigned};
pub use superstruct::superstruct;
```
This PR removes these re-exports and makes it explicit that these types are imported from a non-`consensus/types` crate.
Co-Authored-By: Mac L <mjladson@pm.me>
Remove certain dependencies from `eth2`, and feature-gate others which are only used by certain endpoints.
| Removed | Optional | Dev only |
| -------- | -------- | -------- |
| `either` `enr` `libp2p-identity` `multiaddr` | `protoarray` `eth2_keystore` `eip_3076` `zeroize` `reqwest-eventsource` `futures` `futures-util` | `rand` `test_random_derive` |
This is done by adding an `events` feature which enables the events endpoint and its associated dependencies.
The `lighthouse` feature also enables its associated dependencies making them optional.
The networking-adjacent dependencies were removed by just having certain fields use a `String` instead of an explicit network type. This means the user should handle conversion at the call site instead. This is a bit spicy, but I believe `PeerId`, `Enr` and `Multiaddr` are easily converted to and from `String`s so I think it's fine and reduces our dependency space by a lot. The alternative is to feature gate these types behind a `network` feature instead.
Co-Authored-By: Mac L <mjladson@pm.me>
None
I noticed that `observed_column_sidecars` is missing its prune call in the finalization handler, which results in a memory leak on long-running nodes (very slow (**7MB/day**)) :
13dfa9200f/beacon_node/beacon_chain/src/canonical_head.rs (L940-L959)
Both caches use the same generic type `ObservedDataSidecars<T>:`
22ec4b3271/beacon_node/beacon_chain/src/beacon_chain.rs (L413-L416)
The type's documentation explicitly requires manual pruning:
> "*The cache supports pruning based upon the finalized epoch. It does not automatically prune, you must call Self::prune manually.*"
b4704eab4a/beacon_node/beacon_chain/src/observed_data_sidecars.rs (L66-L74)
Currently:
- `observed_blob_sidecars` => pruned
- `observed_column_sidecars` => **NOT** pruned
Without pruning, the underlying HashMap accumulates entries indefinitely, causing continuous memory growth until the node restarts.
Co-Authored-By: Antoine James <antoine@ethereum.org>
Fix a bug in `verify_header_signature` which tripped up some Lighthouse nodes at the Fusaka fork. The bug was a latent bug in a function that has been present for a long time, but only used by slashers. With Fulu it entered the critical path of blob/column verification -- call stack:
- `FetchBlobsBeaconAdapter::process_engine_blobs`
- `BeaconChain::process_engine_blobs`
- `BeaconChain::check_engine_blobs_availability_and_import`
- `BeaconChain::check_blob_header_signature_and_slashability`
- `verify_header_signature`
Thanks @eserilev for quickly diagnosing the root cause.
Change `verify_header_signature` to use `ChainSpec::fork_at_epoch` to compute the `Fork`, rather than using the head state's fork. At a fork boundary the head state's fork is stale and lacks the data for the new fork. Using `fork_at_epoch` ensures that we use the correct fork data and validate transition block's signature correctly.
Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
Organize and categorize `consensus/types` into modules based on their relation to key consensus structures/concepts.
This is a precursor to a sensible public interface.
While this refactor is very opinionated, I am open to suggestions on module names, or type groupings if my current ones are inappropriate.
Co-Authored-By: Mac L <mjladson@pm.me>
Instrument beacon node startup and parallelise pubkey cache initialisation.
I instrumented beacon node startup and noticed that pubkey cache takes a long time to initialise, mostly due to decompressing all the validator pubkeys.
This PR uses rayon to parallelize the decompression on initial checkpoint sync. The pubkeys are stored uncompressed, so the decopression time is not a problem on subsequent restarts. On restarts, we still deserialize pubkeys, but the timing is quite minimal on Sepolia so I didn't investigate further.
`validator_pubkey_cache_new` timing on Sepolia:
* before: 109.64ms
* with parallelization: 21ms
on Hoodi:
* before: times out with Kurtosis after 120s
* with parallelization: 12.77s to import keys
**UPDATE**: downloading checkpoint state + genesis state takes about 2 minutes on my laptop, so it seems like the BN managed to start the http server just before timing out (after the optimisation).
<img width="1380" height="625" alt="image" src="https://github.com/user-attachments/assets/4c548c14-57dd-4b47-af9a-115b15791940" />
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
This is a `tracing`-driven optimisation. While investigating why Lighthouse is slow to send `newPayload`, I found a suspicious 13ms of computation on the hot path in `gossip_block_into_execution_pending_block_slashable`:
<img width="1998" height="1022" alt="headercalc" src="https://github.com/user-attachments/assets/e4f88c1a-da23-47b4-b533-cf5479a1c55c" />
Looking at the current implementation we can see that the _only_ thing that happens prior to calling into `from_gossip_verified_block` is the calculation of a `header`. We first call `SignatureVerifiedBlock::from_gossip_verified_block_check_slashable`:
261322c3e3/beacon_node/beacon_chain/src/block_verification.rs (L1075-L1076)
Which is where the `header` is calculated prior to calling `from_gossip_verified_block`:
261322c3e3/beacon_node/beacon_chain/src/block_verification.rs (L1224-L1226)
Notice that the `header` is _only_ used in the case of an error, yet we spend time computing it every time!
This PR moves the calculation of the header (which involves hashing the whole beacon block, including the execution payload), into the error case. We take a cheap clone of the `Arc`'d beacon block on the hot path, and use this for calculating the header _only_ in the case an error actually occurs. This shaves 10-20ms off our pre-newPayload delays, and 10-20ms off every block processing 🎉
Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
We want to not require checkpoint sync starts to include the required custody data columns, and instead fetch them from p2p.
Closes https://github.com/sigp/lighthouse/issues/6837
The checkpoint sync slot can:
1. Be the first slot in the epoch, such that the epoch of the block == the start checkpoint epoch
2. Be in an epoch prior to the start checkpoint epoch
In both cases backfill sync already fetches that epoch worth of blocks with current code. This PR modifies the backfill import filter function to allow to re-importing the oldest block slot in the DB.
I feel this solution is sufficient unless I'm missing something. ~~I have not tested this yet!~~ Michael has tested this and it works.
Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>
Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
Take 2 of #8390.
Fixes the race condition properly instead of propagating the error. I think this is a better alternative, and doesn't seem to look that bad.
* Lift node id loading or generation from `NetworkService ` startup to the `ClientBuilder`, so that it can be used to compute custody columns for the beacon chain without waiting for Network bootstrap.
I've considered and implemented a few alternatives:
1. passing `node_id` to beacon chain builder and compute columns when creating `CustodyContext`. This approach isn't good for separation of concerns and isn't great for testability
2. passing `ordered_custody_groups` to beacon chain. `CustodyContext` only uses this to compute ordered custody columns, so we might as well lift this logic out, so we don't have to do error handling in `CustodyContext` construction. Less tests to update;.
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
Fix the span on execution payload verification (newPayload), by creating a new span rather than using the parent span. Using the parent span was incorrectly associating the time spent verifying the payload with `from_signature_verified_components`.
Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
Part of a fork-choice tech debt clean-up https://github.com/sigp/lighthouse/issues/8325https://github.com/sigp/lighthouse/issues/7089 (non-finalized checkpoint sync) changes the meaning of the checkpoints inside fork-choice. It turns out that we persist the justified and finalized checkpoints **twice** in fork-choice
1. Inside the fork-choice store
2. Inside the proto-array
There's no reason for 2. except for making the function signature of some methods smallers. It's not consistent with the rest of the crate, because in some functions we pass the external variable of time (current_slot) via args, but then read the finalized checkpoint from the internal state. Passing both variables as args makes fork-choice easier to reason about at the cost of a few extra lines.
Remove the unnecessary state (`justified_checkpoint`, `finalized_checkpoint`) inside `ProtoArray`, to make it easier to reason about.
Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>
Co-Authored-By: Michael Sproul <michaelsproul@users.noreply.github.com>
State advances were observed as especially slow on pre-Fulu networks (mainnet).
The reason being: we were doing an extra epoch of state advance because of code that should only have been running after Fulu, when proposer shufflings are determined with lookahead.
Only attempt to cache the _next epoch_ shuffling if the state's slot determines it (this will only be true post-Fulu). Reusing the logic for `proposer_shuffling_decision_slot` avoids having to repeat the fiddly logic about the Fulu fork epoch itself.
Co-Authored-By: Michael Sproul <michael@sigmaprime.io>