lighthouse

mirror of https://github.com/sigp/lighthouse.git synced 2026-06-17 10:48:28 +00:00

Author	SHA1	Message	Date
Michael Sproul	0d0232e8fc	Optimise out block header calculation (#8446 ) This is a `tracing`-driven optimisation. While investigating why Lighthouse is slow to send `newPayload`, I found a suspicious 13ms of computation on the hot path in `gossip_block_into_execution_pending_block_slashable`: <img width="1998" height="1022" alt="headercalc" src="https://github.com/user-attachments/assets/e4f88c1a-da23-47b4-b533-cf5479a1c55c" /> Looking at the current implementation we can see that the _only_ thing that happens prior to calling into `from_gossip_verified_block` is the calculation of a `header`. We first call `SignatureVerifiedBlock::from_gossip_verified_block_check_slashable`: `261322c3e3/beacon_node/beacon_chain/src/block_verification.rs (L1075-L1076)` Which is where the `header` is calculated prior to calling `from_gossip_verified_block`: `261322c3e3/beacon_node/beacon_chain/src/block_verification.rs (L1224-L1226)` Notice that the `header` is _only_ used in the case of an error, yet we spend time computing it every time! This PR moves the calculation of the header (which involves hashing the whole beacon block, including the execution payload), into the error case. We take a cheap clone of the `Arc`'d beacon block on the hot path, and use this for calculating the header _only_ in the case an error actually occurs. This shaves 10-20ms off our pre-newPayload delays, and 10-20ms off every block processing 🎉 Co-Authored-By: Michael Sproul <michael@sigmaprime.io>	2025-11-24 05:25:46 +00:00
Michael Sproul	261322c3e3	Merge remote-tracking branch 'origin/stable' into unstable	2025-11-20 13:04:32 +11:00
Jimmy Chen	af1d9b9991	Fix custody context initialization race condition that caused panic (#8391 ) Take 2 of #8390. Fixes the race condition properly instead of propagating the error. I think this is a better alternative, and doesn't seem to look that bad. * Lift node id loading or generation from `NetworkService ` startup to the `ClientBuilder`, so that it can be used to compute custody columns for the beacon chain without waiting for Network bootstrap. I've considered and implemented a few alternatives: 1. passing `node_id` to beacon chain builder and compute columns when creating `CustodyContext`. This approach isn't good for separation of concerns and isn't great for testability 2. passing `ordered_custody_groups` to beacon chain. `CustodyContext` only uses this to compute ordered custody columns, so we might as well lift this logic out, so we don't have to do error handling in `CustodyContext` construction. Less tests to update;. Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>	2025-11-17 05:23:12 +00:00
Lion - dapplion	53e73fa376	Remove duplicate state in ProtoArray (#8324 ) Part of a fork-choice tech debt clean-up https://github.com/sigp/lighthouse/issues/8325 https://github.com/sigp/lighthouse/issues/7089 (non-finalized checkpoint sync) changes the meaning of the checkpoints inside fork-choice. It turns out that we persist the justified and finalized checkpoints twice in fork-choice 1. Inside the fork-choice store 2. Inside the proto-array There's no reason for 2. except for making the function signature of some methods smallers. It's not consistent with the rest of the crate, because in some functions we pass the external variable of time (current_slot) via args, but then read the finalized checkpoint from the internal state. Passing both variables as args makes fork-choice easier to reason about at the cost of a few extra lines. Remove the unnecessary state (`justified_checkpoint`, `finalized_checkpoint`) inside `ProtoArray`, to make it easier to reason about. Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com> Co-Authored-By: Michael Sproul <michaelsproul@users.noreply.github.com>	2025-11-12 03:42:17 +00:00
Mac L	93b8f4686d	Remove `ethers-core` from `execution_layer` (#8149 ) #6022 Use `alloy_rpc_types::Transaction` to replace the `ethers_core::Transaction` inside the execution block generator. Co-Authored-By: Mac L <mjladson@pm.me>	2025-11-10 06:25:59 +00:00
Javier Chávarri	2c1f1c1605	Migrate derivative to educe (#8125 ) Fixes #7001. Mostly mechanical replacement of `derivative` attributes with `educe` ones. ### Attribute Syntax Changes ```rust // Bounds: = "..." → (...) #[derivative(Hash(bound = "E: EthSpec"))] #[educe(Hash(bound(E: EthSpec)))] // Ignore: = "ignore" → (ignore) #[derivative(PartialEq = "ignore")] #[educe(PartialEq(ignore))] // Default values: value = "..." → expression = ... #[derivative(Default(value = "ForkName::Base"))] #[educe(Default(expression = ForkName::Base))] // Methods: format_with/compare_with = "..." → method(...) #[derivative(Debug(format_with = "fmt_peer_set_as_len"))] #[educe(Debug(method(fmt_peer_set_as_len)))] // Empty bounds: removed entirely, educe can infer appropriate bounds #[derivative(Default(bound = ""))] #[educe(Default)] // Transparent debug: manual implementation (educe doesn't support it) #[derivative(Debug = "transparent")] // Replaced with manual Debug impl that delegates to inner field ``` Note: Some bounds use strings (`bound("E: EthSpec")`) for superstruct compatibility (`expected ','` errors). Co-Authored-By: Javier Chávarri <javier.chavarri@gmail.com> Co-Authored-By: Mac L <mjladson@pm.me>	2025-11-06 14:13:57 +00:00
Michael Sproul	0507eca7b4	Merge remote-tracking branch 'origin/stable' into unstable-merge-v8	2025-11-04 16:08:34 +11:00
Jimmy Chen	bc86dc09e5	Reduce number of blobs used in tests to speed up CI (#8194 ) `beacon-chain-tests` is now regularly taking 1h+ on CI since Fulu fork was added. This PR attemtpts to reduce the test time by bringing down the number of blobs generated in tests - instead of generating 0..max_blobs, the generator now generates 0..1 blobs by default, and this can be modified by setting `harness.execution_block_generator.set_min_blob_count(n)`. Note: The blobs are pre-generated and doesn't require too much CPU to generate however processing a larger number of them on the beacon chain does take a lot of time. This PR also include a few other small improvements - Our slowest test (`chain_segment_varying_chunk_size`) runs 3x faster in Fulu just by reusing chain segments - Avoid re-running fork specific tests on all forks - Fix a bunch of tests that depends on the harness's existing random blob generation, which is fragile beacon chain test time on test machine is ~2x faster: ### `unstable` ``` Summary [ 751.586s] 291 tests run: 291 passed (13 slow), 0 skipped ``` ### this branch ``` Summary [ 373.792s] 291 tests run: 291 passed (2 slow), 0 skipped ``` The next set of tests to optimise is the ones that use [`get_chain_segment`](`77a9af96de/beacon_node/beacon_chain/tests/block_verification.rs (L45)`), as it by default build 320 blocks with supernode - an easy optimisation would be to build these blocks with cgc = 8 for tests that only require fullnodes. Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com> Co-Authored-By: Jimmy Chen <jimmy@sigmaprime.io>	2025-11-04 02:40:44 +00:00
Michael Sproul	4908687e7d	Proposer duties backwards compat (#8335 ) The beacon API spec wasn't updated to use the Fulu definition of `dependent_root` for the proposer duties endpoint. No other client updated their logic, so to retain backwards compatibility the decision has been made to continue using the block root at the end of epoch `N - 1`, and introduce a new v2 endpoint down the track to use the correct dependent root. Eth R&D discussion: https://discord.com/channels/595666850260713488/598292067260825641/1433036715848765562 Change the behaviour of the v1 endpoint back to using the last slot of `N - 1` rather than the last slot of `N - 2`. This introduces the possibility of dependent root false positives (the root can change without changing the shuffling), but causes the least compatibility issues with other clients. Co-Authored-By: Michael Sproul <michael@sigmaprime.io>	2025-11-03 08:06:03 +00:00
Eitan Seri-Levi	25832e5862	Add mainnet configs (#8344 ) #8135 mainnet config PR: https://github.com/eth-clients/mainnet/pull/11 Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu> Co-Authored-By: Michael Sproul <michael@sigmaprime.io> Co-Authored-By: Tan Chee Keong <tanck@sigmaprime.io>	2025-11-03 06:53:13 +00:00
Michael Sproul	3bfdfa5a1a	Merge remote-tracking branch 'origin/release-v8.0' into unstable	2025-10-29 16:20:42 +11:00
hopinheimer	6f0d0dec75	Fix failing CI for `compile-with-beta-compiler` (#8317 ) Co-Authored-By: hopinheimer <knmanas6@gmail.com>	2025-10-29 05:12:57 +00:00
Mac L	f4b1bb46b5	Remove `compare_fields` and import from crates.io (#8189 ) Use the recently published `compare_fields` and remove it from Lighthouse https://crates.io/crates/compare_fields Co-Authored-By: Mac L <mjladson@pm.me>	2025-10-28 05:49:47 +00:00
Mac L	f5809aff87	Bump `ssz_types` to `v0.12.2` (#8032 ) https://github.com/sigp/lighthouse/issues/8012 Replace all instances of `VariableList::from` and `FixedVector::from` to their `try_from` variants. While I tried to use proper error handling in most cases, there were certain situations where adding an `expect` for situations where `try_from` can trivially never fail avoided adding a lot of extra complexity. Co-Authored-By: Mac L <mjladson@pm.me> Co-Authored-By: Michael Sproul <michaelsproul@users.noreply.github.com> Co-Authored-By: Michael Sproul <michael@sigmaprime.io>	2025-10-28 04:01:09 +00:00
kevaundray	6e71fd7c19	chore: fix typo (#8292 ) Co-Authored-By: kevaundray <kevtheappdev@gmail.com>	2025-10-28 01:20:43 +00:00
Michael Sproul	d67ae92112	Implement `/lighthouse/custody/info` API (#8276 ) Closes: - https://github.com/sigp/lighthouse/issues/8249 New `/lighthouse/custody` API including: - [x] Earliest custodied data column slot - [x] Node CGC - [x] Custodied columns Co-Authored-By: Michael Sproul <michael@sigmaprime.io>	2025-10-27 08:48:12 +00:00
Eitan Seri-Levi	33e21634cb	Custody backfill sync (#7907 ) #7603 #### Custody backfill sync service Similar in many ways to the current backfill service. There may be ways to unify the two services. The difficulty there is that the current backfill service tightly couples blocks and their associated blobs/data columns. Any attempts to unify the two services should be left to a separate PR in my opinion. #### `SyncNeworkContext` `SyncNetworkContext` manages custody sync data columns by range requests separetly from other sync RPC requests. I think this is a nice separation considering that custody backfill is its own service. #### Data column import logic The import logic verifies KZG committments and that the data columns block root matches the block root in the nodes store before importing columns #### New channel to send messages to `SyncManager` Now external services can communicate with the `SyncManager`. In this PR this channel is used to trigger a custody sync. Alternatively we may be able to use the existing `mpsc` channel that the `SyncNetworkContext` uses to communicate with the `SyncManager`. I will spend some time reviewing this. Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu> Co-Authored-By: Eitan Seri- Levi <eserilev@gmail.com> Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>	2025-10-22 03:51:34 +00:00
Michael Sproul	2f8587301d	More proposer shuffling cleanup (#8130 ) Addressing more review comments from: - https://github.com/sigp/lighthouse/pull/8101 I've also tweaked a few more things that I think are minor bugs. - Instrument `ensure_state_can_determine_proposers_for_epoch` - Fix `block_root` usage in `compute_proposer_duties_from_head`. This was a regression introduced in 8101 😬 . - Update the `state_advance_timer` to prime the next-epoch proposer cache post-Fulu. Co-Authored-By: Michael Sproul <michael@sigmaprime.io>	2025-10-20 03:14:14 +00:00
Mac L	f13d0615fd	Add `eip_3076` crate (#8206 ) #7894 Moves the `Interchange` format from `slashing_protection` and thus removes the dependency on `slashing_protection` from `eth2` which can now just depend on the slimmer `eip_3076` crate. Co-Authored-By: Mac L <mjladson@pm.me>	2025-10-16 16:10:42 +00:00
Mac L	345faf52cb	Remove `safe_arith` and import from crates.io (#8191 ) Use the recently published `safe_arith` and remove it from Lighthouse https://crates.io/crates/safe_arith Co-Authored-By: Mac L <mjladson@pm.me>	2025-10-15 06:03:46 +00:00
Michael Sproul	0c9fdea28d	Update `ForkName::latest_stable` to Fulu for tests (#8181 ) Update `ForkName::latest_stable` to Fulu, reflecting our plan to stabilise Fulu in the immediate future! This will lead to some more tests running with Fulu rather than Electra. Co-Authored-By: Michael Sproul <michael@sigmaprime.io>	2025-10-09 13:53:51 +00:00
chonghe	3110ca325b	Implement `/eth/v1/beacon/blobs` endpoint (#8103 ) * #8085 Co-Authored-By: Tan Chee Keong <tanck@sigmaprime.io> Co-Authored-By: chonghe <44791194+chong-he@users.noreply.github.com>	2025-10-09 05:01:30 +00:00
Michael Sproul	b5c2a9668e	Quote `BeaconState::proposer_lookahead` in JSON repr (#8167 ) Use quoted integers for `state.proposer_lookahead` when serializing JSON. This is standard for all integer fields, but was missed for the newly added proposer lookahead. I noticed this issue while inspecting the head state on a local devnet. I'm glad we found this before someone reported it :P Co-Authored-By: Michael Sproul <michael@sigmaprime.io>	2025-10-08 00:05:41 +00:00
Eitan Seri-Levi	4eb89604f8	Fulu ASCII art (#8151 ) Co-Authored-By: Eitan Seri- Levi <eserilev@gmail.com>	2025-10-07 14:32:35 +00:00
Michael Sproul	26575c594c	Improve spec compliance for `/eth/v1/config/spec` API (#8144 ) - [x] Remove the unnecessary `_MILLIS` suffix from `MAXIMUM_GOSSIP_CLOCK_DISPARITY` - [x] Add missing Deneb preset `KZG_COMMITMENT_INCLUSION_PROOF_DEPTH`, not to be confused with `KZG_COMMITMENTS_INCLUSION_PROOF_DEPTH` (plural) from Fulu... Co-Authored-By: Michael Sproul <michael@sigmaprime.io>	2025-10-01 09:29:15 +00:00
Michael Sproul	38fdaf791c	Fix proposer shuffling decision slot at boundary (#8128 ) Follow-up to the bug fixed in: - https://github.com/sigp/lighthouse/pull/8121 This fixes the root cause of that bug, which was introduced by me in: - https://github.com/sigp/lighthouse/pull/8101 Lion identified the issue here: - https://github.com/sigp/lighthouse/pull/8101#discussion_r2382710356 In the methods that compute the proposer shuffling decision root, ensure we don't use lookahead for the Fulu fork epoch itself. This is accomplished by checking if Fulu is enabled at `epoch - 1`, i.e. if `epoch > fulu_fork_epoch`. I haven't updated the methods that _compute_ shufflings to use these new corrected bounds (e.g. `BeaconState::compute_proposer_indices`), although we could make this change in future. The `get_beacon_proposer_indices` method already gracefully handles the Fulu boundary case by using the `proposer_lookahead` field (if initialised). Co-Authored-By: Michael Sproul <michael@sigmaprime.io>	2025-09-29 01:13:33 +00:00
Michael Sproul	c754234b2c	Fix bugs in proposer calculation post-Fulu (#8101 ) As identified by a researcher during the Fusaka security competition, we were computing the proposer index incorrectly in some places by computing without lookahead. - [x] Add "low level" checks to computation functions in `consensus/types` to ensure they error cleanly - [x] Re-work the determination of proposer shuffling decision roots, which are now fork aware. - [x] Re-work and simplify the beacon proposer cache to be fork-aware. - [x] Optimise `with_proposer_cache` to use `OnceCell`. - [x] All tests passing. - [x] Resolve all remaining `FIXME(sproul)`s. - [x] Unit tests for `ProtoBlock::proposer_shuffling_root_for_child_block`. - [x] End-to-end regression test. - [x] Test on pre-Fulu network. - [x] Test on post-Fulu network. Co-Authored-By: Michael Sproul <michael@sigmaprime.io>	2025-09-26 14:44:50 +00:00
Lion - dapplion	ffa7b2b2b9	Only mark block lookups as pending if block is importing from gossip (#8112 ) - PR https://github.com/sigp/lighthouse/pull/8045 introduced a regression of how lookup sync interacts with the da_checker. Now in unstable block import from the HTTP API also insert the block in the da_checker while the block is being execution verified. If lookup sync finds the block in the da_checker in `NotValidated` state it expects a `GossipBlockProcessResult` message sometime later. That message is only sent after block import in gossip. I confirmed in our node's logs for 4/4 cases of stuck lookups are caused by this sequence of events: - Receive block through API, insert into da_checker in fn process_block in put_pre_execution_block - Create lookup and leave in AwaitingDownload(block in processing cache) state - Block from HTTP API finishes importing - Lookup is left stuck Closes https://github.com/sigp/lighthouse/issues/8104 - https://github.com/sigp/lighthouse/pull/8110 was my initial solution attempt but we can't send the `GossipBlockProcessResult` event from the `http_api` crate without adding new channels, which seems messy. For a given node it's rare that a lookup is created at the same time that a block is being published. This PR solves https://github.com/sigp/lighthouse/issues/8104 by allowing lookup sync to import the block twice in that case. Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>	2025-09-25 03:52:27 +00:00
Eitan Seri-Levi	521be2b757	Prevent silently dropping cell proof chunks (#8023 ) Co-Authored-By: Eitan Seri- Levi <eserilev@gmail.com>	2025-09-18 01:33:42 +00:00
Eitan Seri-Levi	242bdfcf12	Add instrumentation to `recompute_head_at_slot` (#8049 ) Co-Authored-By: Eitan Seri- Levi <eserilev@gmail.com>	2025-09-16 05:18:31 +00:00
Jimmy Chen	811eccdf34	Reduce noise in `Debug` impl of `RuntimeVariableList` (#8007 ) The default debug output of these types contains a lot of unnecessary noise making it hard to read. This PR removes the type and extra fields from debug output to make logs easier to read. `len` could be potentially useful in some cases, but this gives us flexibility to only log it separately if we need it. Related PR in `ssz_types`: - https://github.com/sigp/ssz_types/pull/57 Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>	2025-09-10 04:59:22 +00:00
Jimmy Chen	8a4f6cf0d5	Instrument tracing on block production code path (#8017 ) Partially #7814. Instrument block production code path. New root spans: * `produce_block_v3` * `produce_block_v2` Example traces: <img width="518" height="432" alt="image" src="https://github.com/user-attachments/assets/a9413d25-501c-49dc-95cc-623db5988981" /> Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>	2025-09-10 03:30:51 +00:00
Jimmy Chen	ee734d1456	Fix stuck data column lookups by improving peer selection and retry logic (#8005 ) Fixes the issue described in #7980 where Lighthouse repeatedly sends `DataColumnsByRoot` requests to the same peers that return empty responses, causing sync to get stuck. The root cause was we don't count empty responses as failures, leading to excessive retries to unresponsive peers. - Track per peer attempts to limit retry attempts per peer (`MAX_CUSTODY_PEER_ATTEMPTS = 3`) - Replaced random peer selection with hashing within each lookup to prevent splitting lookup into too many small requests and improve request batching efficiency. - Added `single_block_lookup` root span to track all lookups created and added more debug logs: <img width="1264" height="501" alt="image" src="https://github.com/user-attachments/assets/983629ba-b6d0-41cf-8e93-88a5b96c2f31" /> Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com> Co-Authored-By: Jimmy Chen <jimmy@sigmaprime.io>	2025-09-09 06:18:05 +00:00
Michael Sproul	76adedff27	Simplify length methods on BeaconBlockBody (#7989 ) Just the low-hanging fruit from: - https://github.com/sigp/lighthouse/pull/7988	2025-09-04 00:08:29 +00:00
chonghe	a93cafee08	Implement `selections` Beacon API endpoints to support DVT middleware (#7016 ) * #6610 - [x] Add `beacon_committee_selections` endpoint - [x] Test beacon committee aggregator and confirmed working - [x] Add `sync_committee_selections` endpoint - [x] Test sync committee aggregator and confirmed working	2025-09-03 03:50:41 +00:00
Paul Etscheit	66edda2690	Impl ForkVersionDecode for beacon state (#7954 )	2025-09-01 02:22:40 +00:00
Michael Sproul	d235f2c697	Delete `RuntimeVariableList::from_vec` (#7930 ) This method is a footgun because it truncates the list. It is the source of a recent bug: - https://github.com/sigp/lighthouse/pull/7927 - Delete uses of `RuntimeVariableList::from_vec` and replace them with `::new` which does validation and can fail. - Propagate errors where possible, unwrap in tests and use `expect` for obviously-safe uses (in `chain_spec.rs`).	2025-08-27 06:52:14 +00:00
Barnabas Busa	2b33fe6620	Update to spec v1.6.0-alpha.5 (#7910 ) - https://github.com/ethereum/consensus-specs/pull/4508	2025-08-27 03:59:21 +00:00
Mac L	e438691683	Add Gloas boilerplate (#7728 ) Adds the required boilerplate code for the Gloas (Glamsterdam) hard fork. This allows PRs testing Gloas-candidate features to test fork transition. This also includes de-duplication of post-Bellatrix readiness notifiers from #6797 (credit to @dapplion)	2025-08-26 02:49:48 +00:00
Jimmy Chen	747d9118ff	Fix `DataColumnsByRoot` request limit validation bug (#7928 ) Fixes #7926 This was a bug I introduced in #7890 and @pawanjay176 noticed it on some running nodes, and added a rpc test to confirm it. The culprit is this line, where I failed to fill the vec to it's max size, so it doesn't calculate the max size properly, resulting in all `DataColumnByRoot` requests exceeding the max size during validation: `d24a6d2a45/consensus/types/src/chain_spec.rs (L1984)` The PR fixes this and includes new regression tests for this fix.	2025-08-25 04:13:36 +00:00
Jimmy Chen	b4704eab4a	Fulu update to spec v1.6.0-alpha.4 (#7890 ) Fulu update to spec [v1.6.0-alpha.4](https://github.com/ethereum/consensus-specs/releases/tag/v1.6.0-alpha.4). - Make `number_of_columns` a preset - Optimise `get_custody_groups` to avoid computing if cgc = 128 - Add support for additional typenum values in type_dispatch macro	2025-08-20 02:05:04 +00:00
Michael Sproul	836c39efaa	Shrink persisted fork choice data (#7805 ) Closes: - https://github.com/sigp/lighthouse/issues/7760 - [x] Remove `balances_cache` from `PersistedForkChoiceStore` (~65 MB saving on mainnet) - [x] Remove `justified_balances` from `PersistedForkChoiceStore` (~16 MB saving on mainnet) - [x] Remove `balances` from `ProtoArray`/`SszContainer`. - [x] Implement zstd compression for votes - [x] Fix bug in justified state usage - [x] Bump schema version to V28 and implement migration.	2025-08-18 06:03:28 +00:00
Michael Sproul	42f6d7b02d	Yeet env_logger into the sun (#7872 ) - Remove explicit `env_logger` usage from `state_processing` tests and `lcli`. - Set up tracing correctly for `lcli` (I've checked that we can see logs after this change). - I didn't do anything to set up logging for the `state_processing` tests, as these are rarely run manually (they never fail). We could add `test_logger` in there on an as-needed basis.	2025-08-15 03:17:26 +00:00
chonghe	522bd9e9c6	Update Rust Edition to 2024 (#7766 ) * #7749 Thanks @dknopik and @michaelsproul for your help!	2025-08-13 03:04:31 +00:00
Mac L	152f2bb2e4	Re-export `context_deserialize_derive` inside `context_deserialize` (#7852 ) Re-export `context_deserialize_derive` inside of `context_deserialize` so they are both available from the same interface, which matches how popular crates (like `serde`) handle this. This also nests both crates inside a new `context_deserialize` directory which will make it easier to eventually spin out into a different repo (if/when) we decide to do that (plus I prefer it aesthetically).	2025-08-12 05:16:19 +00:00
Michael Sproul	918121e313	Fix bugs in rebasing of states prior to finalization (#7849 ) Attempt to fix this error reported by `beaconcha.in` on their Hoodi archive nodes: > {"code":500,"message":"UNHANDLED_ERROR: DBError(CacheBuildError(BeaconState(MilhouseError(OutOfBoundsIterFrom { index: 1199549, len: 1060000 }))))","stacktraces":[]} There are only a handful of places where we call `iter_from`. This one is safe by construction (the check immediately prior ensures `self.pubkeys.len()` is not out of bounds): `cfb1f73310/beacon_node/beacon_chain/src/validator_pubkey_cache.rs (L84-L90)` This one should also be safe, and the indexes used here would not be as large as the ones in the reported error: `cfb1f73310/consensus/state_processing/src/per_epoch_processing/single_pass.rs (L365-L368)` Which leaves one remaining usage which must be the culprit: `cfb1f73310/consensus/types/src/beacon_state.rs (L2109-L2113)` This indexing relies on the invariant that `self.pubkey_cache().len() <= self.validators.len()`. We mostly maintain that invariant, except for in `rebase_caches_on` (fixed in this PR). The other bug, is that we were calling `rebase_on_finalized` for all "hot" states, which post-v7.1.0 includes states prior to the split which are required by the hdiff grid. This is how we end up calling something like `genesis_state.rebase_on(&split_state)`, which then corrupts the pubkey cache of the genesis state using the newer pubkey cache from the split state.	2025-08-12 02:19:24 +00:00
Jimmy Chen	40c2fd5ff4	Instrument tracing spans for block processing and import (#7816 ) #7815 - removes all existing spans, so some span fields that appear in logs like `service_name` may be lost. - instruments a few key code paths in the beacon node, starting from root spans named below: * Gossip block and blobs * `process_gossip_data_column_sidecar` * `process_gossip_blob` * `process_gossip_block` * Rpc block and blobs * `process_rpc_block` * `process_rpc_blobs` * `process_rpc_custody_columns` * Rpc blocks (range and backfill) * `process_chain_segment` * `PendingComponents` lifecycle * `pending_components` To test locally: * Run Grafana and Tempo with https://github.com/sigp/lighthouse-metrics/pull/57 * Run Lighthouse BN with `--telemetry-collector-url http://localhost:4317` Some captured traces can be found here: https://hackmd.io/@jimmygchen/r1sLOxPPeg Removing the old spans seem to have reduced the memory usage quite a lot - i think we were using them on long running tasks and too excessively: <img width="910" height="495" alt="image" src="https://github.com/user-attachments/assets/5208bbe4-53b2-4ead-bc71-0b782c788669" />	2025-08-08 05:32:22 +00:00
Jimmy Chen	8bc6693dac	Fix wrong columns getting processed on a CGC change (#7792 ) This PR fixes a bug where wrong columns could get processed immediately after a CGC increase. Scenario: - The node's CGC increased due to additional validators attached to it (lets say from 10 to 11) - The new CGC is advertised and new subnets are subscribed immediately, however the change won't be effective in the data availability check until the next epoch (See [this](`ab0e8870b4/beacon_node/beacon_chain/src/validator_custody.rs (L93-L99)`)). Data availability checker still only require 10 columns for the current epoch. - During this time, data columns for the additional custody column (lets say column 11) may arrive via gossip as we're already subscribed to the topic, and it may be incorrectly used to satisfy the existing data availability requirement (10 columns), and result in this additional column (instead of a required one) getting persisted, resulting in database inconsistency.	2025-08-07 00:45:04 +00:00
Eric Tu	c06ac81c67	Shuffling for 32 bit platforms (#7725 ) - In shuffling, a the raw_pivot (u64) is cast to a usize which will break on 32 bit systems. Now it is modulo'ed with the list_size first then cast to a usize. - ruint doesn't implement shifting with u64's on 32-bit arch. Since `prefix_bits` is u8 and NODE_ID_BITS = 256, we use them as u32's instead. See: https://docs.rs/ruint/latest/src/ruint/bits.rs.html#711	2025-08-06 02:37:07 +00:00
Michael Sproul	0dcce40ccb	Fix Clippy for Rust 1.90 beta (#7826 ) Fix Clippy for recently released Rust 1.90 beta. There may be more changes required when Rust 1.89 stable is released in a few days, but possibly not 🤞	2025-08-05 13:52:26 +00:00

1 2 3 4 5 ...

812 Commits