lighthouse

mirror of https://github.com/sigp/lighthouse.git synced 2026-03-18 04:13:00 +00:00

Author	SHA1	Message	Date
Lion - dapplion	53e73fa376	Remove duplicate state in ProtoArray (#8324 ) Part of a fork-choice tech debt clean-up https://github.com/sigp/lighthouse/issues/8325 https://github.com/sigp/lighthouse/issues/7089 (non-finalized checkpoint sync) changes the meaning of the checkpoints inside fork-choice. It turns out that we persist the justified and finalized checkpoints twice in fork-choice 1. Inside the fork-choice store 2. Inside the proto-array There's no reason for 2. except for making the function signature of some methods smallers. It's not consistent with the rest of the crate, because in some functions we pass the external variable of time (current_slot) via args, but then read the finalized checkpoint from the internal state. Passing both variables as args makes fork-choice easier to reason about at the cost of a few extra lines. Remove the unnecessary state (`justified_checkpoint`, `finalized_checkpoint`) inside `ProtoArray`, to make it easier to reason about. Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com> Co-Authored-By: Michael Sproul <michaelsproul@users.noreply.github.com>	2025-11-12 03:42:17 +00:00
Michael Sproul	a7e89a8761	Optimise `state_root_at_slot` for finalized slot (#8353 ) This is an optimisation targeted at Fulu networks in non-finality. While debugging on Holesky, we found that `state_root_at_slot` was being called from `prepare_beacon_proposer` a lot, for the finalized state: `2c9b670f5d/beacon_node/http_api/src/lib.rs (L3860-L3861)` This was causing `prepare_beacon_proposer` calls to take upwards of 5 seconds, sometimes 10 seconds, because it would trigger _multiple_ beacon state loads in order to iterate back to the finalized slot. Ideally, loading the finalized state should be quick because we keep it cached in the state cache (technically we keep the split state, but they usually coincide). Instead we are computing the finalized state root separately (slow), and then loading the state from the cache (fast). Although it would be possible to make the API faster by removing the `state_root_at_slot` call, I believe it's simpler to change `state_root_at_slot` itself and remove the footgun. Devs rightly expect operations involving the finalized state to be fast. Co-Authored-By: Michael Sproul <michael@sigmaprime.io>	2025-11-05 02:08:46 +00:00
Mac L	f5809aff87	Bump `ssz_types` to `v0.12.2` (#8032 ) https://github.com/sigp/lighthouse/issues/8012 Replace all instances of `VariableList::from` and `FixedVector::from` to their `try_from` variants. While I tried to use proper error handling in most cases, there were certain situations where adding an `expect` for situations where `try_from` can trivially never fail avoided adding a lot of extra complexity. Co-Authored-By: Mac L <mjladson@pm.me> Co-Authored-By: Michael Sproul <michaelsproul@users.noreply.github.com> Co-Authored-By: Michael Sproul <michael@sigmaprime.io>	2025-10-28 04:01:09 +00:00
Jimmy Chen	43c5e924d7	Add `--semi-supernode` support (#8254 ) Addresses #8218 A simplified version of #8241 for the initial release. I've tried to minimise the logic change in this PR, although introducing the `NodeCustodyType` enum still result in quite a bit a of diff, but the actual logic change in `CustodyContext` is quite small. The main changes are in the `CustdoyContext` struct * ~~combining `validator_custody_count` and `current_is_supernode` fields into a single `custody_group_count_at_head` field. We persist the cgc of the initial cli values into the `custody_group_count_at_head` field and only allow for increase (same behaviour as before).~~ * I noticed the above approach caused a backward compatibility issue, I've [made a fix](`15569bc085`) and changed the approach slightly (which was actually what I had originally in mind): * when initialising, only override the `validator_custody_count` value if either flag `--supernode` or `--semi-supernode` is used; otherwise leave it as the existing default `0`. Most other logic remains unchanged. All existing validator custody unit tests are still all passing, and I've added additional tests to cover semi-supernode, and restoring `CustodyContext` from disk. Note: I've added a `WARN` if the user attempts to switch to a `--semi-supernode` or `--supernode` - this currently has no effect, but once @eserilev column backfill is merged, we should be able to support this quite easily. Things to test - [x] cgc in metadata / enr - [x] cgc in metrics - [x] subscribed subnets - [x] getBlobs endpoint Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>	2025-10-22 05:23:17 +00:00
Eitan Seri-Levi	33e21634cb	Custody backfill sync (#7907 ) #7603 #### Custody backfill sync service Similar in many ways to the current backfill service. There may be ways to unify the two services. The difficulty there is that the current backfill service tightly couples blocks and their associated blobs/data columns. Any attempts to unify the two services should be left to a separate PR in my opinion. #### `SyncNeworkContext` `SyncNetworkContext` manages custody sync data columns by range requests separetly from other sync RPC requests. I think this is a nice separation considering that custody backfill is its own service. #### Data column import logic The import logic verifies KZG committments and that the data columns block root matches the block root in the nodes store before importing columns #### New channel to send messages to `SyncManager` Now external services can communicate with the `SyncManager`. In this PR this channel is used to trigger a custody sync. Alternatively we may be able to use the existing `mpsc` channel that the `SyncNetworkContext` uses to communicate with the `SyncManager`. I will spend some time reviewing this. Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu> Co-Authored-By: Eitan Seri- Levi <eserilev@gmail.com> Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>	2025-10-22 03:51:34 +00:00
Eitan Seri-Levi	46dde9afee	Fix data column rpc request (#8247 ) Fixes an issue mentioned in this comment regarding data column rpc requests: https://github.com/sigp/lighthouse/issues/6572#issuecomment-3400076236 Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu> Co-Authored-By: Michael Sproul <micsproul@gmail.com>	2025-10-21 23:54:35 +00:00
Michael Sproul	21bab0899a	Improve block header signature handling (#8253 ) Closes: - https://github.com/sigp/lighthouse/issues/7650 Reject blob and data column sidecars from RPC with invalid signatures. Co-Authored-By: Michael Sproul <michael@sigmaprime.io>	2025-10-21 13:58:12 +00:00
Michael Sproul	2f8587301d	More proposer shuffling cleanup (#8130 ) Addressing more review comments from: - https://github.com/sigp/lighthouse/pull/8101 I've also tweaked a few more things that I think are minor bugs. - Instrument `ensure_state_can_determine_proposers_for_epoch` - Fix `block_root` usage in `compute_proposer_duties_from_head`. This was a regression introduced in 8101 😬 . - Update the `state_advance_timer` to prime the next-epoch proposer cache post-Fulu. Co-Authored-By: Michael Sproul <michael@sigmaprime.io>	2025-10-20 03:14:14 +00:00
Pawan Dhananjay	2c328e32a6	Persist only custody columns in db (#8188 ) * Only persist custody columns * Get claude to write tests * lint * Address review comments and fix tests. * Use supernode only when building chain segments * Clean up * Rewrite tests. * Fix tests * Clippy --------- Co-authored-by: Jimmy Chen <jchen.tc@gmail.com> Co-authored-by: Michael Sproul <michael@sigmaprime.io>	2025-10-13 20:32:13 +11:00
Michael Sproul	13dfa9200f	Block proposal optimisations (#8156 ) Closes: - https://github.com/sigp/lighthouse/issues/4412 This should reduce Lighthouse's block proposal times on Holesky and prevent us getting reorged. - [x] Allow the head state to be advanced further than 1 slot. This lets us avoid epoch processing on hot paths including block production, by having new epoch boundaries pre-computed and available in the state cache. - [x] Use the finalized state to prune the op pool. We were previously using the head state and trying to infer slashing/exit relevance based on `exit_epoch`. However some exit epochs are far in the future, despite occurring recently. Co-Authored-By: Michael Sproul <michael@sigmaprime.io>	2025-10-08 06:09:12 +00:00
Michael Sproul	c754234b2c	Fix bugs in proposer calculation post-Fulu (#8101 ) As identified by a researcher during the Fusaka security competition, we were computing the proposer index incorrectly in some places by computing without lookahead. - [x] Add "low level" checks to computation functions in `consensus/types` to ensure they error cleanly - [x] Re-work the determination of proposer shuffling decision roots, which are now fork aware. - [x] Re-work and simplify the beacon proposer cache to be fork-aware. - [x] Optimise `with_proposer_cache` to use `OnceCell`. - [x] All tests passing. - [x] Resolve all remaining `FIXME(sproul)`s. - [x] Unit tests for `ProtoBlock::proposer_shuffling_root_for_child_block`. - [x] End-to-end regression test. - [x] Test on pre-Fulu network. - [x] Test on post-Fulu network. Co-Authored-By: Michael Sproul <michael@sigmaprime.io>	2025-09-26 14:44:50 +00:00
Lion - dapplion	ffa7b2b2b9	Only mark block lookups as pending if block is importing from gossip (#8112 ) - PR https://github.com/sigp/lighthouse/pull/8045 introduced a regression of how lookup sync interacts with the da_checker. Now in unstable block import from the HTTP API also insert the block in the da_checker while the block is being execution verified. If lookup sync finds the block in the da_checker in `NotValidated` state it expects a `GossipBlockProcessResult` message sometime later. That message is only sent after block import in gossip. I confirmed in our node's logs for 4/4 cases of stuck lookups are caused by this sequence of events: - Receive block through API, insert into da_checker in fn process_block in put_pre_execution_block - Create lookup and leave in AwaitingDownload(block in processing cache) state - Block from HTTP API finishes importing - Lookup is left stuck Closes https://github.com/sigp/lighthouse/issues/8104 - https://github.com/sigp/lighthouse/pull/8110 was my initial solution attempt but we can't send the `GossipBlockProcessResult` event from the `http_api` crate without adding new channels, which seems messy. For a given node it's rare that a lookup is created at the same time that a block is being published. This PR solves https://github.com/sigp/lighthouse/issues/8104 by allowing lookup sync to import the block twice in that case. Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>	2025-09-25 03:52:27 +00:00
Eitan Seri-Levi	af274029e8	Run reconstruction inside a scoped rayon pool (#8075 ) Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com> Co-Authored-By: Eitan Seri- Levi <eserilev@gmail.com> Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>	2025-09-24 06:37:34 +00:00
Jimmy Chen	78d330e4b7	Consolidate `reqresp_pre_import_cache` into `data_availability_checker` (#8045 ) This PR consolidates the `reqresp_pre_import_cache` into the `data_availability_checker` for the following reasons: - the `reqresp_pre_import_cache` suffers from the same TOCTOU bug we had with `data_availability_checker` earlier, and leads to unbounded memory leak, which we have observed over the last 6 months on some nodes. - the `reqresp_pre_import_cache` is no longer necessary, because we now hold blocks in the `data_availability_checker` for longer since (#7961), and recent blocks can be served from the DA checker. This PR also maintains the following functionalities - Serving pre-executed blocks over RPC, and they're now served from the `data_availability_checker` instead. - Using the cache for de-duplicating lookup requests. Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com> Co-Authored-By: Jimmy Chen <jimmy@sigmaprime.io>	2025-09-19 07:01:13 +00:00
Jimmy Chen	3de646c8b3	Enable reconstruction for nodes custodying more than 50% of columns and instrument tracing (#8052 ) Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com> Co-Authored-By: Jimmy Chen <jimmy@sigmaprime.io>	2025-09-16 08:17:43 +00:00
Michael Sproul	f04d5ecddd	Another check to prevent duplicate block imports (#8050 ) Attempt to address performance issues caused by importing the same block multiple times. - Check fork choice "after" obtaining the fork choice write lock in `BeaconChain::import_block`. We actually use an upgradable read lock, but this is semantically equivalent (the upgradable read has the advantage of not excluding regular reads). The hope is that this change has several benefits: 1. By preventing duplicate block imports we save time repeating work inside `import_block` that is unnecessary, e.g. writing the state to disk. Although the store itself now takes some measures to avoid re-writing diffs, it is even better if we avoid a disk write entirely. 2. By returning `DuplicateFullyImported`, we reduce some duplicated work downstream. E.g. if multiple threads importing columns trigger `import_block`, now only _one_ of them will get a notification of the block import completing successfully, and only this one will run `recompute_head`. This should help avoid a situation where multiple beacon processor workers are consumed by threads blocking on the `recompute_head_lock`. However, a similar block-fest is still possible with the upgradable fork choice lock (a large number of threads can be blocked waiting for the first thread to complete block import). Co-Authored-By: Michael Sproul <michael@sigmaprime.io>	2025-09-16 04:10:42 +00:00
Jimmy Chen	8a4f6cf0d5	Instrument tracing on block production code path (#8017 ) Partially #7814. Instrument block production code path. New root spans: * `produce_block_v3` * `produce_block_v2` Example traces: <img width="518" height="432" alt="image" src="https://github.com/user-attachments/assets/a9413d25-501c-49dc-95cc-623db5988981" /> Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>	2025-09-10 03:30:51 +00:00
Jimmy Chen	eef02afc93	Fix data availability checker race condition causing partial data columns to be served over RPC (#7961 ) Partially resolves #6439, an simpler alternative to #7931. Race condition occurs when RPC data columns arrives after a block has been imported and removed from the DA checker: 1. Block becomes available via gossip 2. RPC columns arrive and pass fork choice check (block hasn't been imported) 3. Block import completes (removing block from DA checker) 4. RPC data columns finish verification and get imported into DA checker This causes two issues: 1. Partial data serving: Already imported components get re-inserted, potentially causing LH to serve incomplete data 2. State cache misses: Leads to state reconstruction, holding the availability cache write lock longer and increasing race likelihood ### Proposed Changes 1. Never manually remove pending components from DA checker. Components are only removed via LRU eviction as finality advances. This makes sure we don't run into the issue described above. 2. Use `get` instead of `pop` when recovering the executed block, this prevents cache misses in race condition. This should reduce the likelihood of the race condition 3. Refactor DA checker to drop write lock as soon as components are added. This should also reduce the likelihood of the race condition Trade-offs: This solution eliminates a few nasty race conditions while allowing simplicity, with the cost of allowing block re-import (already existing). The increase in memory in DA checker can be partially offset by a reduction in block cache size if this really comes an issue (as we now serve recent blocks from DA checker).	2025-09-02 07:18:23 +00:00
Jimmy Chen	c13fb2fb46	Instrument `publish_block` code path (#7945 ) Instrument `publish_block` code path and log dropped data columns when publishing. Example spans (running the devnet from my laptop, so the numbers aren't great) <img width="734" height="296" alt="image" src="https://github.com/user-attachments/assets/20620bf7-2b38-4392-aa75-9ba96d3a7f0d" /> <img width="718" height="625" alt="image" src="https://github.com/user-attachments/assets/61e1ff1c-65b5-4ad4-981a-d0fadc9829e1" />	2025-08-28 03:31:29 +00:00
Mac L	e438691683	Add Gloas boilerplate (#7728 ) Adds the required boilerplate code for the Gloas (Glamsterdam) hard fork. This allows PRs testing Gloas-candidate features to test fork transition. This also includes de-duplication of post-Bellatrix readiness notifiers from #6797 (credit to @dapplion)	2025-08-26 02:49:48 +00:00
Jimmy Chen	2d223575d6	Avoid unnecessary database lookups in data column RPC requests (#7897 ) This PR is an optimisation to avoid unnecessary database lookups when peer requests data columns that the node doesn't custody (advertised via `cgc`). e.g. an extreme but realistic example - a full node only store 4 custody columns by default, but it may receive a range request of 32 slots with all 128 columns, and this would result in 4096 database lookups but the node is only able to get 128 (4 * 32) of them. - Filter data column RPC requests (`DataColumnsByRoot`, `DataColumnsByRange`) to only lookup columns the node custodies - Prevents unnecessary database queries that would always fail for non-custody columns	2025-08-20 05:08:53 +00:00
Jimmy Chen	b4704eab4a	Fulu update to spec v1.6.0-alpha.4 (#7890 ) Fulu update to spec [v1.6.0-alpha.4](https://github.com/ethereum/consensus-specs/releases/tag/v1.6.0-alpha.4). - Make `number_of_columns` a preset - Optimise `get_custody_groups` to avoid computing if cgc = 128 - Add support for additional typenum values in type_dispatch macro	2025-08-20 02:05:04 +00:00
Michael Sproul	836c39efaa	Shrink persisted fork choice data (#7805 ) Closes: - https://github.com/sigp/lighthouse/issues/7760 - [x] Remove `balances_cache` from `PersistedForkChoiceStore` (~65 MB saving on mainnet) - [x] Remove `justified_balances` from `PersistedForkChoiceStore` (~16 MB saving on mainnet) - [x] Remove `balances` from `ProtoArray`/`SszContainer`. - [x] Implement zstd compression for votes - [x] Fix bug in justified state usage - [x] Bump schema version to V28 and implement migration.	2025-08-18 06:03:28 +00:00
chonghe	522bd9e9c6	Update Rust Edition to 2024 (#7766 ) * #7749 Thanks @dknopik and @michaelsproul for your help!	2025-08-13 03:04:31 +00:00
Jimmy Chen	40c2fd5ff4	Instrument tracing spans for block processing and import (#7816 ) #7815 - removes all existing spans, so some span fields that appear in logs like `service_name` may be lost. - instruments a few key code paths in the beacon node, starting from root spans named below: * Gossip block and blobs * `process_gossip_data_column_sidecar` * `process_gossip_blob` * `process_gossip_block` * Rpc block and blobs * `process_rpc_block` * `process_rpc_blobs` * `process_rpc_custody_columns` * Rpc blocks (range and backfill) * `process_chain_segment` * `PendingComponents` lifecycle * `pending_components` To test locally: * Run Grafana and Tempo with https://github.com/sigp/lighthouse-metrics/pull/57 * Run Lighthouse BN with `--telemetry-collector-url http://localhost:4317` Some captured traces can be found here: https://hackmd.io/@jimmygchen/r1sLOxPPeg Removing the old spans seem to have reduced the memory usage quite a lot - i think we were using them on long running tasks and too excessively: <img width="910" height="495" alt="image" src="https://github.com/user-attachments/assets/5208bbe4-53b2-4ead-bc71-0b782c788669" />	2025-08-08 05:32:22 +00:00
Jimmy Chen	8bc6693dac	Fix wrong columns getting processed on a CGC change (#7792 ) This PR fixes a bug where wrong columns could get processed immediately after a CGC increase. Scenario: - The node's CGC increased due to additional validators attached to it (lets say from 10 to 11) - The new CGC is advertised and new subnets are subscribed immediately, however the change won't be effective in the data availability check until the next epoch (See [this](`ab0e8870b4/beacon_node/beacon_chain/src/validator_custody.rs (L93-L99)`)). Data availability checker still only require 10 columns for the current epoch. - During this time, data columns for the additional custody column (lets say column 11) may arrive via gossip as we're already subscribed to the topic, and it may be incorrectly used to satisfy the existing data availability requirement (10 columns), and result in this additional column (instead of a required one) getting persisted, resulting in database inconsistency.	2025-08-07 00:45:04 +00:00
Jimmy Chen	2aae08a8aa	Remove KZG verification on blobs fetched from the EL (#7771 ) Continuation of #7713, addresses comment about skipping KZG verification on EL fetched blobs: https://github.com/sigp/lighthouse/pull/7713#discussion_r2198542501	2025-07-25 06:49:50 +00:00
Jimmy Chen	4daa015971	Remove peer sampling code (#7768 ) Peer sampling has been completely removed from the spec. This PR removes our partial implementation from the codebase. https://github.com/ethereum/consensus-specs/pull/4393	2025-07-23 03:24:45 +00:00
Eitan Seri-Levi	db8b6be9df	Data column custody info (#7648 ) #7647 Introduces a new record in the blobs db `DataColumnCustodyInfo` When `DataColumnCustodyInfo` exists in the db this indicates that a recent cgc change has occurred and/or that a custody backfill sync is currently in progress (custody backfill will be added as a separate PR). When a cgc change has occurred `earliest_available_slot` will be equal to the slot at which the cgc change occured. During custody backfill sync`earliest_available_slot` should be updated incrementally as it progresses. ~~Note that if `advertise_false_custody_group_count` is enabled we do not add a `DataColumnCustodyInfo` record in the db as that would affect the status v2 response.~~ (See comment https://github.com/sigp/lighthouse/pull/7648#discussion_r2212403389) ~~If `DataColumnCustodyInfo` doesn't exist in the db this indicates that we have fulfilled our custody requirements up to the DA window.~~ (It now always exist, and the slot will be set to `None` once backfill is complete) StatusV2 now uses `DataColumnCustodyInfo` to calculate the `earliest_available_slot` if a `DataColumnCustodyInfo` record exists in the db, if it's `None`, then we return the `oldest_block_slot`.	2025-07-22 13:30:30 +00:00
Jimmy Chen	b48879a566	Remove KZG verification from local block production and blobs fetched from the EL (#7713 ) #7700 As described in title, the EL already performs KZG verification on all blobs when they entered the mempool, so it's redundant to perform extra validation on blobs returned from the EL. This PR removes - KZG verification for both blobs and data columns during block production - KZG verification for data columns after fetch engine blobs call. I have not done this for blobs because it requires extra changes to check the observed cache, and doesn't feel like it's a worthy optimisation given the number of blobs per block. This PR does not remove KZG verification on the block publishing path yet.	2025-07-22 10:48:49 +00:00
ethDreamer	b43e0b446c	Final changes for `fusaka-devnet-2` (#7655 ) Closes #7467. This PR primarily addresses [the P2P changes](https://github.com/ethereum/EIPs/pull/9840) in [fusaka-devnet-2](https://fusaka-devnet-2.ethpandaops.io/). Specifically: * [the new `nfd` parameter added to the `ENR`](https://github.com/ethereum/EIPs/pull/9840) * [the modified `compute_fork_digest()` changes for every BPO fork](https://github.com/ethereum/EIPs/pull/9840) 90% of this PR was absolutely hacked together as fast as possible during the Berlinterop as fast as I could while running between Glamsterdam debates. Luckily, it seems to work. But I was unable to be as careful in avoiding bugs as I usually am. I've cleaned up the things I remember wanting to come back and have a closer look at. But still working on this. Progress: * [x] get it working on `fusaka-devnet-2` * [ ] [optional disconnect from peers with incorrect `nfd` at the fork boundary](https://github.com/ethereum/consensus-specs/pull/4407) - Can be addressed in a future PR if necessary * [x] first pass clean-up * [x] fix up all the broken tests * [x] final self-review * [x] more thorough review from people more familiar with affected code	2025-07-10 21:32:58 +00:00
Michael Sproul	c7bb3b00e4	Fix lookups of the block at `oldest_block_slot` (#7693 ) Closes: - https://github.com/sigp/lighthouse/issues/7690 Another checkpoint sync related fix! See issue for a description of the bug. We fix it by just loading the block root of the `oldest_block_slot`, rather than trying to load the slot prior, which will always fail.	2025-07-02 23:40:04 +00:00
Pawan Dhananjay	e305cb1b92	Custody persist fix (#7661 ) N/A Persist the epoch -> cgc values. This is to ensure that `ValidatorRegistrations::latest_validator_custody_requirement` always returns a `Some` value post restart assuming the `epoch_validator_custody_requirements` map has been updated in the previous runs.	2025-07-01 06:06:37 +00:00
chonghe	8e3c5d1524	Rust 1.89 compiler lint fix (#7644 ) Fix lints for Rust 1.89 beta compiler	2025-06-25 05:33:17 +00:00
Pawan Dhananjay	11bcccb353	Remove all prod eth1 related code (#7133 ) N/A After the electra fork which includes EIP 6110, the beacon node no longer needs the eth1 bridging mechanism to include new deposits as they are provided by the EL as a `deposit_request`. So after electra + a transition period where the finalized bridge deposits pre-fork are included through the old mechanism, we no longer need the elaborate machinery we had to get deposit contract data from the execution layer. Since holesky has already forked to electra and completed the transition period, this PR basically checks to see if removing all the eth1 related logic leads to any surprises.	2025-06-23 03:00:07 +00:00
Lion - dapplion	dd98534158	Hierarchical state diffs in hot DB (#6750 ) This PR implements https://github.com/sigp/lighthouse/pull/5978 (tree-states) but on the hot DB. It allows Lighthouse to massively reduce its disk footprint during non-finality and overall I/O in all cases. Closes https://github.com/sigp/lighthouse/issues/6580 Conga into https://github.com/sigp/lighthouse/pull/6744 ### TODOs - [x] Fix OOM in CI https://github.com/sigp/lighthouse/pull/7176 - [x] optimise store_hot_state to avoid storing a duplicate state if the summary already exists (should be safe from races now that pruning is cleaner) - [x] mispelled: get_ancenstor_state_root - [x] get_ancestor_state_root should use state summaries - [x] Prevent split from changing during ancestor calc - [x] Use same hierarchy for hot and cold ### TODO Good optimization for future PRs - [ ] On the migration, if the latest hot snapshot is aligned with the cold snapshot migrate the diffs instead of the full states. ``` align slot time 10485760 Nov-26-2024 12582912 Sep-14-2025 14680064 Jul-02-2026 ``` ### TODO Maybe things good to have - [ ] Rename anchor_slot https://github.com/sigp/lighthouse/compare/tree-states-hot-rebase-oom...dapplion:lighthouse:tree-states-hot-anchor-slot-rename?expand=1 - [ ] Make anchor fields not public such that they must be mutated through a method. To prevent un-wanted changes of the anchor_slot ### NOTTODO - [ ] Use fork-choice and a new method [`descendants_of_checkpoint`](`ca2388e196 (diff-046fbdb517ca16b80e4464c2c824cf001a74a0a94ac0065e635768ac391062a8)`) to filter only the state summaries that descend of finalized checkpoint]	2025-06-19 02:43:25 +00:00
Eitan Seri-Levi	6786b9d12a	Single attestation "Full" implementation (#7444 ) #6970 This allows for us to receive `SingleAttestation` over gossip and process it without converting. There is still a conversion to `Attestation` as a final step in the attestation verification process, but by then the `SingleAttestation` is fully verified. I've also fully removed the `submitPoolAttestationsV1` endpoint as its been deprecated I've also pre-emptively deprecated supporting `Attestation` in `submitPoolAttestationsV2` endpoint. See here for more info: https://github.com/ethereum/beacon-APIs/pull/531 I tried to the minimize the diff here by only making the "required" changes. There are some unnecessary complexities with the way we manage the different attestation verification wrapper types. We could probably consolidate this to one wrapper type and refactor this even further. We could leave that to a separate PR if we feel like cleaning things up in the future. Note that I've also updated the test harness to always submit `SingleAttestation` regardless of fork variant. I don't see a problem in that approach and it allows us to delete more code :)	2025-06-17 09:01:26 +00:00
Daniel Knopik	5472cb8500	Batch verify KZG proofs for getBlobsV2 (#7582 )	2025-06-12 14:35:14 +00:00
Pawan Dhananjay	5f208bb858	Implement basic validator custody framework (no backfill) (#7578 ) Resolves #6767 This PR implements a basic version of validator custody. - It introduces a new `CustodyContext` object which contains info regarding number of validators attached to a node and the custody count they contribute to the cgc. - The `CustodyContext` is added in the da_checker and has methods for returning the current cgc and the number of columns to sample at head. Note that the logic for returning the cgc existed previously in the network globals. - To estimate the number of validators attached, we use the `beacon_committee_subscriptions` endpoint. This might overestimate the number of validators actually publishing attestations from the node in the case of multi BN setups. We could also potentially use the `publish_attestations` endpoint to get a more conservative estimate at a later point. - Anytime there's a change in the `custody_group_count` due to addition/removal of validators, the custody context should send an event on a broadcast channnel. The only subscriber for the channel exists in the network service which simply subscribes to more subnets. There can be additional subscribers in sync that will start a backfill once the cgc changes. TODO - [ ] NOT REQUIRED: Currently, the logic only handles an increase in validator count and does not handle a decrease. We should ideally unsubscribe from subnets when the cgc has decreased. - [ ] NOT REQUIRED: Add a service in the `CustodyContext` that emits an event once `MIN_EPOCHS_FOR_BLOB_SIDECARS_REQUESTS ` passes after updating the current cgc. This event should be picked up by a subscriber which updates the enr and metadata. - [x] Add more tests	2025-06-11 18:10:06 +00:00
Pawan Dhananjay	076a1c3fae	Data column sidecar event (#7587 ) N/A Implement events for data column sidecar https://github.com/ethereum/beacon-APIs/pull/535	2025-06-11 16:39:22 +00:00
Jimmy Chen	e6ef644db4	Verify `getBlobsV2` response and avoid reprocessing imported data columns (#7493 ) #7461 and partly #6439. Desired behaviour after receiving `engine_getBlobs` response: 1. Gossip verify the blobs and proofs, but don't mark them as observed yet. This is because not all blobs are published immediately (due to staggered publishing). If we mark them as observed and not publish them, we could end up blocking the gossip propagation. 2. Blobs are marked as observed _either_ when: * They are received from gossip and forwarded to the network . * They are published by the node. Current behaviour: - ❗ We only gossip verify `engine_getBlobsV1` responses, but not `engine_getBlobsV2` responses (PeerDAS). - ❗ After importing EL blobs AND before they're published, if the same blobs arrive via gossip, they will get re-processed, which may result in a re-import. 1. Perform gossip verification on data columns computed from EL `getBlobsV2` response. We currently only do this for `getBlobsV1` to prevent importing blobs with invalid proofs into the `DataAvailabilityChecker`, this should be done on V2 responses too. 2. Add additional gossip verification to make sure we don't re-process a ~~blob~~ or data column that was imported via the EL `getBlobs` but not yet "seen" on the gossip network. If an "unobserved" gossip blob is found in the availability cache, then we know it has passed verification so we can immediately propagate the `ACCEPT` result and forward it to the network, but without re-processing it. UPDATE: I've left blobs out for the second change mentioned above, as the likelihood and impact is very slow and we haven't seen it enough, but under PeerDAS this issue is a regular occurrence and we do see the same block getting imported many times.	2025-05-26 19:55:58 +00:00
ethDreamer	7684d1f866	ContextDeserialize and Beacon API Improvements (#7372 ) * #7286 * BeaconAPI is not returning a versioned response when it should for some V1 endpoints * these [strange functions with vX in the name that still accept `endpoint_version` arguments](https://github.com/sigp/lighthouse/blob/stable/beacon_node/http_api/src/produce_block.rs#L192) This refactor is a prerequisite to get the fulu EF tests running.	2025-05-19 05:05:16 +00:00
Eitan Seri-Levi	268809a530	Rust clippy 1.87 lint fixes (#7471 ) Fix clippy lints for `rustc` 1.87 clippy complains about `BeaconChainError` being too large. I went on a bit of a boxing spree because of this. We may instead want to `Box` some of the `BeaconChainError` variants?	2025-05-16 05:03:00 +00:00
SunnysidedJ	593390162f	`peerdas-devnet-7`: update `DataColumnSidecarsByRoot` request to use `DataColumnsByRootIdentifier` (#7399 ) Update DataColumnSidecarsByRoot request to use DataColumnsByRootIdentifier #7377 As described in https://github.com/ethereum/consensus-specs/pull/4284	2025-05-12 00:20:55 +00:00
Mac L	39eb8145f8	Merge branch 'release-v7.0.0' into unstable	2025-04-11 21:32:24 +10:00
Eitan Seri-Levi	aed562abef	Downgrade light client errors (#7300 ) Downgrade light client errors to debug Error messages are alarming and usually indicate somethings wrong with the beacon node. The Light Client service is supposed to minimally impact users, and most will not care if the light client server is erroring. Furthermore, the only errors we've seen in the wild are during hard forks, for the first few epochs before the fork finalizes.	2025-04-10 02:17:07 +00:00
SunnysidedJ	d96b73152e	Fix for #6296 : Deterministic RNG in peer DAS publish block tests (#7192 ) #6296: Deterministic RNG in peer DAS publish block tests Made test functions to call publish-block APIs with true for the deterministic RNG boolean parameter while production code with false. This will deterministically shuffle columns for unit tests under broadcast_validation_tests.rs.	2025-04-09 15:35:15 +00:00
Jimmy Chen	759b0612b3	Offloading KZG Proof Computation from the beacon node (#7117 ) Addresses #7108 - Add EL integration for `getPayloadV5` and `getBlobsV2` - Offload proof computation and use proofs from EL RPC APIs	2025-04-08 07:37:16 +00:00
Lion - dapplion	70850fe58d	Drop head tracker for summaries DAG (#6744 ) The head tracker is a persisted piece of state that must be kept in sync with the fork-choice. It has been a source of pruning issues in the past, so we want to remove it - see https://github.com/sigp/lighthouse/issues/1785 When implementing tree-states in the hot DB we have to change the pruning routine (more details below) so we want to do those changes first in isolation. - see https://github.com/sigp/lighthouse/issues/6580 - If you want to see the full feature of tree-states hot https://github.com/dapplion/lighthouse/pull/39 Closes https://github.com/sigp/lighthouse/issues/1785 Current DB migration routine - Locate abandoned heads with head tracker - Use a roots iterator to collect the ancestors of those heads can be pruned - Delete those abandoned blocks / states - Migrate the newly finalized chain to the freezer In summary, it computes what it has to delete and keeps the rest. Then it migrates data to the freezer. If the abandoned forks routine has a bug it can break the freezer migration. Proposed migration routine (this PR) - Migrate the newly finalized chain to the freezer - Load all state summaries from disk - From those, just knowing the head and finalized block compute two sets: (1) descendants of finalized (2) newly finalized chain - Iterate all summaries, if a summary does not belong to set (1) or (2), delete This strategy is more sound as it just checks what's there in the hot DB, computes what it has to keep and deletes the rest. Because it does not rely and 3rd pieces of data we can drop the head tracker and pruning checkpoint. Since the DB migration happens first now, as long as the computation of the sets to keep is correct we won't have pruning issues.	2025-04-07 04:23:52 +00:00
Lion - dapplion	d511ca0494	Compute roots for unfinalized by_range requests with fork-choice (#7098 ) Includes PRs - https://github.com/sigp/lighthouse/pull/7058 - https://github.com/sigp/lighthouse/pull/7066 Cleaner for the `release-v7.0.0` branch	2025-04-07 03:16:41 +00:00

1 2 3 4 5 ...

792 Commits