lighthouse

mirror of https://github.com/sigp/lighthouse.git synced 2026-05-31 13:17:09 +00:00

Author	SHA1	Message	Date
Eitan Seri-Levi	7e502a5e65	Fix	2026-05-01 18:23:33 +02:00
Eitan Seri-Levi	5ce7c59f5e	Rename validate_full_data_columns_with_commitments	2026-05-01 11:10:15 +02:00
dapplion	0ce058835a	Address review comments - data_availability_checker.rs: use !gloas_enabled() instead of < ForkName::Gloas (jimmygchen, dapplion). - beacon_chain.rs: get_data_columns checks data_availability_checker first, then pending_payload_cache (dapplion). - pending_components.rs: merge_data_columns drops the unused Result return (jimmygchen). num_completed_columns uses filter() instead of filter_map (jimmygchen). - pending_column.rs: TODO marker on the hard-coded Gloas variant in try_to_sidecar (jimmygchen). - pending_payload_cache/mod.rs: gloas_spec test helper collapsed to ForkName::Gloas.make_genesis_spec(E::default_spec()) (jimmygchen). - gossip_methods.rs / sync/manager.rs: replace UnknownBlockHashFromAttestation fallback with TODO(gloas) for proper Gloas lookup sync (dapplion).	2026-05-01 10:16:06 +02:00
dapplion	dac8a6ec8d	Gloas: fix test failures (KZG verifier wiring, harness columns, WSS sync) Brings the FORK_NAME=gloas beacon_chain test suite from 31 failures to green: - v1 KZG batch verifier couldn't verify Gloas columns. Added verify_columns_against_block helper that picks commitments per fork (Fulu: inline on column; Gloas: signed_execution_payload_bid). - BeaconChainHarness::process_envelope didn't persist columns. Now mirrors what production does in import_available_execution_payload_envelope. - get_or_reconstruct_blobs returned an error for Gloas. Now short-circuits to Ok(None); WSS test copies columns from source to dest directly. - update_data_column_signed_header (block_verification tests) only handled Fulu shape. Added a Gloas branch that re-keys to canonical_root. - BlockError::EnvelopeBlockRootUnknown changed to tuple variant. - Removed duplicate process_payload_envelope_availability.	2026-05-01 10:06:52 +02:00
Daniel Knopik	ae17107f78	fix test runs	2026-04-30 09:45:59 +02:00
Daniel Knopik	7cf76ac7af	clean up	2026-04-29 17:38:49 +02:00
Daniel Knopik	d7f5e24ede	nuke router	2026-04-29 14:24:25 +02:00
Daniel Knopik	132f94c91c	clean up claude progress	2026-04-28 17:39:43 +02:00
Daniel Knopik	4535753c9b	starting to cell-ize	2026-04-28 17:09:42 +02:00
Daniel Knopik	3a5492fba7	initial straightforward merge changes	2026-04-28 17:09:41 +02:00
Eitan Seri-Levi	82dde267b5	range sync	2026-04-28 15:39:40 +02:00
Daniel Knopik	8a384ff445	Cell Dissemination (Partial messages) (#8314 ) - https://github.com/ethereum/consensus-specs/pull/4558 - https://eips.ethereum.org/EIPS/eip-8136 Co-Authored-By: Daniel Knopik <daniel@dknopik.de> Co-Authored-By: Pawan Dhananjay <pawandhananjay@gmail.com> Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>	2026-04-23 18:52:28 +00:00
Eitan Seri- Levi	2f8e140a9e	Resolve merge conflicts	2026-03-16 02:24:50 -07:00
ethDreamer	6ca610d918	Breakup RPCBlock into LookupBlock & RangeSyncBlock (#8860 ) Co-Authored-By: Mark Mackey <mark@sigmaprime.io>	2026-03-13 19:22:29 +00:00
Eitan Seri- Levi	23a7dc561f	Fix	2026-02-10 21:13:40 -08:00
Eitan Seri- Levi	abf0c33e12	Refactor	2026-02-10 21:08:31 -08:00
Eitan Seri- Levi	0a111f51af	Resolve merge conflicts	2026-02-03 21:27:18 -08:00
Pawan Dhananjay	e50bab098e	Remove state lru cache (#8724 ) N/A In https://github.com/sigp/lighthouse/pull/4801 , we added a state lru cache to avoid having too many states in memory which was a concern with 200mb+ states pre tree-states. With https://github.com/sigp/lighthouse/pull/5891 , we made the overflow cache a simpler in memory lru cache that can only hold 32 pending states at the most and doesn't flush anything to disk. As noted in #5891, we can always fetch older blocks which never became available over rpc if they become available later. Since we merged tree states, I don't think the state lru cache is relevant anymore. Instead of having the `DietAvailabilityPendingExecutedBlock` that stores only the state root, we can just store the full state in the `AvailabilityPendingExecutedBlock`. Given entries in the cache can span max 1 epoch (cache size is 32), the underlying `BeaconState` objects in the cache share most of their memory. The state_lru_cache is one level of indirection that doesn't give us any benefit. Please check me on this cc @dapplion Co-Authored-By: Pawan Dhananjay <pawandhananjay@gmail.com>	2026-02-04 04:55:53 +00:00
Jimmy Chen	1dd0f7bcbb	Remove `kzg_commitments` from `DataColumnSidecarGloas` (#8739 ) Co-Authored-By: Eitan Seri- Levi <eserilev@gmail.com> Co-Authored-By: Michael Sproul <michael@sigmaprime.io> Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com> Co-Authored-By: Michael Sproul <michaelsproul@users.noreply.github.com>	2026-02-04 03:37:05 +00:00
Eitan Seri-Levi	3ecf964385	Replace `INTERVALS_PER_SLOT` with explicit slot component times (#7944 ) https://github.com/ethereum/consensus-specs/pull/4476 Co-Authored-By: Barnabas Busa <barnabas.busa@ethereum.org> Co-Authored-By: Eitan Seri- Levi <eserilev@gmail.com> Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu> Co-Authored-By: Michael Sproul <michaelsproul@users.noreply.github.com> Co-Authored-By: Michael Sproul <michael@sigmaprime.io>	2026-02-02 05:58:42 +00:00
Eitan Seri- Levi	78c61a0621	DA cache updated	2026-01-30 10:59:43 -08:00
Eitan Seri- Levi	6ea966846c	Some test fixes	2026-01-29 11:02:55 -08:00
Eitan Seri- Levi	e9f9ad6c45	Small rename	2026-01-28 16:46:25 -08:00
Eitan Seri- Levi	aba7d45fd5	Resolve merge conflicts	2026-01-27 22:49:08 -08:00
Eitan Seri- Levi	4a9aaa2b46	Resolve merge conflicts	2026-01-27 22:46:46 -08:00
Eitan Seri-Levi	f7b5c7ee3f	Convert RpcBlock to an enum that indicates availability (#8424 ) Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu> Co-Authored-By: Mark Mackey <mark@sigmaprime.io> Co-Authored-By: Eitan Seri-Levi <eserilev@gmail.com> Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>	2026-01-28 05:59:32 +00:00
Eitan Seri-Levi	9bec8df37a	Add Gloas data column support (#8682 ) Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu> Co-Authored-By: Eitan Seri- Levi <eserilev@gmail.com>	2026-01-28 04:52:12 +00:00
Eitan Seri-Levi	d9c21f5e33	Add da router, and initial logic	2026-01-27 19:32:30 -08:00
Mac L	58b153cac5	Remove remaining facade module re-exports from `consensus/types` (#8672 ) Removes the remaining facade re-exports from `consensus/types`. I have left `graffiti` as I think it has some utility so am leaning towards keeping it in the final API design. Co-Authored-By: Mac L <mjladson@pm.me>	2026-01-16 19:51:29 +00:00
Mac L	3903e1c67f	More `consensus/types` re-export cleanup (#8665 ) Remove more of the temporary re-exports from `consensus/types` Co-Authored-By: Mac L <mjladson@pm.me>	2026-01-16 04:43:05 +00:00
Jimmy Chen	af1d9b9991	Fix custody context initialization race condition that caused panic (#8391 ) Take 2 of #8390. Fixes the race condition properly instead of propagating the error. I think this is a better alternative, and doesn't seem to look that bad. * Lift node id loading or generation from `NetworkService ` startup to the `ClientBuilder`, so that it can be used to compute custody columns for the beacon chain without waiting for Network bootstrap. I've considered and implemented a few alternatives: 1. passing `node_id` to beacon chain builder and compute columns when creating `CustodyContext`. This approach isn't good for separation of concerns and isn't great for testability 2. passing `ordered_custody_groups` to beacon chain. `CustodyContext` only uses this to compute ordered custody columns, so we might as well lift this logic out, so we don't have to do error handling in `CustodyContext` construction. Less tests to update;. Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>	2025-11-17 05:23:12 +00:00
kevaundray	613ce3c011	chore!: remove pub visibility on `OVERFLOW_LRU_CAPACITY` and `STATE_LRU_CAPACITY_NON_ZERO` (#8234 ) - Renames `OVERFLOW_LRU_CAPACITY` to `OVERFLOW_LRU_CAPACITY_NON_ZERO` to follow naming convention of `STATE_LRU_CAPACITY_NON_ZERO` - Makes `OVERFLOW_LRU_CAPACITY_NON_ZERO` and `STATE_LRU_CAPACITY_NON_ZERO` private since they are only used in this module - Moves `STATE_LRU_CAPACITY` into test module since it is only used for tests Co-Authored-By: Kevaundray Wedderburn <kevtheappdev@gmail.com>	2025-10-27 11:23:45 +00:00
Pawan Dhananjay	c668cb7d9a	Only publish reconstructed columns that we need to sample (#8269 ) N/A We were publishing columns all columns that we didn't already have in the da cache when reconstructing. This is unnecessary outbound bandwidth for the node that is supposed to sample fewer columns. This PR changes the behaviour to publish only columns that we are supposed to sample in the topics that we are subscribed to. Co-Authored-By: Pawan Dhananjay <pawandhananjay@gmail.com>	2025-10-23 05:05:08 +00:00
Jimmy Chen	43c5e924d7	Add `--semi-supernode` support (#8254 ) Addresses #8218 A simplified version of #8241 for the initial release. I've tried to minimise the logic change in this PR, although introducing the `NodeCustodyType` enum still result in quite a bit a of diff, but the actual logic change in `CustodyContext` is quite small. The main changes are in the `CustdoyContext` struct * ~~combining `validator_custody_count` and `current_is_supernode` fields into a single `custody_group_count_at_head` field. We persist the cgc of the initial cli values into the `custody_group_count_at_head` field and only allow for increase (same behaviour as before).~~ * I noticed the above approach caused a backward compatibility issue, I've [made a fix](`15569bc085`) and changed the approach slightly (which was actually what I had originally in mind): * when initialising, only override the `validator_custody_count` value if either flag `--supernode` or `--semi-supernode` is used; otherwise leave it as the existing default `0`. Most other logic remains unchanged. All existing validator custody unit tests are still all passing, and I've added additional tests to cover semi-supernode, and restoring `CustodyContext` from disk. Note: I've added a `WARN` if the user attempts to switch to a `--semi-supernode` or `--supernode` - this currently has no effect, but once @eserilev column backfill is merged, we should be able to support this quite easily. Things to test - [x] cgc in metadata / enr - [x] cgc in metrics - [x] subscribed subnets - [x] getBlobs endpoint Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>	2025-10-22 05:23:17 +00:00
Pawan Dhananjay	73e75e3e69	Ignore extra columns in da cache (#8201 ) N/A Found this issue in sepolia. Note: the custody requirement for this node is 100. ``` Oct 14 11:25:40.053 DEBUG Reconstructed columns count: 28, block_root: 0x4d7946dec0ab59f2afd46610d7c54af555cb4c2851d9eea7d83dd17cf6e96aae, slot: 8725628 Oct 14 11:25:45.568 WARN Internal availability check failure block_root: 0x4d7946dec0ab59f2afd46610d7c54af555cb4c2851d9eea7d83dd17cf6e96aae, error: Unexpected("too many columns got 128 expected 100") ``` So if any of the block components arrives late, then we reconstruct all 128 columns and try to add it to da cache and have more columns than needed for availability in the cache. There are 2 ways I can think of fixing this: 1. pass only the required columns to the da cache after reconstruction here `60df5f4ab6/beacon_node/beacon_chain/src/data_availability_checker.rs (L647-L648)` 2. Ensure that we add only columns that we need to sample in the da cache. I think this is safer since we can add columns to the cache from multiple code paths and this fixes it at the source. ~~This PR implements (2).~~ Thought more about it, I think (1) is cleaner since we filter gossip and rpc columns also before calling `put_kzg_verified_data_columns`/ Co-Authored-By: Pawan Dhananjay <pawandhananjay@gmail.com>	2025-10-16 09:25:44 +00:00
Lion - dapplion	ffa7b2b2b9	Only mark block lookups as pending if block is importing from gossip (#8112 ) - PR https://github.com/sigp/lighthouse/pull/8045 introduced a regression of how lookup sync interacts with the da_checker. Now in unstable block import from the HTTP API also insert the block in the da_checker while the block is being execution verified. If lookup sync finds the block in the da_checker in `NotValidated` state it expects a `GossipBlockProcessResult` message sometime later. That message is only sent after block import in gossip. I confirmed in our node's logs for 4/4 cases of stuck lookups are caused by this sequence of events: - Receive block through API, insert into da_checker in fn process_block in put_pre_execution_block - Create lookup and leave in AwaitingDownload(block in processing cache) state - Block from HTTP API finishes importing - Lookup is left stuck Closes https://github.com/sigp/lighthouse/issues/8104 - https://github.com/sigp/lighthouse/pull/8110 was my initial solution attempt but we can't send the `GossipBlockProcessResult` event from the `http_api` crate without adding new channels, which seems messy. For a given node it's rare that a lookup is created at the same time that a block is being published. This PR solves https://github.com/sigp/lighthouse/issues/8104 by allowing lookup sync to import the block twice in that case. Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>	2025-09-25 03:52:27 +00:00
Jimmy Chen	78d330e4b7	Consolidate `reqresp_pre_import_cache` into `data_availability_checker` (#8045 ) This PR consolidates the `reqresp_pre_import_cache` into the `data_availability_checker` for the following reasons: - the `reqresp_pre_import_cache` suffers from the same TOCTOU bug we had with `data_availability_checker` earlier, and leads to unbounded memory leak, which we have observed over the last 6 months on some nodes. - the `reqresp_pre_import_cache` is no longer necessary, because we now hold blocks in the `data_availability_checker` for longer since (#7961), and recent blocks can be served from the DA checker. This PR also maintains the following functionalities - Serving pre-executed blocks over RPC, and they're now served from the `data_availability_checker` instead. - Using the cache for de-duplicating lookup requests. Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com> Co-Authored-By: Jimmy Chen <jimmy@sigmaprime.io>	2025-09-19 07:01:13 +00:00
Michael Sproul	3543a20192	Add experimental complete-blob-backfill flag (#7751 ) A different (and complementary) approach for: - https://github.com/sigp/lighthouse/issues/5391 This PR adds a flag to set the DA boundary to the Deneb fork. The effect of this change is that Lighthouse will try to backfill _all_ blobs. Most peers do not have this data, but I'm thinking that combined with `trusted-peers` this could be quite effective. Co-Authored-By: Michael Sproul <michael@sigmaprime.io>	2025-09-18 05:17:03 +00:00
Jimmy Chen	3de646c8b3	Enable reconstruction for nodes custodying more than 50% of columns and instrument tracing (#8052 ) Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com> Co-Authored-By: Jimmy Chen <jimmy@sigmaprime.io>	2025-09-16 08:17:43 +00:00
Jimmy Chen	eef02afc93	Fix data availability checker race condition causing partial data columns to be served over RPC (#7961 ) Partially resolves #6439, an simpler alternative to #7931. Race condition occurs when RPC data columns arrives after a block has been imported and removed from the DA checker: 1. Block becomes available via gossip 2. RPC columns arrive and pass fork choice check (block hasn't been imported) 3. Block import completes (removing block from DA checker) 4. RPC data columns finish verification and get imported into DA checker This causes two issues: 1. Partial data serving: Already imported components get re-inserted, potentially causing LH to serve incomplete data 2. State cache misses: Leads to state reconstruction, holding the availability cache write lock longer and increasing race likelihood ### Proposed Changes 1. Never manually remove pending components from DA checker. Components are only removed via LRU eviction as finality advances. This makes sure we don't run into the issue described above. 2. Use `get` instead of `pop` when recovering the executed block, this prevents cache misses in race condition. This should reduce the likelihood of the race condition 3. Refactor DA checker to drop write lock as soon as components are added. This should also reduce the likelihood of the race condition Trade-offs: This solution eliminates a few nasty race conditions while allowing simplicity, with the cost of allowing block re-import (already existing). The increase in memory in DA checker can be partially offset by a reduction in block cache size if this really comes an issue (as we now serve recent blocks from DA checker).	2025-09-02 07:18:23 +00:00
Jimmy Chen	a134d43446	Use `rayon` to speed up batch KZG verification (#7921 ) Addresses #7866. Use Rayon to speed up batch KZG verification during range / backfill sync. While I was analysing the traces, I also discovered a bug that resulted in only the first 128 columns in a chain segment batch being verified. This PR fixes it, so we might actually observe slower range sync due to more cells being KZG verified. I've also updated the handling of batch KZG failure to only find the first invalid KZG column when verification fails as this gets very expensive during range/backfill sync.	2025-08-29 00:59:40 +00:00
Jimmy Chen	daf1c7c3af	Fix RPC blocks not getting fully KZG verified (#7927 ) Fix RPC blocks not getting fully KZG verified due to incorrect list truncation.	2025-08-25 16:46:16 +00:00
Jimmy Chen	b4704eab4a	Fulu update to spec v1.6.0-alpha.4 (#7890 ) Fulu update to spec [v1.6.0-alpha.4](https://github.com/ethereum/consensus-specs/releases/tag/v1.6.0-alpha.4). - Make `number_of_columns` a preset - Optimise `get_custody_groups` to avoid computing if cgc = 128 - Add support for additional typenum values in type_dispatch macro	2025-08-20 02:05:04 +00:00
chonghe	522bd9e9c6	Update Rust Edition to 2024 (#7766 ) * #7749 Thanks @dknopik and @michaelsproul for your help!	2025-08-13 03:04:31 +00:00
Jimmy Chen	40c2fd5ff4	Instrument tracing spans for block processing and import (#7816 ) #7815 - removes all existing spans, so some span fields that appear in logs like `service_name` may be lost. - instruments a few key code paths in the beacon node, starting from root spans named below: * Gossip block and blobs * `process_gossip_data_column_sidecar` * `process_gossip_blob` * `process_gossip_block` * Rpc block and blobs * `process_rpc_block` * `process_rpc_blobs` * `process_rpc_custody_columns` * Rpc blocks (range and backfill) * `process_chain_segment` * `PendingComponents` lifecycle * `pending_components` To test locally: * Run Grafana and Tempo with https://github.com/sigp/lighthouse-metrics/pull/57 * Run Lighthouse BN with `--telemetry-collector-url http://localhost:4317` Some captured traces can be found here: https://hackmd.io/@jimmygchen/r1sLOxPPeg Removing the old spans seem to have reduced the memory usage quite a lot - i think we were using them on long running tasks and too excessively: <img width="910" height="495" alt="image" src="https://github.com/user-attachments/assets/5208bbe4-53b2-4ead-bc71-0b782c788669" />	2025-08-08 05:32:22 +00:00
Jimmy Chen	3a02bdd94a	Adjust DA checker cache size (#7825 ) The current `OVERFLOW_LRU_CAPACITY` of `1024` seems a bit excessive now we rarely store more than 1 `PendingComponents` (under normal networking components). Additionally given the blob count increases, the max size of `PendingComponents` has also increased and is expected to increase further. This PR brings the max capacity of the cache down to `64`, which should be more than enough headroom but also give us better protection from the network.	2025-08-07 05:11:38 +00:00
Jimmy Chen	8bc6693dac	Fix wrong columns getting processed on a CGC change (#7792 ) This PR fixes a bug where wrong columns could get processed immediately after a CGC increase. Scenario: - The node's CGC increased due to additional validators attached to it (lets say from 10 to 11) - The new CGC is advertised and new subnets are subscribed immediately, however the change won't be effective in the data availability check until the next epoch (See [this](`ab0e8870b4/beacon_node/beacon_chain/src/validator_custody.rs (L93-L99)`)). Data availability checker still only require 10 columns for the current epoch. - During this time, data columns for the additional custody column (lets say column 11) may arrive via gossip as we're already subscribed to the topic, and it may be incorrectly used to satisfy the existing data availability requirement (10 columns), and result in this additional column (instead of a required one) getting persisted, resulting in database inconsistency.	2025-08-07 00:45:04 +00:00
Jimmy Chen	2aae08a8aa	Remove KZG verification on blobs fetched from the EL (#7771 ) Continuation of #7713, addresses comment about skipping KZG verification on EL fetched blobs: https://github.com/sigp/lighthouse/pull/7713#discussion_r2198542501	2025-07-25 06:49:50 +00:00
Jimmy Chen	fcc602a787	Update fulu network configs and add `MIN_EPOCHS_FOR_DATA_COLUMN_SIDECARS_REQUESTS` (#7646 ) - #6240 - Bring built-in network configs up to date with latest consensus-spec PeerDAS configs. - Add `MIN_EPOCHS_FOR_DATA_COLUMN_SIDECARS_REQUESTS` and use it to determine data availability window after the Fulu fork.	2025-07-02 02:38:25 +00:00
Daniel Knopik	5472cb8500	Batch verify KZG proofs for getBlobsV2 (#7582 )	2025-06-12 14:35:14 +00:00

1 2 3

126 Commits