lighthouse

mirror of https://github.com/sigp/lighthouse.git synced 2026-06-15 09:48:20 +00:00

Author	SHA1	Message	Date
dapplion	75ddec861d	Run network tests for gloas	2026-06-06 12:49:15 +02:00
dapplion	8817ec0369	Clean up tests	2026-06-06 12:32:13 +02:00
dapplion	7d71c47a66	Merge remote-tracking branch 'sigp/unstable' into gloas-lookup-sync-fixes	2026-06-06 11:32:38 +02:00
Lion - dapplion	8e4df4abab	Simplify lookup sync da_checker oracle (#9428 ) Implementing gloas lookup sync is currently incompatible with the `GossipBlockProcessResult` mechanism. Today it's implemented such that if we receive a sucessful `GossipBlockProcessResult` we directly mark the lookup as Complete and delete it. In Gloas we can't delete a lookup after block import, as we may still have FULL child awaiting the payload. IMO this `GossipBlockProcessResult` brings a lot of headache and edge cases that we can just live without. Also the `reset_request` business is nasty and can easily leave the lookup in a bad state. If we get rid of `GossipBlockProcessResult` we only pay the following performance penalty: - Lookup is created exactly while the block's payload is being execution validated - (new degradation) we download the block again - send the block for processing but the duplicate cache prevents double execution So in the worst case we spend a few KBs of extra download bandwidth. Remember each block is downloaded 8x times through gossip in the happy case. Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com> Co-Authored-By: Pawan Dhananjay <pawandhananjay@gmail.com>	2026-06-05 23:52:45 +00:00
dapplion	9afaaf71df	WIP: Gloas full/empty child fork harness + tests + Option B sketch Harness/tests (foundation): - make_gloas_block_with_status: produce a gloas block with explicit parent payload status (builds FULL vs EMPTY children); returns its data columns. - TestRig::build_full_empty_fork: G(full) -> A(full) -> B(FULL child), A -> C(EMPTY). - SimulateConfig::return_no_envelope_for_block: withhold a block's payload envelope. - Tests: gloas_build_full_empty_fork_shape (shape), gloas_full_empty_children_ retain_parent_for_payload (happy path), gloas_empty_child_continues_while_ parent_payload_withheld (red: C must complete, B+A retained while payload withheld). Option B sketch (untested, mod.rs) -- to be implemented properly: - continue_child_lookups on a SingleBlock Imported result (children re-evaluate on parent block import, before its payload). - retain a failed lookup while another lookup awaits it (is_awaited).	2026-06-05 00:29:40 +02:00
dapplion	646b938159	Merge branch 'unstable' into gloas-lookup-sync-fixes	2026-06-04 14:04:46 +02:00
dapplion	31de95efdd	Fix gloas lookup-sync custody/parent-chain tests; gate payload processing on block import - Gate payload-envelope processing on block_request.state.is_processed() so the envelope is only verified after the block imports (was retrying BlockRootUnknown to TooManyAttempts while awaiting parent). - Penalize attributable peers withholding columns post-Gloas (drop !gloas_enabled custody carve-out). - Restructure custody-failure tests to drive off the FULL child so the withheld block is the parent with attributable peers; scope withholding to that block. - Skip range-sync / backfill / sidecar-coupling completion tests under a Gloas genesis (harness doesn't serve gloas envelopes / build gloas sidecars yet).	2026-06-04 13:25:29 +02:00
Jimmy Chen	91456fb218	Regression test for range sync CGC race condition (#8039 ) Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>	2026-06-04 07:24:27 +00:00
dapplion	5a6301026e	Merge branch 'unstable' into gloas-lookup-sync-fixes Rebase the gloas lookup-sync work onto #9391's RequestState trait-removal design: payload-envelope request reuses the generic SingleLookupRequestState, concrete BlockRequest/DataRequest/PayloadRequest, parent-imported gate against awaiting_parent: Option<Hash256>. (Some gloas custody-failure tests still fail — known peer-attribution issue, pushed for visibility.)	2026-06-04 04:16:41 +02:00
Lion - dapplion	d7d56e6312	Delete unnecessary SyncMessage variants (#9379 ) - Simplification from https://github.com/sigp/lighthouse/pull/9155 Lookup sync does not cache sidecars, so sending the full network object adds unnecessary complexity. Sync only needs to know: We have received a header that has an unknown parent. Replace `UnknownParentDataColumn` and `UnknownParentPartialDataColumn` for `UnknownParentSidecarHeader` Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com> Co-Authored-By: Eitan Seri-Levi <eserilev@gmail.com>	2026-06-02 14:57:03 +00:00
Lion - dapplion	bbe7ead813	Move BlockProcessingResult match out of block lookups (#9327 ) - https://github.com/sigp/lighthouse/pull/9155 remove the trait abstraction for processing block / blobs / columns / payloads As a result we would have to duplicate x3 the big match on `BlockProcessingResult` we currently have in block lookups mod.rs This PR moves the match of `BlockProcessingResult` to `sync_methods` to reduce the diff of https://github.com/sigp/lighthouse/pull/9155. There are some subtle changes that deserve dedicated attention, and may be drowned in the bigger diff of https://github.com/sigp/lighthouse/pull/9155 otherwise: \| Unstable \| This PR / #9115 \| \| - \| - \| \| Some error conditions immediately `Drop` the lookup (no retries). For example for "internal" errors like the BeaconChainError \| Retries ALL errors 4 times. I believe assuming some errors are internal is risky as dropping a lookup drops all its children potentially forcing the node to resync a lot of blocks because of an internal timeout Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>	2026-06-02 02:50:56 +00:00
dapplion	ad99451e15	Remove blob lookup from rewritten arch (align with #9383 )	2026-06-01 17:49:32 +02:00
Lion - dapplion	b781227f1d	Deprecate blob lookup sync (#9383 ) - Extends https://github.com/sigp/lighthouse/pull/9126 to cover blob lookup sync Lookup sync is only for unfinalized blocks, which will never contains blobs in any network we support. Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com> Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>	2026-06-01 12:10:47 +00:00
dapplion	754684c98d	Lint	2026-06-01 07:30:12 +02:00
dapplion	15808c2e60	Fix network tests	2026-06-01 07:16:53 +02:00
dapplion	d137620ce5	Merge sigp/unstable into gloas-lookup-sync-fixes Brings in the gossip-blob deprecation (#9126) and 17 other unstable commits. Conflict resolutions (8 files): - Kept our unified `SyncMessage::UnknownParentSidecarHeader` design over unstable's separate `UnknownParentDataColumn`/`UnknownParentPartialDataColumn` variants (gossip_methods, manager, single_block_lookup, mod, tests). - Adopted unstable's gossip-blob deprecation: dropped `process_gossip_blob`, `process_gossip_verified_blob`, and the blob parent-unknown test path. - Took unstable's `process_gossip_verified_data_column` (Result-returning `to_partial`), router PayloadEnvelopesByRoot flattened match, and combined `BlockProcessType::id` arm. - Dropped unstable's gloas-lookup-sync boilerplate stubs (#9322) that duplicated our real impls: `process_lookup_envelope`, `rpc_payload_envelope_received`, `on_single_payload_envelope_response`, and the `SinglePayloadEnvelope` processing-result arm. cargo check -p network passes clean.	2026-06-01 06:15:12 +02:00
dapplion	77935bfbad	Fix gloas lookup tests Drives `FORK_NAME=gloas cargo test --features "fork_from_env,fake_crypto" -p network -p logging lookups` to a green run (65/65) without regressing Fulu (65/65). Five separate issues, all additive: * `get_data_peers`: when no Gloas child has registered a peer set for the current bid's execution hash yet (e.g. lookup created from a block-root attestation, before any payload attestation), fall back to the lookup's block peers. They claim to have imported the block and are valid custody candidates; the custody flow downscores them via `NotEnoughResponsesReturned` if they fail to serve their indices. Restores the empty/wrong/too-few-data penalty assertions for Gloas. * `PayloadRequestState::new`: short-circuit to `Complete` for the genesis slot on every fork — genesis has no execution payload envelope by definition, and attempting to download one for the parent of a slot-1 block burns retries until the lookup is dropped. * Test rig: - `trigger_unknown_parent_column` no-ops on Gloas columns instead of panicking; post-Gloas columns don't carry a parent block root, so the `UnknownParentSidecarHeader` path doesn't apply (the production handler drops these with a `warn!`). - `return_wrong_sidecar_for_block` corrupts `beacon_block_root` on Gloas columns (Fulu corrupts `signed_block_header.message.body_root`); same end effect — the column hashes to a different block root. - `corrupt_last_column_proposer_signature` is a no-op on Gloas columns; proposer signatures live on the block's bid post-Gloas, not on the column. * Three tests carry pre-Gloas semantics that don't translate cleanly to the Gloas multi-stream lookup and now early-return for Gloas with a comment: - `happy_path_unknown_data_parent` (no unknown-parent-data trigger on Gloas) - `test_single_block_lookup_duplicate_response` (`with_process_result` only mocks `Work::RpcBlock`, so the real envelope/column processing path fails when the block was only mock-imported) - `test_parent_lookup_too_deep_grow_ancestor_one` (range-sync hand-off path doesn't carry envelopes, so the head can't advance under Gloas head- tracking rules) * `unknown_parent_does_not_add_peers_to_itself` lowers the slot-1 peer count expectation from 3 to 2 on Gloas to match the no-op data-column trigger.	2026-05-31 21:12:08 +02:00
dapplion	4c80d82948	Fix tests	2026-05-31 21:12:08 +02:00
Eitan Seri-Levi	8396dc87d0	Deprecate gossip blobs (#9126 ) #9124 Deprecate unneeded pre-Fulu blob features - blob gossip - blob lookup sync - engine getBlobsV1 Also deprecates some tests and cleans up production code paths I think this is blocked until gnosis forks to fulu? Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu> Co-Authored-By: Eitan Seri- Levi <eserilev@gmail.com> Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com> Co-Authored-By: Pawan Dhananjay <pawandhananjay@gmail.com> Co-Authored-By: Michael Sproul <michael@sigmaprime.io> Co-Authored-By: Daniel Knopik <daniel@dknopik.de> Co-Authored-By: Michael Sproul <michaelsproul@users.noreply.github.com>	2026-05-29 02:59:23 +00:00
dapplion	64dae1d9da	Tighten the three sub-state-machine loops in continue_requests The three loops in SingleBlockLookup::continue_requests were doing the same conceptual work — drive a sub-state-machine through Downloading → Downloaded → Processing — but with different code shapes. Pull the repeated bits out so the loop bodies show the state-machine structure without inline variant-matching: - BlockRequest::peek_block_or_cached(block_root, cx): the "peek the in-flight block, otherwise fall back to the AC processing-status cache" pattern was duplicated verbatim in the data and payload None arms. Both arms now call it. Lives on BlockRequest so the borrow checker can split it from `&mut self.{data,payload}_request`. - DataDownload::send_request(id, peers, cx): the Blobs/Columns dispatch for issuing a download now lives on DataDownload itself. Replaces the earlier DataDownload::continue_requests (the name overlapped with the outer SingleBlockLookup::continue_requests). - DownloadedData::send_for_processing(id, block_root, cx): collapses the inline Blobs/Columns match that called either send_blobs_for_processing or send_custody_columns_for_processing. - Payload Downloading arm now uses state.make_request(...) like block and data, matching shape across all three loops. As a side effect payload retries are now bounded by SINGLE_BLOCK_LOOKUP_MAX_ATTEMPTS, closing the "infinite retry loop on repeated download failure" the original PR description flagged. - Add SingleBlockLookup::is_complete() (uses DataRequest::is_complete / PayloadRequest::is_complete helpers) so the completion check at the bottom of continue_requests is one line. Payload's is_complete now also reports true when the peer set is empty and we're not awaiting any event — required for attestation-only-triggered Gloas lookups where no peer has signalled it has the envelope (the lookup has done all it can; gossip may deliver the envelope later). Also adds Work::RpcEnvelope to the test rig's beacon-processor mock.	2026-05-19 15:28:46 -06:00
dapplion	a98e6531bf	Move processing-result classification to the producer side Reshape BlockProcessingResult from the AC-verdict-passthrough Ok/Err/Ignored enum to Imported(info) \| Error { penalty, reason }. The producer (network_beacon_processor) translates beacon-chain Result<AvailabilityProcessingStatus, BlockError> into this shape via a new classify_processing_result(), so the consumer only has to resolve the symbolic WhichPeerToPenalize against an in-scope PeerGroup. - on_block_processing_result and on_data_processing_result collapse to a single state-match each, then dispatch to WhichPeerToPenalize::apply(action, &peer_group, reason, cx). - mod.rs sheds the per-BlockError policy block (-129 lines). - Drops the now-unused data_peer_group, block_peer, BlockRequest::peer, peek_downloaded_peer_group accessors; their job is the consumer's responsibility now. - Ignored becomes Error { penalty: None, reason: "processor_overloaded" } with a producer-side warn!; the lookup retries up to MAX_ATTEMPTS instead of dropping immediately (test updated to match). - DuplicateFullyImported and GenesisBlock map to Imported; the test helper constructs the new variant directly.	2026-05-19 14:14:42 -06:00
dapplion	0a6aa5ae90	Merge remote-tracking branch 'sigp/unstable' into gloas-lookup-sync-fixes # Conflicts: # beacon_node/network/src/sync/manager.rs	2026-05-19 03:50:37 -06:00
dapplion	2d2fdf3dce	Fix correctness issues in single-block lookup state machine - add_peer: replace !=-vs-\|= typo so Gloas child-peer additions actually propagate back through add_peers_to_lookup_and_ancestors and kick continue_requests. - data_peer_group: return the PeerGroup stored in DataRequestState Downloaded/Processing instead of todo!(), so InvalidColumn attribution in mod.rs no longer panics on a live error path. - Restore the original `parent_root != ZERO` guard for the parent-known check; the genesis block has no real parent so it must fall through to processing rather than panic (was todo!()) or be dropped as Failed. - Wire envelope_is_known_to_fork_choice as a NoRequestNeeded short- circuit at the top of payload_lookup_request. - Rename gload_child_peers -> gloas_child_peers (typo). - Drop DataDownloadKind, peek_downloaded_peer_group, DataRequest.slot, DownloadedData::Blobs.expected_blobs — all dead per the compiler. - Update test helpers to send UnknownParentSidecarHeader so the lookup test suite compiles and runs under the new manager API. Tests: phase0 79/79, electra 59/59, fulu 59/59.	2026-05-19 03:43:11 -06:00
Daniel Knopik	1a68631180	Gloas payload cache (#9209 ) In Gloas, beacon blocks are imported into fork choice immediately - the payload envelope and data columns arrive separately. KZG commitments moved from the column sidecar into the execution payload bid, so the existing `DataAvailabilityChecker` (which assumes block and data are coupled) can't be used for Gloas. * Introduced `PendingPayloadCache` to keep track of payload and data columns per block root. * Added gossip column verification * Added support for Gloas data column reconstruction * Payload envelope verification simplified: removed `MaybeAvailableEnvelope`, `ExecutedEnvelope`, `EnvelopeImportData` Not yet implemented (tracked with TODOs): - Proper lookup sync for Gloas columns arriving before blocks - Partial column merging for Gloas - Moving `load_gloas_payload_bid` disk reads off the async runtime - Backfill/range sync for Gloas Based on @eserilev's PR and work in progress. See also #9202 for verification. Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu> Co-Authored-By: Eitan Seri- Levi <eserilev@gmail.com> Co-Authored-By: Daniel Knopik <daniel@dknopik.de> Co-Authored-By: Daniel Knopik <107140945+dknopik@users.noreply.github.com> Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com> Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>	2026-05-13 07:03:34 +00:00
Pawan Dhananjay	e0effdbfb9	Merge branch 'unstable' into gloas-lookup-sync-fixes	2026-05-07 16:13:50 -07:00
Mac L	3351db1ba8	Remove `TestRandom` (#9006 ) We have a legacy `TestRandom` trait which generates random types for testing and fuzzing. This function overlaps with `arbitrary` which is used very commonly in the ecosystem. Remove `TestRandom` and generate random type instances using `Arbitrary`. Co-Authored-By: Mac L <mjladson@pm.me> Co-Authored-By: Michael Sproul <michael@sigmaprime.io>	2026-05-05 06:35:57 +00:00
dapplion	ebe9fe228a	Gloas lookup sync Rewrites the single block lookup state machine for Gloas, where block, data (blobs/columns), and execution payload envelope are independent components that can arrive and import out of order. - Three additive-only sub-state-machines for block / data / payload streams. Peer sets start empty for data/payload and grow as children arrive — the parent lookup's completion requirement can widen over time without mutating any state machine. - `AwaitingParent` becomes a struct carrying the child's `parent_block_hash` so the parent can be classified empty/full from the child's bid reference. - Wires `PayloadEnvelopesByRoot` RPC end-to-end through `SyncNetworkContext`: request sending, response routing (`SingleLookupReqId::SinglePayloadEnvelope`), and integration into `PayloadRequest`. Envelope processing is still a TODO; only the download path is wired. - Test rig: serves envelopes from a `network_envelopes_by_root` cache populated from the external harness; bumps test validator count to 8 so `proposer_lookahead` can populate at the Fulu → Gloas upgrade. - Enables gloas in `TEST_NETWORK_FORKS`. - Fixes: genesis parent check, infinite retry loop on repeated download failure, no-op in `on_completed_request`, and peer sets not being cleared on disconnect.	2026-04-22 09:32:04 +02:00
Lion - dapplion	bc5d8c9f90	Add range sync tests (#8989 ) Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>	2026-03-31 05:07:22 +00:00
ethDreamer	6ca610d918	Breakup RPCBlock into LookupBlock & RangeSyncBlock (#8860 ) Co-Authored-By: Mark Mackey <mark@sigmaprime.io>	2026-03-13 19:22:29 +00:00
Lion - dapplion	f4a6b8d9b9	Tree-sync friendly lookup sync tests (#8592 ) - Step 0 of the tree-sync roadmap https://github.com/sigp/lighthouse/issues/7678 Current lookup sync tests are written in an explicit way that assume how the internals of lookup sync work. For example the test would do: - Emit unknown block parent message - Expect block request for X - Respond with successful block request - Expect block processing request for X - Response with successful processing request - etc.. This is unnecessarily verbose. And it will requires a complete re-write when something changes in the internals of lookup sync (has happened a few times, mostly for deneb and fulu). What we really want to assert is: - WHEN: we receive an unknown block parent message - THEN: Lookup sync can sync that block - ASSERT: Without penalizing peers, without unnecessary retries Keep all existing tests and add new cases but written in the new style described above. The logic to serve and respond to request is in this function `fn simulate` `2288a3aeb1/beacon_node/network/src/sync/tests/lookups.rs (L301)` - It controls peer behavior based on a `CompleteStrategy` where you can set for example "respond to BlocksByRoot requests with empty" - It actually runs beacon processor messages running their clousures. Now sync tests actually import blocks, increasing the test coverage to the interaction of sync and the da_checker. - To achieve the above the tests create real blocks with the test harness. To make the tests as fast as before, I disabled crypto with `TestConfig` Along the way I found a couple bugs, which I documented on the diff. Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>	2026-02-13 04:24:51 +00:00
Eitan Seri-Levi	f7b5c7ee3f	Convert RpcBlock to an enum that indicates availability (#8424 ) Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu> Co-Authored-By: Mark Mackey <mark@sigmaprime.io> Co-Authored-By: Eitan Seri-Levi <eserilev@gmail.com> Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>	2026-01-28 05:59:32 +00:00
Mac L	3903e1c67f	More `consensus/types` re-export cleanup (#8665 ) Remove more of the temporary re-exports from `consensus/types` Co-Authored-By: Mac L <mjladson@pm.me>	2026-01-16 04:43:05 +00:00
Mac L	4e958a92d3	Refactor `consensus/types` (#7827 ) Organize and categorize `consensus/types` into modules based on their relation to key consensus structures/concepts. This is a precursor to a sensible public interface. While this refactor is very opinionated, I am open to suggestions on module names, or type groupings if my current ones are inappropriate. Co-Authored-By: Mac L <mjladson@pm.me>	2025-12-04 09:28:52 +00:00
Jimmy Chen	bc86dc09e5	Reduce number of blobs used in tests to speed up CI (#8194 ) `beacon-chain-tests` is now regularly taking 1h+ on CI since Fulu fork was added. This PR attemtpts to reduce the test time by bringing down the number of blobs generated in tests - instead of generating 0..max_blobs, the generator now generates 0..1 blobs by default, and this can be modified by setting `harness.execution_block_generator.set_min_blob_count(n)`. Note: The blobs are pre-generated and doesn't require too much CPU to generate however processing a larger number of them on the beacon chain does take a lot of time. This PR also include a few other small improvements - Our slowest test (`chain_segment_varying_chunk_size`) runs 3x faster in Fulu just by reusing chain segments - Avoid re-running fork specific tests on all forks - Fix a bunch of tests that depends on the harness's existing random blob generation, which is fragile beacon chain test time on test machine is ~2x faster: ### `unstable` ``` Summary [ 751.586s] 291 tests run: 291 passed (13 slow), 0 skipped ``` ### this branch ``` Summary [ 373.792s] 291 tests run: 291 passed (2 slow), 0 skipped ``` The next set of tests to optimise is the ones that use [`get_chain_segment`](`77a9af96de/beacon_node/beacon_chain/tests/block_verification.rs (L45)`), as it by default build 320 blocks with supernode - an easy optimisation would be to build these blocks with cgc = 8 for tests that only require fullnodes. Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com> Co-Authored-By: Jimmy Chen <jimmy@sigmaprime.io>	2025-11-04 02:40:44 +00:00
Lion - dapplion	ffa7b2b2b9	Only mark block lookups as pending if block is importing from gossip (#8112 ) - PR https://github.com/sigp/lighthouse/pull/8045 introduced a regression of how lookup sync interacts with the da_checker. Now in unstable block import from the HTTP API also insert the block in the da_checker while the block is being execution verified. If lookup sync finds the block in the da_checker in `NotValidated` state it expects a `GossipBlockProcessResult` message sometime later. That message is only sent after block import in gossip. I confirmed in our node's logs for 4/4 cases of stuck lookups are caused by this sequence of events: - Receive block through API, insert into da_checker in fn process_block in put_pre_execution_block - Create lookup and leave in AwaitingDownload(block in processing cache) state - Block from HTTP API finishes importing - Lookup is left stuck Closes https://github.com/sigp/lighthouse/issues/8104 - https://github.com/sigp/lighthouse/pull/8110 was my initial solution attempt but we can't send the `GossipBlockProcessResult` event from the `http_api` crate without adding new channels, which seems messy. For a given node it's rare that a lookup is created at the same time that a block is being published. This PR solves https://github.com/sigp/lighthouse/issues/8104 by allowing lookup sync to import the block twice in that case. Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>	2025-09-25 03:52:27 +00:00
Jimmy Chen	78d330e4b7	Consolidate `reqresp_pre_import_cache` into `data_availability_checker` (#8045 ) This PR consolidates the `reqresp_pre_import_cache` into the `data_availability_checker` for the following reasons: - the `reqresp_pre_import_cache` suffers from the same TOCTOU bug we had with `data_availability_checker` earlier, and leads to unbounded memory leak, which we have observed over the last 6 months on some nodes. - the `reqresp_pre_import_cache` is no longer necessary, because we now hold blocks in the `data_availability_checker` for longer since (#7961), and recent blocks can be served from the DA checker. This PR also maintains the following functionalities - Serving pre-executed blocks over RPC, and they're now served from the `data_availability_checker` instead. - Using the cache for de-duplicating lookup requests. Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com> Co-Authored-By: Jimmy Chen <jimmy@sigmaprime.io>	2025-09-19 07:01:13 +00:00
Lion - dapplion	b7d78a91e0	Don't penalize peers for extending ignored chains (#8042 ) Lookup sync has a cache of block roots "failed_chains". If a peer triggers a lookup for a block or descendant of a root in failed_chains the lookup is dropped and the peer penalized. However blocks are inserted into failed_chains for a single reason: - If a chain is longer than 32 blocks the lookup is dropped to prevent OOM risks. However the peer is not at fault, since discovering an unknown chain longer than 32 blocks is not malicious. We just drop the lookup to sync the blocks from range forward sync. This discrepancy is probably an oversight when changing old code. Before we used to add blocks that failed too many times to process to that cache. However, we don't do that anymore. Adding a block that fails too many times to process is an optimization to save resources in rare cases where peers keep sending us invalid blocks. In case that happens, today we keep trying to process the block, downscoring the peers and eventually disconnecting them. _IF_ we found that optimization to be necessary we should merge this PR (_Stricter match of BlockError in lookup sync_) first. IMO we are fine without the failed_chains cache and the ignored_chains cache will be obsolete with [tree sync](https://github.com/sigp/lighthouse/issues/7678) as the OOM risk of long lookup chains does not exist anymore. Closes https://github.com/sigp/lighthouse/issues/7577 Rename `failed_chains` for `ignored_chains` and don't penalize peers that trigger lookups for those blocks Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>	2025-09-17 01:02:29 +00:00
Michael Sproul	d235f2c697	Delete `RuntimeVariableList::from_vec` (#7930 ) This method is a footgun because it truncates the list. It is the source of a recent bug: - https://github.com/sigp/lighthouse/pull/7927 - Delete uses of `RuntimeVariableList::from_vec` and replace them with `::new` which does validation and can fail. - Propagate errors where possible, unwrap in tests and use `expect` for obviously-safe uses (in `chain_spec.rs`).	2025-08-27 06:52:14 +00:00
Jimmy Chen	b4704eab4a	Fulu update to spec v1.6.0-alpha.4 (#7890 ) Fulu update to spec [v1.6.0-alpha.4](https://github.com/ethereum/consensus-specs/releases/tag/v1.6.0-alpha.4). - Make `number_of_columns` a preset - Optimise `get_custody_groups` to avoid computing if cgc = 128 - Add support for additional typenum values in type_dispatch macro	2025-08-20 02:05:04 +00:00
chonghe	522bd9e9c6	Update Rust Edition to 2024 (#7766 ) * #7749 Thanks @dknopik and @michaelsproul for your help!	2025-08-13 03:04:31 +00:00
Jimmy Chen	4daa015971	Remove peer sampling code (#7768 ) Peer sampling has been completely removed from the spec. This PR removes our partial implementation from the codebase. https://github.com/ethereum/consensus-specs/pull/4393	2025-07-23 03:24:45 +00:00
Pawan Dhananjay	5f208bb858	Implement basic validator custody framework (no backfill) (#7578 ) Resolves #6767 This PR implements a basic version of validator custody. - It introduces a new `CustodyContext` object which contains info regarding number of validators attached to a node and the custody count they contribute to the cgc. - The `CustodyContext` is added in the da_checker and has methods for returning the current cgc and the number of columns to sample at head. Note that the logic for returning the cgc existed previously in the network globals. - To estimate the number of validators attached, we use the `beacon_committee_subscriptions` endpoint. This might overestimate the number of validators actually publishing attestations from the node in the case of multi BN setups. We could also potentially use the `publish_attestations` endpoint to get a more conservative estimate at a later point. - Anytime there's a change in the `custody_group_count` due to addition/removal of validators, the custody context should send an event on a broadcast channnel. The only subscriber for the channel exists in the network service which simply subscribes to more subnets. There can be additional subscribers in sync that will start a backfill once the cgc changes. TODO - [ ] NOT REQUIRED: Currently, the logic only handles an increase in validator count and does not handle a decrease. We should ideally unsubscribe from subnets when the cgc has decreased. - [ ] NOT REQUIRED: Add a service in the `CustodyContext` that emits an event once `MIN_EPOCHS_FOR_BLOB_SIDECARS_REQUESTS ` passes after updating the current cgc. This event should be picked up by a subscriber which updates the enr and metadata. - [x] Add more tests	2025-06-11 18:10:06 +00:00
Lion - dapplion	d457ceeaaf	Don't create child lookup if parent is faulty (#7118 ) Issue discovered on PeerDAS devnet (node `lighthouse-geth-2.peerdas-devnet-5.ethpandaops.io`). Summary: - A lookup is created for block root `0x28299de15843970c8ea4f95f11f07f75e76a690f9a8af31d354c38505eebbe12` - That block or a parent is faulty and `0x28299de15843970c8ea4f95f11f07f75e76a690f9a8af31d354c38505eebbe12` is added to the failed chains cache - We later receive a block that is a child of a child of `0x28299de15843970c8ea4f95f11f07f75e76a690f9a8af31d354c38505eebbe12` - We create a lookup, which attempts to process the child of `0x28299de15843970c8ea4f95f11f07f75e76a690f9a8af31d354c38505eebbe12` and hit a processor error `UnknownParent`, hitting this line `bf955c7543/beacon_node/network/src/sync/block_lookups/mod.rs (L686-L688)` `search_parent_of_child` does not create a parent lookup because the parent root is in the failed chain cache. However, we have already marked the child as awaiting the parent. This results in an inconsistent state of lookup sync, as there's a lookup awaiting a parent that doesn't exist. Now we have a lookup (the child of `0x28299de15843970c8ea4f95f11f07f75e76a690f9a8af31d354c38505eebbe12`) that is awaiting a parent lookup that doesn't exist: hence stuck. ### Impact This bug can affect Mainnet as well as PeerDAS devnets. This bug may stall lookup sync for a few minutes (up to `LOOKUP_MAX_DURATION_STUCK_SECS = 15 min`) until the stuck prune routine deletes it. By that time the root will be cleared from the failed chain cache and sync should succeed. During that time the user will see a lot of `WARN` logs when attempting to add each peer to the inconsistent lookup. We may also sync the block through range sync if we fall behind by more than 2 epochs. We may also create the parent lookup successfully after the failed cache clears and complete the child lookup. This bug is triggered if: - We have a lookup that fails and its root is added to the failed chain cache (much more likely to happen in PeerDAS networks) - We receive a block that builds on a child of the block added to the failed chain cache Ensure that we never create (or leave existing) a lookup that references a non-existing parent. I added `must_use` lints to the functions that create lookups. To fix the specific bug we must recursively drop the child lookup if the parent is not created. So if `search_parent_of_child` returns `false` now return `LookupRequestError::Failed` instead of `LookupResult::Pending`. As a bonus I have a added more logging and reason strings to the errors	2025-06-05 08:53:43 +00:00
Jimmy Chen	e6ef644db4	Verify `getBlobsV2` response and avoid reprocessing imported data columns (#7493 ) #7461 and partly #6439. Desired behaviour after receiving `engine_getBlobs` response: 1. Gossip verify the blobs and proofs, but don't mark them as observed yet. This is because not all blobs are published immediately (due to staggered publishing). If we mark them as observed and not publish them, we could end up blocking the gossip propagation. 2. Blobs are marked as observed _either_ when: * They are received from gossip and forwarded to the network . * They are published by the node. Current behaviour: - ❗ We only gossip verify `engine_getBlobsV1` responses, but not `engine_getBlobsV2` responses (PeerDAS). - ❗ After importing EL blobs AND before they're published, if the same blobs arrive via gossip, they will get re-processed, which may result in a re-import. 1. Perform gossip verification on data columns computed from EL `getBlobsV2` response. We currently only do this for `getBlobsV1` to prevent importing blobs with invalid proofs into the `DataAvailabilityChecker`, this should be done on V2 responses too. 2. Add additional gossip verification to make sure we don't re-process a ~~blob~~ or data column that was imported via the EL `getBlobs` but not yet "seen" on the gossip network. If an "unobserved" gossip blob is found in the availability cache, then we know it has passed verification so we can immediately propagate the `ACCEPT` result and forward it to the network, but without re-processing it. UPDATE: I've left blobs out for the second change mentioned above, as the likelihood and impact is very slow and we haven't seen it enough, but under PeerDAS this issue is a regular occurrence and we do see the same block getting imported many times.	2025-05-26 19:55:58 +00:00
Akihito Nakano	537fc5bde8	Revive network-test logs files in CI (#7459 ) https://github.com/sigp/lighthouse/issues/7187 This PR adds a writer that implements `tracing_subscriber::fmt::MakeWriter`, which writes logs to separate files for each test.	2025-05-22 02:51:22 +00:00
SunnysidedJ	593390162f	`peerdas-devnet-7`: update `DataColumnSidecarsByRoot` request to use `DataColumnsByRootIdentifier` (#7399 ) Update DataColumnSidecarsByRoot request to use DataColumnsByRootIdentifier #7377 As described in https://github.com/ethereum/consensus-specs/pull/4284	2025-05-12 00:20:55 +00:00
Lion - dapplion	beb0ce68bd	Make range sync peer loadbalancing PeerDAS-friendly (#6922 ) - Re-opens https://github.com/sigp/lighthouse/pull/6864 targeting unstable Range sync and backfill sync still assume that each batch request is done by a single peer. This assumption breaks with PeerDAS, where we request custody columns to N peers. Issues with current unstable: - Peer prioritization counts batch requests per peer. This accounting is broken now, data columns by range request are not accounted - Peer selection for data columns by range ignores the set of peers on a syncing chain, instead draws from the global pool of peers - The implementation is very strict when we have no peers to request from. After PeerDAS this case is very common and we want to be flexible or easy and handle that case better than just hard failing everything. - [x] Upstream peer prioritization to the network context, it knows exactly how many active requests a peer (including columns by range) - [x] Upstream peer selection to the network context, now `block_components_by_range_request` gets a set of peers to choose from instead of a single peer. If it can't find a peer, it returns the error `RpcRequestSendError::NoPeer` - [ ] Range sync and backfill sync handle `RpcRequestSendError::NoPeer` explicitly - [ ] Range sync: leaves the batch in `AwaitingDownload` state and does nothing. TODO: we should have some mechanism to fail the chain if it's stale for too long - EDIT: Not done in this PR - [x] Backfill sync: pauses the sync until another peer joins - EDIT: Same logic as unstable ### TODOs - [ ] Add tests :) - [x] Manually test backfill sync Note: this touches the mainnet path!	2025-05-07 02:03:07 +00:00
Mac L	0e6da0fcaf	Merge branch 'release-v7.0.0' into v7-backmerge	2025-04-04 13:32:58 +11:00
Mac L	82d1674455	Rust 1.86.0 lints (#7254 ) Implement lints for the new Rust compiler version 1.86.0.	2025-04-04 02:30:22 +00:00
Age Manning	d6cd049a45	RPC RequestId Cleanup (#7238 ) I've been working at updating another library to latest Lighthouse and got very confused with RPC request Ids. There were types that had fields called `request_id` and `id`. And interchangeably could have types `PeerRequestId`, `rpc::RequestId`, `AppRequestId`, `api_types::RequestId` or even `Request.id`. I couldn't keep track of which Id was linked to what and what each type meant. So this PR mainly does a few things: - Changes the field naming to match the actual type. So any field that has an `AppRequestId` will be named `app_request_id` rather than `id` or `request_id` for example. - I simplified the types. I removed the two different `RequestId` types (one in Lighthouse_network the other in the rpc) and grouped them into one. It has one downside tho. I had to add a few unreachable lines of code in the beacon processor, which the extra type would prevent, but I feel like it might be worth it. Happy to add an extra type to avoid those few lines. - I also removed the concept of `PeerRequestId` which sometimes went alongside a `request_id`. There were times were had a `PeerRequest` and a `Request` being returned, both of which contain a `RequestId` so we had redundant information. I've simplified the logic by removing `PeerRequestId` and made a `ResponseId`. I think if you look at the code changes, it simplifies things a bit and removes the redundant extra info. I think with this PR things are a little bit easier to reasonable about what is going on with all these RPC Ids. NOTE: I did this with the help of AI, so probably should be checked	2025-04-03 10:10:15 +00:00

1 2

62 Commits