lighthouse

mirror of https://github.com/sigp/lighthouse.git synced 2026-06-16 10:18:15 +00:00

Author	SHA1	Message	Date
dapplion	4e13b3be0f	Fix failed_peers post fulu	2025-06-11 11:49:25 +02:00
dapplion	7a03578795	Remove total_requests_per_peer	2025-06-11 11:21:12 +02:00
dapplion	28d9d8b8e2	lint	2025-06-11 11:02:37 +02:00
Jimmy Chen	6f754bfd8d	Merge branch 'peerdas-devnet-7' into peerdas-rangesync	2025-06-05 23:39:03 +10:00
Jimmy Chen	4fadf1fba8	Merge branch 'unstable' into peerdas-devnet-7	2025-06-05 23:38:31 +10:00
Lion - dapplion	d457ceeaaf	Don't create child lookup if parent is faulty (#7118 ) Issue discovered on PeerDAS devnet (node `lighthouse-geth-2.peerdas-devnet-5.ethpandaops.io`). Summary: - A lookup is created for block root `0x28299de15843970c8ea4f95f11f07f75e76a690f9a8af31d354c38505eebbe12` - That block or a parent is faulty and `0x28299de15843970c8ea4f95f11f07f75e76a690f9a8af31d354c38505eebbe12` is added to the failed chains cache - We later receive a block that is a child of a child of `0x28299de15843970c8ea4f95f11f07f75e76a690f9a8af31d354c38505eebbe12` - We create a lookup, which attempts to process the child of `0x28299de15843970c8ea4f95f11f07f75e76a690f9a8af31d354c38505eebbe12` and hit a processor error `UnknownParent`, hitting this line `bf955c7543/beacon_node/network/src/sync/block_lookups/mod.rs (L686-L688)` `search_parent_of_child` does not create a parent lookup because the parent root is in the failed chain cache. However, we have already marked the child as awaiting the parent. This results in an inconsistent state of lookup sync, as there's a lookup awaiting a parent that doesn't exist. Now we have a lookup (the child of `0x28299de15843970c8ea4f95f11f07f75e76a690f9a8af31d354c38505eebbe12`) that is awaiting a parent lookup that doesn't exist: hence stuck. ### Impact This bug can affect Mainnet as well as PeerDAS devnets. This bug may stall lookup sync for a few minutes (up to `LOOKUP_MAX_DURATION_STUCK_SECS = 15 min`) until the stuck prune routine deletes it. By that time the root will be cleared from the failed chain cache and sync should succeed. During that time the user will see a lot of `WARN` logs when attempting to add each peer to the inconsistent lookup. We may also sync the block through range sync if we fall behind by more than 2 epochs. We may also create the parent lookup successfully after the failed cache clears and complete the child lookup. This bug is triggered if: - We have a lookup that fails and its root is added to the failed chain cache (much more likely to happen in PeerDAS networks) - We receive a block that builds on a child of the block added to the failed chain cache Ensure that we never create (or leave existing) a lookup that references a non-existing parent. I added `must_use` lints to the functions that create lookups. To fix the specific bug we must recursively drop the child lookup if the parent is not created. So if `search_parent_of_child` returns `false` now return `LookupRequestError::Failed` instead of `LookupResult::Pending`. As a bonus I have a added more logging and reason strings to the errors	2025-06-05 08:53:43 +00:00
dapplion	ae0ef8f929	Fix finalized_sync_permanent_custody_peer_failure	2025-06-04 23:02:56 -06:00
Jimmy Chen	1b72871ad1	Merge branch 'peerdas-devnet-7' into peerdas-rangesync	2025-06-03 18:20:54 +10:00
Jimmy Chen	42ef88bdb4	Merge branch 'unstable' into peerdas-devnet-7 # Conflicts: # beacon_node/beacon_chain/src/data_availability_checker.rs	2025-06-03 18:19:07 +10:00
ethDreamer	ae30480926	Implement EIP-7892 BPO hardforks (#7521 ) [EIP-7892: Blob Parameter Only Hardforks](https://eips.ethereum.org/EIPS/eip-7892) #7467	2025-06-02 06:54:42 +00:00
dapplion	c6b39e9e10	Merge remote-tracking branch 'sigp/peerdas-devnet-7' into peerdas-rangesync	2025-05-27 16:20:34 -05:00
dapplion	02d97377a5	Address review comments	2025-05-27 16:07:45 -05:00
dapplion	144b83e625	Remove BatchStateSummary	2025-05-27 15:52:14 -05:00
dapplion	0ef95dd7f8	Remove stale TODO	2025-05-27 15:33:39 -05:00
dapplion	fc3922f854	Resolve more TODOs	2025-05-27 15:32:29 -05:00
dapplion	52722b7b2e	Resolve TODO(das)	2025-05-27 14:28:52 -05:00
dapplion	86ad87eced	Lint tests	2025-05-27 12:21:42 -05:00
dapplion	8f74adc66f	Use DataColumnSidecarList	2025-05-27 00:43:38 -05:00
dapplion	34b37b97ed	Remove unused module	2025-05-27 00:37:12 -05:00
Michael Sproul	7c89b970af	Handle attestation validation errors (#7382 ) Partly addresses: - https://github.com/sigp/lighthouse/issues/7379 Handle attestation validation errors from `get_attesting_indices` to prevent an error log, downscore the peer, and reject the message.	2025-05-27 01:55:17 +00:00
dapplion	01329ab230	Improve RangeBlockComponent type	2025-05-26 19:07:15 -05:00
dapplion	c8a0c9e379	Remove CustodyByRoot and CustodyByRange types	2025-05-26 19:04:50 -05:00
dapplion	7d0fb93274	Reduce conversions	2025-05-26 18:49:45 -05:00
dapplion	b383f7af53	More comments	2025-05-26 18:37:20 -05:00
Jimmy Chen	e6ef644db4	Verify `getBlobsV2` response and avoid reprocessing imported data columns (#7493 ) #7461 and partly #6439. Desired behaviour after receiving `engine_getBlobs` response: 1. Gossip verify the blobs and proofs, but don't mark them as observed yet. This is because not all blobs are published immediately (due to staggered publishing). If we mark them as observed and not publish them, we could end up blocking the gossip propagation. 2. Blobs are marked as observed _either_ when: * They are received from gossip and forwarded to the network . * They are published by the node. Current behaviour: - ❗ We only gossip verify `engine_getBlobsV1` responses, but not `engine_getBlobsV2` responses (PeerDAS). - ❗ After importing EL blobs AND before they're published, if the same blobs arrive via gossip, they will get re-processed, which may result in a re-import. 1. Perform gossip verification on data columns computed from EL `getBlobsV2` response. We currently only do this for `getBlobsV1` to prevent importing blobs with invalid proofs into the `DataAvailabilityChecker`, this should be done on V2 responses too. 2. Add additional gossip verification to make sure we don't re-process a ~~blob~~ or data column that was imported via the EL `getBlobs` but not yet "seen" on the gossip network. If an "unobserved" gossip blob is found in the availability cache, then we know it has passed verification so we can immediately propagate the `ACCEPT` result and forward it to the network, but without re-processing it. UPDATE: I've left blobs out for the second change mentioned above, as the likelihood and impact is very slow and we haven't seen it enough, but under PeerDAS this issue is a regular occurrence and we do see the same block getting imported many times.	2025-05-26 19:55:58 +00:00
Jimmy Chen	a85d863fb6	Merge branch 'unstable' into peerdas-devnet-7	2025-05-26 14:42:18 +10:00
Akihito Nakano	a2797d4bbd	Fix formatting errors from cargo-sort (#7512 ) [cargo-sort is currently failing on CI](https://github.com/sigp/lighthouse/actions/runs/15198128212/job/42746931918?pr=7025), likely due to new checks introduced in version [2.0.0](https://github.com/DevinR528/cargo-sort/releases/tag/v2.0.0). Fixed the errors by running cargo-sort with formatting enabled.	2025-05-23 05:25:56 +00:00
dapplion	801659d4ae	Resolve some TODOs	2025-05-22 01:06:57 -05:00
dapplion	4fb2ae658a	Implement reliable range sync for PeerDAS	2025-05-22 00:03:25 -05:00
Akihito Nakano	537fc5bde8	Revive network-test logs files in CI (#7459 ) https://github.com/sigp/lighthouse/issues/7187 This PR adds a writer that implements `tracing_subscriber::fmt::MakeWriter`, which writes logs to separate files for each test.	2025-05-22 02:51:22 +00:00
Lion - dapplion	b014675b7a	Fix PeerDAS sync scoring (#7352 ) * Remove request tracking inside syncing chains * Prioritize by range peers in network context * Prioritize custody peers for columns by range * Explicit error handling of the no peers error case * Remove good_peers_on_sampling_subnets * Count AwaitingDownload towards the buffer limit * Retry syncing chains in AwaitingDownload state * Use same peer priorization for lookups * Review PR * Address TODOs * Revert changes to peer erroring in range sync * Revert metrics changes * Update comment * Pass peers_to_deprioritize to select_columns_by_range_peers_to_request * more idiomatic * Idiomatic while * Add note about infinite loop * Use while let * Fix wrong custody column count for lookup blocks * Remove impl * Remove stale comment * Fix build errors. * Or default * Review PR * BatchPeerGroup * Match block and blob signatures * Explicit match statement to BlockError in range sync * Remove todo in BatchPeerGroup * Remove participating peers from backfill sync * Remove MissingAllCustodyColumns error * Merge fixes * Clean up PR * Consistent naming of batch_peers * Address multiple review comments * Better errors for das * Penalize column peers once * Restore fn * Fix error enum * Removed MismatchedPublicKeyLen * Revert testing changes * Change BlockAndCustodyColumns enum variant * Revert type change in import_historical_block_batch * Drop pubkey cache * Don't collect Vec * Classify errors * Remove ReconstructColumnsError * More detailed UnrequestedSlot error * Lint test * Fix slot conversion * Reduce penalty for missing blobs * Revert changes in peer selection * Lint tests * Rename block matching functions * Reorder block matching in historical blocks * Fix order of block matching * Add store tests * Filter blockchain in assert_correct_historical_block_chain * Also filter before KZG checks * Lint tests * Fix lint * Fix fulu err assertion * Check point is not at infinity * Fix ws sync test * Revert dropping filter fn --------- Co-authored-by: Jimmy Chen <jchen.tc@gmail.com> Co-authored-by: Jimmy Chen <jimmy@sigmaprime.io> Co-authored-by: Pawan Dhananjay <pawandhananjay@gmail.com>	2025-05-21 23:06:42 +10:00
Eitan Seri-Levi	268809a530	Rust clippy 1.87 lint fixes (#7471 ) Fix clippy lints for `rustc` 1.87 clippy complains about `BeaconChainError` being too large. I went on a bit of a boxing spree because of this. We may instead want to `Box` some of the `BeaconChainError` variants?	2025-05-16 05:03:00 +00:00
SunnysidedJ	593390162f	`peerdas-devnet-7`: update `DataColumnSidecarsByRoot` request to use `DataColumnsByRootIdentifier` (#7399 ) Update DataColumnSidecarsByRoot request to use DataColumnsByRootIdentifier #7377 As described in https://github.com/ethereum/consensus-specs/pull/4284	2025-05-12 00:20:55 +00:00
Lion - dapplion	a497ec601c	Retry custody requests after peer metadata updates (#6975 ) Closes https://github.com/sigp/lighthouse/issues/6895 We need sync to retry custody requests when a peer CGC updates. A higher CGC can result in a data column subnet peer count increasing from 0 to 1, allowing requests to happen. Add new sync event `SyncMessage::UpdatedPeerCgc`. It's sent by the router when a metadata response updates the known CGC	2025-05-09 08:27:17 +00:00
Jimmy Chen	0f13029c7d	Don't publish data columns reconstructed from RPC columns to the gossip network (#7409 ) Don't publish data columns reconstructed from RPC columns to the gossip network, as this may result in peer downscoring if we're sending columns from past slots.	2025-05-07 23:24:48 +00:00
Lion - dapplion	beb0ce68bd	Make range sync peer loadbalancing PeerDAS-friendly (#6922 ) - Re-opens https://github.com/sigp/lighthouse/pull/6864 targeting unstable Range sync and backfill sync still assume that each batch request is done by a single peer. This assumption breaks with PeerDAS, where we request custody columns to N peers. Issues with current unstable: - Peer prioritization counts batch requests per peer. This accounting is broken now, data columns by range request are not accounted - Peer selection for data columns by range ignores the set of peers on a syncing chain, instead draws from the global pool of peers - The implementation is very strict when we have no peers to request from. After PeerDAS this case is very common and we want to be flexible or easy and handle that case better than just hard failing everything. - [x] Upstream peer prioritization to the network context, it knows exactly how many active requests a peer (including columns by range) - [x] Upstream peer selection to the network context, now `block_components_by_range_request` gets a set of peers to choose from instead of a single peer. If it can't find a peer, it returns the error `RpcRequestSendError::NoPeer` - [ ] Range sync and backfill sync handle `RpcRequestSendError::NoPeer` explicitly - [ ] Range sync: leaves the batch in `AwaitingDownload` state and does nothing. TODO: we should have some mechanism to fail the chain if it's stale for too long - EDIT: Not done in this PR - [x] Backfill sync: pauses the sync until another peer joins - EDIT: Same logic as unstable ### TODOs - [ ] Add tests :) - [x] Manually test backfill sync Note: this touches the mainnet path!	2025-05-07 02:03:07 +00:00
Lion - dapplion	2aa5d5c25e	Make sure to log SyncingChain ID (#7359 ) Debugging an sync issue from @pawanjay176 I'm missing some key info where instead of logging the ID of the SyncingChain we just log "Finalized" (the sync type). This looks like some typo or something was lost in translation when refactoring things. ``` Apr 17 12:12:00.707 DEBUG Syncing new finalized chain chain: Finalized, component: "range_sync" ``` This log should include more info about the new chain but just logs "Finalized" ``` Apr 17 12:12:00.810 DEBUG New chain added to sync peer_id: "16Uiu2HAmHP8QLYQJwZ4cjMUEyRgxzpkJF87qPgNecLTpUdruYbdA", sync_type: Finalized, new_chain: Finalized, component: "range_sync" ``` - Remove the Display impl and log the ID explicitly for all logs. - Log more details when creating a new SyncingChain	2025-05-01 19:53:29 +00:00
Jimmy Chen	476f3a593c	Add `MAX_BLOBS_PER_BLOCK_FULU` config (#7161 ) Add `MAX_BLOBS_PER_BLOCK_FULU` config.	2025-04-15 00:20:46 +00:00
Lion - dapplion	be68dd24d0	Fix wrong custody column count for lookup blocks (#7281 ) Fixes - https://github.com/sigp/lighthouse/issues/7278 Don't assume 0 columns for `RpcBlockInner::Block`	2025-04-11 22:00:57 +00:00
Mac L	39eb8145f8	Merge branch 'release-v7.0.0' into unstable	2025-04-11 21:32:24 +10:00
SunnysidedJ	d96b73152e	Fix for #6296 : Deterministic RNG in peer DAS publish block tests (#7192 ) #6296: Deterministic RNG in peer DAS publish block tests Made test functions to call publish-block APIs with true for the deterministic RNG boolean parameter while production code with false. This will deterministically shuffle columns for unit tests under broadcast_validation_tests.rs.	2025-04-09 15:35:15 +00:00
Jimmy Chen	759b0612b3	Offloading KZG Proof Computation from the beacon node (#7117 ) Addresses #7108 - Add EL integration for `getPayloadV5` and `getBlobsV2` - Offload proof computation and use proofs from EL RPC APIs	2025-04-08 07:37:16 +00:00
Jimmy Chen	e924264e17	Fullnodes to publish data columns from EL `getBlobs` (#7258 ) Previously only supernode contributes to data column publishing in Lighthouse. Recently we've [updated the spec](https://github.com/ethereum/consensus-specs/pull/4183) to have full nodes publishing data columns as well, to ensure all nodes contributes to propagation. This also prevents already imported data columns from being imported again (because we don't "observe" them), and ensures columns that are observed in the [gossip seen cache](`d60c24ef1c/beacon_node/beacon_chain/src/data_column_verification.rs (L492)`) are forwarded to its peers, rather than being ignored.	2025-04-08 03:20:31 +00:00
Lion - dapplion	d511ca0494	Compute roots for unfinalized by_range requests with fork-choice (#7098 ) Includes PRs - https://github.com/sigp/lighthouse/pull/7058 - https://github.com/sigp/lighthouse/pull/7066 Cleaner for the `release-v7.0.0` branch	2025-04-07 03:16:41 +00:00
Jimmy Chen	7cc64cab83	Add missing error log and remove redundant id field from lookup logs (#6990 ) Partially #6989. This PR adds the missing error log when a batch fails due to issues with converting the response into `RpcBlock`. See the above linked issue for more details. Adding this log reveals that we're completing range requests with missing columns, hence causing the batch to fail. It looks like we've hit the case where we've received enough stream terminations, but not all columns are returned. ``` Feb 12 06:12:16.558 DEBG Failed to convert range block components into RpcBlock, error: No column for block 0xc5b6c7fa02f5ef603d45819c08c6519f1dba661fd5d44a2fc849d3e7028b6007 index 18, id: 3456/RangeSync/116/3432, service: sync, module: network::sync::network_context:488 ``` I've also removed some redundant `id` logging, as the `id` debug representation is difficult to read, and is now being logged as part of `req_id` in a more succinct format (relevant PR: #6914)	2025-04-04 09:01:42 +00:00
Mac L	0e6da0fcaf	Merge branch 'release-v7.0.0' into v7-backmerge	2025-04-04 13:32:58 +11:00
Mac L	82d1674455	Rust 1.86.0 lints (#7254 ) Implement lints for the new Rust compiler version 1.86.0.	2025-04-04 02:30:22 +00:00
Age Manning	d6cd049a45	RPC RequestId Cleanup (#7238 ) I've been working at updating another library to latest Lighthouse and got very confused with RPC request Ids. There were types that had fields called `request_id` and `id`. And interchangeably could have types `PeerRequestId`, `rpc::RequestId`, `AppRequestId`, `api_types::RequestId` or even `Request.id`. I couldn't keep track of which Id was linked to what and what each type meant. So this PR mainly does a few things: - Changes the field naming to match the actual type. So any field that has an `AppRequestId` will be named `app_request_id` rather than `id` or `request_id` for example. - I simplified the types. I removed the two different `RequestId` types (one in Lighthouse_network the other in the rpc) and grouped them into one. It has one downside tho. I had to add a few unreachable lines of code in the beacon processor, which the extra type would prevent, but I feel like it might be worth it. Happy to add an extra type to avoid those few lines. - I also removed the concept of `PeerRequestId` which sometimes went alongside a `request_id`. There were times were had a `PeerRequest` and a `Request` being returned, both of which contain a `RequestId` so we had redundant information. I've simplified the logic by removing `PeerRequestId` and made a `ResponseId`. I think if you look at the code changes, it simplifies things a bit and removes the redundant extra info. I think with this PR things are a little bit easier to reasonable about what is going on with all these RPC Ids. NOTE: I did this with the help of AI, so probably should be checked	2025-04-03 10:10:15 +00:00
Jimmy Chen	80626e58d2	Attempt to fix flaky network tests (#7244 )	2025-04-03 04:01:34 +00:00
Michael Sproul	bde0f1ef0b	Merge remote-tracking branch 'origin/release-v7.0.0' into unstable	2025-03-29 13:01:58 +11:00

1 2 3 4 5 ...

1053 Commits