lighthouse

mirror of https://github.com/sigp/lighthouse.git synced 2026-06-16 18:28:42 +00:00

Author	SHA1	Message	Date
dapplion	e426e45455	Don't use failed_peers for download errors, rely on randomness to skip potentially faulty peers	2025-06-11 12:38:55 +02:00
dapplion	4e13b3be0f	Fix failed_peers post fulu	2025-06-11 11:49:25 +02:00
dapplion	7a03578795	Remove total_requests_per_peer	2025-06-11 11:21:12 +02:00
dapplion	28d9d8b8e2	lint	2025-06-11 11:02:37 +02:00
Jimmy Chen	6f754bfd8d	Merge branch 'peerdas-devnet-7' into peerdas-rangesync	2025-06-05 23:39:03 +10:00
Jimmy Chen	4fadf1fba8	Merge branch 'unstable' into peerdas-devnet-7	2025-06-05 23:38:31 +10:00
Lion - dapplion	d457ceeaaf	Don't create child lookup if parent is faulty (#7118 ) Issue discovered on PeerDAS devnet (node `lighthouse-geth-2.peerdas-devnet-5.ethpandaops.io`). Summary: - A lookup is created for block root `0x28299de15843970c8ea4f95f11f07f75e76a690f9a8af31d354c38505eebbe12` - That block or a parent is faulty and `0x28299de15843970c8ea4f95f11f07f75e76a690f9a8af31d354c38505eebbe12` is added to the failed chains cache - We later receive a block that is a child of a child of `0x28299de15843970c8ea4f95f11f07f75e76a690f9a8af31d354c38505eebbe12` - We create a lookup, which attempts to process the child of `0x28299de15843970c8ea4f95f11f07f75e76a690f9a8af31d354c38505eebbe12` and hit a processor error `UnknownParent`, hitting this line `bf955c7543/beacon_node/network/src/sync/block_lookups/mod.rs (L686-L688)` `search_parent_of_child` does not create a parent lookup because the parent root is in the failed chain cache. However, we have already marked the child as awaiting the parent. This results in an inconsistent state of lookup sync, as there's a lookup awaiting a parent that doesn't exist. Now we have a lookup (the child of `0x28299de15843970c8ea4f95f11f07f75e76a690f9a8af31d354c38505eebbe12`) that is awaiting a parent lookup that doesn't exist: hence stuck. ### Impact This bug can affect Mainnet as well as PeerDAS devnets. This bug may stall lookup sync for a few minutes (up to `LOOKUP_MAX_DURATION_STUCK_SECS = 15 min`) until the stuck prune routine deletes it. By that time the root will be cleared from the failed chain cache and sync should succeed. During that time the user will see a lot of `WARN` logs when attempting to add each peer to the inconsistent lookup. We may also sync the block through range sync if we fall behind by more than 2 epochs. We may also create the parent lookup successfully after the failed cache clears and complete the child lookup. This bug is triggered if: - We have a lookup that fails and its root is added to the failed chain cache (much more likely to happen in PeerDAS networks) - We receive a block that builds on a child of the block added to the failed chain cache Ensure that we never create (or leave existing) a lookup that references a non-existing parent. I added `must_use` lints to the functions that create lookups. To fix the specific bug we must recursively drop the child lookup if the parent is not created. So if `search_parent_of_child` returns `false` now return `LookupRequestError::Failed` instead of `LookupResult::Pending`. As a bonus I have a added more logging and reason strings to the errors	2025-06-05 08:53:43 +00:00
dapplion	ae0ef8f929	Fix finalized_sync_permanent_custody_peer_failure	2025-06-04 23:02:56 -06:00
Jimmy Chen	2b4a9bda44	Merge branch 'peerdas-devnet-7' into peerdas-rangesync	2025-06-04 16:06:23 +10:00
Jimmy Chen	b9ce98a3e5	Merge branch 'unstable' into peerdas-devnet-7	2025-06-04 16:05:38 +10:00
Jimmy Chen	357a8ccbb9	Checkpoint sync without the blobs from Fulu (#7549 ) Lighthouse currently requires checkpoint sync to be performed against a supernode in a PeerDAS network, as only supernodes can serve blobs. This PR lifts that requirement, enabling Lighthouse to checkpoint sync from either a fullnode or a supernode (See https://github.com/sigp/lighthouse/issues/6837#issuecomment-2933094923) Missing data columns for the checkpoint block isn't a big issue, but we should be able to easily implement backfill once we have the logic to backfill data columns.	2025-06-04 00:31:27 +00:00
Jimmy Chen	1b72871ad1	Merge branch 'peerdas-devnet-7' into peerdas-rangesync	2025-06-03 18:20:54 +10:00
Jimmy Chen	42ef88bdb4	Merge branch 'unstable' into peerdas-devnet-7 # Conflicts: # beacon_node/beacon_chain/src/data_availability_checker.rs	2025-06-03 18:19:07 +10:00
ethDreamer	ae30480926	Implement EIP-7892 BPO hardforks (#7521 ) [EIP-7892: Blob Parameter Only Hardforks](https://eips.ethereum.org/EIPS/eip-7892) #7467	2025-06-02 06:54:42 +00:00
Jimmy Chen	94a1446ac9	Fix unexpected blob error and duplicate import in fetch blobs (#7541 ) Getting this error on a non-PeerDAS network: ``` May 29 13:30:13.484 ERROR Error fetching or processing blobs from EL error: BlobProcessingError(AvailabilityCheck(Unexpected("empty blobs"))), block_root: 0x98aa3927056d453614fefbc79eb1f9865666d1f119d0e8aa9e6f4d02aa9395d9 ``` It appears we're passing an empty `Vec` to DA checker, because all blobs were already seen on gossip and filtered out, this causes a `AvailabilityCheckError::Unexpected("empty blobs")`. I've added equivalent unit tests for `getBlobsV1` to cover all the scenarios we test in `getBlobsV2`. This would have caught the bug if I had added it earlier. It also caught another bug which could trigger duplicate block import. Thanks Santito for reporting this! 🙏	2025-06-02 01:51:09 +00:00
Jimmy Chen	4d21846aba	Prevent `AvailabilityCheckError` when there's no new custody columns to import (#7533 ) Addresses a regression recently introduced when we started gossip verifying data columns from EL blobs ``` failures: network_beacon_processor::tests::accept_processed_gossip_data_columns_without_import test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 90 filtered out; finished in 16.60s stderr ─── thread 'network_beacon_processor::tests::accept_processed_gossip_data_columns_without_import' panicked at beacon_node/network/src/network_beacon_processor/tests.rs:829:10: should put data columns into availability cache: Unexpected("empty columns") note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace ``` https://github.com/sigp/lighthouse/actions/runs/15309278812/job/43082341868?pr=7521 If an empty `Vec` is passed to the DA checker, it causes an unexpected error. This PR addresses it by not passing an empty `Vec` for processing, and not spawning a task to publish.	2025-05-29 02:54:34 +00:00
Akihito Nakano	5cda6a6f9e	Mitigate flakiness in test_delayed_rpc_response (#7522 ) https://github.com/sigp/lighthouse/issues/7466 Expanded the margin from 100ms to 500ms.	2025-05-29 01:37:04 +00:00
Mac L	0ddf9a99d6	Remove support for database migrations prior to schema version v22 (#7332 ) Remove deprecated database migrations prior to v22 along with v22 migration specific code.	2025-05-28 13:47:21 +00:00
dapplion	c6b39e9e10	Merge remote-tracking branch 'sigp/peerdas-devnet-7' into peerdas-rangesync	2025-05-27 16:20:34 -05:00
dapplion	02d97377a5	Address review comments	2025-05-27 16:07:45 -05:00
dapplion	144b83e625	Remove BatchStateSummary	2025-05-27 15:52:14 -05:00
dapplion	0ef95dd7f8	Remove stale TODO	2025-05-27 15:33:39 -05:00
dapplion	fc3922f854	Resolve more TODOs	2025-05-27 15:32:29 -05:00
dapplion	52722b7b2e	Resolve TODO(das)	2025-05-27 14:28:52 -05:00
dapplion	86ad87eced	Lint tests	2025-05-27 12:21:42 -05:00
Akihito Nakano	8989ef8fb1	Enable arithmetic lint in rate-limiter (#7025 ) https://github.com/sigp/lighthouse/issues/6875 - Enabled the linter in rate-limiter and fixed errors. - Changed the type of `Quota::max_tokens` from `u64` to `NonZeroU64` because `max_tokens` cannot be zero. - Added a test to ensure that a large value for `tokens`, which causes an overflow, is handled properly.	2025-05-27 15:43:22 +00:00
dapplion	8f74adc66f	Use DataColumnSidecarList	2025-05-27 00:43:38 -05:00
dapplion	34b37b97ed	Remove unused module	2025-05-27 00:37:12 -05:00
Michael Sproul	7c89b970af	Handle attestation validation errors (#7382 ) Partly addresses: - https://github.com/sigp/lighthouse/issues/7379 Handle attestation validation errors from `get_attesting_indices` to prevent an error log, downscore the peer, and reject the message.	2025-05-27 01:55:17 +00:00
dapplion	01329ab230	Improve RangeBlockComponent type	2025-05-26 19:07:15 -05:00
dapplion	c8a0c9e379	Remove CustodyByRoot and CustodyByRange types	2025-05-26 19:04:50 -05:00
dapplion	7d0fb93274	Reduce conversions	2025-05-26 18:49:45 -05:00
dapplion	b383f7af53	More comments	2025-05-26 18:37:20 -05:00
Jimmy Chen	e6ef644db4	Verify `getBlobsV2` response and avoid reprocessing imported data columns (#7493 ) #7461 and partly #6439. Desired behaviour after receiving `engine_getBlobs` response: 1. Gossip verify the blobs and proofs, but don't mark them as observed yet. This is because not all blobs are published immediately (due to staggered publishing). If we mark them as observed and not publish them, we could end up blocking the gossip propagation. 2. Blobs are marked as observed _either_ when: * They are received from gossip and forwarded to the network . * They are published by the node. Current behaviour: - ❗ We only gossip verify `engine_getBlobsV1` responses, but not `engine_getBlobsV2` responses (PeerDAS). - ❗ After importing EL blobs AND before they're published, if the same blobs arrive via gossip, they will get re-processed, which may result in a re-import. 1. Perform gossip verification on data columns computed from EL `getBlobsV2` response. We currently only do this for `getBlobsV1` to prevent importing blobs with invalid proofs into the `DataAvailabilityChecker`, this should be done on V2 responses too. 2. Add additional gossip verification to make sure we don't re-process a ~~blob~~ or data column that was imported via the EL `getBlobs` but not yet "seen" on the gossip network. If an "unobserved" gossip blob is found in the availability cache, then we know it has passed verification so we can immediately propagate the `ACCEPT` result and forward it to the network, but without re-processing it. UPDATE: I've left blobs out for the second change mentioned above, as the likelihood and impact is very slow and we haven't seen it enough, but under PeerDAS this issue is a regular occurrence and we do see the same block getting imported many times.	2025-05-26 19:55:58 +00:00
Jimmy Chen	a85d863fb6	Merge branch 'unstable' into peerdas-devnet-7	2025-05-26 14:42:18 +10:00
Jimmy Chen	f01dc556d1	Update `engine_getBlobsV2` response type and add `getBlobsV2` tests (#7505 ) Update `engine_getBlobsV2` response type to `Option<Vec<BlobsAndProofV2>>`. See recent spec change [here](https://github.com/ethereum/execution-apis/pull/630). Added some tests to cover basic fetch blob scenarios.	2025-05-26 04:33:34 +00:00
Akihito Nakano	a2797d4bbd	Fix formatting errors from cargo-sort (#7512 ) [cargo-sort is currently failing on CI](https://github.com/sigp/lighthouse/actions/runs/15198128212/job/42746931918?pr=7025), likely due to new checks introduced in version [2.0.0](https://github.com/DevinR528/cargo-sort/releases/tag/v2.0.0). Fixed the errors by running cargo-sort with formatting enabled.	2025-05-23 05:25:56 +00:00
dapplion	801659d4ae	Resolve some TODOs	2025-05-22 01:06:57 -05:00
dapplion	4fb2ae658a	Implement reliable range sync for PeerDAS	2025-05-22 00:03:25 -05:00
ethDreamer	6af8c187e0	Publish EL Info in Metrics (#7052 ) Since we now know the EL version, we should publish this to our metrics periodically.	2025-05-22 02:51:30 +00:00
Akihito Nakano	cf0f959855	Improve log readability during rpc_tests (#7180 ) It is unclear from the logs during rpc_tests whether the output comes from the sender or the receiver. ``` 2025-03-20T11:21:50.038868Z DEBUG rpc_tests: Sending message 2 2025-03-20T11:21:50.041129Z DEBUG rpc_tests: Sender received a response 2025-03-20T11:21:50.041242Z DEBUG rpc_tests: Chunk received 2025-03-20T11:21:51.040837Z DEBUG rpc_tests: Sending message 3 2025-03-20T11:21:51.043635Z DEBUG rpc_tests: Sender received a response 2025-03-20T11:21:51.043855Z DEBUG rpc_tests: Chunk received 2025-03-20T11:21:52.043427Z DEBUG rpc_tests: Sending message 4 2025-03-20T11:21:52.052831Z DEBUG rpc_tests: Sender received a response 2025-03-20T11:21:52.052953Z DEBUG rpc_tests: Chunk received 2025-03-20T11:21:53.045589Z DEBUG rpc_tests: Sending message 5 2025-03-20T11:21:53.052718Z DEBUG rpc_tests: Sender received a response 2025-03-20T11:21:53.052825Z DEBUG rpc_tests: Chunk received 2025-03-20T11:21:54.049157Z DEBUG rpc_tests: Sending message 6 2025-03-20T11:21:54.058072Z DEBUG rpc_tests: Sender received a response 2025-03-20T11:21:54.058603Z DEBUG rpc_tests: Chunk received 2025-03-20T11:21:55.018822Z DEBUG Swarm::poll: libp2p_gossipsub::behaviour: Starting heartbeat 2025-03-20T11:21:55.018953Z DEBUG Swarm::poll: libp2p_gossipsub::behaviour: Completed Heartbeat 2025-03-20T11:21:55.027100Z DEBUG Swarm::poll: libp2p_gossipsub::behaviour: Starting heartbeat 2025-03-20T11:21:55.027199Z DEBUG Swarm::poll: libp2p_gossipsub::behaviour: Completed Heartbeat ``` Added `info_span` to both the sender and receiver in each test. ``` 2025-03-20T11:20:04.172699Z DEBUG Receiver: rpc_tests: Sending message 2 2025-03-20T11:20:04.179147Z DEBUG Sender: rpc_tests: Sender received a response 2025-03-20T11:20:04.179281Z DEBUG Sender: rpc_tests: Chunk received 2025-03-20T11:20:05.175300Z DEBUG Receiver: rpc_tests: Sending message 3 2025-03-20T11:20:05.177202Z DEBUG Sender: rpc_tests: Sender received a response 2025-03-20T11:20:05.177292Z DEBUG Sender: rpc_tests: Chunk received 2025-03-20T11:20:06.176868Z DEBUG Receiver: rpc_tests: Sending message 4 2025-03-20T11:20:06.179379Z DEBUG Sender: rpc_tests: Sender received a response 2025-03-20T11:20:06.179460Z DEBUG Sender: rpc_tests: Chunk received 2025-03-20T11:20:07.179257Z DEBUG Receiver: rpc_tests: Sending message 5 2025-03-20T11:20:07.181386Z DEBUG Sender: rpc_tests: Sender received a response 2025-03-20T11:20:07.181503Z DEBUG Sender: rpc_tests: Chunk received 2025-03-20T11:20:08.181428Z DEBUG Receiver: rpc_tests: Sending message 6 2025-03-20T11:20:08.190231Z DEBUG Sender: rpc_tests: Sender received a response 2025-03-20T11:20:08.190358Z DEBUG Sender: rpc_tests: Chunk received 2025-03-20T11:20:09.151699Z DEBUG Sender:Swarm::poll: libp2p_gossipsub::behaviour: Starting heartbeat 2025-03-20T11:20:09.151748Z DEBUG Sender:Swarm::poll: libp2p_gossipsub::behaviour: Completed Heartbeat 2025-03-20T11:20:09.160244Z DEBUG Receiver:Swarm::poll: libp2p_gossipsub::behaviour: Starting heartbeat 2025-03-20T11:20:09.160288Z DEBUG Receiver:Swarm::poll: libp2p_gossipsub::behaviour: Completed Heartbeat ```	2025-05-22 02:51:25 +00:00
Akihito Nakano	537fc5bde8	Revive network-test logs files in CI (#7459 ) https://github.com/sigp/lighthouse/issues/7187 This PR adds a writer that implements `tracing_subscriber::fmt::MakeWriter`, which writes logs to separate files for each test.	2025-05-22 02:51:22 +00:00
Pawan Dhananjay	817f14c349	Send execution_requests in fulu (#7500 ) N/A Sends execution requests with fulu builder bid.	2025-05-22 02:51:20 +00:00
Akihito Nakano	a8035d7395	Enable stdout logging in rpc_tests (#7506 ) Currently `test_delayed_rpc_response` is flaky (possibly specific to Windows?), but I'm not sure why. Enabled stdout logging in rpc_tests. Note that in nextest, std output is only displayed when a test fails.	2025-05-22 02:14:48 +00:00
Michael Sproul	2e96e9769b	Use slice.is_sorted now that it's stable (#7507 ) Use slice.is_sorted which was stabilised in Rust 1.82.0 I thought there would be more places we could use this, but it seems we often want to check strict monotonicity (i.e. sorted + no duplicates)	2025-05-22 02:14:46 +00:00
Lion - dapplion	b014675b7a	Fix PeerDAS sync scoring (#7352 ) * Remove request tracking inside syncing chains * Prioritize by range peers in network context * Prioritize custody peers for columns by range * Explicit error handling of the no peers error case * Remove good_peers_on_sampling_subnets * Count AwaitingDownload towards the buffer limit * Retry syncing chains in AwaitingDownload state * Use same peer priorization for lookups * Review PR * Address TODOs * Revert changes to peer erroring in range sync * Revert metrics changes * Update comment * Pass peers_to_deprioritize to select_columns_by_range_peers_to_request * more idiomatic * Idiomatic while * Add note about infinite loop * Use while let * Fix wrong custody column count for lookup blocks * Remove impl * Remove stale comment * Fix build errors. * Or default * Review PR * BatchPeerGroup * Match block and blob signatures * Explicit match statement to BlockError in range sync * Remove todo in BatchPeerGroup * Remove participating peers from backfill sync * Remove MissingAllCustodyColumns error * Merge fixes * Clean up PR * Consistent naming of batch_peers * Address multiple review comments * Better errors for das * Penalize column peers once * Restore fn * Fix error enum * Removed MismatchedPublicKeyLen * Revert testing changes * Change BlockAndCustodyColumns enum variant * Revert type change in import_historical_block_batch * Drop pubkey cache * Don't collect Vec * Classify errors * Remove ReconstructColumnsError * More detailed UnrequestedSlot error * Lint test * Fix slot conversion * Reduce penalty for missing blobs * Revert changes in peer selection * Lint tests * Rename block matching functions * Reorder block matching in historical blocks * Fix order of block matching * Add store tests * Filter blockchain in assert_correct_historical_block_chain * Also filter before KZG checks * Lint tests * Fix lint * Fix fulu err assertion * Check point is not at infinity * Fix ws sync test * Revert dropping filter fn --------- Co-authored-by: Jimmy Chen <jchen.tc@gmail.com> Co-authored-by: Jimmy Chen <jimmy@sigmaprime.io> Co-authored-by: Pawan Dhananjay <pawandhananjay@gmail.com>	2025-05-21 23:06:42 +10:00
chonghe	7e2df6b602	Empty list `[]` to return all validators balances (#7474 ) The endpoint `/eth/v1/beacon/states/head/validator_balances` returns an empty data when the data field is `[]`. According to the beacon API spec, it should return the balances of all validators: Reference: https://ethereum.github.io/beacon-APIs/#/Beacon/postStateValidatorBalances `If the supplied list is empty (i.e. the body is []) or no body is supplied then balances will be returned for all validators.` This PR changes so that: `curl -X 'POST' 'http://localhost:5052/eth/v1/beacon/states/head/validator_balances' -d '[]' \| jq` returns balances of all validators.	2025-05-20 07:18:29 +00:00
Michael Sproul	805c2dc831	Correct reward denominator in op pool (#5047 ) Closes #5016 The op pool was using the wrong denominator when calculating proposer block rewards! This was mostly inconsequential as our studies of Lighthouse's block profitability already showed that it is very close to optimal. The wrong denominator was leftover from phase0 code, and wasn't properly updated for Altair.	2025-05-20 01:06:40 +00:00
ethDreamer	7684d1f866	ContextDeserialize and Beacon API Improvements (#7372 ) * #7286 * BeaconAPI is not returning a versioned response when it should for some V1 endpoints * these [strange functions with vX in the name that still accept `endpoint_version` arguments](https://github.com/sigp/lighthouse/blob/stable/beacon_node/http_api/src/produce_block.rs#L192) This refactor is a prerequisite to get the fulu EF tests running.	2025-05-19 05:05:16 +00:00
Pawan Dhananjay	23ad833747	Change default EngineState to online (#7417 ) Resolves https://github.com/sigp/lighthouse/issues/7414 The health endpoint returns a 503 if the engine state is offline. The default state for the engine is `Offline`. So until the first request to the EL is made and the state is updated, the health endpoint will keep returning 503s. This PR changes the default state to Online to avoid that. I don't think this causes any issues because in case the EL is actually offline, the first fcu will set the state to offline. Pending testing on kurtosis.	2025-05-16 19:04:30 +00:00

1 2 3 4 5 ...

3455 Commits