Commit Graph

3467 Commits

Author SHA1 Message Date
dapplion
cb5f76f137 Add peers to backfill if FullySynced 2025-06-12 19:41:20 +02:00
dapplion
aa726cc72c lint 2025-06-12 19:29:14 +02:00
dapplion
56fcf289ec lint 2025-06-12 15:45:36 +02:00
dapplion
8c8a8124ee Merge remote-tracking branch 'sigp/peerdas-devnet-7' into peerdas-rangesync 2025-06-12 15:40:00 +02:00
dapplion
a7a3457def Merge branch 'peerdas-devnet-7' into peerdas-rangesync 2025-06-12 15:37:45 +02:00
Jimmy Chen
895ce18343 Merge branch 'unstable' into peerdas-devnet-7
# Conflicts:
#	beacon_node/beacon_chain/src/block_verification_types.rs
#	beacon_node/beacon_chain/src/data_availability_checker.rs
#	beacon_node/beacon_chain/src/test_utils.rs
#	beacon_node/beacon_chain/tests/block_verification.rs
#	beacon_node/network/src/sync/block_sidecar_coupling.rs
#	beacon_node/network/src/sync/network_context.rs
#	beacon_node/network/src/sync/tests/range.rs
2025-06-12 09:46:00 +02:00
Pawan Dhananjay
5f208bb858 Implement basic validator custody framework (no backfill) (#7578)
Resolves #6767


  This PR implements a basic version of validator custody.
- It introduces a new `CustodyContext` object which contains info regarding number of validators attached to a node and  the custody count they contribute to the cgc.
- The `CustodyContext` is added in the da_checker and has methods for returning the current cgc and the number of columns to sample at head. Note that the logic for returning the cgc existed previously in the network globals.
- To estimate the number of validators attached, we use the `beacon_committee_subscriptions` endpoint. This might overestimate the number of validators actually publishing attestations from the node in the case of multi BN setups. We could also potentially use the `publish_attestations` endpoint to get a more conservative estimate at a later point.
- Anytime there's a change in the `custody_group_count` due to addition/removal of validators, the custody context should send an event on a broadcast channnel. The only subscriber for the channel exists in the network service which simply subscribes to more subnets. There can be additional subscribers in sync that will start a backfill once the cgc changes.

TODO

- [ ] **NOT REQUIRED:** Currently, the logic only handles an increase in validator count and does not handle a decrease. We should ideally unsubscribe from subnets when the cgc has decreased.
- [ ] **NOT REQUIRED:** Add a service in the `CustodyContext` that emits an event once `MIN_EPOCHS_FOR_BLOB_SIDECARS_REQUESTS ` passes after updating the current cgc. This event should be picked up by a subscriber which updates the enr and metadata.
- [x] Add more tests
2025-06-11 18:10:06 +00:00
Pawan Dhananjay
076a1c3fae Data column sidecar event (#7587)
N/A


  Implement events for data column sidecar https://github.com/ethereum/beacon-APIs/pull/535
2025-06-11 16:39:22 +00:00
dapplion
82c8e82fe1 Re-add NoPeers error 2025-06-11 16:46:18 +02:00
dapplion
e426e45455 Don't use failed_peers for download errors, rely on randomness to skip potentially faulty peers 2025-06-11 12:38:55 +02:00
dapplion
4e13b3be0f Fix failed_peers post fulu 2025-06-11 11:49:25 +02:00
dapplion
7a03578795 Remove total_requests_per_peer 2025-06-11 11:21:12 +02:00
dapplion
dbce5f7734 Merge remote-tracking branch 'sigp/unstable' into peerdas-devnet-7 2025-06-11 11:03:38 +02:00
dapplion
28d9d8b8e2 lint 2025-06-11 11:02:37 +02:00
Jimmy Chen
8c6abc0b69 Optimise parallelism in compute cells operations by zipping first (#7574)
We're seeing slow KZG performance on `fusaka-devnet-0` and looking for optimisations to improve performance.

Zipping the list first then `into_par_iter` shows a 10% improvement in performance benchmark, i suspect this might be even more material when running on a beacon node.

Before:
```
blobs_to_data_column_sidecars_20
time:   [11.583 ms 12.041 ms 12.534 ms]
Found 5 outliers among 100 measurements (5.00%)
```

After:
```
blobs_to_data_column_sidecars_20
time:   [10.506 ms 10.724 ms 10.982 ms]
change: [-14.925% -10.941% -6.5452%] (p = 0.00 < 0.05)
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
```
2025-06-09 12:41:14 +00:00
ethDreamer
b08d49c4cb Changes for fusaka-devnet-1 (#7559)
Changes for [fusaka-devnet-1](https://notes.ethereum.org/@ethpandaops/fusaka-devnet-1)


  [Consensus Specs v1.6.0-alpha.1](https://github.com/ethereum/consensus-specs/pull/4346)
* [EIP-7917: Deterministic Proposer Lookahead](https://eips.ethereum.org/EIPS/eip-7917)
* [EIP-7892: Blob Parameter Only Hardforks](https://eips.ethereum.org/EIPS/eip-7892)
2025-06-09 09:10:08 +00:00
Jimmy Chen
6f754bfd8d Merge branch 'peerdas-devnet-7' into peerdas-rangesync 2025-06-05 23:39:03 +10:00
Jimmy Chen
4fadf1fba8 Merge branch 'unstable' into peerdas-devnet-7 2025-06-05 23:38:31 +10:00
Lion - dapplion
d457ceeaaf Don't create child lookup if parent is faulty (#7118)
Issue discovered on PeerDAS devnet (node `lighthouse-geth-2.peerdas-devnet-5.ethpandaops.io`). Summary:

- A lookup is created for block root `0x28299de15843970c8ea4f95f11f07f75e76a690f9a8af31d354c38505eebbe12`
- That block or a parent is faulty and `0x28299de15843970c8ea4f95f11f07f75e76a690f9a8af31d354c38505eebbe12` is added to the failed chains cache
- We later receive a block that is a child of a child of `0x28299de15843970c8ea4f95f11f07f75e76a690f9a8af31d354c38505eebbe12`
- We create a lookup, which attempts to process the child of `0x28299de15843970c8ea4f95f11f07f75e76a690f9a8af31d354c38505eebbe12` and hit a processor error `UnknownParent`, hitting this line

bf955c7543/beacon_node/network/src/sync/block_lookups/mod.rs (L686-L688)

`search_parent_of_child` does not create a parent lookup because the parent root is in the failed chain cache. However, we have **already** marked the child as awaiting the parent. This results in an inconsistent state of lookup sync, as there's a lookup awaiting a parent that doesn't exist.

Now we have a lookup (the child of `0x28299de15843970c8ea4f95f11f07f75e76a690f9a8af31d354c38505eebbe12`) that is awaiting a parent lookup that doesn't exist: hence stuck.

### Impact

This bug can affect Mainnet as well as PeerDAS devnets.

This bug may stall lookup sync for a few minutes (up to `LOOKUP_MAX_DURATION_STUCK_SECS = 15 min`) until the stuck prune routine deletes it. By that time the root will be cleared from the failed chain cache and sync should succeed. During that time the user will see a lot of `WARN` logs when attempting to add each peer to the inconsistent lookup. We may also sync the block through range sync if we fall behind by more than 2 epochs. We may also create the parent lookup successfully after the failed cache clears and complete the child lookup.

This bug is triggered if:
- We have a lookup that fails and its root is added to the failed chain cache (much more likely to happen in PeerDAS networks)
- We receive a block that builds on a child of the block added to the failed chain cache


  Ensure that we never create (or leave existing) a lookup that references a non-existing parent.

I added `must_use` lints to the functions that create lookups. To fix the specific bug we must recursively drop the child lookup if the parent is not created. So if `search_parent_of_child` returns `false` now return `LookupRequestError::Failed` instead of `LookupResult::Pending`.

As a bonus I have a added more logging and reason strings to the errors
2025-06-05 08:53:43 +00:00
dapplion
ae0ef8f929 Fix finalized_sync_permanent_custody_peer_failure 2025-06-04 23:02:56 -06:00
Jimmy Chen
2b4a9bda44 Merge branch 'peerdas-devnet-7' into peerdas-rangesync 2025-06-04 16:06:23 +10:00
Jimmy Chen
b9ce98a3e5 Merge branch 'unstable' into peerdas-devnet-7 2025-06-04 16:05:38 +10:00
Jimmy Chen
357a8ccbb9 Checkpoint sync without the blobs from Fulu (#7549)
Lighthouse currently requires checkpoint sync to be performed against a supernode in a PeerDAS network, as only supernodes can serve blobs.

This PR lifts that requirement, enabling Lighthouse to checkpoint sync from either a fullnode or a supernode (See https://github.com/sigp/lighthouse/issues/6837#issuecomment-2933094923)

Missing data columns for the checkpoint block isn't a big issue, but we should be able to easily implement backfill once we have the logic to backfill data columns.
2025-06-04 00:31:27 +00:00
Jimmy Chen
1b72871ad1 Merge branch 'peerdas-devnet-7' into peerdas-rangesync 2025-06-03 18:20:54 +10:00
Jimmy Chen
42ef88bdb4 Merge branch 'unstable' into peerdas-devnet-7
# Conflicts:
#	beacon_node/beacon_chain/src/data_availability_checker.rs
2025-06-03 18:19:07 +10:00
ethDreamer
ae30480926 Implement EIP-7892 BPO hardforks (#7521)
[EIP-7892: Blob Parameter Only Hardforks](https://eips.ethereum.org/EIPS/eip-7892)

#7467
2025-06-02 06:54:42 +00:00
Jimmy Chen
94a1446ac9 Fix unexpected blob error and duplicate import in fetch blobs (#7541)
Getting this error on a non-PeerDAS network:

```
May 29 13:30:13.484 ERROR Error fetching or processing blobs from EL    error: BlobProcessingError(AvailabilityCheck(Unexpected("empty blobs"))), block_root: 0x98aa3927056d453614fefbc79eb1f9865666d1f119d0e8aa9e6f4d02aa9395d9
```

It appears we're passing an empty `Vec` to DA checker, because all blobs were already seen on gossip and filtered out, this causes a `AvailabilityCheckError::Unexpected("empty blobs")`.

I've added equivalent unit tests for `getBlobsV1` to cover all the scenarios we test in `getBlobsV2`. This would have caught the bug if I had added it earlier. It also caught another bug which could trigger duplicate block import.

Thanks Santito for reporting this! 🙏
2025-06-02 01:51:09 +00:00
Jimmy Chen
4d21846aba Prevent AvailabilityCheckError when there's no new custody columns to import (#7533)
Addresses a regression recently introduced when we started gossip verifying data columns from EL blobs

```
failures:
network_beacon_processor::tests::accept_processed_gossip_data_columns_without_import

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 90 filtered out; finished in 16.60s

stderr ───

thread 'network_beacon_processor::tests::accept_processed_gossip_data_columns_without_import' panicked at beacon_node/network/src/network_beacon_processor/tests.rs:829:10:
should put data columns into availability cache: Unexpected("empty columns")
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
```

https://github.com/sigp/lighthouse/actions/runs/15309278812/job/43082341868?pr=7521

If an empty `Vec` is passed to the DA checker, it causes an unexpected error.

This PR addresses it by not passing an empty `Vec` for processing, and not spawning a task to publish.
2025-05-29 02:54:34 +00:00
Akihito Nakano
5cda6a6f9e Mitigate flakiness in test_delayed_rpc_response (#7522)
https://github.com/sigp/lighthouse/issues/7466


  Expanded the margin from 100ms to 500ms.
2025-05-29 01:37:04 +00:00
Mac L
0ddf9a99d6 Remove support for database migrations prior to schema version v22 (#7332)
Remove deprecated database migrations prior to v22 along with v22 migration specific code.
2025-05-28 13:47:21 +00:00
dapplion
c6b39e9e10 Merge remote-tracking branch 'sigp/peerdas-devnet-7' into peerdas-rangesync 2025-05-27 16:20:34 -05:00
dapplion
02d97377a5 Address review comments 2025-05-27 16:07:45 -05:00
dapplion
144b83e625 Remove BatchStateSummary 2025-05-27 15:52:14 -05:00
dapplion
0ef95dd7f8 Remove stale TODO 2025-05-27 15:33:39 -05:00
dapplion
fc3922f854 Resolve more TODOs 2025-05-27 15:32:29 -05:00
dapplion
52722b7b2e Resolve TODO(das) 2025-05-27 14:28:52 -05:00
dapplion
86ad87eced Lint tests 2025-05-27 12:21:42 -05:00
Akihito Nakano
8989ef8fb1 Enable arithmetic lint in rate-limiter (#7025)
https://github.com/sigp/lighthouse/issues/6875


  - Enabled the linter in rate-limiter and fixed errors.
- Changed the type of `Quota::max_tokens` from `u64` to `NonZeroU64` because `max_tokens` cannot be zero.
- Added a test to ensure that a large value for `tokens`, which causes an overflow, is handled properly.
2025-05-27 15:43:22 +00:00
dapplion
8f74adc66f Use DataColumnSidecarList 2025-05-27 00:43:38 -05:00
dapplion
34b37b97ed Remove unused module 2025-05-27 00:37:12 -05:00
Michael Sproul
7c89b970af Handle attestation validation errors (#7382)
Partly addresses:

- https://github.com/sigp/lighthouse/issues/7379


  Handle attestation validation errors from `get_attesting_indices` to prevent an error log, downscore the peer, and reject the message.
2025-05-27 01:55:17 +00:00
dapplion
01329ab230 Improve RangeBlockComponent type 2025-05-26 19:07:15 -05:00
dapplion
c8a0c9e379 Remove CustodyByRoot and CustodyByRange types 2025-05-26 19:04:50 -05:00
dapplion
7d0fb93274 Reduce conversions 2025-05-26 18:49:45 -05:00
dapplion
b383f7af53 More comments 2025-05-26 18:37:20 -05:00
Jimmy Chen
e6ef644db4 Verify getBlobsV2 response and avoid reprocessing imported data columns (#7493)
#7461 and partly #6439.

Desired behaviour after receiving `engine_getBlobs` response:

1. Gossip verify the blobs and proofs, but don't mark them as observed yet. This is because not all blobs are published immediately (due to staggered publishing). If we mark them as observed and not publish them, we could end up blocking the gossip propagation.
2. Blobs are marked as observed _either_ when:
* They are received from gossip and forwarded to the network .
* They are published by the node.

Current behaviour:
-  We only gossip verify `engine_getBlobsV1` responses, but not `engine_getBlobsV2` responses (PeerDAS).
-  After importing EL blobs AND before they're published, if the same blobs arrive via gossip, they will get re-processed, which may result in a re-import.


  1. Perform gossip verification on data columns computed from EL `getBlobsV2` response. We currently only do this for `getBlobsV1` to prevent importing blobs with invalid proofs into the `DataAvailabilityChecker`, this should be done on V2 responses too.
2. Add additional gossip verification to make sure we don't re-process a ~~blob~~ or data column that was imported via the EL `getBlobs` but not yet "seen" on the gossip network. If an "unobserved" gossip blob is found in the availability cache, then we know it has passed verification so we can immediately propagate the `ACCEPT` result and forward it to the network, but without re-processing it.

**UPDATE:** I've left blobs out for the second change mentioned above, as the likelihood and impact is very slow and we haven't seen it enough, but under PeerDAS this issue is a regular occurrence and we do see the same block getting imported many times.
2025-05-26 19:55:58 +00:00
Jimmy Chen
a85d863fb6 Merge branch 'unstable' into peerdas-devnet-7 2025-05-26 14:42:18 +10:00
Jimmy Chen
f01dc556d1 Update engine_getBlobsV2 response type and add getBlobsV2 tests (#7505)
Update `engine_getBlobsV2` response type to `Option<Vec<BlobsAndProofV2>>`. See recent spec change [here](https://github.com/ethereum/execution-apis/pull/630).

Added some tests to cover basic fetch blob scenarios.
2025-05-26 04:33:34 +00:00
Akihito Nakano
a2797d4bbd Fix formatting errors from cargo-sort (#7512)
[cargo-sort is currently failing on CI](https://github.com/sigp/lighthouse/actions/runs/15198128212/job/42746931918?pr=7025), likely due to new checks introduced in version [2.0.0](https://github.com/DevinR528/cargo-sort/releases/tag/v2.0.0).


  Fixed the errors by running cargo-sort with formatting enabled.
2025-05-23 05:25:56 +00:00
dapplion
801659d4ae Resolve some TODOs 2025-05-22 01:06:57 -05:00