Commit Graph

3439 Commits

Author SHA1 Message Date
Eitan Seri-Levi
6786b9d12a Single attestation "Full" implementation (#7444)
#6970


  This allows for us to receive `SingleAttestation` over gossip and process it without converting. There is still a conversion to `Attestation` as a final step in the attestation verification process, but by then the `SingleAttestation` is fully verified.

I've also fully removed the `submitPoolAttestationsV1` endpoint as its been deprecated

I've also pre-emptively deprecated supporting `Attestation` in `submitPoolAttestationsV2` endpoint. See here for more info: https://github.com/ethereum/beacon-APIs/pull/531

I tried to the minimize the diff here by only making the "required" changes. There are some unnecessary complexities with the way we manage the different attestation verification wrapper types. We could probably consolidate this to one wrapper type and refactor this even further. We could leave that to a separate PR if we feel like cleaning things up in the future.

Note that I've also updated the test harness to always submit `SingleAttestation` regardless of fork variant. I don't see a problem in that approach and it allows us to delete more code :)
2025-06-17 09:01:26 +00:00
Jimmy Chen
3d2d65bf8d Advertise --advertise-false-custody-group-count for testing PeerDAS (#7593)
#6973
2025-06-16 11:10:28 +00:00
Jimmy Chen
6135f417a2 Add data columns sidecars debug beacon API (#7591)
Beacon API spec PR: https://github.com/ethereum/beacon-APIs/pull/537
2025-06-15 14:20:16 +00:00
Akihito Nakano
dc5f5af3eb Fix flaky test_rpc_block_reprocessing (#7595)
The test occasionally fails, likely because the 10ms fixed delay after block processing isn't insufficient when the system is under load.

https://github.com/sigp/lighthouse/pull/7522#issuecomment-2914595667


  Replace single assertion with retry loop.
2025-06-14 00:54:19 +00:00
Daniel Knopik
ccd99c138c Wait before column reconstruction (#7588) 2025-06-13 18:19:06 +00:00
Jimmy Chen
a65f78222d Drop stale registrations without reducing CGC (#7594)
Currently the validator effective balance used for computing PeerDAS custody group count is only updated when the validator subscribes to the BN via  `validator/beacon_committee_subscriptions`.

If a validator stops registering with the node, the effective balance gets outdated and stays in the BN memory until the next restart. They are no longer required for CGC computation, as long as the CGC never reduces as per the spec, therefore they can be dropped.
2025-06-13 14:30:43 +00:00
Daniel Knopik
5472cb8500 Batch verify KZG proofs for getBlobsV2 (#7582) 2025-06-12 14:35:14 +00:00
Pawan Dhananjay
9803d69d80 Implement status v2 version (#7590)
N/A


  Implements status v2 as defined in https://github.com/ethereum/consensus-specs/pull/4374/
2025-06-12 07:17:06 +00:00
Pawan Dhananjay
5f208bb858 Implement basic validator custody framework (no backfill) (#7578)
Resolves #6767


  This PR implements a basic version of validator custody.
- It introduces a new `CustodyContext` object which contains info regarding number of validators attached to a node and  the custody count they contribute to the cgc.
- The `CustodyContext` is added in the da_checker and has methods for returning the current cgc and the number of columns to sample at head. Note that the logic for returning the cgc existed previously in the network globals.
- To estimate the number of validators attached, we use the `beacon_committee_subscriptions` endpoint. This might overestimate the number of validators actually publishing attestations from the node in the case of multi BN setups. We could also potentially use the `publish_attestations` endpoint to get a more conservative estimate at a later point.
- Anytime there's a change in the `custody_group_count` due to addition/removal of validators, the custody context should send an event on a broadcast channnel. The only subscriber for the channel exists in the network service which simply subscribes to more subnets. There can be additional subscribers in sync that will start a backfill once the cgc changes.

TODO

- [ ] **NOT REQUIRED:** Currently, the logic only handles an increase in validator count and does not handle a decrease. We should ideally unsubscribe from subnets when the cgc has decreased.
- [ ] **NOT REQUIRED:** Add a service in the `CustodyContext` that emits an event once `MIN_EPOCHS_FOR_BLOB_SIDECARS_REQUESTS ` passes after updating the current cgc. This event should be picked up by a subscriber which updates the enr and metadata.
- [x] Add more tests
2025-06-11 18:10:06 +00:00
Pawan Dhananjay
076a1c3fae Data column sidecar event (#7587)
N/A


  Implement events for data column sidecar https://github.com/ethereum/beacon-APIs/pull/535
2025-06-11 16:39:22 +00:00
Jimmy Chen
8c6abc0b69 Optimise parallelism in compute cells operations by zipping first (#7574)
We're seeing slow KZG performance on `fusaka-devnet-0` and looking for optimisations to improve performance.

Zipping the list first then `into_par_iter` shows a 10% improvement in performance benchmark, i suspect this might be even more material when running on a beacon node.

Before:
```
blobs_to_data_column_sidecars_20
time:   [11.583 ms 12.041 ms 12.534 ms]
Found 5 outliers among 100 measurements (5.00%)
```

After:
```
blobs_to_data_column_sidecars_20
time:   [10.506 ms 10.724 ms 10.982 ms]
change: [-14.925% -10.941% -6.5452%] (p = 0.00 < 0.05)
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
```
2025-06-09 12:41:14 +00:00
ethDreamer
b08d49c4cb Changes for fusaka-devnet-1 (#7559)
Changes for [fusaka-devnet-1](https://notes.ethereum.org/@ethpandaops/fusaka-devnet-1)


  [Consensus Specs v1.6.0-alpha.1](https://github.com/ethereum/consensus-specs/pull/4346)
* [EIP-7917: Deterministic Proposer Lookahead](https://eips.ethereum.org/EIPS/eip-7917)
* [EIP-7892: Blob Parameter Only Hardforks](https://eips.ethereum.org/EIPS/eip-7892)
2025-06-09 09:10:08 +00:00
Lion - dapplion
d457ceeaaf Don't create child lookup if parent is faulty (#7118)
Issue discovered on PeerDAS devnet (node `lighthouse-geth-2.peerdas-devnet-5.ethpandaops.io`). Summary:

- A lookup is created for block root `0x28299de15843970c8ea4f95f11f07f75e76a690f9a8af31d354c38505eebbe12`
- That block or a parent is faulty and `0x28299de15843970c8ea4f95f11f07f75e76a690f9a8af31d354c38505eebbe12` is added to the failed chains cache
- We later receive a block that is a child of a child of `0x28299de15843970c8ea4f95f11f07f75e76a690f9a8af31d354c38505eebbe12`
- We create a lookup, which attempts to process the child of `0x28299de15843970c8ea4f95f11f07f75e76a690f9a8af31d354c38505eebbe12` and hit a processor error `UnknownParent`, hitting this line

bf955c7543/beacon_node/network/src/sync/block_lookups/mod.rs (L686-L688)

`search_parent_of_child` does not create a parent lookup because the parent root is in the failed chain cache. However, we have **already** marked the child as awaiting the parent. This results in an inconsistent state of lookup sync, as there's a lookup awaiting a parent that doesn't exist.

Now we have a lookup (the child of `0x28299de15843970c8ea4f95f11f07f75e76a690f9a8af31d354c38505eebbe12`) that is awaiting a parent lookup that doesn't exist: hence stuck.

### Impact

This bug can affect Mainnet as well as PeerDAS devnets.

This bug may stall lookup sync for a few minutes (up to `LOOKUP_MAX_DURATION_STUCK_SECS = 15 min`) until the stuck prune routine deletes it. By that time the root will be cleared from the failed chain cache and sync should succeed. During that time the user will see a lot of `WARN` logs when attempting to add each peer to the inconsistent lookup. We may also sync the block through range sync if we fall behind by more than 2 epochs. We may also create the parent lookup successfully after the failed cache clears and complete the child lookup.

This bug is triggered if:
- We have a lookup that fails and its root is added to the failed chain cache (much more likely to happen in PeerDAS networks)
- We receive a block that builds on a child of the block added to the failed chain cache


  Ensure that we never create (or leave existing) a lookup that references a non-existing parent.

I added `must_use` lints to the functions that create lookups. To fix the specific bug we must recursively drop the child lookup if the parent is not created. So if `search_parent_of_child` returns `false` now return `LookupRequestError::Failed` instead of `LookupResult::Pending`.

As a bonus I have a added more logging and reason strings to the errors
2025-06-05 08:53:43 +00:00
Jimmy Chen
357a8ccbb9 Checkpoint sync without the blobs from Fulu (#7549)
Lighthouse currently requires checkpoint sync to be performed against a supernode in a PeerDAS network, as only supernodes can serve blobs.

This PR lifts that requirement, enabling Lighthouse to checkpoint sync from either a fullnode or a supernode (See https://github.com/sigp/lighthouse/issues/6837#issuecomment-2933094923)

Missing data columns for the checkpoint block isn't a big issue, but we should be able to easily implement backfill once we have the logic to backfill data columns.
2025-06-04 00:31:27 +00:00
ethDreamer
ae30480926 Implement EIP-7892 BPO hardforks (#7521)
[EIP-7892: Blob Parameter Only Hardforks](https://eips.ethereum.org/EIPS/eip-7892)

#7467
2025-06-02 06:54:42 +00:00
Jimmy Chen
94a1446ac9 Fix unexpected blob error and duplicate import in fetch blobs (#7541)
Getting this error on a non-PeerDAS network:

```
May 29 13:30:13.484 ERROR Error fetching or processing blobs from EL    error: BlobProcessingError(AvailabilityCheck(Unexpected("empty blobs"))), block_root: 0x98aa3927056d453614fefbc79eb1f9865666d1f119d0e8aa9e6f4d02aa9395d9
```

It appears we're passing an empty `Vec` to DA checker, because all blobs were already seen on gossip and filtered out, this causes a `AvailabilityCheckError::Unexpected("empty blobs")`.

I've added equivalent unit tests for `getBlobsV1` to cover all the scenarios we test in `getBlobsV2`. This would have caught the bug if I had added it earlier. It also caught another bug which could trigger duplicate block import.

Thanks Santito for reporting this! 🙏
2025-06-02 01:51:09 +00:00
Jimmy Chen
4d21846aba Prevent AvailabilityCheckError when there's no new custody columns to import (#7533)
Addresses a regression recently introduced when we started gossip verifying data columns from EL blobs

```
failures:
network_beacon_processor::tests::accept_processed_gossip_data_columns_without_import

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 90 filtered out; finished in 16.60s

stderr ───

thread 'network_beacon_processor::tests::accept_processed_gossip_data_columns_without_import' panicked at beacon_node/network/src/network_beacon_processor/tests.rs:829:10:
should put data columns into availability cache: Unexpected("empty columns")
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
```

https://github.com/sigp/lighthouse/actions/runs/15309278812/job/43082341868?pr=7521

If an empty `Vec` is passed to the DA checker, it causes an unexpected error.

This PR addresses it by not passing an empty `Vec` for processing, and not spawning a task to publish.
2025-05-29 02:54:34 +00:00
Akihito Nakano
5cda6a6f9e Mitigate flakiness in test_delayed_rpc_response (#7522)
https://github.com/sigp/lighthouse/issues/7466


  Expanded the margin from 100ms to 500ms.
2025-05-29 01:37:04 +00:00
Mac L
0ddf9a99d6 Remove support for database migrations prior to schema version v22 (#7332)
Remove deprecated database migrations prior to v22 along with v22 migration specific code.
2025-05-28 13:47:21 +00:00
Akihito Nakano
8989ef8fb1 Enable arithmetic lint in rate-limiter (#7025)
https://github.com/sigp/lighthouse/issues/6875


  - Enabled the linter in rate-limiter and fixed errors.
- Changed the type of `Quota::max_tokens` from `u64` to `NonZeroU64` because `max_tokens` cannot be zero.
- Added a test to ensure that a large value for `tokens`, which causes an overflow, is handled properly.
2025-05-27 15:43:22 +00:00
Michael Sproul
7c89b970af Handle attestation validation errors (#7382)
Partly addresses:

- https://github.com/sigp/lighthouse/issues/7379


  Handle attestation validation errors from `get_attesting_indices` to prevent an error log, downscore the peer, and reject the message.
2025-05-27 01:55:17 +00:00
Jimmy Chen
e6ef644db4 Verify getBlobsV2 response and avoid reprocessing imported data columns (#7493)
#7461 and partly #6439.

Desired behaviour after receiving `engine_getBlobs` response:

1. Gossip verify the blobs and proofs, but don't mark them as observed yet. This is because not all blobs are published immediately (due to staggered publishing). If we mark them as observed and not publish them, we could end up blocking the gossip propagation.
2. Blobs are marked as observed _either_ when:
* They are received from gossip and forwarded to the network .
* They are published by the node.

Current behaviour:
-  We only gossip verify `engine_getBlobsV1` responses, but not `engine_getBlobsV2` responses (PeerDAS).
-  After importing EL blobs AND before they're published, if the same blobs arrive via gossip, they will get re-processed, which may result in a re-import.


  1. Perform gossip verification on data columns computed from EL `getBlobsV2` response. We currently only do this for `getBlobsV1` to prevent importing blobs with invalid proofs into the `DataAvailabilityChecker`, this should be done on V2 responses too.
2. Add additional gossip verification to make sure we don't re-process a ~~blob~~ or data column that was imported via the EL `getBlobs` but not yet "seen" on the gossip network. If an "unobserved" gossip blob is found in the availability cache, then we know it has passed verification so we can immediately propagate the `ACCEPT` result and forward it to the network, but without re-processing it.

**UPDATE:** I've left blobs out for the second change mentioned above, as the likelihood and impact is very slow and we haven't seen it enough, but under PeerDAS this issue is a regular occurrence and we do see the same block getting imported many times.
2025-05-26 19:55:58 +00:00
Jimmy Chen
f01dc556d1 Update engine_getBlobsV2 response type and add getBlobsV2 tests (#7505)
Update `engine_getBlobsV2` response type to `Option<Vec<BlobsAndProofV2>>`. See recent spec change [here](https://github.com/ethereum/execution-apis/pull/630).

Added some tests to cover basic fetch blob scenarios.
2025-05-26 04:33:34 +00:00
Akihito Nakano
a2797d4bbd Fix formatting errors from cargo-sort (#7512)
[cargo-sort is currently failing on CI](https://github.com/sigp/lighthouse/actions/runs/15198128212/job/42746931918?pr=7025), likely due to new checks introduced in version [2.0.0](https://github.com/DevinR528/cargo-sort/releases/tag/v2.0.0).


  Fixed the errors by running cargo-sort with formatting enabled.
2025-05-23 05:25:56 +00:00
ethDreamer
6af8c187e0 Publish EL Info in Metrics (#7052)
Since we now know the EL version, we should publish this to our metrics periodically.
2025-05-22 02:51:30 +00:00
Akihito Nakano
cf0f959855 Improve log readability during rpc_tests (#7180)
It is unclear from the logs during rpc_tests whether the output comes from the sender or the receiver.

```
2025-03-20T11:21:50.038868Z DEBUG rpc_tests: Sending message 2
2025-03-20T11:21:50.041129Z DEBUG rpc_tests: Sender received a response
2025-03-20T11:21:50.041242Z DEBUG rpc_tests: Chunk received
2025-03-20T11:21:51.040837Z DEBUG rpc_tests: Sending message 3
2025-03-20T11:21:51.043635Z DEBUG rpc_tests: Sender received a response
2025-03-20T11:21:51.043855Z DEBUG rpc_tests: Chunk received
2025-03-20T11:21:52.043427Z DEBUG rpc_tests: Sending message 4
2025-03-20T11:21:52.052831Z DEBUG rpc_tests: Sender received a response
2025-03-20T11:21:52.052953Z DEBUG rpc_tests: Chunk received
2025-03-20T11:21:53.045589Z DEBUG rpc_tests: Sending message 5
2025-03-20T11:21:53.052718Z DEBUG rpc_tests: Sender received a response
2025-03-20T11:21:53.052825Z DEBUG rpc_tests: Chunk received
2025-03-20T11:21:54.049157Z DEBUG rpc_tests: Sending message 6
2025-03-20T11:21:54.058072Z DEBUG rpc_tests: Sender received a response
2025-03-20T11:21:54.058603Z DEBUG rpc_tests: Chunk received
2025-03-20T11:21:55.018822Z DEBUG Swarm::poll: libp2p_gossipsub::behaviour: Starting heartbeat
2025-03-20T11:21:55.018953Z DEBUG Swarm::poll: libp2p_gossipsub::behaviour: Completed Heartbeat
2025-03-20T11:21:55.027100Z DEBUG Swarm::poll: libp2p_gossipsub::behaviour: Starting heartbeat
2025-03-20T11:21:55.027199Z DEBUG Swarm::poll: libp2p_gossipsub::behaviour: Completed Heartbeat
```


  Added `info_span` to both the sender and receiver in each test.

```
2025-03-20T11:20:04.172699Z DEBUG Receiver: rpc_tests: Sending message 2
2025-03-20T11:20:04.179147Z DEBUG Sender: rpc_tests: Sender received a response
2025-03-20T11:20:04.179281Z DEBUG Sender: rpc_tests: Chunk received
2025-03-20T11:20:05.175300Z DEBUG Receiver: rpc_tests: Sending message 3
2025-03-20T11:20:05.177202Z DEBUG Sender: rpc_tests: Sender received a response
2025-03-20T11:20:05.177292Z DEBUG Sender: rpc_tests: Chunk received
2025-03-20T11:20:06.176868Z DEBUG Receiver: rpc_tests: Sending message 4
2025-03-20T11:20:06.179379Z DEBUG Sender: rpc_tests: Sender received a response
2025-03-20T11:20:06.179460Z DEBUG Sender: rpc_tests: Chunk received
2025-03-20T11:20:07.179257Z DEBUG Receiver: rpc_tests: Sending message 5
2025-03-20T11:20:07.181386Z DEBUG Sender: rpc_tests: Sender received a response
2025-03-20T11:20:07.181503Z DEBUG Sender: rpc_tests: Chunk received
2025-03-20T11:20:08.181428Z DEBUG Receiver: rpc_tests: Sending message 6
2025-03-20T11:20:08.190231Z DEBUG Sender: rpc_tests: Sender received a response
2025-03-20T11:20:08.190358Z DEBUG Sender: rpc_tests: Chunk received
2025-03-20T11:20:09.151699Z DEBUG Sender:Swarm::poll: libp2p_gossipsub::behaviour: Starting heartbeat
2025-03-20T11:20:09.151748Z DEBUG Sender:Swarm::poll: libp2p_gossipsub::behaviour: Completed Heartbeat
2025-03-20T11:20:09.160244Z DEBUG Receiver:Swarm::poll: libp2p_gossipsub::behaviour: Starting heartbeat
2025-03-20T11:20:09.160288Z DEBUG Receiver:Swarm::poll: libp2p_gossipsub::behaviour: Completed Heartbeat
```
2025-05-22 02:51:25 +00:00
Akihito Nakano
537fc5bde8 Revive network-test logs files in CI (#7459)
https://github.com/sigp/lighthouse/issues/7187


  This PR adds a writer that implements `tracing_subscriber::fmt::MakeWriter`, which writes logs to separate files for each test.
2025-05-22 02:51:22 +00:00
Pawan Dhananjay
817f14c349 Send execution_requests in fulu (#7500)
N/A


  Sends execution requests with fulu builder bid.
2025-05-22 02:51:20 +00:00
Akihito Nakano
a8035d7395 Enable stdout logging in rpc_tests (#7506)
Currently `test_delayed_rpc_response` is flaky (possibly specific to Windows?), but I'm not sure why.


  Enabled stdout logging in rpc_tests. Note that in nextest, std output is only displayed when a test fails.
2025-05-22 02:14:48 +00:00
Michael Sproul
2e96e9769b Use slice.is_sorted now that it's stable (#7507)
Use slice.is_sorted which was stabilised in Rust 1.82.0

I thought there would be more places we could use this, but it seems we often want to check strict monotonicity (i.e. sorted + no duplicates)
2025-05-22 02:14:46 +00:00
chonghe
7e2df6b602 Empty list [] to return all validators balances (#7474)
The endpoint `/eth/v1/beacon/states/head/validator_balances` returns an empty data when the data field is `[]`. According to the beacon API spec, it should return the balances of all validators:

Reference: https://ethereum.github.io/beacon-APIs/#/Beacon/postStateValidatorBalances
`If the supplied list is empty (i.e. the body is []) or no body is supplied then balances will be returned for all validators.`


  This PR changes so that: `curl -X 'POST' 'http://localhost:5052/eth/v1/beacon/states/head/validator_balances' -d '[]' | jq` returns balances of all validators.
2025-05-20 07:18:29 +00:00
Michael Sproul
805c2dc831 Correct reward denominator in op pool (#5047)
Closes #5016


  The op pool was using the wrong denominator when calculating proposer block rewards! This was mostly inconsequential as our studies of Lighthouse's block profitability already showed that it is very close to optimal. The wrong denominator was leftover from phase0 code, and wasn't properly updated for Altair.
2025-05-20 01:06:40 +00:00
ethDreamer
7684d1f866 ContextDeserialize and Beacon API Improvements (#7372)
* #7286
* BeaconAPI is not returning a versioned response when it should for some V1 endpoints
* these [strange functions with vX in the name that still accept `endpoint_version` arguments](https://github.com/sigp/lighthouse/blob/stable/beacon_node/http_api/src/produce_block.rs#L192)

This refactor is a prerequisite to get the fulu EF tests running.
2025-05-19 05:05:16 +00:00
Pawan Dhananjay
23ad833747 Change default EngineState to online (#7417)
Resolves https://github.com/sigp/lighthouse/issues/7414


  The health endpoint returns a 503 if the engine state is offline. The default state for the engine is `Offline`. So until the first request to the EL is made and the state is updated, the health endpoint will keep returning 503s.

This PR changes the default state to Online to avoid that. I don't think this causes any issues because in case the EL is actually offline, the first fcu will set the state to offline.

Pending testing on kurtosis.
2025-05-16 19:04:30 +00:00
Eitan Seri-Levi
268809a530 Rust clippy 1.87 lint fixes (#7471)
Fix clippy lints for `rustc` 1.87


  clippy complains about `BeaconChainError` being too large. I went on a bit of a boxing spree because of this. We may instead want to `Box` some of the `BeaconChainError` variants?
2025-05-16 05:03:00 +00:00
Odinson
1853d836b7 Added E::slots_per_epoch() to deneb time calculation (#7458)
Which issue # does this PR address?

Closes #7457


  Added `E::slots_per_epoch()` and now it ensures conversion from epochs to slots while calculating deneb time
2025-05-15 07:31:31 +00:00
Michael Sproul
c2c7fb87a8 Make DAG construction more permissive (#7460)
Workaround/fix for:

- https://github.com/sigp/lighthouse/issues/7323


  - Remove the `StateSummariesNotContiguousError`. This allows us to continue with DAG construction and pruning, even in the case where the DAG is disjointed. We will treat any disjoint summaries as roots of their own tree, and prune them (as they are not descended from finalized). This should be safe, as canonical summaries should not be disjoint (if they are, then the DB is already corrupt).
2025-05-15 02:15:35 +00:00
Eitan Seri-Levi
807848bc7a Next sync committee branch bug (#7443)
#7441


  Make sure we're correctly caching light client data
2025-05-13 01:13:15 +00:00
SunnysidedJ
593390162f peerdas-devnet-7: update DataColumnSidecarsByRoot request to use DataColumnsByRootIdentifier (#7399)
Update DataColumnSidecarsByRoot request to use DataColumnsByRootIdentifier #7377


  As described in https://github.com/ethereum/consensus-specs/pull/4284
2025-05-12 00:20:55 +00:00
Lion - dapplion
a497ec601c Retry custody requests after peer metadata updates (#6975)
Closes https://github.com/sigp/lighthouse/issues/6895

We need sync to retry custody requests when a peer CGC updates. A higher CGC can result in a data column subnet peer count increasing from 0 to 1, allowing requests to happen.


  Add new sync event `SyncMessage::UpdatedPeerCgc`. It's sent by the router when a metadata response updates the known CGC
2025-05-09 08:27:17 +00:00
Jimmy Chen
4b9c16fc71 Add Electra forks to basic sim tests (#7199)
This PR adds transitions to Electra ~~and Fulu~~ fork epochs in the simulator tests.

~~It also covers blob inclusion verification and data column syncing on a full node in Fulu.~~

UPDATE: Remove fulu fork from sim tests due to https://github.com/sigp/lighthouse/pull/7199#issuecomment-2852281176
2025-05-08 08:43:44 +00:00
Jimmy Chen
0f13029c7d Don't publish data columns reconstructed from RPC columns to the gossip network (#7409)
Don't publish data columns reconstructed from RPC columns to the gossip network, as this may result in peer downscoring if we're sending columns from past slots.
2025-05-07 23:24:48 +00:00
Lion - dapplion
beb0ce68bd Make range sync peer loadbalancing PeerDAS-friendly (#6922)
- Re-opens https://github.com/sigp/lighthouse/pull/6864 targeting unstable

Range sync and backfill sync still assume that each batch request is done by a single peer. This assumption breaks with PeerDAS, where we request custody columns to N peers.

Issues with current unstable:

- Peer prioritization counts batch requests per peer. This accounting is broken now, data columns by range request are not accounted
- Peer selection for data columns by range ignores the set of peers on a syncing chain, instead draws from the global pool of peers
- The implementation is very strict when we have no peers to request from. After PeerDAS this case is very common and we want to be flexible or easy and handle that case better than just hard failing everything.


  - [x] Upstream peer prioritization to the network context, it knows exactly how many active requests a peer (including columns by range)
- [x] Upstream peer selection to the network context, now `block_components_by_range_request` gets a set of peers to choose from instead of a single peer. If it can't find a peer, it returns the error `RpcRequestSendError::NoPeer`
- [ ] Range sync and backfill sync handle `RpcRequestSendError::NoPeer` explicitly
- [ ] Range sync: leaves the batch in `AwaitingDownload` state and does nothing. **TODO**: we should have some mechanism to fail the chain if it's stale for too long - **EDIT**: Not done in this PR
- [x] Backfill sync: pauses the sync until another peer joins - **EDIT**: Same logic as unstable

### TODOs

- [ ] Add tests :)
- [x] Manually test backfill sync

Note: this touches the mainnet path!
2025-05-07 02:03:07 +00:00
Lion - dapplion
2aa5d5c25e Make sure to log SyncingChain ID (#7359)
Debugging an sync issue from @pawanjay176 I'm missing some key info where instead of logging the ID of the SyncingChain we just log "Finalized" (the sync type). This looks like some typo or something was lost in translation when refactoring things.

```
Apr 17 12:12:00.707 DEBUG Syncing new finalized chain                   chain: Finalized, component: "range_sync"
```

This log should include more info about the new chain but just logs "Finalized"

```
Apr 17 12:12:00.810 DEBUG New chain added to sync                       peer_id: "16Uiu2HAmHP8QLYQJwZ4cjMUEyRgxzpkJF87qPgNecLTpUdruYbdA", sync_type: Finalized, new_chain: Finalized, component: "range_sync"
```


  - Remove the Display impl and log the ID explicitly for all logs.
- Log more details when creating a new SyncingChain
2025-05-01 19:53:29 +00:00
Jimmy Chen
93ec9df137 Compute proposer shuffling only once in gossip verification (#7304)
When we perform data column gossip verification, we sometimes see multiple proposer shuffling cache miss simultaneously and this results in multiple threads computing the shuffling cache and potentially slows down the gossip verification.

Proposal here is to use a `OnceCell` for each shuffling key to make sure it's only computed once. I have only implemented this in data column verification as a PoC, but this can also be applied to blob and block verification

Related issues:
- https://github.com/sigp/lighthouse/issues/4447
- https://github.com/sigp/lighthouse/issues/7203
2025-05-01 01:30:42 +00:00
Eitan Seri-Levi
9779b4ba2c Optimize validate_data_columns (#7326) 2025-04-30 04:36:50 +00:00
Akihito Nakano
1324d3d3c4 Delayed RPC Send Using Tokens (#5923)
closes https://github.com/sigp/lighthouse/issues/5785


  The diagram below shows the differences in how the receiver (responder) behaves before and after this PR. The following sentences will detail the changes.

```mermaid
flowchart TD

subgraph "*** After ***"
Start2([START]) --> AA[Receive request]
AA --> COND1{Is there already an active request <br> with the same protocol?}
COND1 --> |Yes| CC[Send error response]
CC --> End2([END])
%% COND1 --> |No| COND2{Request is too large?}
%% COND2 --> |Yes| CC
COND1 --> |No| DD[Process request]
DD --> EE{Rate limit reached?}
EE --> |Yes| FF[Wait until tokens are regenerated]
FF --> EE
EE --> |No| GG[Send response]
GG --> End2
end

subgraph "*** Before ***"
Start([START]) --> A[Receive request]
A --> B{Rate limit reached <br> or <br> request is too large?}
B -->|Yes| C[Send error response]
C --> End([END])
B -->|No| E[Process request]
E --> F[Send response]
F --> End
end
```

### `Is there already an active request with the same protocol?`

This check is not performed in `Before`. This is taken from the PR in the consensus-spec, which proposes updates regarding rate limiting and response timeout.
https://github.com/ethereum/consensus-specs/pull/3767/files
> The requester MUST NOT make more than two concurrent requests with the same ID.

The PR mentions the requester side. In this PR, I introduced the `ActiveRequestsLimiter` for the `responder` side to restrict more than two requests from running simultaneously on the same protocol per peer. If the limiter disallows a request, the responder sends a rate-limited error and penalizes the requester.



### `Rate limit reached?` and `Wait until tokens are regenerated`

UPDATE: I moved the limiter logic to the behaviour side. https://github.com/sigp/lighthouse/pull/5923#issuecomment-2379535927

~~The rate limiter is shared between the behaviour and the handler.  (`Arc<Mutex<RateLimiter>>>`) The handler checks the rate limit and queues the response if the limit is reached. The behaviour handles pruning.~~

~~I considered not sharing the rate limiter between the behaviour and the handler, and performing all of these either within the behaviour or handler. However, I decided against this for the following reasons:~~

- ~~Regarding performing everything within the behaviour: The behaviour is unable to recognize the response protocol when `RPC::send_response()` is called, especially when the response is `RPCCodedResponse::Error`. Therefore, the behaviour can't rate limit responses based on the response protocol.~~
- ~~Regarding performing everything within the handler: When multiple connections are established with a peer, there could be multiple handlers interacting with that peer. Thus, we cannot enforce rate limiting per peer solely within the handler. (Any ideas? 🤔 )~~
2025-04-24 03:46:16 +00:00
chonghe
c13e069c9c Revise logging when queue is full (#7324) 2025-04-22 22:46:30 +00:00
Michael Sproul
e61e92b926 Merge remote-tracking branch 'origin/stable' into unstable 2025-04-22 18:55:06 +10:00
Michael Sproul
54f7bc5b2c Release v7.0.0 (#7288)
New v7.0.0 release for Electra on mainnet.
2025-04-22 09:21:03 +10:00