Commit Graph

1047 Commits

Author SHA1 Message Date
dapplion
ae0ef8f929 Fix finalized_sync_permanent_custody_peer_failure 2025-06-04 23:02:56 -06:00
Jimmy Chen
1b72871ad1 Merge branch 'peerdas-devnet-7' into peerdas-rangesync 2025-06-03 18:20:54 +10:00
Jimmy Chen
42ef88bdb4 Merge branch 'unstable' into peerdas-devnet-7
# Conflicts:
#	beacon_node/beacon_chain/src/data_availability_checker.rs
2025-06-03 18:19:07 +10:00
ethDreamer
ae30480926 Implement EIP-7892 BPO hardforks (#7521)
[EIP-7892: Blob Parameter Only Hardforks](https://eips.ethereum.org/EIPS/eip-7892)

#7467
2025-06-02 06:54:42 +00:00
dapplion
c6b39e9e10 Merge remote-tracking branch 'sigp/peerdas-devnet-7' into peerdas-rangesync 2025-05-27 16:20:34 -05:00
dapplion
02d97377a5 Address review comments 2025-05-27 16:07:45 -05:00
dapplion
144b83e625 Remove BatchStateSummary 2025-05-27 15:52:14 -05:00
dapplion
0ef95dd7f8 Remove stale TODO 2025-05-27 15:33:39 -05:00
dapplion
fc3922f854 Resolve more TODOs 2025-05-27 15:32:29 -05:00
dapplion
52722b7b2e Resolve TODO(das) 2025-05-27 14:28:52 -05:00
dapplion
86ad87eced Lint tests 2025-05-27 12:21:42 -05:00
dapplion
8f74adc66f Use DataColumnSidecarList 2025-05-27 00:43:38 -05:00
dapplion
34b37b97ed Remove unused module 2025-05-27 00:37:12 -05:00
Michael Sproul
7c89b970af Handle attestation validation errors (#7382)
Partly addresses:

- https://github.com/sigp/lighthouse/issues/7379


  Handle attestation validation errors from `get_attesting_indices` to prevent an error log, downscore the peer, and reject the message.
2025-05-27 01:55:17 +00:00
dapplion
01329ab230 Improve RangeBlockComponent type 2025-05-26 19:07:15 -05:00
dapplion
c8a0c9e379 Remove CustodyByRoot and CustodyByRange types 2025-05-26 19:04:50 -05:00
dapplion
7d0fb93274 Reduce conversions 2025-05-26 18:49:45 -05:00
dapplion
b383f7af53 More comments 2025-05-26 18:37:20 -05:00
Jimmy Chen
e6ef644db4 Verify getBlobsV2 response and avoid reprocessing imported data columns (#7493)
#7461 and partly #6439.

Desired behaviour after receiving `engine_getBlobs` response:

1. Gossip verify the blobs and proofs, but don't mark them as observed yet. This is because not all blobs are published immediately (due to staggered publishing). If we mark them as observed and not publish them, we could end up blocking the gossip propagation.
2. Blobs are marked as observed _either_ when:
* They are received from gossip and forwarded to the network .
* They are published by the node.

Current behaviour:
-  We only gossip verify `engine_getBlobsV1` responses, but not `engine_getBlobsV2` responses (PeerDAS).
-  After importing EL blobs AND before they're published, if the same blobs arrive via gossip, they will get re-processed, which may result in a re-import.


  1. Perform gossip verification on data columns computed from EL `getBlobsV2` response. We currently only do this for `getBlobsV1` to prevent importing blobs with invalid proofs into the `DataAvailabilityChecker`, this should be done on V2 responses too.
2. Add additional gossip verification to make sure we don't re-process a ~~blob~~ or data column that was imported via the EL `getBlobs` but not yet "seen" on the gossip network. If an "unobserved" gossip blob is found in the availability cache, then we know it has passed verification so we can immediately propagate the `ACCEPT` result and forward it to the network, but without re-processing it.

**UPDATE:** I've left blobs out for the second change mentioned above, as the likelihood and impact is very slow and we haven't seen it enough, but under PeerDAS this issue is a regular occurrence and we do see the same block getting imported many times.
2025-05-26 19:55:58 +00:00
Jimmy Chen
a85d863fb6 Merge branch 'unstable' into peerdas-devnet-7 2025-05-26 14:42:18 +10:00
Akihito Nakano
a2797d4bbd Fix formatting errors from cargo-sort (#7512)
[cargo-sort is currently failing on CI](https://github.com/sigp/lighthouse/actions/runs/15198128212/job/42746931918?pr=7025), likely due to new checks introduced in version [2.0.0](https://github.com/DevinR528/cargo-sort/releases/tag/v2.0.0).


  Fixed the errors by running cargo-sort with formatting enabled.
2025-05-23 05:25:56 +00:00
dapplion
801659d4ae Resolve some TODOs 2025-05-22 01:06:57 -05:00
dapplion
4fb2ae658a Implement reliable range sync for PeerDAS 2025-05-22 00:03:25 -05:00
Akihito Nakano
537fc5bde8 Revive network-test logs files in CI (#7459)
https://github.com/sigp/lighthouse/issues/7187


  This PR adds a writer that implements `tracing_subscriber::fmt::MakeWriter`, which writes logs to separate files for each test.
2025-05-22 02:51:22 +00:00
Lion - dapplion
b014675b7a Fix PeerDAS sync scoring (#7352)
* Remove request tracking inside syncing chains

* Prioritize by range peers in network context

* Prioritize custody peers for columns by range

* Explicit error handling of the no peers error case

* Remove good_peers_on_sampling_subnets

* Count AwaitingDownload towards the buffer limit

* Retry syncing chains in AwaitingDownload state

* Use same peer priorization for lookups

* Review PR

* Address TODOs

* Revert changes to peer erroring in range sync

* Revert metrics changes

* Update comment

* Pass peers_to_deprioritize to select_columns_by_range_peers_to_request

* more idiomatic

* Idiomatic while

* Add note about infinite loop

* Use while let

* Fix wrong custody column count for lookup blocks

* Remove impl

* Remove stale comment

* Fix build errors.

* Or default

* Review PR

* BatchPeerGroup

* Match block and blob signatures

* Explicit match statement to BlockError in range sync

* Remove todo in BatchPeerGroup

* Remove participating peers from backfill sync

* Remove MissingAllCustodyColumns error

* Merge fixes

* Clean up PR

* Consistent naming of batch_peers

* Address multiple review comments

* Better errors for das

* Penalize column peers once

* Restore fn

* Fix error enum

* Removed MismatchedPublicKeyLen

* Revert testing changes

* Change BlockAndCustodyColumns enum variant

* Revert type change in import_historical_block_batch

* Drop pubkey cache

* Don't collect Vec

* Classify errors

* Remove ReconstructColumnsError

* More detailed UnrequestedSlot error

* Lint test

* Fix slot conversion

* Reduce penalty for missing blobs

* Revert changes in peer selection

* Lint tests

* Rename block matching functions

* Reorder block matching in historical blocks

* Fix order of block matching

* Add store tests

* Filter blockchain in assert_correct_historical_block_chain

* Also filter before KZG checks

* Lint tests

* Fix lint

* Fix fulu err assertion

* Check point is not at infinity

* Fix ws sync test

* Revert dropping filter fn

---------

Co-authored-by: Jimmy Chen <jchen.tc@gmail.com>
Co-authored-by: Jimmy Chen <jimmy@sigmaprime.io>
Co-authored-by: Pawan Dhananjay <pawandhananjay@gmail.com>
2025-05-21 23:06:42 +10:00
Eitan Seri-Levi
268809a530 Rust clippy 1.87 lint fixes (#7471)
Fix clippy lints for `rustc` 1.87


  clippy complains about `BeaconChainError` being too large. I went on a bit of a boxing spree because of this. We may instead want to `Box` some of the `BeaconChainError` variants?
2025-05-16 05:03:00 +00:00
SunnysidedJ
593390162f peerdas-devnet-7: update DataColumnSidecarsByRoot request to use DataColumnsByRootIdentifier (#7399)
Update DataColumnSidecarsByRoot request to use DataColumnsByRootIdentifier #7377


  As described in https://github.com/ethereum/consensus-specs/pull/4284
2025-05-12 00:20:55 +00:00
Lion - dapplion
a497ec601c Retry custody requests after peer metadata updates (#6975)
Closes https://github.com/sigp/lighthouse/issues/6895

We need sync to retry custody requests when a peer CGC updates. A higher CGC can result in a data column subnet peer count increasing from 0 to 1, allowing requests to happen.


  Add new sync event `SyncMessage::UpdatedPeerCgc`. It's sent by the router when a metadata response updates the known CGC
2025-05-09 08:27:17 +00:00
Jimmy Chen
0f13029c7d Don't publish data columns reconstructed from RPC columns to the gossip network (#7409)
Don't publish data columns reconstructed from RPC columns to the gossip network, as this may result in peer downscoring if we're sending columns from past slots.
2025-05-07 23:24:48 +00:00
Lion - dapplion
beb0ce68bd Make range sync peer loadbalancing PeerDAS-friendly (#6922)
- Re-opens https://github.com/sigp/lighthouse/pull/6864 targeting unstable

Range sync and backfill sync still assume that each batch request is done by a single peer. This assumption breaks with PeerDAS, where we request custody columns to N peers.

Issues with current unstable:

- Peer prioritization counts batch requests per peer. This accounting is broken now, data columns by range request are not accounted
- Peer selection for data columns by range ignores the set of peers on a syncing chain, instead draws from the global pool of peers
- The implementation is very strict when we have no peers to request from. After PeerDAS this case is very common and we want to be flexible or easy and handle that case better than just hard failing everything.


  - [x] Upstream peer prioritization to the network context, it knows exactly how many active requests a peer (including columns by range)
- [x] Upstream peer selection to the network context, now `block_components_by_range_request` gets a set of peers to choose from instead of a single peer. If it can't find a peer, it returns the error `RpcRequestSendError::NoPeer`
- [ ] Range sync and backfill sync handle `RpcRequestSendError::NoPeer` explicitly
- [ ] Range sync: leaves the batch in `AwaitingDownload` state and does nothing. **TODO**: we should have some mechanism to fail the chain if it's stale for too long - **EDIT**: Not done in this PR
- [x] Backfill sync: pauses the sync until another peer joins - **EDIT**: Same logic as unstable

### TODOs

- [ ] Add tests :)
- [x] Manually test backfill sync

Note: this touches the mainnet path!
2025-05-07 02:03:07 +00:00
Lion - dapplion
2aa5d5c25e Make sure to log SyncingChain ID (#7359)
Debugging an sync issue from @pawanjay176 I'm missing some key info where instead of logging the ID of the SyncingChain we just log "Finalized" (the sync type). This looks like some typo or something was lost in translation when refactoring things.

```
Apr 17 12:12:00.707 DEBUG Syncing new finalized chain                   chain: Finalized, component: "range_sync"
```

This log should include more info about the new chain but just logs "Finalized"

```
Apr 17 12:12:00.810 DEBUG New chain added to sync                       peer_id: "16Uiu2HAmHP8QLYQJwZ4cjMUEyRgxzpkJF87qPgNecLTpUdruYbdA", sync_type: Finalized, new_chain: Finalized, component: "range_sync"
```


  - Remove the Display impl and log the ID explicitly for all logs.
- Log more details when creating a new SyncingChain
2025-05-01 19:53:29 +00:00
Jimmy Chen
476f3a593c Add MAX_BLOBS_PER_BLOCK_FULU config (#7161)
Add `MAX_BLOBS_PER_BLOCK_FULU` config.
2025-04-15 00:20:46 +00:00
Lion - dapplion
be68dd24d0 Fix wrong custody column count for lookup blocks (#7281)
Fixes
- https://github.com/sigp/lighthouse/issues/7278


  Don't assume 0 columns for `RpcBlockInner::Block`
2025-04-11 22:00:57 +00:00
Mac L
39eb8145f8 Merge branch 'release-v7.0.0' into unstable 2025-04-11 21:32:24 +10:00
SunnysidedJ
d96b73152e Fix for #6296: Deterministic RNG in peer DAS publish block tests (#7192)
#6296: Deterministic RNG in peer DAS publish block tests


  Made test functions to call publish-block APIs with true for the deterministic RNG boolean parameter while production code with false. This will deterministically shuffle columns for unit tests under broadcast_validation_tests.rs.
2025-04-09 15:35:15 +00:00
Jimmy Chen
759b0612b3 Offloading KZG Proof Computation from the beacon node (#7117)
Addresses #7108

- Add EL integration for `getPayloadV5` and `getBlobsV2`
- Offload proof computation and use proofs from EL RPC APIs
2025-04-08 07:37:16 +00:00
Jimmy Chen
e924264e17 Fullnodes to publish data columns from EL getBlobs (#7258)
Previously only supernode contributes to data column publishing in Lighthouse.

Recently we've [updated the spec](https://github.com/ethereum/consensus-specs/pull/4183) to have full nodes publishing data columns as well, to ensure all nodes contributes to propagation.

This also prevents already imported data columns from being imported again (because we don't "observe" them), and ensures columns that are observed in the [gossip seen cache](d60c24ef1c/beacon_node/beacon_chain/src/data_column_verification.rs (L492)) are forwarded to its peers, rather than being ignored.
2025-04-08 03:20:31 +00:00
Lion - dapplion
d511ca0494 Compute roots for unfinalized by_range requests with fork-choice (#7098)
Includes PRs

- https://github.com/sigp/lighthouse/pull/7058
- https://github.com/sigp/lighthouse/pull/7066

Cleaner for the `release-v7.0.0` branch
2025-04-07 03:16:41 +00:00
Jimmy Chen
7cc64cab83 Add missing error log and remove redundant id field from lookup logs (#6990)
Partially #6989.

This PR adds the missing error log when a batch fails due to issues with converting the response into `RpcBlock`. See the above linked issue for more details.

Adding this log reveals that we're completing range requests with missing columns, hence causing the batch to fail. It looks like we've hit the case where we've received enough stream terminations, but not all columns are returned.

```
Feb 12 06:12:16.558 DEBG Failed to convert range block components into RpcBlock, error: No column for block 0xc5b6c7fa02f5ef603d45819c08c6519f1dba661fd5d44a2fc849d3e7028b6007 index 18, id: 3456/RangeSync/116/3432, service: sync, module: network::sync::network_context:488
```

I've also removed some redundant `id` logging, as the `id` debug representation is difficult to read, and is now being logged as part of `req_id` in a more succinct format (relevant PR: #6914)
2025-04-04 09:01:42 +00:00
Mac L
0e6da0fcaf Merge branch 'release-v7.0.0' into v7-backmerge 2025-04-04 13:32:58 +11:00
Mac L
82d1674455 Rust 1.86.0 lints (#7254)
Implement lints for the new Rust compiler version 1.86.0.
2025-04-04 02:30:22 +00:00
Age Manning
d6cd049a45 RPC RequestId Cleanup (#7238)
I've been working at updating another library to latest Lighthouse and got very confused with RPC request Ids.

There were types that had fields called `request_id` and `id`. And interchangeably could have types `PeerRequestId`, `rpc::RequestId`, `AppRequestId`, `api_types::RequestId` or even `Request.id`.

I couldn't keep track of which Id was linked to what and what each type meant.

So this PR mainly does a few things:
- Changes the field naming to match the actual type. So any field that has an  `AppRequestId` will be named `app_request_id` rather than `id` or `request_id` for example.
- I simplified the types. I removed the two different `RequestId` types (one in Lighthouse_network the other in the rpc) and grouped them into one. It has one downside tho. I had to add a few unreachable lines of code in the beacon processor, which the extra type would prevent, but I feel like it might be worth it. Happy to add an extra type to avoid those few lines.
- I also removed the concept of `PeerRequestId` which sometimes went alongside a `request_id`. There were times were had a `PeerRequest` and a `Request` being returned, both of which contain a `RequestId` so we had redundant information. I've simplified the logic by removing `PeerRequestId` and made a `ResponseId`. I think if you look at the code changes, it simplifies things a bit and removes the redundant extra info.

I think with this PR things are a little bit easier to reasonable about what is going on with all these RPC Ids.

NOTE: I did this with the help of AI, so probably should be checked
2025-04-03 10:10:15 +00:00
Jimmy Chen
80626e58d2 Attempt to fix flaky network tests (#7244) 2025-04-03 04:01:34 +00:00
Michael Sproul
bde0f1ef0b Merge remote-tracking branch 'origin/release-v7.0.0' into unstable 2025-03-29 13:01:58 +11:00
Pawan Dhananjay
54aef2d043 Admin add/remove peer (#7198)
N/A


  Adds endpoints to add and remove trusted peers from the http api. The added peers are trusted peers so they won't be disconnected for bad scores. We try to maintain a connection to the peer in case they disconnect from us by trying to dial it every heartbeat.
2025-03-28 12:59:09 +00:00
Lion - dapplion
6f31d44343 Remove CGC from data_availability checker (#7033)
- Part of https://github.com/sigp/lighthouse/issues/6767

Validator custody makes the CGC and set of sampling columns dynamic. Right now this information is stored twice:
- in the data availability checker
- in the network globals

If that state becomes dynamic we must make sure it is in sync updating it twice, or guarding it behind a mutex. However, I noted that we don't really have to keep the CGC inside the data availability checker. All consumers can actually read it from the network globals, and we can update `make_available` to read the expected count of data columns from the block.
2025-03-26 05:19:51 +00:00
Eitan Seri-Levi
2f37bf4de5 Fix more merge conflicts between unstable and release-v7.0.0 2025-03-24 09:13:28 +11:00
Eitan Seri-Levi
cbf1c04a14 resolve merge conflicts between untstable and release-v7.0.0 2025-03-23 11:09:02 -06:00
Lion - dapplion
ca237652f1 Track request IDs in RangeBlockComponentsRequest (#6998)
Part of
- https://github.com/sigp/lighthouse/issues/6258


  `RangeBlockComponentsRequest` handles a set of by_range requests. It's quite lose on these requests, not tracking them by ID. We want to implement individual request retries, so we must make `RangeBlockComponentsRequest` aware of its requests IDs. We don't want the result of a prior by_range request to affect the state of a future retry. Lookup sync uses this mechanism.

Now `RangeBlockComponentsRequest` tracks:

```rust
pub struct RangeBlockComponentsRequest<E: EthSpec> {
blocks_request: ByRangeRequest<BlocksByRangeRequestId, Vec<Arc<SignedBeaconBlock<E>>>>,
block_data_request: RangeBlockDataRequest<E>,
}

enum RangeBlockDataRequest<E: EthSpec> {
NoData,
Blobs(ByRangeRequest<BlobsByRangeRequestId, Vec<Arc<BlobSidecar<E>>>>),
DataColumns {
requests: HashMap<
DataColumnsByRangeRequestId,
ByRangeRequest<DataColumnsByRangeRequestId, DataColumnSidecarList<E>>,
>,
expected_custody_columns: Vec<ColumnIndex>,
},
}

enum ByRangeRequest<I: PartialEq + std::fmt::Display, T> {
Active(I),
Complete(T),
}
```

I have merged `is_finished` and `Into_responses` into the same function. Otherwise, we need to duplicate the logic to figure out if the requests are done.
2025-03-21 04:07:30 +00:00
Eitan Seri-Levi
27aabe8159 Pseudo finalization endpoint (#7103)
This is a backport of:
-  https://github.com/sigp/lighthouse/pull/7059
- https://github.com/sigp/lighthouse/pull/7071

For:
- https://github.com/sigp/lighthouse/issues/7039


  Introduce a new lighthouse endpoint that allows a user to force a pseudo finalization. This migrates data to the freezer db and prunes sidechains which may help reduce disk space issues on non finalized networks like Holesky

We also ban peers that send us blocks that conflict with the manually finalized checkpoint.

There were some CI fixes in https://github.com/sigp/lighthouse/pull/7071 that I tried including here

Co-authored with: @jimmygchen  @pawanjay176 @michaelsproul
2025-03-18 05:21:05 +00:00