Commit Graph

3434 Commits

Author SHA1 Message Date
dapplion
c6b39e9e10 Merge remote-tracking branch 'sigp/peerdas-devnet-7' into peerdas-rangesync 2025-05-27 16:20:34 -05:00
dapplion
02d97377a5 Address review comments 2025-05-27 16:07:45 -05:00
dapplion
144b83e625 Remove BatchStateSummary 2025-05-27 15:52:14 -05:00
dapplion
0ef95dd7f8 Remove stale TODO 2025-05-27 15:33:39 -05:00
dapplion
fc3922f854 Resolve more TODOs 2025-05-27 15:32:29 -05:00
dapplion
52722b7b2e Resolve TODO(das) 2025-05-27 14:28:52 -05:00
dapplion
86ad87eced Lint tests 2025-05-27 12:21:42 -05:00
dapplion
8f74adc66f Use DataColumnSidecarList 2025-05-27 00:43:38 -05:00
dapplion
34b37b97ed Remove unused module 2025-05-27 00:37:12 -05:00
dapplion
01329ab230 Improve RangeBlockComponent type 2025-05-26 19:07:15 -05:00
dapplion
c8a0c9e379 Remove CustodyByRoot and CustodyByRange types 2025-05-26 19:04:50 -05:00
dapplion
7d0fb93274 Reduce conversions 2025-05-26 18:49:45 -05:00
dapplion
b383f7af53 More comments 2025-05-26 18:37:20 -05:00
Jimmy Chen
a85d863fb6 Merge branch 'unstable' into peerdas-devnet-7 2025-05-26 14:42:18 +10:00
Jimmy Chen
f01dc556d1 Update engine_getBlobsV2 response type and add getBlobsV2 tests (#7505)
Update `engine_getBlobsV2` response type to `Option<Vec<BlobsAndProofV2>>`. See recent spec change [here](https://github.com/ethereum/execution-apis/pull/630).

Added some tests to cover basic fetch blob scenarios.
2025-05-26 04:33:34 +00:00
Akihito Nakano
a2797d4bbd Fix formatting errors from cargo-sort (#7512)
[cargo-sort is currently failing on CI](https://github.com/sigp/lighthouse/actions/runs/15198128212/job/42746931918?pr=7025), likely due to new checks introduced in version [2.0.0](https://github.com/DevinR528/cargo-sort/releases/tag/v2.0.0).


  Fixed the errors by running cargo-sort with formatting enabled.
2025-05-23 05:25:56 +00:00
dapplion
801659d4ae Resolve some TODOs 2025-05-22 01:06:57 -05:00
dapplion
4fb2ae658a Implement reliable range sync for PeerDAS 2025-05-22 00:03:25 -05:00
ethDreamer
6af8c187e0 Publish EL Info in Metrics (#7052)
Since we now know the EL version, we should publish this to our metrics periodically.
2025-05-22 02:51:30 +00:00
Akihito Nakano
cf0f959855 Improve log readability during rpc_tests (#7180)
It is unclear from the logs during rpc_tests whether the output comes from the sender or the receiver.

```
2025-03-20T11:21:50.038868Z DEBUG rpc_tests: Sending message 2
2025-03-20T11:21:50.041129Z DEBUG rpc_tests: Sender received a response
2025-03-20T11:21:50.041242Z DEBUG rpc_tests: Chunk received
2025-03-20T11:21:51.040837Z DEBUG rpc_tests: Sending message 3
2025-03-20T11:21:51.043635Z DEBUG rpc_tests: Sender received a response
2025-03-20T11:21:51.043855Z DEBUG rpc_tests: Chunk received
2025-03-20T11:21:52.043427Z DEBUG rpc_tests: Sending message 4
2025-03-20T11:21:52.052831Z DEBUG rpc_tests: Sender received a response
2025-03-20T11:21:52.052953Z DEBUG rpc_tests: Chunk received
2025-03-20T11:21:53.045589Z DEBUG rpc_tests: Sending message 5
2025-03-20T11:21:53.052718Z DEBUG rpc_tests: Sender received a response
2025-03-20T11:21:53.052825Z DEBUG rpc_tests: Chunk received
2025-03-20T11:21:54.049157Z DEBUG rpc_tests: Sending message 6
2025-03-20T11:21:54.058072Z DEBUG rpc_tests: Sender received a response
2025-03-20T11:21:54.058603Z DEBUG rpc_tests: Chunk received
2025-03-20T11:21:55.018822Z DEBUG Swarm::poll: libp2p_gossipsub::behaviour: Starting heartbeat
2025-03-20T11:21:55.018953Z DEBUG Swarm::poll: libp2p_gossipsub::behaviour: Completed Heartbeat
2025-03-20T11:21:55.027100Z DEBUG Swarm::poll: libp2p_gossipsub::behaviour: Starting heartbeat
2025-03-20T11:21:55.027199Z DEBUG Swarm::poll: libp2p_gossipsub::behaviour: Completed Heartbeat
```


  Added `info_span` to both the sender and receiver in each test.

```
2025-03-20T11:20:04.172699Z DEBUG Receiver: rpc_tests: Sending message 2
2025-03-20T11:20:04.179147Z DEBUG Sender: rpc_tests: Sender received a response
2025-03-20T11:20:04.179281Z DEBUG Sender: rpc_tests: Chunk received
2025-03-20T11:20:05.175300Z DEBUG Receiver: rpc_tests: Sending message 3
2025-03-20T11:20:05.177202Z DEBUG Sender: rpc_tests: Sender received a response
2025-03-20T11:20:05.177292Z DEBUG Sender: rpc_tests: Chunk received
2025-03-20T11:20:06.176868Z DEBUG Receiver: rpc_tests: Sending message 4
2025-03-20T11:20:06.179379Z DEBUG Sender: rpc_tests: Sender received a response
2025-03-20T11:20:06.179460Z DEBUG Sender: rpc_tests: Chunk received
2025-03-20T11:20:07.179257Z DEBUG Receiver: rpc_tests: Sending message 5
2025-03-20T11:20:07.181386Z DEBUG Sender: rpc_tests: Sender received a response
2025-03-20T11:20:07.181503Z DEBUG Sender: rpc_tests: Chunk received
2025-03-20T11:20:08.181428Z DEBUG Receiver: rpc_tests: Sending message 6
2025-03-20T11:20:08.190231Z DEBUG Sender: rpc_tests: Sender received a response
2025-03-20T11:20:08.190358Z DEBUG Sender: rpc_tests: Chunk received
2025-03-20T11:20:09.151699Z DEBUG Sender:Swarm::poll: libp2p_gossipsub::behaviour: Starting heartbeat
2025-03-20T11:20:09.151748Z DEBUG Sender:Swarm::poll: libp2p_gossipsub::behaviour: Completed Heartbeat
2025-03-20T11:20:09.160244Z DEBUG Receiver:Swarm::poll: libp2p_gossipsub::behaviour: Starting heartbeat
2025-03-20T11:20:09.160288Z DEBUG Receiver:Swarm::poll: libp2p_gossipsub::behaviour: Completed Heartbeat
```
2025-05-22 02:51:25 +00:00
Akihito Nakano
537fc5bde8 Revive network-test logs files in CI (#7459)
https://github.com/sigp/lighthouse/issues/7187


  This PR adds a writer that implements `tracing_subscriber::fmt::MakeWriter`, which writes logs to separate files for each test.
2025-05-22 02:51:22 +00:00
Pawan Dhananjay
817f14c349 Send execution_requests in fulu (#7500)
N/A


  Sends execution requests with fulu builder bid.
2025-05-22 02:51:20 +00:00
Akihito Nakano
a8035d7395 Enable stdout logging in rpc_tests (#7506)
Currently `test_delayed_rpc_response` is flaky (possibly specific to Windows?), but I'm not sure why.


  Enabled stdout logging in rpc_tests. Note that in nextest, std output is only displayed when a test fails.
2025-05-22 02:14:48 +00:00
Michael Sproul
2e96e9769b Use slice.is_sorted now that it's stable (#7507)
Use slice.is_sorted which was stabilised in Rust 1.82.0

I thought there would be more places we could use this, but it seems we often want to check strict monotonicity (i.e. sorted + no duplicates)
2025-05-22 02:14:46 +00:00
Lion - dapplion
b014675b7a Fix PeerDAS sync scoring (#7352)
* Remove request tracking inside syncing chains

* Prioritize by range peers in network context

* Prioritize custody peers for columns by range

* Explicit error handling of the no peers error case

* Remove good_peers_on_sampling_subnets

* Count AwaitingDownload towards the buffer limit

* Retry syncing chains in AwaitingDownload state

* Use same peer priorization for lookups

* Review PR

* Address TODOs

* Revert changes to peer erroring in range sync

* Revert metrics changes

* Update comment

* Pass peers_to_deprioritize to select_columns_by_range_peers_to_request

* more idiomatic

* Idiomatic while

* Add note about infinite loop

* Use while let

* Fix wrong custody column count for lookup blocks

* Remove impl

* Remove stale comment

* Fix build errors.

* Or default

* Review PR

* BatchPeerGroup

* Match block and blob signatures

* Explicit match statement to BlockError in range sync

* Remove todo in BatchPeerGroup

* Remove participating peers from backfill sync

* Remove MissingAllCustodyColumns error

* Merge fixes

* Clean up PR

* Consistent naming of batch_peers

* Address multiple review comments

* Better errors for das

* Penalize column peers once

* Restore fn

* Fix error enum

* Removed MismatchedPublicKeyLen

* Revert testing changes

* Change BlockAndCustodyColumns enum variant

* Revert type change in import_historical_block_batch

* Drop pubkey cache

* Don't collect Vec

* Classify errors

* Remove ReconstructColumnsError

* More detailed UnrequestedSlot error

* Lint test

* Fix slot conversion

* Reduce penalty for missing blobs

* Revert changes in peer selection

* Lint tests

* Rename block matching functions

* Reorder block matching in historical blocks

* Fix order of block matching

* Add store tests

* Filter blockchain in assert_correct_historical_block_chain

* Also filter before KZG checks

* Lint tests

* Fix lint

* Fix fulu err assertion

* Check point is not at infinity

* Fix ws sync test

* Revert dropping filter fn

---------

Co-authored-by: Jimmy Chen <jchen.tc@gmail.com>
Co-authored-by: Jimmy Chen <jimmy@sigmaprime.io>
Co-authored-by: Pawan Dhananjay <pawandhananjay@gmail.com>
2025-05-21 23:06:42 +10:00
chonghe
7e2df6b602 Empty list [] to return all validators balances (#7474)
The endpoint `/eth/v1/beacon/states/head/validator_balances` returns an empty data when the data field is `[]`. According to the beacon API spec, it should return the balances of all validators:

Reference: https://ethereum.github.io/beacon-APIs/#/Beacon/postStateValidatorBalances
`If the supplied list is empty (i.e. the body is []) or no body is supplied then balances will be returned for all validators.`


  This PR changes so that: `curl -X 'POST' 'http://localhost:5052/eth/v1/beacon/states/head/validator_balances' -d '[]' | jq` returns balances of all validators.
2025-05-20 07:18:29 +00:00
Michael Sproul
805c2dc831 Correct reward denominator in op pool (#5047)
Closes #5016


  The op pool was using the wrong denominator when calculating proposer block rewards! This was mostly inconsequential as our studies of Lighthouse's block profitability already showed that it is very close to optimal. The wrong denominator was leftover from phase0 code, and wasn't properly updated for Altair.
2025-05-20 01:06:40 +00:00
ethDreamer
7684d1f866 ContextDeserialize and Beacon API Improvements (#7372)
* #7286
* BeaconAPI is not returning a versioned response when it should for some V1 endpoints
* these [strange functions with vX in the name that still accept `endpoint_version` arguments](https://github.com/sigp/lighthouse/blob/stable/beacon_node/http_api/src/produce_block.rs#L192)

This refactor is a prerequisite to get the fulu EF tests running.
2025-05-19 05:05:16 +00:00
Pawan Dhananjay
23ad833747 Change default EngineState to online (#7417)
Resolves https://github.com/sigp/lighthouse/issues/7414


  The health endpoint returns a 503 if the engine state is offline. The default state for the engine is `Offline`. So until the first request to the EL is made and the state is updated, the health endpoint will keep returning 503s.

This PR changes the default state to Online to avoid that. I don't think this causes any issues because in case the EL is actually offline, the first fcu will set the state to offline.

Pending testing on kurtosis.
2025-05-16 19:04:30 +00:00
Eitan Seri-Levi
268809a530 Rust clippy 1.87 lint fixes (#7471)
Fix clippy lints for `rustc` 1.87


  clippy complains about `BeaconChainError` being too large. I went on a bit of a boxing spree because of this. We may instead want to `Box` some of the `BeaconChainError` variants?
2025-05-16 05:03:00 +00:00
Odinson
1853d836b7 Added E::slots_per_epoch() to deneb time calculation (#7458)
Which issue # does this PR address?

Closes #7457


  Added `E::slots_per_epoch()` and now it ensures conversion from epochs to slots while calculating deneb time
2025-05-15 07:31:31 +00:00
Michael Sproul
c2c7fb87a8 Make DAG construction more permissive (#7460)
Workaround/fix for:

- https://github.com/sigp/lighthouse/issues/7323


  - Remove the `StateSummariesNotContiguousError`. This allows us to continue with DAG construction and pruning, even in the case where the DAG is disjointed. We will treat any disjoint summaries as roots of their own tree, and prune them (as they are not descended from finalized). This should be safe, as canonical summaries should not be disjoint (if they are, then the DB is already corrupt).
2025-05-15 02:15:35 +00:00
Eitan Seri-Levi
807848bc7a Next sync committee branch bug (#7443)
#7441


  Make sure we're correctly caching light client data
2025-05-13 01:13:15 +00:00
SunnysidedJ
593390162f peerdas-devnet-7: update DataColumnSidecarsByRoot request to use DataColumnsByRootIdentifier (#7399)
Update DataColumnSidecarsByRoot request to use DataColumnsByRootIdentifier #7377


  As described in https://github.com/ethereum/consensus-specs/pull/4284
2025-05-12 00:20:55 +00:00
Lion - dapplion
a497ec601c Retry custody requests after peer metadata updates (#6975)
Closes https://github.com/sigp/lighthouse/issues/6895

We need sync to retry custody requests when a peer CGC updates. A higher CGC can result in a data column subnet peer count increasing from 0 to 1, allowing requests to happen.


  Add new sync event `SyncMessage::UpdatedPeerCgc`. It's sent by the router when a metadata response updates the known CGC
2025-05-09 08:27:17 +00:00
Jimmy Chen
4b9c16fc71 Add Electra forks to basic sim tests (#7199)
This PR adds transitions to Electra ~~and Fulu~~ fork epochs in the simulator tests.

~~It also covers blob inclusion verification and data column syncing on a full node in Fulu.~~

UPDATE: Remove fulu fork from sim tests due to https://github.com/sigp/lighthouse/pull/7199#issuecomment-2852281176
2025-05-08 08:43:44 +00:00
Jimmy Chen
0f13029c7d Don't publish data columns reconstructed from RPC columns to the gossip network (#7409)
Don't publish data columns reconstructed from RPC columns to the gossip network, as this may result in peer downscoring if we're sending columns from past slots.
2025-05-07 23:24:48 +00:00
Lion - dapplion
beb0ce68bd Make range sync peer loadbalancing PeerDAS-friendly (#6922)
- Re-opens https://github.com/sigp/lighthouse/pull/6864 targeting unstable

Range sync and backfill sync still assume that each batch request is done by a single peer. This assumption breaks with PeerDAS, where we request custody columns to N peers.

Issues with current unstable:

- Peer prioritization counts batch requests per peer. This accounting is broken now, data columns by range request are not accounted
- Peer selection for data columns by range ignores the set of peers on a syncing chain, instead draws from the global pool of peers
- The implementation is very strict when we have no peers to request from. After PeerDAS this case is very common and we want to be flexible or easy and handle that case better than just hard failing everything.


  - [x] Upstream peer prioritization to the network context, it knows exactly how many active requests a peer (including columns by range)
- [x] Upstream peer selection to the network context, now `block_components_by_range_request` gets a set of peers to choose from instead of a single peer. If it can't find a peer, it returns the error `RpcRequestSendError::NoPeer`
- [ ] Range sync and backfill sync handle `RpcRequestSendError::NoPeer` explicitly
- [ ] Range sync: leaves the batch in `AwaitingDownload` state and does nothing. **TODO**: we should have some mechanism to fail the chain if it's stale for too long - **EDIT**: Not done in this PR
- [x] Backfill sync: pauses the sync until another peer joins - **EDIT**: Same logic as unstable

### TODOs

- [ ] Add tests :)
- [x] Manually test backfill sync

Note: this touches the mainnet path!
2025-05-07 02:03:07 +00:00
Lion - dapplion
2aa5d5c25e Make sure to log SyncingChain ID (#7359)
Debugging an sync issue from @pawanjay176 I'm missing some key info where instead of logging the ID of the SyncingChain we just log "Finalized" (the sync type). This looks like some typo or something was lost in translation when refactoring things.

```
Apr 17 12:12:00.707 DEBUG Syncing new finalized chain                   chain: Finalized, component: "range_sync"
```

This log should include more info about the new chain but just logs "Finalized"

```
Apr 17 12:12:00.810 DEBUG New chain added to sync                       peer_id: "16Uiu2HAmHP8QLYQJwZ4cjMUEyRgxzpkJF87qPgNecLTpUdruYbdA", sync_type: Finalized, new_chain: Finalized, component: "range_sync"
```


  - Remove the Display impl and log the ID explicitly for all logs.
- Log more details when creating a new SyncingChain
2025-05-01 19:53:29 +00:00
Jimmy Chen
93ec9df137 Compute proposer shuffling only once in gossip verification (#7304)
When we perform data column gossip verification, we sometimes see multiple proposer shuffling cache miss simultaneously and this results in multiple threads computing the shuffling cache and potentially slows down the gossip verification.

Proposal here is to use a `OnceCell` for each shuffling key to make sure it's only computed once. I have only implemented this in data column verification as a PoC, but this can also be applied to blob and block verification

Related issues:
- https://github.com/sigp/lighthouse/issues/4447
- https://github.com/sigp/lighthouse/issues/7203
2025-05-01 01:30:42 +00:00
Eitan Seri-Levi
9779b4ba2c Optimize validate_data_columns (#7326) 2025-04-30 04:36:50 +00:00
Akihito Nakano
1324d3d3c4 Delayed RPC Send Using Tokens (#5923)
closes https://github.com/sigp/lighthouse/issues/5785


  The diagram below shows the differences in how the receiver (responder) behaves before and after this PR. The following sentences will detail the changes.

```mermaid
flowchart TD

subgraph "*** After ***"
Start2([START]) --> AA[Receive request]
AA --> COND1{Is there already an active request <br> with the same protocol?}
COND1 --> |Yes| CC[Send error response]
CC --> End2([END])
%% COND1 --> |No| COND2{Request is too large?}
%% COND2 --> |Yes| CC
COND1 --> |No| DD[Process request]
DD --> EE{Rate limit reached?}
EE --> |Yes| FF[Wait until tokens are regenerated]
FF --> EE
EE --> |No| GG[Send response]
GG --> End2
end

subgraph "*** Before ***"
Start([START]) --> A[Receive request]
A --> B{Rate limit reached <br> or <br> request is too large?}
B -->|Yes| C[Send error response]
C --> End([END])
B -->|No| E[Process request]
E --> F[Send response]
F --> End
end
```

### `Is there already an active request with the same protocol?`

This check is not performed in `Before`. This is taken from the PR in the consensus-spec, which proposes updates regarding rate limiting and response timeout.
https://github.com/ethereum/consensus-specs/pull/3767/files
> The requester MUST NOT make more than two concurrent requests with the same ID.

The PR mentions the requester side. In this PR, I introduced the `ActiveRequestsLimiter` for the `responder` side to restrict more than two requests from running simultaneously on the same protocol per peer. If the limiter disallows a request, the responder sends a rate-limited error and penalizes the requester.



### `Rate limit reached?` and `Wait until tokens are regenerated`

UPDATE: I moved the limiter logic to the behaviour side. https://github.com/sigp/lighthouse/pull/5923#issuecomment-2379535927

~~The rate limiter is shared between the behaviour and the handler.  (`Arc<Mutex<RateLimiter>>>`) The handler checks the rate limit and queues the response if the limit is reached. The behaviour handles pruning.~~

~~I considered not sharing the rate limiter between the behaviour and the handler, and performing all of these either within the behaviour or handler. However, I decided against this for the following reasons:~~

- ~~Regarding performing everything within the behaviour: The behaviour is unable to recognize the response protocol when `RPC::send_response()` is called, especially when the response is `RPCCodedResponse::Error`. Therefore, the behaviour can't rate limit responses based on the response protocol.~~
- ~~Regarding performing everything within the handler: When multiple connections are established with a peer, there could be multiple handlers interacting with that peer. Thus, we cannot enforce rate limiting per peer solely within the handler. (Any ideas? 🤔 )~~
2025-04-24 03:46:16 +00:00
chonghe
c13e069c9c Revise logging when queue is full (#7324) 2025-04-22 22:46:30 +00:00
Michael Sproul
e61e92b926 Merge remote-tracking branch 'origin/stable' into unstable 2025-04-22 18:55:06 +10:00
Michael Sproul
54f7bc5b2c Release v7.0.0 (#7288)
New v7.0.0 release for Electra on mainnet.
2025-04-22 09:21:03 +10:00
chonghe
80fe133d2c Update Lighthouse Book for Electra features (#7280)
* #7227
2025-04-17 09:31:26 +00:00
Mac L
c32569ab83 Restore HTTP API logging and add more metrics (#7225)
#7124


  - Restores previous HTTP logging with tracing compatible syntax
- Adds metrics for certain missing endpoints (and alphabetized the existing ones)
2025-04-17 08:18:45 +00:00
Michael Sproul
fd82ee2f81 Release v7.0.0-beta.7 (#7333) 2025-04-17 14:46:43 +10:00
Jean-Baptiste Pinalie
5352d5f78a Update proposer_slashings and attester_slashings amounts for electra. (#7316)
Did not find a specific issue beside https://github.com/sigp/lighthouse/issues/6821


  Leverage `whistleblower_reward_quotient_for_state` to have accurate post-electra `proposer_slashings` and `attester_slashings` fields returned by `/eth/v1/beacon/rewards/blocks/<id>`.
2025-04-17 00:58:36 +00:00
Michael Sproul
6fad6fba6a Release v7.0.0-beta.6 2025-04-16 08:54:53 +10:00