Commit Graph

502 Commits

Author SHA1 Message Date
Jimmy Chen
b4704eab4a Fulu update to spec v1.6.0-alpha.4 (#7890)
Fulu update to spec [v1.6.0-alpha.4](https://github.com/ethereum/consensus-specs/releases/tag/v1.6.0-alpha.4).
- Make `number_of_columns` a preset
- Optimise `get_custody_groups` to avoid computing if cgc = 128
- Add support for additional typenum values in type_dispatch macro
2025-08-20 02:05:04 +00:00
Jimmy Chen
34dd1b27ae Revise data column rpc limits and queue sizes (#7887)
Revise data column rpc limits and queue sizes. Also removed some outdated TODOs for Fulu / das.
2025-08-19 03:48:08 +00:00
Age Manning
ee1b0ae2ff Allow for sync state where batch is unknown (#7391) 2025-08-13 06:00:49 +00:00
chonghe
522bd9e9c6 Update Rust Edition to 2024 (#7766)
* #7749

Thanks @dknopik and @michaelsproul for your help!
2025-08-13 03:04:31 +00:00
Pawan Dhananjay
80ba0b169b Backfill peer attribution (#7762)
Partly addresses https://github.com/sigp/lighthouse/issues/7744


  Implement similar peer sync attribution like in #7733 for backfill sync.
2025-08-12 02:11:56 +00:00
Jimmy Chen
40c2fd5ff4 Instrument tracing spans for block processing and import (#7816)
#7815

- removes all existing spans, so some span fields that appear in logs like `service_name` may be lost.
- instruments a few key code paths in the beacon node, starting from **root spans** named below:

* Gossip block and blobs
* `process_gossip_data_column_sidecar`
* `process_gossip_blob`
* `process_gossip_block`
* Rpc block and blobs
* `process_rpc_block`
* `process_rpc_blobs`
* `process_rpc_custody_columns`
* Rpc blocks (range and backfill)
* `process_chain_segment`
* `PendingComponents` lifecycle
* `pending_components`

To test locally:
* Run Grafana and Tempo with https://github.com/sigp/lighthouse-metrics/pull/57
* Run Lighthouse BN with `--telemetry-collector-url http://localhost:4317`

Some captured traces can be found here: https://hackmd.io/@jimmygchen/r1sLOxPPeg

Removing the old spans seem to have reduced the memory usage quite a lot - i think we were using them on long running tasks and too excessively:
<img width="910" height="495" alt="image" src="https://github.com/user-attachments/assets/5208bbe4-53b2-4ead-bc71-0b782c788669" />
2025-08-08 05:32:22 +00:00
Jimmy Chen
8bc6693dac Fix wrong columns getting processed on a CGC change (#7792)
This PR fixes a bug where wrong columns could get processed immediately after a CGC increase.

Scenario:
- The node's CGC increased due to additional validators attached to it (lets say from 10 to 11)
- The new CGC is advertised and new subnets are subscribed immediately, however the change won't be effective in the data availability check until the next epoch (See [this](ab0e8870b4/beacon_node/beacon_chain/src/validator_custody.rs (L93-L99))). Data availability checker still only require 10 columns for the current epoch.
- During this time, data columns for the additional custody column (lets say column 11) may arrive via gossip as we're already subscribed to the topic, and it may be incorrectly used to satisfy the existing data availability requirement (10 columns), and result in this additional column (instead of a required one) getting persisted, resulting in database inconsistency.
2025-08-07 00:45:04 +00:00
Jimmy Chen
4daa015971 Remove peer sampling code (#7768)
Peer sampling has been completely removed from the spec. This PR removes our partial implementation from the codebase.
https://github.com/ethereum/consensus-specs/pull/4393
2025-07-23 03:24:45 +00:00
Pawan Dhananjay
90ff64381e Sync peer attribution (#7733)
Which issue # does this PR address?

Closes #7604


  Improvements to range sync including:

1. Contain column requests only to peers that are part of the SyncingChain
2. Attribute the fault to the correct peer and downscore them if they don't return the data columns for the request
3. Improve sync performance by retrying only the failed columns from other peers instead of failing the entire batch
4. Uses the earliest_available_slot to make requests to peers that claim to have the epoch. Note: if no earliest_available_slot info is available, fallback to using previous logic i.e. assume peer has everything backfilled upto WS checkpoint/da boundary

Tested this on fusaka-devnet-2 with a full node and supernode and the recovering logic seems to works well.
Also tested this a little on mainnet.

Need to do more testing and possibly add some unit tests.
2025-07-12 00:02:30 +00:00
ethDreamer
b43e0b446c Final changes for fusaka-devnet-2 (#7655)
Closes #7467.

This PR primarily addresses [the P2P changes](https://github.com/ethereum/EIPs/pull/9840) in [fusaka-devnet-2](https://fusaka-devnet-2.ethpandaops.io/). Specifically:

* [the new `nfd` parameter added to the `ENR`](https://github.com/ethereum/EIPs/pull/9840)
* [the modified `compute_fork_digest()` changes for every BPO fork](https://github.com/ethereum/EIPs/pull/9840)

90% of this PR was absolutely hacked together as fast as possible during the Berlinterop as fast as I could while running between Glamsterdam debates. Luckily, it seems to work. But I was unable to be as careful in avoiding bugs as I usually am. I've cleaned up the things *I remember* wanting to come back and have a closer look at. But still working on this.

Progress:
* [x] get it working on `fusaka-devnet-2`
* [ ] [*optional* disconnect from peers with incorrect `nfd` at the fork boundary](https://github.com/ethereum/consensus-specs/pull/4407) - Can be addressed in a future PR if necessary
* [x] first pass clean-up
* [x] fix up all the broken tests
* [x] final self-review
* [x] more thorough review from people more familiar with affected code
2025-07-10 21:32:58 +00:00
Jimmy Chen
56485cc986 Remove unneeded spans that caused debug logs to appear when level is set to info (#7707)
Fixes #7155.

It turns out the issue is caused by calling a function that creates an info span (`chain.id()` here), e.g.

```rust
debug!(id = chain.id(), ?sync_type, reason = ?remove_reason, op, "Chain removed");
```

I've remove all unneeded spans, especially getter functions - there's little reasons for span and they often get used in logging. We should also revisit all the spans after the release - i think we could make them more useful than they are today.

I've let it run for a while and no longer seeing any `DEBUG` logs.
2025-07-08 00:37:54 +00:00
Eitan Seri-Levi
56b2d4b525 Remove instrumenting log level (#7636)
https://github.com/sigp/lighthouse/issues/7155


  Theres some additional places we set instrumenting log levels that wasn't covered in #7620
2025-06-24 06:29:10 +00:00
Pawan Dhananjay
11bcccb353 Remove all prod eth1 related code (#7133)
N/A


  After the electra fork which includes EIP 6110, the beacon node no longer needs the eth1 bridging mechanism to include new deposits as they are provided by the EL as a `deposit_request`. So after electra + a transition period where the finalized bridge deposits pre-fork are included through the old mechanism, we no longer need the elaborate machinery we had to get deposit contract data from the execution layer.

Since holesky has already forked to electra and completed the transition period, this PR basically checks to see if removing all the eth1 related logic leads to any surprises.
2025-06-23 03:00:07 +00:00
Age Manning
d50924677a Remove instrumenting log level (#7620)
I think this should resolve #7155


This removes the level field from the instrumenting we were doing across a range of functions. The level will now default to the level of the log.
2025-06-20 07:44:59 +00:00
Pawan Dhananjay
9803d69d80 Implement status v2 version (#7590)
N/A


  Implements status v2 as defined in https://github.com/ethereum/consensus-specs/pull/4374/
2025-06-12 07:17:06 +00:00
Pawan Dhananjay
5f208bb858 Implement basic validator custody framework (no backfill) (#7578)
Resolves #6767


  This PR implements a basic version of validator custody.
- It introduces a new `CustodyContext` object which contains info regarding number of validators attached to a node and  the custody count they contribute to the cgc.
- The `CustodyContext` is added in the da_checker and has methods for returning the current cgc and the number of columns to sample at head. Note that the logic for returning the cgc existed previously in the network globals.
- To estimate the number of validators attached, we use the `beacon_committee_subscriptions` endpoint. This might overestimate the number of validators actually publishing attestations from the node in the case of multi BN setups. We could also potentially use the `publish_attestations` endpoint to get a more conservative estimate at a later point.
- Anytime there's a change in the `custody_group_count` due to addition/removal of validators, the custody context should send an event on a broadcast channnel. The only subscriber for the channel exists in the network service which simply subscribes to more subnets. There can be additional subscribers in sync that will start a backfill once the cgc changes.

TODO

- [ ] **NOT REQUIRED:** Currently, the logic only handles an increase in validator count and does not handle a decrease. We should ideally unsubscribe from subnets when the cgc has decreased.
- [ ] **NOT REQUIRED:** Add a service in the `CustodyContext` that emits an event once `MIN_EPOCHS_FOR_BLOB_SIDECARS_REQUESTS ` passes after updating the current cgc. This event should be picked up by a subscriber which updates the enr and metadata.
- [x] Add more tests
2025-06-11 18:10:06 +00:00
Lion - dapplion
d457ceeaaf Don't create child lookup if parent is faulty (#7118)
Issue discovered on PeerDAS devnet (node `lighthouse-geth-2.peerdas-devnet-5.ethpandaops.io`). Summary:

- A lookup is created for block root `0x28299de15843970c8ea4f95f11f07f75e76a690f9a8af31d354c38505eebbe12`
- That block or a parent is faulty and `0x28299de15843970c8ea4f95f11f07f75e76a690f9a8af31d354c38505eebbe12` is added to the failed chains cache
- We later receive a block that is a child of a child of `0x28299de15843970c8ea4f95f11f07f75e76a690f9a8af31d354c38505eebbe12`
- We create a lookup, which attempts to process the child of `0x28299de15843970c8ea4f95f11f07f75e76a690f9a8af31d354c38505eebbe12` and hit a processor error `UnknownParent`, hitting this line

bf955c7543/beacon_node/network/src/sync/block_lookups/mod.rs (L686-L688)

`search_parent_of_child` does not create a parent lookup because the parent root is in the failed chain cache. However, we have **already** marked the child as awaiting the parent. This results in an inconsistent state of lookup sync, as there's a lookup awaiting a parent that doesn't exist.

Now we have a lookup (the child of `0x28299de15843970c8ea4f95f11f07f75e76a690f9a8af31d354c38505eebbe12`) that is awaiting a parent lookup that doesn't exist: hence stuck.

### Impact

This bug can affect Mainnet as well as PeerDAS devnets.

This bug may stall lookup sync for a few minutes (up to `LOOKUP_MAX_DURATION_STUCK_SECS = 15 min`) until the stuck prune routine deletes it. By that time the root will be cleared from the failed chain cache and sync should succeed. During that time the user will see a lot of `WARN` logs when attempting to add each peer to the inconsistent lookup. We may also sync the block through range sync if we fall behind by more than 2 epochs. We may also create the parent lookup successfully after the failed cache clears and complete the child lookup.

This bug is triggered if:
- We have a lookup that fails and its root is added to the failed chain cache (much more likely to happen in PeerDAS networks)
- We receive a block that builds on a child of the block added to the failed chain cache


  Ensure that we never create (or leave existing) a lookup that references a non-existing parent.

I added `must_use` lints to the functions that create lookups. To fix the specific bug we must recursively drop the child lookup if the parent is not created. So if `search_parent_of_child` returns `false` now return `LookupRequestError::Failed` instead of `LookupResult::Pending`.

As a bonus I have a added more logging and reason strings to the errors
2025-06-05 08:53:43 +00:00
Jimmy Chen
e6ef644db4 Verify getBlobsV2 response and avoid reprocessing imported data columns (#7493)
#7461 and partly #6439.

Desired behaviour after receiving `engine_getBlobs` response:

1. Gossip verify the blobs and proofs, but don't mark them as observed yet. This is because not all blobs are published immediately (due to staggered publishing). If we mark them as observed and not publish them, we could end up blocking the gossip propagation.
2. Blobs are marked as observed _either_ when:
* They are received from gossip and forwarded to the network .
* They are published by the node.

Current behaviour:
-  We only gossip verify `engine_getBlobsV1` responses, but not `engine_getBlobsV2` responses (PeerDAS).
-  After importing EL blobs AND before they're published, if the same blobs arrive via gossip, they will get re-processed, which may result in a re-import.


  1. Perform gossip verification on data columns computed from EL `getBlobsV2` response. We currently only do this for `getBlobsV1` to prevent importing blobs with invalid proofs into the `DataAvailabilityChecker`, this should be done on V2 responses too.
2. Add additional gossip verification to make sure we don't re-process a ~~blob~~ or data column that was imported via the EL `getBlobs` but not yet "seen" on the gossip network. If an "unobserved" gossip blob is found in the availability cache, then we know it has passed verification so we can immediately propagate the `ACCEPT` result and forward it to the network, but without re-processing it.

**UPDATE:** I've left blobs out for the second change mentioned above, as the likelihood and impact is very slow and we haven't seen it enough, but under PeerDAS this issue is a regular occurrence and we do see the same block getting imported many times.
2025-05-26 19:55:58 +00:00
Akihito Nakano
537fc5bde8 Revive network-test logs files in CI (#7459)
https://github.com/sigp/lighthouse/issues/7187


  This PR adds a writer that implements `tracing_subscriber::fmt::MakeWriter`, which writes logs to separate files for each test.
2025-05-22 02:51:22 +00:00
SunnysidedJ
593390162f peerdas-devnet-7: update DataColumnSidecarsByRoot request to use DataColumnsByRootIdentifier (#7399)
Update DataColumnSidecarsByRoot request to use DataColumnsByRootIdentifier #7377


  As described in https://github.com/ethereum/consensus-specs/pull/4284
2025-05-12 00:20:55 +00:00
Lion - dapplion
a497ec601c Retry custody requests after peer metadata updates (#6975)
Closes https://github.com/sigp/lighthouse/issues/6895

We need sync to retry custody requests when a peer CGC updates. A higher CGC can result in a data column subnet peer count increasing from 0 to 1, allowing requests to happen.


  Add new sync event `SyncMessage::UpdatedPeerCgc`. It's sent by the router when a metadata response updates the known CGC
2025-05-09 08:27:17 +00:00
Lion - dapplion
beb0ce68bd Make range sync peer loadbalancing PeerDAS-friendly (#6922)
- Re-opens https://github.com/sigp/lighthouse/pull/6864 targeting unstable

Range sync and backfill sync still assume that each batch request is done by a single peer. This assumption breaks with PeerDAS, where we request custody columns to N peers.

Issues with current unstable:

- Peer prioritization counts batch requests per peer. This accounting is broken now, data columns by range request are not accounted
- Peer selection for data columns by range ignores the set of peers on a syncing chain, instead draws from the global pool of peers
- The implementation is very strict when we have no peers to request from. After PeerDAS this case is very common and we want to be flexible or easy and handle that case better than just hard failing everything.


  - [x] Upstream peer prioritization to the network context, it knows exactly how many active requests a peer (including columns by range)
- [x] Upstream peer selection to the network context, now `block_components_by_range_request` gets a set of peers to choose from instead of a single peer. If it can't find a peer, it returns the error `RpcRequestSendError::NoPeer`
- [ ] Range sync and backfill sync handle `RpcRequestSendError::NoPeer` explicitly
- [ ] Range sync: leaves the batch in `AwaitingDownload` state and does nothing. **TODO**: we should have some mechanism to fail the chain if it's stale for too long - **EDIT**: Not done in this PR
- [x] Backfill sync: pauses the sync until another peer joins - **EDIT**: Same logic as unstable

### TODOs

- [ ] Add tests :)
- [x] Manually test backfill sync

Note: this touches the mainnet path!
2025-05-07 02:03:07 +00:00
Lion - dapplion
2aa5d5c25e Make sure to log SyncingChain ID (#7359)
Debugging an sync issue from @pawanjay176 I'm missing some key info where instead of logging the ID of the SyncingChain we just log "Finalized" (the sync type). This looks like some typo or something was lost in translation when refactoring things.

```
Apr 17 12:12:00.707 DEBUG Syncing new finalized chain                   chain: Finalized, component: "range_sync"
```

This log should include more info about the new chain but just logs "Finalized"

```
Apr 17 12:12:00.810 DEBUG New chain added to sync                       peer_id: "16Uiu2HAmHP8QLYQJwZ4cjMUEyRgxzpkJF87qPgNecLTpUdruYbdA", sync_type: Finalized, new_chain: Finalized, component: "range_sync"
```


  - Remove the Display impl and log the ID explicitly for all logs.
- Log more details when creating a new SyncingChain
2025-05-01 19:53:29 +00:00
Lion - dapplion
be68dd24d0 Fix wrong custody column count for lookup blocks (#7281)
Fixes
- https://github.com/sigp/lighthouse/issues/7278


  Don't assume 0 columns for `RpcBlockInner::Block`
2025-04-11 22:00:57 +00:00
Jimmy Chen
7cc64cab83 Add missing error log and remove redundant id field from lookup logs (#6990)
Partially #6989.

This PR adds the missing error log when a batch fails due to issues with converting the response into `RpcBlock`. See the above linked issue for more details.

Adding this log reveals that we're completing range requests with missing columns, hence causing the batch to fail. It looks like we've hit the case where we've received enough stream terminations, but not all columns are returned.

```
Feb 12 06:12:16.558 DEBG Failed to convert range block components into RpcBlock, error: No column for block 0xc5b6c7fa02f5ef603d45819c08c6519f1dba661fd5d44a2fc849d3e7028b6007 index 18, id: 3456/RangeSync/116/3432, service: sync, module: network::sync::network_context:488
```

I've also removed some redundant `id` logging, as the `id` debug representation is difficult to read, and is now being logged as part of `req_id` in a more succinct format (relevant PR: #6914)
2025-04-04 09:01:42 +00:00
Mac L
0e6da0fcaf Merge branch 'release-v7.0.0' into v7-backmerge 2025-04-04 13:32:58 +11:00
Mac L
82d1674455 Rust 1.86.0 lints (#7254)
Implement lints for the new Rust compiler version 1.86.0.
2025-04-04 02:30:22 +00:00
Age Manning
d6cd049a45 RPC RequestId Cleanup (#7238)
I've been working at updating another library to latest Lighthouse and got very confused with RPC request Ids.

There were types that had fields called `request_id` and `id`. And interchangeably could have types `PeerRequestId`, `rpc::RequestId`, `AppRequestId`, `api_types::RequestId` or even `Request.id`.

I couldn't keep track of which Id was linked to what and what each type meant.

So this PR mainly does a few things:
- Changes the field naming to match the actual type. So any field that has an  `AppRequestId` will be named `app_request_id` rather than `id` or `request_id` for example.
- I simplified the types. I removed the two different `RequestId` types (one in Lighthouse_network the other in the rpc) and grouped them into one. It has one downside tho. I had to add a few unreachable lines of code in the beacon processor, which the extra type would prevent, but I feel like it might be worth it. Happy to add an extra type to avoid those few lines.
- I also removed the concept of `PeerRequestId` which sometimes went alongside a `request_id`. There were times were had a `PeerRequest` and a `Request` being returned, both of which contain a `RequestId` so we had redundant information. I've simplified the logic by removing `PeerRequestId` and made a `ResponseId`. I think if you look at the code changes, it simplifies things a bit and removes the redundant extra info.

I think with this PR things are a little bit easier to reasonable about what is going on with all these RPC Ids.

NOTE: I did this with the help of AI, so probably should be checked
2025-04-03 10:10:15 +00:00
Lion - dapplion
6f31d44343 Remove CGC from data_availability checker (#7033)
- Part of https://github.com/sigp/lighthouse/issues/6767

Validator custody makes the CGC and set of sampling columns dynamic. Right now this information is stored twice:
- in the data availability checker
- in the network globals

If that state becomes dynamic we must make sure it is in sync updating it twice, or guarding it behind a mutex. However, I noted that we don't really have to keep the CGC inside the data availability checker. All consumers can actually read it from the network globals, and we can update `make_available` to read the expected count of data columns from the block.
2025-03-26 05:19:51 +00:00
Lion - dapplion
ca237652f1 Track request IDs in RangeBlockComponentsRequest (#6998)
Part of
- https://github.com/sigp/lighthouse/issues/6258


  `RangeBlockComponentsRequest` handles a set of by_range requests. It's quite lose on these requests, not tracking them by ID. We want to implement individual request retries, so we must make `RangeBlockComponentsRequest` aware of its requests IDs. We don't want the result of a prior by_range request to affect the state of a future retry. Lookup sync uses this mechanism.

Now `RangeBlockComponentsRequest` tracks:

```rust
pub struct RangeBlockComponentsRequest<E: EthSpec> {
blocks_request: ByRangeRequest<BlocksByRangeRequestId, Vec<Arc<SignedBeaconBlock<E>>>>,
block_data_request: RangeBlockDataRequest<E>,
}

enum RangeBlockDataRequest<E: EthSpec> {
NoData,
Blobs(ByRangeRequest<BlobsByRangeRequestId, Vec<Arc<BlobSidecar<E>>>>),
DataColumns {
requests: HashMap<
DataColumnsByRangeRequestId,
ByRangeRequest<DataColumnsByRangeRequestId, DataColumnSidecarList<E>>,
>,
expected_custody_columns: Vec<ColumnIndex>,
},
}

enum ByRangeRequest<I: PartialEq + std::fmt::Display, T> {
Active(I),
Complete(T),
}
```

I have merged `is_finished` and `Into_responses` into the same function. Otherwise, we need to duplicate the logic to figure out if the requests are done.
2025-03-21 04:07:30 +00:00
Lion - dapplion
a6bdc474db Log range sync download errors (#6991)
Currently range sync download log errors just say `error: rpc_error` which isn't helpful. The actual error is suppressed unless logged somewhere else.


  Log the actual error that caused the batch download to fail as part of the log that states that the batch download failed.
2025-03-13 13:26:43 +00:00
ThreeHrSleep
d60c24ef1c Integrate tracing (#6339)
Tracing Integration
- [reference](5bbf1859e9/projects/project-ideas.md (L297))


  - [x] replace slog & log with tracing throughout the codebase
- [x] implement custom crit log
- [x] make relevant changes in the formatter
- [x] replace sloggers
- [x] re-write SSE logging components

cc: @macladson @eserilev
2025-03-12 22:31:05 +00:00
Lion - dapplion
d60388134d Add PeerDAS metrics to track subnets without peers (#6928)
Currently we track a key metric `PEERS_PER_COLUMN_SUBNET` in a getter `good_peers_on_sampling_subnets`. Another PR https://github.com/sigp/lighthouse/pull/6922 deletes that function, so we have to move the metric anyway. This PR moves that metric computation to the metrics spawned task which is refreshed every 5 seconds.

I also added a few more useful metrics. The total set and intended usage is:

- `sync_peers_per_column_subnet`: Track health of overall subnets in your node
- `sync_peers_per_custody_column_subnet`: Track health of the subnets your node needs. We should track this metric closely in our dashboards with a heatmap and bar plot
- ~~`sync_column_subnets_with_zero_peers`: Is equivalent to the Grafana query `count(sync_peers_per_column_subnet == 0) by (instance)`. We may prefer to skip it, but I believe it's the most important metric as if `sync_column_subnets_with_zero_peers > 0` your node stalls.~~
- ~~`sync_custody_column_subnets_with_zero_peers`: `count(sync_peers_per_custody_column_subnet == 0) by (instance)`~~
2025-02-11 06:56:38 +00:00
Lion - dapplion
3992d6ba74 Fix misc PeerDAS todos (#6862)
Address misc PeerDAS TODOs that are not too big for a dedicated PR


  I'll justify each TODO on an inlined comment
2025-02-11 06:07:13 +00:00
Lion - dapplion
d5a03c9d86 Add more range sync tests (#6872)
Currently we have very poor coverage of range sync with unit tests. With the event driven test framework we could cover much more ground and be confident when modifying the code.


  Add two basic cases:
- Happy path, complete a finalized sync for 2 epochs
- Post-PeerDAS case where we start without enough custody peers and later we find enough

⚠️  If you have ideas for more test cases, please let me know! I'll write them
2025-02-10 07:55:22 +00:00
Lion - dapplion
f35213ebe7 Sync active request byrange ids logs (#6914)
- Re-opened PR from https://github.com/sigp/lighthouse/pull/6869

Writing and running tests I noted that the sync RPC requests are very verbose now.

`DataColumnsByRootRequestId { id: 123, requester: Custody(CustodyId { requester: CustodyRequester(SingleLookupReqId { req_id: 121, lookup_id: 101 }) }) }`

Since this Id is logged rather often I believe there's value in
1. Making them more succinct for log verbosity
2. Make them a string that's easy to copy and work with elastic


  Write custom `Display` implementations to render Ids in a more DX format

_ DataColumnsByRootRequestId with a block lookup_

```
123/Custody/121/Lookup/101
```

_DataColumnsByRangeRequestId_

```
123/122/RangeSync/0/5492900659401505034
```

- This one will be shorter after https://github.com/sigp/lighthouse/pull/6868

Also made the logs format and text consistent across all methods
2025-02-10 01:27:05 +00:00
Lion - dapplion
921d95217d Remove un-used batch sync error condition (#6917)
- PR https://github.com/sigp/lighthouse/pull/6497 made obsolete some consistency checks inside the batch

I forgot to remove the consumers of those errors


  Remove un-used batch sync error condition, which was a nested `Result<_, Result<_, E>>`
2025-02-07 07:48:58 +00:00
Lion - dapplion
2193f6a4d4 Add individual by_range sync requests (#6497)
Part of
- https://github.com/sigp/lighthouse/issues/6258

To address PeerDAS sync issues we need to make individual by_range requests within a batch retriable. We should adopt the same pattern for lookup sync where each request (block/blobs/columns) is tracked individually within a "meta" request that group them all and handles retry logic.


  - Building on https://github.com/sigp/lighthouse/pull/6398

second step is to add individual request accumulators for `blocks_by_range`, `blobs_by_range`, and `data_columns_by_range`. This will allow each request to progress independently and be retried separately.

Most of the logic is just piping, excuse the large diff. This PR does not change the logic of how requests are handled or retried. This will be done in a future PR changing the logic of `RangeBlockComponentsRequest`.

### Before

- Sync manager receives block with `SyncRequestId::RangeBlockAndBlobs`
- Insert block into `SyncNetworkContext::range_block_components_requests`
- (If received stream terminators of all requests)
- Return `Vec<RpcBlock>`, and insert into `range_sync`

### Now

- Sync manager receives block with `SyncRequestId::RangeBlockAndBlobs`
- Insert block into `SyncNetworkContext:: blocks_by_range_requests`
- (If received stream terminator of this request)
- Return `Vec<SignedBlock>`, and insert into `SyncNetworkContext::components_by_range_requests `
- (If received a result for all requests)
- Return `Vec<RpcBlock>`, and insert into `range_sync`
2025-02-05 07:08:28 +00:00
Lion - dapplion
95cec45c38 Use data column batch verification consistently (#6851)
Resolve a `TODO(das)` to use KZG batch verification in `put_rpc_custody_columns`


  Uses `verify_kzg_for_data_column_list_with_scoring` in all paths that send more than one column. To use batch verification and have attributability of which peer is sending a bad column.

Needs to move `verify_kzg_for_data_column_list_with_scoring` into the type's module to convert to the KZG verified type.
2025-02-03 06:07:45 +00:00
Jimmy Chen
70194dfc6a Implement PeerDAS Fulu fork activation (#6795)
Addresses #6706


  This PR activates PeerDAS at the Fulu fork epoch instead of `EIP_7594_FORK_EPOCH`. This means we no longer support testing PeerDAS with Deneb / Electrs, as it's now part of a hard fork.
2025-01-30 07:01:34 +00:00
Lion - dapplion
7d54a43243 Make range sync chain Id sequential (#6868)
Currently, we set the `chain_id` of range sync chains to `u64(hash(target_root, target_slot))`, which results in a long integer.

```
Jan 27 00:43:27.246 DEBG Batch downloaded, chain: 4223372036854775807, awaiting_batches: 0, batch_state: [p,E,E,E,E], blocks: 0, epoch: 0, service: range_sync
```


  Instead, we can use `network_context.next_id()` as we do for all other sync items and get a unique sequential (not too big) integer as id.

```
Jan 27 00:43:27.246 DEBG Batch downloaded, chain: 4, awaiting_batches: 0, batch_state: [p,E,E,E,E], blocks: 0, epoch: 0,  service: range_sync
```

Also, if a specific chain for the same target is retried later, it won't get the same ID so we can more clearly differentiate the logs associated with each attempt.
2025-01-30 07:01:32 +00:00
Age Manning
1fe0ac72be Underflow and Typo (#6885)
I was looking at sync and noticed a potential underflow and a typo, so just fixed those whilst I was in there.
2025-01-30 06:22:59 +00:00
Lion - dapplion
66c6552e8c Some sync/backfill format nits (#6861)
When working on unrelated changes I noted:

- An unnecessary closure left by a commit of some guy named @dapplion that can be removed
- match statements that can be simplified with the new let else syntax
- instead of mapping a result to ignore the Ok value, return
2025-01-30 03:09:48 +00:00
Pawan Dhananjay
4a07c08c4f Fork aware max values in rpc (#6847)
N/A


  In https://github.com/sigp/lighthouse/pull/6329 we changed `max_blobs_per_block` from a preset to a config value.
We weren't using the right value based on fork in that PR. This is a follow up PR to use the fork dependent values.

In the proces, I also updated other places where we weren't using fork dependent values from the ChainSpec.

Note to reviewer: easier to go through by commit
2025-01-29 19:42:13 +00:00
Lion - dapplion
c6ebaba892 Detect invalid proposer signature on RPC block processing (#6519)
Complements
- https://github.com/sigp/lighthouse/pull/6321

by detecting if the proposer signature is valid or not during RPC block processing. In lookup sync, if the invalid signature signature is the proposer signature, it's not deterministic on the block root. So we should only penalize the sending peer and retry. Otherwise, if it's on the body we should drop the lookup and penalize all peers that claim to have imported the block
2025-01-28 19:01:26 +00:00
Lion - dapplion
7a0388ef2a Fix custodial peer assumption on lookup custody requests (#6815)
* Fix custodial peer assumption on lookup custody requests

* lint
2025-01-20 12:31:18 +00:00
Jimmy Chen
6ce33c4d1d Do not send column requests if there is no blob for the block. (#6814)
* Do not send column requests if there is no blob for the block.

* Address review comments

* Replace fix - the previous solution didnt work.
2025-01-20 09:07:47 +00:00
Jimmy Chen
e98209d118 Implement PeerDAS subnet decoupling (aka custody groups) (#6736)
* Implement PeerDAS subnet decoupling (aka custody groups).

* Merge branch 'unstable' into decouple-subnets

* Refactor feature testing for spec tests (#6737)

Squashed commit of the following:

commit 898d05ee17
Merge: ffbd25e2b 7e0cddef3
Author: Jimmy Chen <jchen.tc@gmail.com>
Date:   Tue Dec 24 14:41:19 2024 +1100

    Merge branch 'unstable' into refactor-ef-tests-features

commit ffbd25e2be
Author: Jimmy Chen <jchen.tc@gmail.com>
Date:   Tue Dec 24 14:40:38 2024 +1100

    Fix `SszStatic` tests for PeerDAS: exclude eip7594 test vectors when testing Electra types.

commit aa593cf35c
Author: Jimmy Chen <jchen.tc@gmail.com>
Date:   Fri Dec 20 12:08:54 2024 +1100

    Refactor spec testing for features and simplify usage.

* Fix build.

* Add input validation and improve arithmetic handling when calculating custody groups.

* Address review comments re code style consistency.

* Merge branch 'unstable' into decouple-subnets

# Conflicts:
#	beacon_node/beacon_chain/src/kzg_utils.rs
#	beacon_node/beacon_chain/src/observed_data_sidecars.rs
#	beacon_node/lighthouse_network/src/discovery/subnet_predicate.rs
#	common/eth2_network_config/built_in_network_configs/chiado/config.yaml
#	common/eth2_network_config/built_in_network_configs/gnosis/config.yaml
#	common/eth2_network_config/built_in_network_configs/holesky/config.yaml
#	common/eth2_network_config/built_in_network_configs/mainnet/config.yaml
#	common/eth2_network_config/built_in_network_configs/sepolia/config.yaml
#	consensus/types/src/chain_spec.rs

* Update consensus/types/src/chain_spec.rs

Co-authored-by: Lion - dapplion <35266934+dapplion@users.noreply.github.com>

* Merge remote-tracking branch 'origin/unstable' into decouple-subnets

* Update error handling.

* Address review comment.

* Merge remote-tracking branch 'origin/unstable' into decouple-subnets

# Conflicts:
#	consensus/types/src/chain_spec.rs

* Update PeerDAS spec tests to `1.5.0-beta.0` and fix failing unit tests.

* Merge remote-tracking branch 'origin/unstable' into decouple-subnets

# Conflicts:
#	beacon_node/lighthouse_network/src/peer_manager/mod.rs
2025-01-15 07:40:26 +00:00
Michael Sproul
06e4d22d49 Electra spec changes for v1.5.0-beta.0 (#6731)
* First pass

* Add restrictions to RuntimeVariableList api

* Use empty_uninitialized and fix warnings

* Fix some todos

* Merge branch 'unstable' into max-blobs-preset

* Fix take impl on RuntimeFixedList

* cleanup

* Fix test compilations

* Fix some more tests

* Fix test from unstable

* Merge branch 'unstable' into max-blobs-preset

* Implement "Bugfix and more withdrawal tests"

* Implement "Add missed exit checks to consolidation processing"

* Implement "Update initial earliest_exit_epoch calculation"

* Implement "Limit consolidating balance by validator.effective_balance"

* Implement "Use 16-bit random value in validator filter"

* Implement "Do not change creds type on consolidation"

* Rename PendingPartialWithdraw index field to validator_index

* Skip slots to get test to pass and add TODO

* Implement "Synchronously check all transactions to have non-zero length"

* Merge remote-tracking branch 'origin/unstable' into max-blobs-preset

* Remove footgun function

* Minor simplifications

* Move from preset to config

* Fix typo

* Revert "Remove footgun function"

This reverts commit de01f923c7.

* Try fixing tests

* Implement "bump minimal preset MAX_BLOB_COMMITMENTS_PER_BLOCK and KZG_COMMITMENT_INCLUSION_PROOF_DEPTH"

* Thread through ChainSpec

* Fix release tests

* Move RuntimeFixedVector into module and rename

* Add test

* Implement "Remove post-altair `initialize_beacon_state_from_eth1` from specs"

* Update preset YAML

* Remove empty RuntimeVarList awefullness

* Make max_blobs_per_block a config parameter (#6329)

Squashed commit of the following:

commit 04b3743ec1
Author: Michael Sproul <michael@sigmaprime.io>
Date:   Mon Jan 6 17:36:58 2025 +1100

    Add test

commit 440e854199
Author: Michael Sproul <michael@sigmaprime.io>
Date:   Mon Jan 6 17:24:50 2025 +1100

    Move RuntimeFixedVector into module and rename

commit f66e179a40
Author: Michael Sproul <michael@sigmaprime.io>
Date:   Mon Jan 6 17:17:17 2025 +1100

    Fix release tests

commit e4bfe71cd1
Author: Michael Sproul <michael@sigmaprime.io>
Date:   Mon Jan 6 17:05:30 2025 +1100

    Thread through ChainSpec

commit 063b79c16a
Author: Michael Sproul <michael@sigmaprime.io>
Date:   Mon Jan 6 15:32:16 2025 +1100

    Try fixing tests

commit 88bedf09bc
Author: Michael Sproul <michael@sigmaprime.io>
Date:   Mon Jan 6 15:04:37 2025 +1100

    Revert "Remove footgun function"

    This reverts commit de01f923c7.

commit 32483d385b
Author: Michael Sproul <michael@sigmaprime.io>
Date:   Mon Jan 6 15:04:32 2025 +1100

    Fix typo

commit 2e86585b47
Author: Michael Sproul <michael@sigmaprime.io>
Date:   Mon Jan 6 15:04:15 2025 +1100

    Move from preset to config

commit 1095d60a40
Author: Michael Sproul <michael@sigmaprime.io>
Date:   Mon Jan 6 14:38:40 2025 +1100

    Minor simplifications

commit de01f923c7
Author: Michael Sproul <michael@sigmaprime.io>
Date:   Mon Jan 6 14:06:57 2025 +1100

    Remove footgun function

commit 0c2c8c4224
Merge: 21ecb58ff f51a292f7
Author: Michael Sproul <michael@sigmaprime.io>
Date:   Mon Jan 6 14:02:50 2025 +1100

    Merge remote-tracking branch 'origin/unstable' into max-blobs-preset

commit f51a292f77
Author: Daniel Knopik <107140945+dknopik@users.noreply.github.com>
Date:   Fri Jan 3 20:27:21 2025 +0100

    fully lint only explicitly to avoid unnecessary rebuilds (#6753)

    * fully lint only explicitly to avoid unnecessary rebuilds

commit 7e0cddef32
Author: Akihito Nakano <sora.akatsuki@gmail.com>
Date:   Tue Dec 24 10:38:56 2024 +0900

    Make sure we have fanout peers when publish (#6738)

    * Ensure that `fanout_peers` is always non-empty if it's `Some`

commit 21ecb58ff8
Merge: 2fcb2935e 9aefb5539
Author: Pawan Dhananjay <pawandhananjay@gmail.com>
Date:   Mon Oct 21 14:46:00 2024 -0700

    Merge branch 'unstable' into max-blobs-preset

commit 2fcb2935ec
Author: Pawan Dhananjay <pawandhananjay@gmail.com>
Date:   Fri Sep 6 18:28:31 2024 -0700

    Fix test from unstable

commit 12c6ef118a
Author: Pawan Dhananjay <pawandhananjay@gmail.com>
Date:   Wed Sep 4 16:16:36 2024 -0700

    Fix some more tests

commit d37733b846
Author: Pawan Dhananjay <pawandhananjay@gmail.com>
Date:   Wed Sep 4 12:47:36 2024 -0700

    Fix test compilations

commit 52bb581e07
Author: Pawan Dhananjay <pawandhananjay@gmail.com>
Date:   Tue Sep 3 18:38:19 2024 -0700

    cleanup

commit e71020e3e6
Author: Pawan Dhananjay <pawandhananjay@gmail.com>
Date:   Tue Sep 3 17:16:10 2024 -0700

    Fix take impl on RuntimeFixedList

commit 13f9bba647
Merge: 60100fc6b 4e675cf5d
Author: Pawan Dhananjay <pawandhananjay@gmail.com>
Date:   Tue Sep 3 16:08:59 2024 -0700

    Merge branch 'unstable' into max-blobs-preset

commit 60100fc6be
Author: Pawan Dhananjay <pawandhananjay@gmail.com>
Date:   Fri Aug 30 16:04:11 2024 -0700

    Fix some todos

commit a9cb329a22
Author: Pawan Dhananjay <pawandhananjay@gmail.com>
Date:   Fri Aug 30 15:54:00 2024 -0700

    Use empty_uninitialized and fix warnings

commit 4dc6e6515e
Author: Pawan Dhananjay <pawandhananjay@gmail.com>
Date:   Fri Aug 30 15:53:18 2024 -0700

    Add restrictions to RuntimeVariableList api

commit 25feedfde3
Author: Pawan Dhananjay <pawandhananjay@gmail.com>
Date:   Thu Aug 29 16:11:19 2024 -0700

    First pass

* Fix tests

* Implement max_blobs_per_block_electra

* Fix config issues

* Simplify BlobSidecarListFromRoot

* Disable PeerDAS tests

* Merge remote-tracking branch 'origin/unstable' into max-blobs-preset

* Bump quota to account for new target (6)

* Remove clone

* Fix issue from review

* Try to remove ugliness

* Merge branch 'unstable' into max-blobs-preset

* Merge remote-tracking branch 'origin/unstable' into electra-alpha10

* Merge commit '04b3743ec1e0b650269dd8e58b540c02430d1c0d' into electra-alpha10

* Merge remote-tracking branch 'pawan/max-blobs-preset' into electra-alpha10

* Update tests to v1.5.0-beta.0

* Resolve merge conflicts

* Linting

* fmt

* Fix test and add TODO

* Gracefully handle slashed proposers in fork choice tests

* Merge remote-tracking branch 'origin/unstable' into electra-alpha10

* Keep latest changes from max_blobs_per_block PR in codec.rs

* Revert a few more regressions and add a comment

* Disable more DAS tests

* Improve validator monitor test a little

* Make test more robust

* Fix sync test that didn't understand blobs

* Fill out cropped comment
2025-01-13 23:17:00 +00:00
Pawan Dhananjay
05727290fb Make max_blobs_per_block a config parameter (#6329)
* First pass

* Add restrictions to RuntimeVariableList api

* Use empty_uninitialized and fix warnings

* Fix some todos

* Merge branch 'unstable' into max-blobs-preset

* Fix take impl on RuntimeFixedList

* cleanup

* Fix test compilations

* Fix some more tests

* Fix test from unstable

* Merge branch 'unstable' into max-blobs-preset

* Merge remote-tracking branch 'origin/unstable' into max-blobs-preset

* Remove footgun function

* Minor simplifications

* Move from preset to config

* Fix typo

* Revert "Remove footgun function"

This reverts commit de01f923c7.

* Try fixing tests

* Thread through ChainSpec

* Fix release tests

* Move RuntimeFixedVector into module and rename

* Add test

* Remove empty RuntimeVarList awefullness

* Fix tests

* Simplify BlobSidecarListFromRoot

* Merge remote-tracking branch 'origin/unstable' into max-blobs-preset

* Bump quota to account for new target (6)

* Remove clone

* Fix issue from review

* Try to remove ugliness

* Merge branch 'unstable' into max-blobs-preset

* Fix max value

* Fix doctest

* Fix formatting

* Fix max check

* Delete hardcoded max_blobs_per_block in RPC limits

* Merge remote-tracking branch 'origin/unstable' into max-blobs-preset
2025-01-10 06:34:58 +00:00