Addresses #6854.
PeerDAS requires unsubscribing a Gossip topic at a fork boundary. This is not possible with our current topic machinery.
Instead of defining which topics have to be **added** at a given fork, we define the complete set of topics at a given fork. The new start of the show and key function is:
```rust
pub fn core_topics_to_subscribe<E: EthSpec>(
fork_name: ForkName,
opts: &TopicConfig,
spec: &ChainSpec,
) -> Vec<GossipKind> {
// ...
if fork_name.deneb_enabled() && !fork_name.fulu_enabled() {
// All of deneb blob topics are core topics
for i in 0..spec.blob_sidecar_subnet_count(fork_name) {
topics.push(GossipKind::BlobSidecar(i));
}
}
// ...
}
```
`core_topics_to_subscribe` only returns the blob topics if `fork < Fulu`. Then at the fork boundary, we subscribe with the new fork digest to `core_topics_to_subscribe(next_fork)`, which excludes the blob topics.
I added `is_fork_non_core_topic` to carry on to the next fork the aggregator topics for attestations and sync committee messages. This approach is future-proof if those topics ever become fork-dependent.
Closes https://github.com/sigp/lighthouse/issues/6854
- PR https://github.com/sigp/lighthouse/pull/6497 made obsolete some consistency checks inside the batch
I forgot to remove the consumers of those errors
Remove un-used batch sync error condition, which was a nested `Result<_, Result<_, E>>`
Currently we track a key metric `PEERS_PER_COLUMN_SUBNET` in a getter `good_peers_on_sampling_subnets`. Another PR https://github.com/sigp/lighthouse/pull/6922 deletes that function, so we have to move the metric anyway. This PR moves that metric computation to the metrics spawned task which is refreshed every 5 seconds.
I also added a few more useful metrics. The total set and intended usage is:
- `sync_peers_per_column_subnet`: Track health of overall subnets in your node
- `sync_peers_per_custody_column_subnet`: Track health of the subnets your node needs. We should track this metric closely in our dashboards with a heatmap and bar plot
- ~~`sync_column_subnets_with_zero_peers`: Is equivalent to the Grafana query `count(sync_peers_per_column_subnet == 0) by (instance)`. We may prefer to skip it, but I believe it's the most important metric as if `sync_column_subnets_with_zero_peers > 0` your node stalls.~~
- ~~`sync_custody_column_subnets_with_zero_peers`: `count(sync_peers_per_custody_column_subnet == 0) by (instance)`~~
Currently we have very poor coverage of range sync with unit tests. With the event driven test framework we could cover much more ground and be confident when modifying the code.
Add two basic cases:
- Happy path, complete a finalized sync for 2 epochs
- Post-PeerDAS case where we start without enough custody peers and later we find enough
⚠️ If you have ideas for more test cases, please let me know! I'll write them
Update spec tests for recent v1.5.0-beta.2 release. There are no substantial changes for Electra and earlier, and the Fulu test updates to be made are tracked here:
- https://github.com/sigp/lighthouse/issues/6957
- Add `SingleAttestation` SSZ tests
- Add new `deposit_with_reorg` fork choice tests
- Update tag to v1.5.0-beta.2
- Ignore Fulu tests
- Re-opened PR from https://github.com/sigp/lighthouse/pull/6869
Writing and running tests I noted that the sync RPC requests are very verbose now.
`DataColumnsByRootRequestId { id: 123, requester: Custody(CustodyId { requester: CustodyRequester(SingleLookupReqId { req_id: 121, lookup_id: 101 }) }) }`
Since this Id is logged rather often I believe there's value in
1. Making them more succinct for log verbosity
2. Make them a string that's easy to copy and work with elastic
Write custom `Display` implementations to render Ids in a more DX format
_ DataColumnsByRootRequestId with a block lookup_
```
123/Custody/121/Lookup/101
```
_DataColumnsByRangeRequestId_
```
123/122/RangeSync/0/5492900659401505034
```
- This one will be shorter after https://github.com/sigp/lighthouse/pull/6868
Also made the logs format and text consistent across all methods
Closes:
- https://github.com/sigp/lighthouse/issues/6818
Use `MAX_EFFECTIVE_BALANCE_ELECTRA` (2048) for attestation reward calculations involving Electra.
Add a new `InteropGenesisBuilder` that tries to provide a more flexible way to build genesis states. Unfortunately due to lifetime jank, it is quite unergonomic at present. We may want to refactor this builder in future to make it easier to use.
This PR adds an implementation to get fork_version and fork_epoch given a `ForkName`. I didn't realize that this is already implemented in the `ChainSpec` sorry
- https://github.com/sigp/lighthouse/pull/6933
Remove duplicated fork_epoch and fork_version implementation
Closes
- https://github.com/sigp/lighthouse/issues/6805
- Use a new `WorkEvent::GossipAttestationToConvert` to handle the conversion from `SingleAttestation` to `Attestation` _on_ the beacon processor (prevents a Tokio thread being blocked).
- Improve the error handling for single attestations. I think previously we had no ability to reprocess single attestations for unknown blocks -- we would just error. This seemed to be the case in both gossip processing and processing of `SingleAttestation`s from the HTTP API.
- Move the `SingleAttestation -> Attestation` conversion function into `beacon_chain` so that it can return the `attestation_verification::Error` type, which has well-defined error handling and peer penalties. The now-unused variants of `types::Attestation::Error` have been removed.
Fix another issue with fetch-blobs, similar to:
- https://github.com/sigp/lighthouse/pull/6911
Check if the list of blobs returned is all `None`, and if so, do not proceed any further.
This prevents an ugly error like:
> Feb 03 17:32:12.384 ERRO Error fetching or processing blobs from EL, block_root: 0x7326fe2dc1cb9036c9de7a07a662c86a339085597849016eadf061b70b7815ba, error: BlobProcessingError(AvailabilityCheck(Unexpected)), module
: network::network_beacon_processor:1011
- #6510
- Keep execution payload during historical backfill when `--prune-payloads false` is set
- Add a field in the historical backfill debug log to indicate if execution payload is kept
- Add a test to check historical blocks has execution payload when `--prune-payloads false is set
- Very minor typo correction that I notice when working on this
- PR https://github.com/sigp/lighthouse/pull/6497 made obsolete some consistency checks inside the batch
I forgot to remove the consumers of those errors
Remove un-used batch sync error condition, which was a nested `Result<_, Result<_, E>>`
Noted that there's a bit of fork boiler plate in fork context.
If we list a mapping of ForkName -> fork_version in the ForkName enum we can get rid of it :)
Not much, but should make the next fork a tiny tit less annoying
A temporary workaround for the failing execution tests for geth.
https://github.com/sigp/lighthouse/actions/runs/13192297954
The test is broken due to the following breaking changes in geth that requires updating our tests:
1. removal of `personal` namespace in v1.14.12: See #30704
2. removal of `totalDifficulty` field from RPC in v1.14.11. See #30386.
Using an older version for now (` 1.14.10`) as we need to get things merged for the upcoming release.
Will create a separate issue to fix this.
Part of
- https://github.com/sigp/lighthouse/issues/6258
To address PeerDAS sync issues we need to make individual by_range requests within a batch retriable. We should adopt the same pattern for lookup sync where each request (block/blobs/columns) is tracked individually within a "meta" request that group them all and handles retry logic.
- Building on https://github.com/sigp/lighthouse/pull/6398
second step is to add individual request accumulators for `blocks_by_range`, `blobs_by_range`, and `data_columns_by_range`. This will allow each request to progress independently and be retried separately.
Most of the logic is just piping, excuse the large diff. This PR does not change the logic of how requests are handled or retried. This will be done in a future PR changing the logic of `RangeBlockComponentsRequest`.
### Before
- Sync manager receives block with `SyncRequestId::RangeBlockAndBlobs`
- Insert block into `SyncNetworkContext::range_block_components_requests`
- (If received stream terminators of all requests)
- Return `Vec<RpcBlock>`, and insert into `range_sync`
### Now
- Sync manager receives block with `SyncRequestId::RangeBlockAndBlobs`
- Insert block into `SyncNetworkContext:: blocks_by_range_requests`
- (If received stream terminator of this request)
- Return `Vec<SignedBlock>`, and insert into `SyncNetworkContext::components_by_range_requests `
- (If received a result for all requests)
- Return `Vec<RpcBlock>`, and insert into `range_sync`