- Step 0 of the tree-sync roadmap https://github.com/sigp/lighthouse/issues/7678
Current lookup sync tests are written in an explicit way that assume how the internals of lookup sync work. For example the test would do:
- Emit unknown block parent message
- Expect block request for X
- Respond with successful block request
- Expect block processing request for X
- Response with successful processing request
- etc..
This is unnecessarily verbose. And it will requires a complete re-write when something changes in the internals of lookup sync (has happened a few times, mostly for deneb and fulu).
What we really want to assert is:
- WHEN: we receive an unknown block parent message
- THEN: Lookup sync can sync that block
- ASSERT: Without penalizing peers, without unnecessary retries
Keep all existing tests and add new cases but written in the new style described above. The logic to serve and respond to request is in this function `fn simulate` 2288a3aeb1/beacon_node/network/src/sync/tests/lookups.rs (L301)
- It controls peer behavior based on a `CompleteStrategy` where you can set for example "respond to BlocksByRoot requests with empty"
- It actually runs beacon processor messages running their clousures. Now sync tests actually import blocks, increasing the test coverage to the interaction of sync and the da_checker.
- To achieve the above the tests create real blocks with the test harness. To make the tests as fast as before, I disabled crypto with `TestConfig`
Along the way I found a couple bugs, which I documented on the diff.
Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>
Just visual clean-up, making logging statements look uniform. There's no reason to use `tracing::debug` instead of `debug`. If we ever need to migrate our logging lib in the future it would make things easier too.
Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
Co-Authored-By: Michael Sproul <michaelsproul@users.noreply.github.com>
Which issue # does this PR address?
None
The `visualize_batch_state` functions uses the following loop `for mut batch_index in 0..BATCH_BUFFER_SIZE`, making it from `0` to `BATCH_BUFFER_SIZE - 1` (behind the scenes).
Hence we would never hit the following condition:
```rust
if batch_index != BATCH_BUFFER_SIZE {
visualization_string.push(',');
}
```
Replacing `!=` with `<` & `BATCH_BUFFER_SIZE -1` allows for the following change:
`[A,B,C,D,E,]` to become: `[A,B,C,D,E]`
Co-Authored-By: Antoine James <antoine@ethereum.org>
#7603
#### Custody backfill sync service
Similar in many ways to the current backfill service. There may be ways to unify the two services. The difficulty there is that the current backfill service tightly couples blocks and their associated blobs/data columns. Any attempts to unify the two services should be left to a separate PR in my opinion.
#### `SyncNeworkContext`
`SyncNetworkContext` manages custody sync data columns by range requests separetly from other sync RPC requests. I think this is a nice separation considering that custody backfill is its own service.
#### Data column import logic
The import logic verifies KZG committments and that the data columns block root matches the block root in the nodes store before importing columns
#### New channel to send messages to `SyncManager`
Now external services can communicate with the `SyncManager`. In this PR this channel is used to trigger a custody sync. Alternatively we may be able to use the existing `mpsc` channel that the `SyncNetworkContext` uses to communicate with the `SyncManager`. I will spend some time reviewing this.
Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>
Co-Authored-By: Eitan Seri- Levi <eserilev@gmail.com>
Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>
N/A
1. In the batch retry logic, we were failing to set the batch state to `AwaitingDownload` before attempting a retry. This PR sets it to `AwaitingDownload` before the retry and sets it back to `Downloading` if the retry suceeded in sending out a request
2. Remove all peer scoring logic from retrying and rely on just de priorotizing the failed peer. I finally concede the point to @dapplion 😄
3. Changes `block_components_by_range_request` to accept `block_peers` and `column_peers`. This is to ensure that we use the full synced peerset for requesting columns in order to avoid splitting the column peers among multiple head chains. During forward sync, we want the block peers to be the peers from the syncing chain and column peers to be all synced peers from the peerdb.
Also, fixes a typo and calls `attempt_send_awaiting_download_batches` from more places
Co-Authored-By: Pawan Dhananjay <pawandhananjay@gmail.com>
I was looking into some long `PendingComponents` span and noticed the block event wasn't added to the span, so it wasn't possible to see when the block was added from the trace view, this PR fixes this.
<img width="637" height="430" alt="image" src="https://github.com/user-attachments/assets/65040b1c-11e7-43ac-951b-bdfb34b665fb" />
Additionally I've noticed a lot of noises and confusion in sync logs due to the initial`peer_id` being included as part of the syncing chain span, causing all logs under the span to have that `peer_id`, which may not be accurate for some sync logs, I've removed `peer_id` from the `SyncingChain` span, and also cleaned up a bunch of spans to use `%` (display) for slots and epochs to make logs easier to read.
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
N/A
Extracts (3) from https://github.com/sigp/lighthouse/pull/7946.
Prior to peerdas, a batch should never have been in `AwaitingDownload` state because we immediataly try to move from `AwaitingDownload` to `Downloading` state by sending batches. This was always possible as long as we had peers in the `SyncingChain` in the pre-peerdas world.
However, this is no longer the case as a batch can be stuck waiting in `AwaitingDownload` state if we have no peers to request the columns from. This PR makes `AwaitingDownload` to be an allowable in between state. If a batch is found to be in this state, then we attempt to send the batch instead of erroring like before.
Note to reviewer: We need to make sure that this doesn't lead to a bunch of batches stuck in `AwaitingDownload` state if the chain can be progressed.
Backfill already retries all batches in AwaitingDownload state so we just need to make `AwaitingDownload` a valid state during processing and validation.
This PR explicitly adds the same logic for forward sync to download batches stuck in `AwaitingDownload`.
Apart from that, we also force download of the `processing_target` when sync stops progressing. This is required in cases where `self.batches` has > `BATCH_BUFFER_SIZE` batches that are waiting to get processed but the `processing_batch` has repeatedly failed at download/processing stage. This leads to sync getting stuck and never recovering.
Closes:
- #7865
- #7855
Changes extracted from earlier PR #7876
This PR fixes two main things with a few other improvements mentioned below:
- Prevent Lighthouse from repeatedly sending `DataColumnByRoot` requests to an unsynced peer, causing lookup sync to get stuck
- Allows Lighthouse to send discovery requests if there isn't enough **synced** peers in the required sampling subnets - this fixes the stuck sync scenario where there isn't enough usable peers in sampling subnet but no discovery is attempted.
- Make peer discovery queries if custody subnet peer count drops below the minimum threshold
- Update peer pruning logic to prioritise uniform distribution across all data column subnets and avoid pruning sampling peers if the count is below the target threshold (2)
- Check sync status when making discovery requests, to make sure we don't ignore requests if there isn't enough synced peers in the required sampling subnets
- Optimise some of the `PeerDB` functions checking custody peers
- Only send lookup requests to peers that are synced or advanced
#7815
- removes all existing spans, so some span fields that appear in logs like `service_name` may be lost.
- instruments a few key code paths in the beacon node, starting from **root spans** named below:
* Gossip block and blobs
* `process_gossip_data_column_sidecar`
* `process_gossip_blob`
* `process_gossip_block`
* Rpc block and blobs
* `process_rpc_block`
* `process_rpc_blobs`
* `process_rpc_custody_columns`
* Rpc blocks (range and backfill)
* `process_chain_segment`
* `PendingComponents` lifecycle
* `pending_components`
To test locally:
* Run Grafana and Tempo with https://github.com/sigp/lighthouse-metrics/pull/57
* Run Lighthouse BN with `--telemetry-collector-url http://localhost:4317`
Some captured traces can be found here: https://hackmd.io/@jimmygchen/r1sLOxPPeg
Removing the old spans seem to have reduced the memory usage quite a lot - i think we were using them on long running tasks and too excessively:
<img width="910" height="495" alt="image" src="https://github.com/user-attachments/assets/5208bbe4-53b2-4ead-bc71-0b782c788669" />
This PR fixes a bug where wrong columns could get processed immediately after a CGC increase.
Scenario:
- The node's CGC increased due to additional validators attached to it (lets say from 10 to 11)
- The new CGC is advertised and new subnets are subscribed immediately, however the change won't be effective in the data availability check until the next epoch (See [this](ab0e8870b4/beacon_node/beacon_chain/src/validator_custody.rs (L93-L99))). Data availability checker still only require 10 columns for the current epoch.
- During this time, data columns for the additional custody column (lets say column 11) may arrive via gossip as we're already subscribed to the topic, and it may be incorrectly used to satisfy the existing data availability requirement (10 columns), and result in this additional column (instead of a required one) getting persisted, resulting in database inconsistency.
Which issue # does this PR address?
Closes#7604
Improvements to range sync including:
1. Contain column requests only to peers that are part of the SyncingChain
2. Attribute the fault to the correct peer and downscore them if they don't return the data columns for the request
3. Improve sync performance by retrying only the failed columns from other peers instead of failing the entire batch
4. Uses the earliest_available_slot to make requests to peers that claim to have the epoch. Note: if no earliest_available_slot info is available, fallback to using previous logic i.e. assume peer has everything backfilled upto WS checkpoint/da boundary
Tested this on fusaka-devnet-2 with a full node and supernode and the recovering logic seems to works well.
Also tested this a little on mainnet.
Need to do more testing and possibly add some unit tests.
Fixes#7155.
It turns out the issue is caused by calling a function that creates an info span (`chain.id()` here), e.g.
```rust
debug!(id = chain.id(), ?sync_type, reason = ?remove_reason, op, "Chain removed");
```
I've remove all unneeded spans, especially getter functions - there's little reasons for span and they often get used in logging. We should also revisit all the spans after the release - i think we could make them more useful than they are today.
I've let it run for a while and no longer seeing any `DEBUG` logs.
I think this should resolve#7155
This removes the level field from the instrumenting we were doing across a range of functions. The level will now default to the level of the log.
Resolves#6767
This PR implements a basic version of validator custody.
- It introduces a new `CustodyContext` object which contains info regarding number of validators attached to a node and the custody count they contribute to the cgc.
- The `CustodyContext` is added in the da_checker and has methods for returning the current cgc and the number of columns to sample at head. Note that the logic for returning the cgc existed previously in the network globals.
- To estimate the number of validators attached, we use the `beacon_committee_subscriptions` endpoint. This might overestimate the number of validators actually publishing attestations from the node in the case of multi BN setups. We could also potentially use the `publish_attestations` endpoint to get a more conservative estimate at a later point.
- Anytime there's a change in the `custody_group_count` due to addition/removal of validators, the custody context should send an event on a broadcast channnel. The only subscriber for the channel exists in the network service which simply subscribes to more subnets. There can be additional subscribers in sync that will start a backfill once the cgc changes.
TODO
- [ ] **NOT REQUIRED:** Currently, the logic only handles an increase in validator count and does not handle a decrease. We should ideally unsubscribe from subnets when the cgc has decreased.
- [ ] **NOT REQUIRED:** Add a service in the `CustodyContext` that emits an event once `MIN_EPOCHS_FOR_BLOB_SIDECARS_REQUESTS ` passes after updating the current cgc. This event should be picked up by a subscriber which updates the enr and metadata.
- [x] Add more tests
- Re-opens https://github.com/sigp/lighthouse/pull/6864 targeting unstable
Range sync and backfill sync still assume that each batch request is done by a single peer. This assumption breaks with PeerDAS, where we request custody columns to N peers.
Issues with current unstable:
- Peer prioritization counts batch requests per peer. This accounting is broken now, data columns by range request are not accounted
- Peer selection for data columns by range ignores the set of peers on a syncing chain, instead draws from the global pool of peers
- The implementation is very strict when we have no peers to request from. After PeerDAS this case is very common and we want to be flexible or easy and handle that case better than just hard failing everything.
- [x] Upstream peer prioritization to the network context, it knows exactly how many active requests a peer (including columns by range)
- [x] Upstream peer selection to the network context, now `block_components_by_range_request` gets a set of peers to choose from instead of a single peer. If it can't find a peer, it returns the error `RpcRequestSendError::NoPeer`
- [ ] Range sync and backfill sync handle `RpcRequestSendError::NoPeer` explicitly
- [ ] Range sync: leaves the batch in `AwaitingDownload` state and does nothing. **TODO**: we should have some mechanism to fail the chain if it's stale for too long - **EDIT**: Not done in this PR
- [x] Backfill sync: pauses the sync until another peer joins - **EDIT**: Same logic as unstable
### TODOs
- [ ] Add tests :)
- [x] Manually test backfill sync
Note: this touches the mainnet path!
Debugging an sync issue from @pawanjay176 I'm missing some key info where instead of logging the ID of the SyncingChain we just log "Finalized" (the sync type). This looks like some typo or something was lost in translation when refactoring things.
```
Apr 17 12:12:00.707 DEBUG Syncing new finalized chain chain: Finalized, component: "range_sync"
```
This log should include more info about the new chain but just logs "Finalized"
```
Apr 17 12:12:00.810 DEBUG New chain added to sync peer_id: "16Uiu2HAmHP8QLYQJwZ4cjMUEyRgxzpkJF87qPgNecLTpUdruYbdA", sync_type: Finalized, new_chain: Finalized, component: "range_sync"
```
- Remove the Display impl and log the ID explicitly for all logs.
- Log more details when creating a new SyncingChain
Currently range sync download log errors just say `error: rpc_error` which isn't helpful. The actual error is suppressed unless logged somewhere else.
Log the actual error that caused the batch download to fail as part of the log that states that the batch download failed.
Currently we track a key metric `PEERS_PER_COLUMN_SUBNET` in a getter `good_peers_on_sampling_subnets`. Another PR https://github.com/sigp/lighthouse/pull/6922 deletes that function, so we have to move the metric anyway. This PR moves that metric computation to the metrics spawned task which is refreshed every 5 seconds.
I also added a few more useful metrics. The total set and intended usage is:
- `sync_peers_per_column_subnet`: Track health of overall subnets in your node
- `sync_peers_per_custody_column_subnet`: Track health of the subnets your node needs. We should track this metric closely in our dashboards with a heatmap and bar plot
- ~~`sync_column_subnets_with_zero_peers`: Is equivalent to the Grafana query `count(sync_peers_per_column_subnet == 0) by (instance)`. We may prefer to skip it, but I believe it's the most important metric as if `sync_column_subnets_with_zero_peers > 0` your node stalls.~~
- ~~`sync_custody_column_subnets_with_zero_peers`: `count(sync_peers_per_custody_column_subnet == 0) by (instance)`~~
Currently we have very poor coverage of range sync with unit tests. With the event driven test framework we could cover much more ground and be confident when modifying the code.
Add two basic cases:
- Happy path, complete a finalized sync for 2 epochs
- Post-PeerDAS case where we start without enough custody peers and later we find enough
⚠️ If you have ideas for more test cases, please let me know! I'll write them
- PR https://github.com/sigp/lighthouse/pull/6497 made obsolete some consistency checks inside the batch
I forgot to remove the consumers of those errors
Remove un-used batch sync error condition, which was a nested `Result<_, Result<_, E>>`
Part of
- https://github.com/sigp/lighthouse/issues/6258
To address PeerDAS sync issues we need to make individual by_range requests within a batch retriable. We should adopt the same pattern for lookup sync where each request (block/blobs/columns) is tracked individually within a "meta" request that group them all and handles retry logic.
- Building on https://github.com/sigp/lighthouse/pull/6398
second step is to add individual request accumulators for `blocks_by_range`, `blobs_by_range`, and `data_columns_by_range`. This will allow each request to progress independently and be retried separately.
Most of the logic is just piping, excuse the large diff. This PR does not change the logic of how requests are handled or retried. This will be done in a future PR changing the logic of `RangeBlockComponentsRequest`.
### Before
- Sync manager receives block with `SyncRequestId::RangeBlockAndBlobs`
- Insert block into `SyncNetworkContext::range_block_components_requests`
- (If received stream terminators of all requests)
- Return `Vec<RpcBlock>`, and insert into `range_sync`
### Now
- Sync manager receives block with `SyncRequestId::RangeBlockAndBlobs`
- Insert block into `SyncNetworkContext:: blocks_by_range_requests`
- (If received stream terminator of this request)
- Return `Vec<SignedBlock>`, and insert into `SyncNetworkContext::components_by_range_requests `
- (If received a result for all requests)
- Return `Vec<RpcBlock>`, and insert into `range_sync`
Currently, we set the `chain_id` of range sync chains to `u64(hash(target_root, target_slot))`, which results in a long integer.
```
Jan 27 00:43:27.246 DEBG Batch downloaded, chain: 4223372036854775807, awaiting_batches: 0, batch_state: [p,E,E,E,E], blocks: 0, epoch: 0, service: range_sync
```
Instead, we can use `network_context.next_id()` as we do for all other sync items and get a unique sequential (not too big) integer as id.
```
Jan 27 00:43:27.246 DEBG Batch downloaded, chain: 4, awaiting_batches: 0, batch_state: [p,E,E,E,E], blocks: 0, epoch: 0, service: range_sync
```
Also, if a specific chain for the same target is retried later, it won't get the same ID so we can more clearly differentiate the logs associated with each attempt.
* Write range sync tests in external event-driven form
* Fix remaining test
* Drop unused generics
* Merge branch 'unstable' into range-sync-tests
* Add reference to test author
* Use async await
* Fix failing test. Not sure how it was passing before without an EL.
* add fork_name_enabled fn to Forkname impl
* refactor codebase to use new fork_enabled fn
* fmt
* Merge branch 'unstable' of https://github.com/sigp/lighthouse into fork-ord-impl
* small code cleanup
* resolve merge conflicts
* fix beacon chain test
* merge conflicts
* fix ef test issue
* resolve merge conflicts
* Improve `get_custody_columns` validation, caching and error handling.
* Merge branch 'unstable' into get-custody-columns-error-handing
* Fix failing test and add more test.
* Fix failing test and add more test.
* Merge branch 'unstable' into get-custody-columns-error-handing
# Conflicts:
# beacon_node/lighthouse_network/src/discovery/subnet_predicate.rs
# beacon_node/lighthouse_network/src/peer_manager/peerdb.rs
# beacon_node/lighthouse_network/src/peer_manager/peerdb/peer_info.rs
# beacon_node/lighthouse_network/src/types/globals.rs
# beacon_node/network/src/service.rs
# consensus/types/src/data_column_subnet_id.rs
* Add unit test to make sure the default specs won't panic on the `compute_custody_requirement_subnets` function.
* Add condition when calling `compute_custody_subnets_from_metadata` and update logs.
* Validate `csc` when returning from enr. Remove `csc` computation on connection since we get them on metadata anyway.
* Add `peers_per_custody_subnet_count` to track peer csc and supernodes.
* Disconnect peers with invalid metadata and find other peers instead.
* Fix sampling tests.
* Merge branch 'unstable' into get-custody-columns-error-handing
* Merge branch 'unstable' into get-custody-columns-error-handing
* 1D PeerDAS prototype: Data format and Distribution (#5050)
* Build and publish column sidecars. Add stubs for gossip.
* Add blob column subnets
* Add `BlobColumnSubnetId` and initial compute subnet logic.
* Subscribe to blob column subnets.
* Introduce `BLOB_COLUMN_SUBNET_COUNT` based on DAS configuration parameter changes.
* Fix column sidecar type to use `VariableList` for data.
* Fix lint errors.
* Update types and naming to latest consensus-spec #3574.
* Fix test and some cleanups.
* Merge branch 'unstable' into das
* Merge branch 'unstable' into das
* Merge branch 'unstable' into das
# Conflicts:
# consensus/types/src/chain_spec.rs
* Add `DataColumnSidecarsByRoot ` req/resp protocol (#5196)
* Add stub for `DataColumnsByRoot`
* Add basic implementation of serving RPC data column from DA checker.
* Store data columns in early attester cache and blobs db.
* Apply suggestions from code review
Co-authored-by: Eitan Seri-Levi <eserilev@gmail.com>
Co-authored-by: Jacob Kaufmann <jacobkaufmann18@gmail.com>
* Fix build.
* Store `DataColumnInfo` in database and various cleanups.
* Update `DataColumnSidecar` ssz max size and remove panic code.
---------
Co-authored-by: Eitan Seri-Levi <eserilev@gmail.com>
Co-authored-by: Jacob Kaufmann <jacobkaufmann18@gmail.com>
* feat: add DAS KZG in data col construction (#5210)
* feat: add DAS KZG in data col construction
* refactor data col sidecar construction
* refactor: add data cols to GossipVerifiedBlockContents
* Disable windows tests for `das` branch. (c-kzg doesn't build on windows)
* Formatting and lint changes only.
* refactor: remove iters in construction of data cols
* Update vec capacity and error handling.
* Add `data_column_sidecar_computation_seconds` metric.
---------
Co-authored-by: Jimmy Chen <jchen.tc@gmail.com>
* Merge branch 'unstable' into das
# Conflicts:
# .github/workflows/test-suite.yml
# beacon_node/lighthouse_network/src/types/topics.rs
* fix: update data col subnet count from 64 to 32 (#5413)
* feat: add peerdas custody field to ENR (#5409)
* feat: add peerdas custody field to ENR
* add hash prefix step in subnet computation
* refactor test and fix possible u64 overflow
* default to min custody value if not present in ENR
* Merge branch 'unstable' into das
* Merge branch 'unstable' into das-unstable-merge-0415
# Conflicts:
# Cargo.lock
# beacon_node/beacon_chain/src/data_availability_checker.rs
# beacon_node/beacon_chain/src/data_availability_checker/availability_view.rs
# beacon_node/beacon_chain/src/data_availability_checker/overflow_lru_cache.rs
# beacon_node/beacon_chain/src/data_availability_checker/processing_cache.rs
# beacon_node/lighthouse_network/src/rpc/methods.rs
# beacon_node/network/src/network_beacon_processor/mod.rs
# beacon_node/network/src/sync/block_lookups/tests.rs
# crypto/kzg/Cargo.toml
* Merge remote-tracking branch 'sigp/unstable' into das
* Merge remote-tracking branch 'sigp/unstable' into das
* Fix merge conflicts.
* Send custody data column to `DataAvailabilityChecker` for determining block importability (#5570)
* Only import custody data columns after publishing a block.
* Add `subscribe-all-data-column-subnets` and pass custody column count to `availability_cache`.
* Add custody requirement checks to `availability_cache`.
* Fix config not being passed to DAChecker and add more logging.
* Introduce `peer_das_epoch` and make blobs and columns mutually exclusive.
* Add DA filter for PeerDAS.
* Fix data availability check and use test_logger in tests.
* Fix subscribe to all data column subnets not working correctly.
* Fix tests.
* Only publish column sidecars if PeerDAS is activated. Add `PEER_DAS_EPOCH` chain spec serialization.
* Remove unused data column index in `OverflowKey`.
* Fix column sidecars incorrectly produced when there are no blobs.
* Re-instate index to `OverflowKey::DataColumn` and downgrade noisy debug log to `trace`.
* DAS sampling on sync (#5616)
* Data availability sampling on sync
* Address @jimmygchen review
* Trigger sampling
* Address some review comments and only send `SamplingBlock` sync message after PEER_DAS_EPOCH.
---------
Co-authored-by: Jimmy Chen <jchen.tc@gmail.com>
* Merge branch 'unstable' into das
# Conflicts:
# Cargo.lock
# Cargo.toml
# beacon_node/beacon_chain/src/block_verification.rs
# beacon_node/http_api/src/publish_blocks.rs
# beacon_node/lighthouse_network/src/rpc/codec/ssz_snappy.rs
# beacon_node/lighthouse_network/src/rpc/protocol.rs
# beacon_node/lighthouse_network/src/types/pubsub.rs
# beacon_node/network/src/sync/block_lookups/single_block_lookup.rs
# beacon_node/store/src/hot_cold_store.rs
# consensus/types/src/beacon_state.rs
# consensus/types/src/chain_spec.rs
# consensus/types/src/eth_spec.rs
* Merge branch 'unstable' into das
* Re-process early sampling requests (#5569)
* Re-process early sampling requests
# Conflicts:
# beacon_node/beacon_processor/src/work_reprocessing_queue.rs
# beacon_node/lighthouse_network/src/rpc/methods.rs
# beacon_node/network/src/network_beacon_processor/rpc_methods.rs
* Update beacon_node/beacon_processor/src/work_reprocessing_queue.rs
Co-authored-by: Jimmy Chen <jchen.tc@gmail.com>
* Add missing var
* Beta compiler fixes and small typo fixes.
* Remove duplicate method.
---------
Co-authored-by: Jimmy Chen <jchen.tc@gmail.com>
* Merge remote-tracking branch 'sigp/unstable' into das
* Fix merge conflict.
* Add data columns by root to currently supported protocol list (#5678)
* Add data columns by root to currently supported protocol list.
* Add missing data column by roots handling.
* Merge branch 'unstable' into das
# Conflicts:
# Cargo.lock
# Cargo.toml
# beacon_node/network/src/sync/block_lookups/tests.rs
# beacon_node/network/src/sync/manager.rs
* Fix simulator tests on `das` branch (#5731)
* Bump genesis delay in sim tests as KZG setup takes longer for DAS.
* Fix incorrect YAML spacing.
* DataColumnByRange boilerplate (#5353)
* add boilerplate
* fmt
* PeerDAS custody lookup sync (#5684)
* Implement custody sync
* Lint
* Fix tests
* Fix rebase issue
* Add data column kzg verification and update `c-kzg`. (#5701)
* Add data column kzg verification and update `c-kzg`.
* Fix incorrect `Cell` size.
* Add kzg verification on rpc blocks.
* Add kzg verification on rpc data columns.
* Rename `PEER_DAS_EPOCH` to `EIP7594_FORK_EPOCH` for client interop. (#5750)
* Fetch custody columns in range sync (#5747)
* Fetch custody columns in range sync
* Clean up todos
* Remove `BlobSidecar` construction and publish after PeerDAS activated (#5759)
* Avoid building and publishing blob sidecars after PeerDAS.
* Ignore gossip blobs with a slot greater than peer das activation epoch.
* Only attempt to verify blob count and import blobs before PeerDAS.
* #5684 review comments (#5748)
* #5684 review comments.
* Doc and message update only.
* Fix incorrect condition when constructing `RpcBlock` with `DataColumn`s
* Make sampling tests deterministic (#5775)
* PeerDAS spec tests (#5772)
* Add get_custody_columns spec tests.
* Add kzg merkle proof spec tests.
* Add SSZ spec tests.
* Add remaining KZG tests
* Load KZG only once per process, exclude electra tests and add missing SSZ tests.
* Fix lint and missing changes.
* Ignore macOS generated file.
* Merge remote branch 'sigp/unstable' into das
* Merge remote tracking branch 'origin/unstable' into das
* Implement unconditional reconstruction for supernodes (#5781)
* Implement unconditional reconstruction for supernodes
* Move code into KzgVerifiedCustodyDataColumn
* Remove expect
* Add test
* Thanks justin
* Add withhold attack mode for interop (#5788)
* Add withhold attack mode
* Update readme
* Drop added readmes
* Undo styling changes
* Add column gossip verification and handle unknown parent block (#5783)
* Add column gossip verification and handle missing parent for columns.
* Review PR
* Fix rebase issue
* more lint issues :)
---------
Co-authored-by: dapplion <35266934+dapplion@users.noreply.github.com>
* Trigger sampling on sync events (#5776)
* Trigger sampling on sync events
* Update beacon_chain.rs
* Fix tests
* Fix tests
* PeerDAS parameter changes for devnet-0 (#5779)
* Update PeerDAS parameters to latest values.
* Lint fix
* Fix lint.
* Update hardcoded subnet count to 64 (#5791)
* Fix incorrect columns per subnet and config cleanup (#5792)
* Tidy up PeerDAS preset and config values.
* Fix broken config
* Fix DAS branch CI (#5793)
* Fix invalid syntax.
* Update cli doc. Ignore get_custody_columns test temporarily.
* Fix failing test and add verify inclusion test.
* Undo accidentally removed code.
* Only attempt reconstruct columns once. (#5794)
* Re-enable precompute table for peerdas kzg (#5795)
* Merge branch 'unstable' into das
* Update subscription filter. (#5797)
* Remove penalty for duplicate columns (expected due to reconstruction) (#5798)
* Revert DAS config for interop testing. Optimise get_custody_columns function. (#5799)
* Don't perform reconstruction for proposer node as it already has all the columns. (#5806)
* Multithread compute_cells_and_proofs (#5805)
* Multi-thread reconstruct data columns
* Multi-thread path for block production
* Merge branch 'unstable' into das
# Conflicts:
# .github/workflows/test-suite.yml
# beacon_node/network/src/sync/block_lookups/mod.rs
# beacon_node/network/src/sync/block_lookups/single_block_lookup.rs
# beacon_node/network/src/sync/network_context.rs
* Fix CI errors.
* Move PeerDAS type-level config to configurable `ChainSpec` (#5828)
* Move PeerDAS type level config to `ChainSpec`.
* Fix tests
* Misc custody lookup improvements (#5821)
* Improve custody requests
* Type DataColumnsByRootRequestId
* Prioritize peers and load balance
* Update tests
* Address PR review
* Merge branch 'unstable' into das
* Rename deploy_block in network config (`das` branch) (#5852)
* Rename deploy_block.txt to deposit_contract_block.txt
* fmt
---------
Co-authored-by: Pawan Dhananjay <pawandhananjay@gmail.com>
* Merge branch 'unstable' into das
* Fix CI and merge issues.
* Merge branch 'unstable' into das
# Conflicts:
# beacon_node/beacon_chain/src/data_availability_checker/overflow_lru_cache.rs
# lcli/src/main.rs
* Store data columns individually in store and caches (#5890)
* Store data columns individually in store and caches
* Implement data column pruning
* Merge branch 'unstable' into das
# Conflicts:
# Cargo.lock
* Update reconstruction benches to newer criterion version. (#5949)
* Merge branch 'unstable' into das
# Conflicts:
# .github/workflows/test-suite.yml
* chore: add `recover_cells_and_compute_proofs` method (#5938)
* chore: add recover_cells_and_compute_proofs method
* Introduce type alias `CellsAndKzgProofs` to address type complexity.
---------
Co-authored-by: Jimmy Chen <jchen.tc@gmail.com>
* Update `csc` format in ENR and spec tests for devnet-1 (#5966)
* Update `csc` format in ENR.
* Add spec tests for `recover_cells_and_kzg_proofs`.
* Add tests for ENR.
* Fix failing tests.
* Add protection against invalid csc value in ENR.
* Fix lint
* Fix csc encoding and decoding (#5997)
* Fix data column rpc request not being sent due to incorrect limits set. (#6000)
* Fix incorrect inbound request count causing rate limiting. (#6025)
* Merge branch 'stable' into das
# Conflicts:
# beacon_node/network/src/sync/block_lookups/tests.rs
# beacon_node/network/src/sync/block_sidecar_coupling.rs
# beacon_node/network/src/sync/manager.rs
# beacon_node/network/src/sync/network_context.rs
# beacon_node/network/src/sync/network_context/requests.rs
* Merge remote-tracking branch 'unstable' into das
* Add kurtosis config for DAS testing (#5968)
* Add kurtosis config for DAS testing.
* Fix invalid yaml file
* Update network parameter files.
* chore: add rust PeerdasKZG crypto library for peerdas functionality and rollback c-kzg dependency to 4844 version (#5941)
* chore: add recover_cells_and_compute_proofs method
* chore: add rust peerdas crypto library
* chore: integrate peerdaskzg rust library into kzg crate
* chore(multi):
- update `ssz_cell_to_crypto_cell`
- update conversion from the crypto cell type to a Vec<u8>. Since the Rust library defines them as references to an array, the conversion is simply `to_vec`
* chore(multi):
- update rest of code to handle the new crypto `Cell` type
- update test case code to no longer use the Box type
* chore: cleanup of superfluous conversions
* chore: revert c-kzg dependency back to v1
* chore: move dependency into correct order
* chore: update rust dependency
- This version includes a new method `PeerDasContext::with_num_threads`
* chore: remove Default initialization of PeerDasContext and explicitly set the parameters in `new_from_trusted_setup`
* chore: cleanup exports
* chore: commit updated cargo.lock
* Update Cargo.toml
Co-authored-by: Jimmy Chen <jchen.tc@gmail.com>
* chore: rename dependency
* chore: update peerdas lib
- sets the blst version to 0.3 so that it matches whatever lighthouse is using. Although 0.3.12 is latest, lighthouse is pinned to 0.3.3
* chore: fix clippy lifetime
- Rust doesn't allow you to elide the lifetime on type aliases
* chore: cargo clippy fix
* chore: cargo fmt
* chore: update lib to add redundant checks (these will be removed in consensus-specs PR 3819)
* chore: update dependency to ignore proofs
* chore: update peerdas lib to latest
* update lib
* chore: remove empty proof parameter
---------
Co-authored-by: Jimmy Chen <jchen.tc@gmail.com>
* Update PeerDAS interop testnet config (#6069)
* Update interop testnet config.
* Fix typo and remove target peers
* Avoid retrying same sampling peer that previously failed. (#6084)
* Various fixes to custody range sync (#6004)
* Only start requesting batches when there are good peers across all custody columns to avoid spaming block requests.
* Add custody peer check before mutating `BatchInfo` to avoid inconsistent state.
* Add check to cover a case where batch is not processed while waiting for custody peers to become available.
* Fix lint and logic error
* Fix `good_peers_on_subnet` always returning false for `DataColumnSubnet`.
* Add test for `get_custody_peers_for_column`
* Revert epoch parameter refactor.
* Fall back to default custody requiremnt if peer ENR is not present.
* Add metrics and update code comment.
* Add more debug logs.
* Use subscribed peers on subnet before MetaDataV3 is implemented. Remove peer_id matching when injecting error because multiple peers are used for range requests. Use randomized custodial peer to avoid repeatedly sending requests to failing peers. Batch by range request where possible.
* Remove unused code and update docs.
* Add comment
* chore: update peerdas-kzg library (#6118)
* chore: update peerDAS lib
* chore: update library
* chore: update library to version that include "init context" benchmarks and optional validation checks
* chore: (can remove) -- Add benchmarks for init context
* Prevent continuous searchers for low-peer networks (#6162)
* Merge branch 'unstable' into das
* Fix merge conflicts
* Add cli flag to enable sampling and disable by default. (#6209)
* chore: Use reference to an array representing a blob instead of an owned KzgBlob (#6179)
* add KzgBlobRef type
* modify code to use KzgBlobRef
* clippy
* Remove Deneb blob related changes to maintain compatibility with `c-kzg-4844`.
---------
Co-authored-by: Jimmy Chen <jchen.tc@gmail.com>
* Store computed custody subnets in PeerDB and fix custody lookup test (#6218)
* Fix failing custody lookup tests.
* Store custody subnets in PeerDB, fix custody lookup test and refactor some methods.
* Merge branch 'unstable' into das
# Conflicts:
# beacon_node/beacon_chain/src/beacon_chain.rs
# beacon_node/beacon_chain/src/block_verification_types.rs
# beacon_node/beacon_chain/src/builder.rs
# beacon_node/beacon_chain/src/data_availability_checker.rs
# beacon_node/beacon_chain/src/data_availability_checker/overflow_lru_cache.rs
# beacon_node/beacon_chain/src/data_column_verification.rs
# beacon_node/beacon_chain/src/early_attester_cache.rs
# beacon_node/beacon_chain/src/historical_blocks.rs
# beacon_node/beacon_chain/tests/store_tests.rs
# beacon_node/lighthouse_network/src/discovery/enr.rs
# beacon_node/network/src/service.rs
# beacon_node/src/cli.rs
# beacon_node/store/src/hot_cold_store.rs
# beacon_node/store/src/lib.rs
# lcli/src/generate_bootnode_enr.rs
* Fix CI failures after merge.
* Batch sampling requests by peer (#6256)
* Batch sampling requests by peer
* Fix clippy errors
* Fix tests
* Add column_index to error message for ease of tracing
* Remove outdated comment
* Fix range sync never evaluating request as finished, causing it to get stuck. (#6276)
* Merge branch 'unstable' into das-0821-merge
# Conflicts:
# Cargo.lock
# Cargo.toml
# beacon_node/beacon_chain/src/beacon_chain.rs
# beacon_node/beacon_chain/src/data_availability_checker.rs
# beacon_node/beacon_chain/src/data_availability_checker/overflow_lru_cache.rs
# beacon_node/beacon_chain/src/data_column_verification.rs
# beacon_node/beacon_chain/src/kzg_utils.rs
# beacon_node/beacon_chain/src/metrics.rs
# beacon_node/beacon_processor/src/lib.rs
# beacon_node/lighthouse_network/src/rpc/codec/ssz_snappy.rs
# beacon_node/lighthouse_network/src/rpc/config.rs
# beacon_node/lighthouse_network/src/rpc/methods.rs
# beacon_node/lighthouse_network/src/rpc/outbound.rs
# beacon_node/lighthouse_network/src/rpc/rate_limiter.rs
# beacon_node/lighthouse_network/src/service/api_types.rs
# beacon_node/lighthouse_network/src/types/globals.rs
# beacon_node/network/src/network_beacon_processor/mod.rs
# beacon_node/network/src/network_beacon_processor/rpc_methods.rs
# beacon_node/network/src/network_beacon_processor/sync_methods.rs
# beacon_node/network/src/sync/block_lookups/common.rs
# beacon_node/network/src/sync/block_lookups/mod.rs
# beacon_node/network/src/sync/block_lookups/single_block_lookup.rs
# beacon_node/network/src/sync/block_lookups/tests.rs
# beacon_node/network/src/sync/manager.rs
# beacon_node/network/src/sync/network_context.rs
# consensus/types/src/data_column_sidecar.rs
# crypto/kzg/Cargo.toml
# crypto/kzg/benches/benchmark.rs
# crypto/kzg/src/lib.rs
* Fix custody tests and load PeerDAS KZG instead.
* Fix ef tests and bench compilation.
* Fix failing sampling test.
* Merge pull request #6287 from jimmygchen/das-0821-merge
Merge `unstable` into `das` 20240821
* Remove get_block_import_status
* Merge branch 'unstable' into das
* Re-enable Windows release tests.
* Address some review comments.
* Address more review comments and cleanups.
* Comment out peer DAS KZG EF tests for now
* Address more review comments and fix build.
* Merge branch 'das' of github.com:sigp/lighthouse into das
* Unignore Electra tests
* Fix metric name
* Address some of Pawan's review comments
* Merge remote-tracking branch 'origin/unstable' into das
* Update PeerDAS network parameters for peerdas-devnet-2 (#6290)
* update subnet count & custody req
* das network params
* update ef tests
---------
Co-authored-by: Jimmy Chen <jchen.tc@gmail.com>
* Add randomization in sync retry batch peer selection
* Use min
* Apply suggestions from code review
Co-authored-by: João Oliveira <hello@jxs.pt>
* Merge branch 'unstable' into peer-prio