PeerDAS implementation (#5683)

* 1D PeerDAS prototype: Data format and Distribution (#5050)

* Build and publish column sidecars. Add stubs for gossip.

* Add blob column subnets

* Add `BlobColumnSubnetId` and initial compute subnet logic.

* Subscribe to blob column subnets.

* Introduce `BLOB_COLUMN_SUBNET_COUNT` based on DAS configuration parameter changes.

* Fix column sidecar type to use `VariableList` for data.

* Fix lint errors.

* Update types and naming to latest consensus-spec #3574.

* Fix test and some cleanups.

* Merge branch 'unstable' into das

* Merge branch 'unstable' into das

* Merge branch 'unstable' into das

# Conflicts:
#	consensus/types/src/chain_spec.rs

* Add `DataColumnSidecarsByRoot ` req/resp protocol (#5196)

* Add stub for `DataColumnsByRoot`

* Add basic implementation of serving RPC data column from DA checker.

* Store data columns in early attester cache and blobs db.

* Apply suggestions from code review

Co-authored-by: Eitan Seri-Levi <eserilev@gmail.com>
Co-authored-by: Jacob Kaufmann <jacobkaufmann18@gmail.com>

* Fix build.

* Store `DataColumnInfo` in database and various cleanups.

* Update `DataColumnSidecar` ssz max size and remove panic code.

---------

Co-authored-by: Eitan Seri-Levi <eserilev@gmail.com>
Co-authored-by: Jacob Kaufmann <jacobkaufmann18@gmail.com>

* feat: add DAS KZG in data col construction (#5210)

* feat: add DAS KZG in data col construction

* refactor data col sidecar construction

* refactor: add data cols to GossipVerifiedBlockContents

* Disable windows tests for `das` branch. (c-kzg doesn't build on windows)

* Formatting and lint changes only.

* refactor: remove iters in construction of data cols

* Update vec capacity and error handling.

* Add `data_column_sidecar_computation_seconds` metric.

---------

Co-authored-by: Jimmy Chen <jchen.tc@gmail.com>

* Merge branch 'unstable' into das

# Conflicts:
#	.github/workflows/test-suite.yml
#	beacon_node/lighthouse_network/src/types/topics.rs

* fix: update data col subnet count from 64 to 32 (#5413)

* feat: add peerdas custody field to ENR (#5409)

* feat: add peerdas custody field to ENR

* add hash prefix step in subnet computation

* refactor test and fix possible u64 overflow

* default to min custody value if not present in ENR

* Merge branch 'unstable' into das

* Merge branch 'unstable' into das-unstable-merge-0415

# Conflicts:
#	Cargo.lock
#	beacon_node/beacon_chain/src/data_availability_checker.rs
#	beacon_node/beacon_chain/src/data_availability_checker/availability_view.rs
#	beacon_node/beacon_chain/src/data_availability_checker/overflow_lru_cache.rs
#	beacon_node/beacon_chain/src/data_availability_checker/processing_cache.rs
#	beacon_node/lighthouse_network/src/rpc/methods.rs
#	beacon_node/network/src/network_beacon_processor/mod.rs
#	beacon_node/network/src/sync/block_lookups/tests.rs
#	crypto/kzg/Cargo.toml

* Merge remote-tracking branch 'sigp/unstable' into das

* Merge remote-tracking branch 'sigp/unstable' into das

* Fix merge conflicts.

* Send custody data column to `DataAvailabilityChecker` for determining block importability (#5570)

* Only import custody data columns after publishing a block.

* Add `subscribe-all-data-column-subnets` and pass custody column count to `availability_cache`.

* Add custody requirement checks to `availability_cache`.

* Fix config not being passed to DAChecker and add more logging.

* Introduce `peer_das_epoch` and make blobs and columns mutually exclusive.

* Add DA filter for PeerDAS.

* Fix data availability check and use test_logger in tests.

* Fix subscribe to all data column subnets not working correctly.

* Fix tests.

* Only publish column sidecars if PeerDAS is activated. Add `PEER_DAS_EPOCH` chain spec serialization.

* Remove unused data column index in `OverflowKey`.

* Fix column sidecars incorrectly produced when there are no blobs.

* Re-instate index to `OverflowKey::DataColumn` and downgrade noisy debug log to `trace`.

* DAS sampling on sync (#5616)

* Data availability sampling on sync

* Address @jimmygchen review

* Trigger sampling

* Address some review comments and only send `SamplingBlock` sync message after PEER_DAS_EPOCH.

---------

Co-authored-by: Jimmy Chen <jchen.tc@gmail.com>

* Merge branch 'unstable' into das

# Conflicts:
#	Cargo.lock
#	Cargo.toml
#	beacon_node/beacon_chain/src/block_verification.rs
#	beacon_node/http_api/src/publish_blocks.rs
#	beacon_node/lighthouse_network/src/rpc/codec/ssz_snappy.rs
#	beacon_node/lighthouse_network/src/rpc/protocol.rs
#	beacon_node/lighthouse_network/src/types/pubsub.rs
#	beacon_node/network/src/sync/block_lookups/single_block_lookup.rs
#	beacon_node/store/src/hot_cold_store.rs
#	consensus/types/src/beacon_state.rs
#	consensus/types/src/chain_spec.rs
#	consensus/types/src/eth_spec.rs

* Merge branch 'unstable' into das

* Re-process early sampling requests (#5569)

* Re-process early sampling requests

# Conflicts:
#	beacon_node/beacon_processor/src/work_reprocessing_queue.rs
#	beacon_node/lighthouse_network/src/rpc/methods.rs
#	beacon_node/network/src/network_beacon_processor/rpc_methods.rs

* Update beacon_node/beacon_processor/src/work_reprocessing_queue.rs

Co-authored-by: Jimmy Chen <jchen.tc@gmail.com>

* Add missing var

* Beta compiler fixes and small typo fixes.

* Remove duplicate method.

---------

Co-authored-by: Jimmy Chen <jchen.tc@gmail.com>

* Merge remote-tracking branch 'sigp/unstable' into das

* Fix merge conflict.

* Add data columns by root to currently supported protocol list (#5678)

* Add data columns by root to currently supported protocol list.

* Add missing data column by roots handling.

* Merge branch 'unstable' into das

# Conflicts:
#	Cargo.lock
#	Cargo.toml
#	beacon_node/network/src/sync/block_lookups/tests.rs
#	beacon_node/network/src/sync/manager.rs

* Fix simulator tests on `das` branch (#5731)

* Bump genesis delay in sim tests as KZG setup takes longer for DAS.

* Fix incorrect YAML spacing.

* DataColumnByRange boilerplate (#5353)

* add boilerplate

* fmt

* PeerDAS custody lookup sync (#5684)

* Implement custody sync

* Lint

* Fix tests

* Fix rebase issue

* Add data column kzg verification and update `c-kzg`. (#5701)

* Add data column kzg verification and update `c-kzg`.

* Fix incorrect `Cell` size.

* Add kzg verification on rpc blocks.

* Add kzg verification on rpc data columns.

* Rename `PEER_DAS_EPOCH` to `EIP7594_FORK_EPOCH` for client interop. (#5750)

* Fetch custody columns in range sync (#5747)

* Fetch custody columns in range sync

* Clean up todos

* Remove `BlobSidecar` construction and publish after PeerDAS activated (#5759)

* Avoid building and publishing blob sidecars after PeerDAS.

* Ignore gossip blobs with a slot greater than peer das activation epoch.

* Only attempt to verify blob count and import blobs before PeerDAS.

* #5684 review comments (#5748)

* #5684 review comments.

* Doc and message update only.

* Fix incorrect condition when constructing `RpcBlock` with `DataColumn`s

* Make sampling tests deterministic (#5775)

* PeerDAS spec tests (#5772)

* Add get_custody_columns spec tests.

* Add kzg merkle proof spec tests.

* Add SSZ spec tests.

* Add remaining KZG tests

* Load KZG only once per process, exclude electra tests and add missing SSZ tests.

* Fix lint and missing changes.

* Ignore macOS generated file.

* Merge remote branch 'sigp/unstable' into das

* Merge remote tracking branch 'origin/unstable' into das

* Implement unconditional reconstruction for supernodes (#5781)

* Implement unconditional reconstruction for supernodes

* Move code into KzgVerifiedCustodyDataColumn

* Remove expect

* Add test

* Thanks justin

* Add withhold attack mode for interop (#5788)

* Add withhold attack mode

* Update readme

* Drop added readmes

* Undo styling changes

* Add column gossip verification and handle unknown parent block (#5783)

* Add column gossip verification and handle missing parent for columns.

* Review PR

* Fix rebase issue

* more lint issues :)

---------

Co-authored-by: dapplion <35266934+dapplion@users.noreply.github.com>

* Trigger sampling on sync events (#5776)

* Trigger sampling on sync events

* Update beacon_chain.rs

* Fix tests

* Fix tests

* PeerDAS parameter changes for devnet-0 (#5779)

* Update PeerDAS parameters to latest values.

* Lint fix

* Fix lint.

* Update hardcoded subnet count to 64 (#5791)

* Fix incorrect columns per subnet and config cleanup (#5792)

* Tidy up PeerDAS preset and config values.

* Fix broken config

* Fix DAS branch CI (#5793)

* Fix invalid syntax.

* Update cli doc. Ignore get_custody_columns test temporarily.

* Fix failing test and add verify inclusion test.

* Undo accidentally removed code.

* Only attempt reconstruct columns once. (#5794)

* Re-enable precompute table for peerdas kzg (#5795)

* Merge branch 'unstable' into das

* Update subscription filter. (#5797)

* Remove penalty for duplicate columns (expected due to reconstruction) (#5798)

* Revert DAS config for interop testing. Optimise get_custody_columns function. (#5799)

* Don't perform reconstruction for proposer node as it already has all the columns. (#5806)

* Multithread compute_cells_and_proofs (#5805)

* Multi-thread reconstruct data columns

* Multi-thread path for block production

* Merge branch 'unstable' into das

# Conflicts:
#	.github/workflows/test-suite.yml
#	beacon_node/network/src/sync/block_lookups/mod.rs
#	beacon_node/network/src/sync/block_lookups/single_block_lookup.rs
#	beacon_node/network/src/sync/network_context.rs

* Fix CI errors.

* Move PeerDAS type-level config to configurable `ChainSpec` (#5828)

* Move PeerDAS type level config to `ChainSpec`.

* Fix tests

* Misc custody lookup improvements (#5821)

* Improve custody requests

* Type DataColumnsByRootRequestId

* Prioritize peers and load balance

* Update tests

* Address PR review

* Merge branch 'unstable' into das

* Rename deploy_block in network config (`das` branch) (#5852)

* Rename deploy_block.txt to deposit_contract_block.txt

* fmt

---------

Co-authored-by: Pawan Dhananjay <pawandhananjay@gmail.com>

* Merge branch 'unstable' into das

* Fix CI and merge issues.

* Merge branch 'unstable' into das

# Conflicts:
#	beacon_node/beacon_chain/src/data_availability_checker/overflow_lru_cache.rs
#	lcli/src/main.rs

* Store data columns individually in store and caches (#5890)

* Store data columns individually in store and caches

* Implement data column pruning

* Merge branch 'unstable' into das

# Conflicts:
#	Cargo.lock

* Update reconstruction benches to newer criterion version. (#5949)

* Merge branch 'unstable' into das

# Conflicts:
#	.github/workflows/test-suite.yml

* chore: add `recover_cells_and_compute_proofs` method (#5938)

* chore: add recover_cells_and_compute_proofs method

* Introduce type alias `CellsAndKzgProofs` to address type complexity.

---------

Co-authored-by: Jimmy Chen <jchen.tc@gmail.com>

* Update `csc` format in ENR and spec tests for devnet-1 (#5966)

* Update `csc` format in ENR.

* Add spec tests for `recover_cells_and_kzg_proofs`.

* Add tests for ENR.

* Fix failing tests.

* Add protection against invalid csc value in ENR.

* Fix lint

* Fix csc encoding and decoding (#5997)

* Fix data column rpc request not being sent due to incorrect limits set. (#6000)

* Fix incorrect inbound request count causing rate limiting. (#6025)

* Merge branch 'stable' into das

# Conflicts:
#	beacon_node/network/src/sync/block_lookups/tests.rs
#	beacon_node/network/src/sync/block_sidecar_coupling.rs
#	beacon_node/network/src/sync/manager.rs
#	beacon_node/network/src/sync/network_context.rs
#	beacon_node/network/src/sync/network_context/requests.rs

* Merge remote-tracking branch 'unstable' into das

* Add kurtosis config for DAS testing (#5968)

* Add kurtosis config for DAS testing.

* Fix invalid yaml file

* Update network parameter files.

* chore: add rust PeerdasKZG crypto library for peerdas functionality and rollback c-kzg dependency to 4844 version (#5941)

* chore: add recover_cells_and_compute_proofs method

* chore: add rust peerdas crypto library

* chore: integrate peerdaskzg rust library into kzg crate

* chore(multi):

- update `ssz_cell_to_crypto_cell`
- update conversion from the crypto cell type to a Vec<u8>. Since the Rust library defines them as references to an array, the conversion is simply `to_vec`

* chore(multi):

- update rest of code to handle the new crypto `Cell` type
- update test case code to no longer use the Box type

* chore: cleanup of superfluous conversions

* chore: revert c-kzg dependency back to v1

* chore: move dependency into correct order

* chore: update rust dependency

- This version includes a new method `PeerDasContext::with_num_threads`

* chore: remove Default initialization of PeerDasContext and explicitly set the parameters in `new_from_trusted_setup`

* chore: cleanup exports

* chore: commit updated cargo.lock

* Update Cargo.toml

Co-authored-by: Jimmy Chen <jchen.tc@gmail.com>

* chore: rename dependency

* chore: update peerdas lib

- sets the blst version to 0.3 so that it matches whatever lighthouse is using. Although 0.3.12 is latest, lighthouse is pinned to 0.3.3

* chore: fix clippy lifetime

- Rust doesn't allow you to elide the lifetime on type aliases

* chore: cargo clippy fix

* chore: cargo fmt

* chore: update lib to add redundant checks (these will be removed in consensus-specs PR 3819)

* chore: update dependency to ignore proofs

* chore: update peerdas lib to latest

* update lib

* chore: remove empty proof parameter

---------

Co-authored-by: Jimmy Chen <jchen.tc@gmail.com>

* Update PeerDAS interop testnet config (#6069)

* Update interop testnet config.

* Fix typo and remove target peers

* Avoid retrying same sampling peer that previously failed. (#6084)

* Various fixes to custody range sync  (#6004)

* Only start requesting batches when there are good peers across all custody columns to avoid spaming block requests.

* Add custody peer check before mutating `BatchInfo` to avoid inconsistent state.

* Add check to cover a case where batch is not processed while waiting for custody peers to become available.

* Fix lint and logic error

* Fix `good_peers_on_subnet` always returning false for `DataColumnSubnet`.

* Add test for `get_custody_peers_for_column`

* Revert epoch parameter refactor.

* Fall back to default custody requiremnt if peer ENR is not present.

* Add metrics and update code comment.

* Add more debug logs.

* Use subscribed peers on subnet before MetaDataV3 is implemented. Remove peer_id matching when injecting error because multiple peers are used for range requests. Use randomized custodial peer to avoid repeatedly sending requests to failing peers. Batch by range request where possible.

* Remove unused code and update docs.

* Add comment

* chore: update peerdas-kzg library (#6118)

* chore: update peerDAS lib

* chore: update library

* chore: update library to version that include "init context" benchmarks and optional validation checks

* chore: (can remove) -- Add benchmarks for init context

* Prevent continuous searchers for low-peer networks (#6162)

* Merge branch 'unstable' into das

* Fix merge conflicts

* Add cli flag to enable sampling and disable by default. (#6209)

* chore: Use reference to an array representing a blob instead of an owned KzgBlob (#6179)

* add KzgBlobRef type

* modify code to use KzgBlobRef

* clippy

* Remove Deneb blob related changes to maintain compatibility with `c-kzg-4844`.

---------

Co-authored-by: Jimmy Chen <jchen.tc@gmail.com>

* Store computed custody subnets in PeerDB and fix custody lookup test (#6218)

* Fix failing custody lookup tests.

* Store custody subnets in PeerDB, fix custody lookup test and refactor some methods.

* Merge branch 'unstable' into das

# Conflicts:
#	beacon_node/beacon_chain/src/beacon_chain.rs
#	beacon_node/beacon_chain/src/block_verification_types.rs
#	beacon_node/beacon_chain/src/builder.rs
#	beacon_node/beacon_chain/src/data_availability_checker.rs
#	beacon_node/beacon_chain/src/data_availability_checker/overflow_lru_cache.rs
#	beacon_node/beacon_chain/src/data_column_verification.rs
#	beacon_node/beacon_chain/src/early_attester_cache.rs
#	beacon_node/beacon_chain/src/historical_blocks.rs
#	beacon_node/beacon_chain/tests/store_tests.rs
#	beacon_node/lighthouse_network/src/discovery/enr.rs
#	beacon_node/network/src/service.rs
#	beacon_node/src/cli.rs
#	beacon_node/store/src/hot_cold_store.rs
#	beacon_node/store/src/lib.rs
#	lcli/src/generate_bootnode_enr.rs

* Fix CI failures after merge.

* Batch sampling requests by peer (#6256)

* Batch sampling requests by peer

* Fix clippy errors

* Fix tests

* Add column_index to error message for ease of tracing

* Remove outdated comment

* Fix range sync never evaluating request as finished, causing it to get stuck. (#6276)

* Merge branch 'unstable' into das-0821-merge

# Conflicts:
#	Cargo.lock
#	Cargo.toml
#	beacon_node/beacon_chain/src/beacon_chain.rs
#	beacon_node/beacon_chain/src/data_availability_checker.rs
#	beacon_node/beacon_chain/src/data_availability_checker/overflow_lru_cache.rs
#	beacon_node/beacon_chain/src/data_column_verification.rs
#	beacon_node/beacon_chain/src/kzg_utils.rs
#	beacon_node/beacon_chain/src/metrics.rs
#	beacon_node/beacon_processor/src/lib.rs
#	beacon_node/lighthouse_network/src/rpc/codec/ssz_snappy.rs
#	beacon_node/lighthouse_network/src/rpc/config.rs
#	beacon_node/lighthouse_network/src/rpc/methods.rs
#	beacon_node/lighthouse_network/src/rpc/outbound.rs
#	beacon_node/lighthouse_network/src/rpc/rate_limiter.rs
#	beacon_node/lighthouse_network/src/service/api_types.rs
#	beacon_node/lighthouse_network/src/types/globals.rs
#	beacon_node/network/src/network_beacon_processor/mod.rs
#	beacon_node/network/src/network_beacon_processor/rpc_methods.rs
#	beacon_node/network/src/network_beacon_processor/sync_methods.rs
#	beacon_node/network/src/sync/block_lookups/common.rs
#	beacon_node/network/src/sync/block_lookups/mod.rs
#	beacon_node/network/src/sync/block_lookups/single_block_lookup.rs
#	beacon_node/network/src/sync/block_lookups/tests.rs
#	beacon_node/network/src/sync/manager.rs
#	beacon_node/network/src/sync/network_context.rs
#	consensus/types/src/data_column_sidecar.rs
#	crypto/kzg/Cargo.toml
#	crypto/kzg/benches/benchmark.rs
#	crypto/kzg/src/lib.rs

* Fix custody tests and load PeerDAS KZG instead.

* Fix ef tests and bench compilation.

* Fix failing sampling test.

* Merge pull request #6287 from jimmygchen/das-0821-merge

Merge `unstable` into `das` 20240821

* Remove get_block_import_status

* Merge branch 'unstable' into das

* Re-enable Windows release tests.

* Address some review comments.

* Address more review comments and cleanups.

* Comment out peer DAS KZG EF tests for now

* Address more review comments and fix build.

* Merge branch 'das' of github.com:sigp/lighthouse into das

* Unignore Electra tests

* Fix metric name

* Address some of Pawan's review comments

* Merge remote-tracking branch 'origin/unstable' into das

* Update PeerDAS network parameters for peerdas-devnet-2  (#6290)

* update subnet count & custody req

* das network params

* update ef tests

---------

Co-authored-by: Jimmy Chen <jchen.tc@gmail.com>
This commit is contained in:
Lion - dapplion
2024-08-27 06:10:22 +02:00
committed by GitHub
parent e09fe5a372
commit f75a2cf65b
96 changed files with 5006 additions and 613 deletions

View File

@@ -1,43 +1,53 @@
//! Provides network functionality for the Syncing thread. This fundamentally wraps a network
//! channel and stores a global RPC ID to perform requests.
use self::custody::{ActiveCustodyRequest, Error as CustodyRequestError};
use self::requests::{ActiveBlobsByRootRequest, ActiveBlocksByRootRequest};
pub use self::requests::{BlobsByRootSingleBlockRequest, BlocksByRootSingleRequest};
use super::block_sidecar_coupling::BlocksAndBlobsRequestInfo;
pub use self::requests::{BlocksByRootSingleRequest, DataColumnsByRootSingleBlockRequest};
use super::block_sidecar_coupling::RangeBlockComponentsRequest;
use super::manager::BlockProcessType;
use super::range_sync::{BatchId, ByRangeRequestType, ChainId};
use crate::metrics;
use crate::network_beacon_processor::NetworkBeaconProcessor;
use crate::service::NetworkMessage;
use crate::status::ToStatusMessage;
use crate::sync::block_lookups::SingleLookupId;
use crate::sync::manager::BlockProcessType;
use crate::sync::network_context::requests::BlobsByRootSingleBlockRequest;
use beacon_chain::block_verification_types::RpcBlock;
use beacon_chain::{BeaconChain, BeaconChainTypes, BlockProcessStatus, EngineState};
use fnv::FnvHashMap;
use lighthouse_network::rpc::methods::BlobsByRangeRequest;
use lighthouse_network::rpc::methods::{BlobsByRangeRequest, DataColumnsByRangeRequest};
use lighthouse_network::rpc::{BlocksByRangeRequest, GoodbyeReason, RPCError};
use lighthouse_network::service::api_types::{
AppRequestId, DataColumnsByRootRequestId, Id, SingleLookupReqId, SyncRequestId,
AppRequestId, CustodyId, CustodyRequester, DataColumnsByRootRequestId,
DataColumnsByRootRequester, Id, SingleLookupReqId, SyncRequestId,
};
use lighthouse_network::{Client, NetworkGlobals, PeerAction, PeerId, ReportSource, Request};
use rand::seq::SliceRandom;
use rand::thread_rng;
use requests::ActiveDataColumnsByRootRequest;
pub use requests::LookupVerifyError;
use requests::{ActiveDataColumnsByRootRequest, DataColumnsByRootSingleBlockRequest};
use slog::{debug, error, trace, warn};
use slog::{debug, error, warn};
use slot_clock::SlotClock;
use std::collections::hash_map::Entry;
use std::collections::HashMap;
use std::sync::Arc;
use std::time::Duration;
use tokio::sync::mpsc;
use types::blob_sidecar::FixedBlobSidecarList;
use types::{
BlobSidecar, DataColumnSidecar, DataColumnSidecarList, EthSpec, Hash256, SignedBeaconBlock,
BlobSidecar, ColumnIndex, DataColumnSidecar, DataColumnSidecarList, EthSpec, Hash256,
SignedBeaconBlock, Slot,
};
pub mod custody;
mod requests;
pub struct BlocksAndBlobsByRangeResponse<E: EthSpec> {
pub sender_id: RangeRequestId,
pub responses: Result<Vec<RpcBlock<E>>, String>,
pub request_type: ByRangeRequestType,
pub expects_blobs: bool,
pub expects_custody_columns: Option<Vec<ColumnIndex>>,
}
#[derive(Debug, Clone, Copy)]
@@ -60,15 +70,20 @@ pub enum RpcEvent<T> {
pub type RpcResponseResult<T> = Result<(T, Duration), RpcResponseError>;
#[derive(Debug)]
pub enum RpcResponseError {
RpcError(RPCError),
VerifyError(LookupVerifyError),
CustodyRequestError(CustodyRequestError),
}
#[derive(Debug, PartialEq, Eq)]
pub enum RpcRequestSendError {
/// Network channel send failed
NetworkSendError,
NoCustodyPeers,
CustodyRequestError(custody::Error),
SlotClockError,
}
#[derive(Debug, PartialEq, Eq)]
@@ -82,6 +97,7 @@ impl std::fmt::Display for RpcResponseError {
match self {
RpcResponseError::RpcError(e) => write!(f, "RPC Error: {:?}", e),
RpcResponseError::VerifyError(e) => write!(f, "Lookup Verify Error: {:?}", e),
RpcResponseError::CustodyRequestError(e) => write!(f, "Custody Request Error: {:?}", e),
}
}
}
@@ -98,6 +114,31 @@ impl From<LookupVerifyError> for RpcResponseError {
}
}
/// Represents a group of peers that served a block component.
#[derive(Clone, Debug)]
pub struct PeerGroup {
/// Peers group by which indexed section of the block component they served. For example:
/// - PeerA served = [blob index 0, blob index 2]
/// - PeerA served = [blob index 1]
peers: HashMap<PeerId, Vec<usize>>,
}
impl PeerGroup {
/// Return a peer group where a single peer returned all parts of a block component. For
/// example, a block has a single component (the block = index 0/1).
pub fn from_single(peer: PeerId) -> Self {
Self {
peers: HashMap::from_iter([(peer, vec![0])]),
}
}
pub fn from_set(peers: HashMap<PeerId, Vec<usize>>) -> Self {
Self { peers }
}
pub fn all(&self) -> impl Iterator<Item = &PeerId> + '_ {
self.peers.keys()
}
}
/// Sequential ID that uniquely identifies ReqResp outgoing requests
pub type ReqId = u32;
@@ -128,13 +169,16 @@ pub struct SyncNetworkContext<T: BeaconChainTypes> {
/// A mapping of active BlobsByRoot requests, including both current slot and parent lookups.
blobs_by_root_requests: FnvHashMap<SingleLookupReqId, ActiveBlobsByRootRequest<T::EthSpec>>,
/// Mapping of active custody column requests for a block root
custody_by_root_requests: FnvHashMap<CustodyRequester, ActiveCustodyRequest<T>>,
/// A mapping of active DataColumnsByRoot requests
data_columns_by_root_requests:
FnvHashMap<DataColumnsByRootRequestId, ActiveDataColumnsByRootRequest<T::EthSpec>>,
/// BlocksByRange requests paired with BlobsByRange
range_blocks_and_blobs_requests:
FnvHashMap<Id, (RangeRequestId, BlocksAndBlobsRequestInfo<T::EthSpec>)>,
range_block_components_requests:
FnvHashMap<Id, (RangeRequestId, RangeBlockComponentsRequest<T::EthSpec>)>,
/// Whether the ee is online. If it's not, we don't allow access to the
/// `beacon_processor_send`.
@@ -153,6 +197,7 @@ pub struct SyncNetworkContext<T: BeaconChainTypes> {
pub enum BlockOrBlob<E: EthSpec> {
Block(Option<Arc<SignedBeaconBlock<E>>>),
Blob(Option<Arc<BlobSidecar<E>>>),
CustodyColumns(Option<Arc<DataColumnSidecar<E>>>),
}
impl<E: EthSpec> From<Option<Arc<SignedBeaconBlock<E>>>> for BlockOrBlob<E> {
@@ -181,7 +226,8 @@ impl<T: BeaconChainTypes> SyncNetworkContext<T> {
blocks_by_root_requests: <_>::default(),
blobs_by_root_requests: <_>::default(),
data_columns_by_root_requests: <_>::default(),
range_blocks_and_blobs_requests: FnvHashMap::default(),
custody_by_root_requests: <_>::default(),
range_block_components_requests: FnvHashMap::default(),
network_beacon_processor,
chain,
log,
@@ -191,10 +237,10 @@ impl<T: BeaconChainTypes> SyncNetworkContext<T> {
/// Returns the ids of all the requests made to the given peer_id.
pub fn peer_disconnected(&mut self, peer_id: &PeerId) -> Vec<SyncRequestId> {
let failed_range_ids =
self.range_blocks_and_blobs_requests
self.range_block_components_requests
.iter()
.filter_map(|(id, request)| {
if request.1.peer_id == *peer_id {
if request.1.peer_ids.contains(peer_id) {
Some(SyncRequestId::RangeBlockAndBlobs { id: *id })
} else {
None
@@ -239,6 +285,17 @@ impl<T: BeaconChainTypes> SyncNetworkContext<T> {
.collect()
}
pub fn get_custodial_peers(&self, column_index: ColumnIndex) -> Vec<PeerId> {
self.network_globals()
.custody_peers_for_column(column_index)
}
pub fn get_random_custodial_peer(&self, column_index: ColumnIndex) -> Option<PeerId> {
self.get_custodial_peers(column_index)
.choose(&mut thread_rng())
.cloned()
}
pub fn network_globals(&self) -> &NetworkGlobals<T::EthSpec> {
&self.network_beacon_processor.network_globals
}
@@ -277,19 +334,23 @@ impl<T: BeaconChainTypes> SyncNetworkContext<T> {
}
}
/// A blocks by range request for the range sync algorithm.
pub fn blocks_by_range_request(
/// A blocks by range request sent by the range sync algorithm
pub fn block_components_by_range_request(
&mut self,
peer_id: PeerId,
batch_type: ByRangeRequestType,
request: BlocksByRangeRequest,
sender_id: RangeRequestId,
) -> Result<Id, RpcRequestSendError> {
let epoch = Slot::new(*request.start_slot()).epoch(T::EthSpec::slots_per_epoch());
let id = self.next_id();
trace!(
let mut requested_peers = vec![peer_id];
debug!(
self.log,
"Sending BlocksByRange request";
"method" => "BlocksByRange",
"count" => request.count(),
"epoch" => epoch,
"peer" => %peer_id,
);
self.network_send
@@ -300,12 +361,13 @@ impl<T: BeaconChainTypes> SyncNetworkContext<T> {
})
.map_err(|_| RpcRequestSendError::NetworkSendError)?;
if matches!(batch_type, ByRangeRequestType::BlocksAndBlobs) {
let expected_blobs = if matches!(batch_type, ByRangeRequestType::BlocksAndBlobs) {
debug!(
self.log,
"Sending BlobsByRange requests";
"method" => "BlobsByRange",
"count" => request.count(),
"epoch" => epoch,
"peer" => %peer_id,
);
@@ -320,33 +382,94 @@ impl<T: BeaconChainTypes> SyncNetworkContext<T> {
request_id: AppRequestId::Sync(SyncRequestId::RangeBlockAndBlobs { id }),
})
.map_err(|_| RpcRequestSendError::NetworkSendError)?;
}
true
} else {
false
};
let (expects_custody_columns, num_of_custody_column_req) =
if matches!(batch_type, ByRangeRequestType::BlocksAndColumns) {
let custody_indexes = self.network_globals().custody_columns();
let mut num_of_custody_column_req = 0;
for (peer_id, columns_by_range_request) in
self.make_columns_by_range_requests(request, &custody_indexes)?
{
requested_peers.push(peer_id);
debug!(
self.log,
"Sending DataColumnsByRange requests";
"method" => "DataColumnsByRange",
"count" => columns_by_range_request.count,
"epoch" => epoch,
"columns" => ?columns_by_range_request.columns,
"peer" => %peer_id,
);
self.send_network_msg(NetworkMessage::SendRequest {
peer_id,
request: Request::DataColumnsByRange(columns_by_range_request),
request_id: AppRequestId::Sync(SyncRequestId::RangeBlockAndBlobs { id }),
})
.map_err(|_| RpcRequestSendError::NetworkSendError)?;
num_of_custody_column_req += 1;
}
(Some(custody_indexes), Some(num_of_custody_column_req))
} else {
(None, None)
};
let info = RangeBlockComponentsRequest::new(
expected_blobs,
expects_custody_columns,
num_of_custody_column_req,
requested_peers,
);
self.range_block_components_requests
.insert(id, (sender_id, info));
Ok(id)
}
/// A blocks by range request sent by the range sync algorithm
pub fn blocks_and_blobs_by_range_request(
&mut self,
peer_id: PeerId,
batch_type: ByRangeRequestType,
fn make_columns_by_range_requests(
&self,
request: BlocksByRangeRequest,
sender_id: RangeRequestId,
) -> Result<Id, RpcRequestSendError> {
let id = self.blocks_by_range_request(peer_id, batch_type, request)?;
self.range_blocks_and_blobs_requests.insert(
id,
(
sender_id,
BlocksAndBlobsRequestInfo::new(batch_type, peer_id),
),
);
Ok(id)
custody_indexes: &Vec<ColumnIndex>,
) -> Result<HashMap<PeerId, DataColumnsByRangeRequest>, RpcRequestSendError> {
let mut peer_id_to_request_map = HashMap::new();
for column_index in custody_indexes {
// TODO(das): The peer selection logic here needs to be improved - we should probably
// avoid retrying from failed peers, however `BatchState` currently only tracks the peer
// serving the blocks.
let Some(custody_peer) = self.get_random_custodial_peer(*column_index) else {
// TODO(das): this will be pretty bad UX. To improve we should:
// - Attempt to fetch custody requests first, before requesting blocks
// - Handle the no peers case gracefully, maybe add some timeout and give a few
// minutes / seconds to the peer manager to locate peers on this subnet before
// abandoing progress on the chain completely.
return Err(RpcRequestSendError::NoCustodyPeers);
};
let columns_by_range_request = peer_id_to_request_map
.entry(custody_peer)
.or_insert_with(|| DataColumnsByRangeRequest {
start_slot: *request.start_slot(),
count: *request.count(),
columns: vec![],
});
columns_by_range_request.columns.push(*column_index);
}
Ok(peer_id_to_request_map)
}
pub fn range_request_failed(&mut self, request_id: Id) -> Option<RangeRequestId> {
let sender_id = self
.range_blocks_and_blobs_requests
.range_block_components_requests
.remove(&request_id)
.map(|(sender_id, _info)| sender_id);
if let Some(sender_id) = sender_id {
@@ -370,7 +493,7 @@ impl<T: BeaconChainTypes> SyncNetworkContext<T> {
request_id: Id,
block_or_blob: BlockOrBlob<T::EthSpec>,
) -> Option<BlocksAndBlobsByRangeResponse<T::EthSpec>> {
let Entry::Occupied(mut entry) = self.range_blocks_and_blobs_requests.entry(request_id)
let Entry::Occupied(mut entry) = self.range_block_components_requests.entry(request_id)
else {
metrics::inc_counter_vec(&metrics::SYNC_UNKNOWN_NETWORK_REQUESTS, &["range_blocks"]);
return None;
@@ -380,15 +503,17 @@ impl<T: BeaconChainTypes> SyncNetworkContext<T> {
match block_or_blob {
BlockOrBlob::Block(maybe_block) => info.add_block_response(maybe_block),
BlockOrBlob::Blob(maybe_sidecar) => info.add_sidecar_response(maybe_sidecar),
BlockOrBlob::CustodyColumns(column) => info.add_data_column(column),
}
if info.is_finished() {
// If the request is finished, dequeue everything
let (sender_id, info) = entry.remove();
let request_type = info.get_request_type();
let (expects_blobs, expects_custody_columns) = info.get_requirements();
Some(BlocksAndBlobsByRangeResponse {
sender_id,
request_type,
responses: info.into_responses(),
responses: info.into_responses(&self.chain.spec),
expects_blobs,
expects_custody_columns,
})
} else {
None
@@ -470,6 +595,21 @@ impl<T: BeaconChainTypes> SyncNetworkContext<T> {
block_root: Hash256,
downloaded_block: Option<Arc<SignedBeaconBlock<T::EthSpec>>>,
) -> Result<LookupRequestResult, RpcRequestSendError> {
// Check if we are into deneb, and before peerdas
if !self
.chain
.data_availability_checker
.blobs_required_for_epoch(
// TODO(das): use the block's slot
self.chain
.slot_clock
.now_or_genesis()
.ok_or(RpcRequestSendError::SlotClockError)?
.epoch(T::EthSpec::slots_per_epoch()),
)
{
return Ok(LookupRequestResult::NoRequestNeeded);
}
let Some(block) = downloaded_block.or_else(|| {
// If the block is already being processed or fully validated, retrieve how many blobs
// it expects. Consider any stage of the block. If the block root has been validated, we
@@ -553,7 +693,7 @@ impl<T: BeaconChainTypes> SyncNetworkContext<T> {
/// Request to send a single `data_columns_by_root` request to the network.
pub fn data_column_lookup_request(
&mut self,
requester: SingleLookupReqId,
requester: DataColumnsByRootRequester,
peer_id: PeerId,
request: DataColumnsByRootSingleBlockRequest,
) -> Result<LookupRequestResult<DataColumnsByRootRequestId>, &'static str> {
@@ -627,7 +767,7 @@ impl<T: BeaconChainTypes> SyncNetworkContext<T> {
.unwrap_or_default();
// TODO(das): figure out how to pass block.slot if we end up doing rotation
let custody_indexes_duty = self.network_globals().custody_columns(&self.chain.spec);
let custody_indexes_duty = self.network_globals().custody_columns();
// Include only the blob indexes not yet imported (received through gossip)
let custody_indexes_to_fetch = custody_indexes_duty
@@ -651,10 +791,28 @@ impl<T: BeaconChainTypes> SyncNetworkContext<T> {
"id" => ?id
);
// TODO(das): Issue a custody request with `id` for the set of columns
// `custody_indexes_to_fetch` and block `block_root`.
let requester = CustodyRequester(id);
let mut request = ActiveCustodyRequest::new(
block_root,
// TODO(das): req_id is duplicated here, also present in id
CustodyId { requester, req_id },
&custody_indexes_to_fetch,
self.log.clone(),
);
Ok(LookupRequestResult::RequestSent(req_id))
// TODO(das): start request
// Note that you can only send, but not handle a response here
match request.continue_requests(self) {
Ok(_) => {
// Ignoring the result of `continue_requests` is okay. A request that has just been
// created cannot return data immediately, it must send some request to the network
// first. And there must exist some request, `custody_indexes_to_fetch` is not empty.
self.custody_by_root_requests.insert(requester, request);
Ok(LookupRequestResult::RequestSent(req_id))
}
// TODO(das): handle this error properly
Err(e) => Err(RpcRequestSendError::CustodyRequestError(e)),
}
}
pub fn is_execution_engine_online(&self) -> bool {
@@ -738,12 +896,18 @@ impl<T: BeaconChainTypes> SyncNetworkContext<T> {
"To deal with alignment with deneb boundaries, batches need to be of just one epoch"
);
if let Some(data_availability_boundary) = self.chain.data_availability_boundary() {
if epoch >= data_availability_boundary {
ByRangeRequestType::BlocksAndBlobs
} else {
ByRangeRequestType::Blocks
}
if self
.chain
.data_availability_checker
.data_columns_required_for_epoch(epoch)
{
ByRangeRequestType::BlocksAndColumns
} else if self
.chain
.data_availability_checker
.blobs_required_for_epoch(epoch)
{
ByRangeRequestType::BlocksAndBlobs
} else {
ByRangeRequestType::Blocks
}
@@ -753,9 +917,9 @@ impl<T: BeaconChainTypes> SyncNetworkContext<T> {
&mut self,
id: Id,
sender_id: RangeRequestId,
info: BlocksAndBlobsRequestInfo<T::EthSpec>,
info: RangeBlockComponentsRequest<T::EthSpec>,
) {
self.range_blocks_and_blobs_requests
self.range_block_components_requests
.insert(id, (sender_id, info));
}
@@ -853,7 +1017,7 @@ impl<T: BeaconChainTypes> SyncNetworkContext<T> {
pub fn on_data_columns_by_root_response(
&mut self,
id: DataColumnsByRootRequestId,
peer_id: PeerId,
_peer_id: PeerId,
rpc_event: RpcEvent<Arc<DataColumnSidecar<T::EthSpec>>>,
) -> Option<RpcResponseResult<Vec<Arc<DataColumnSidecar<T::EthSpec>>>>> {
let Entry::Occupied(mut request) = self.data_columns_by_root_requests.entry(id) else {
@@ -885,8 +1049,10 @@ impl<T: BeaconChainTypes> SyncNetworkContext<T> {
// catch if a peer is returning more columns than requested or if the excess blobs are
// invalid.
Err((e, resolved)) => {
if let RpcResponseError::VerifyError(e) = &e {
self.report_peer(peer_id, PeerAction::LowToleranceError, e.into());
if let RpcResponseError::VerifyError(_e) = &e {
// TODO(das): this is a bug, we should not penalise peer in this case.
// confirm this can be removed.
// self.report_peer(peer_id, PeerAction::LowToleranceError, e.into());
}
if resolved {
None
@@ -897,6 +1063,53 @@ impl<T: BeaconChainTypes> SyncNetworkContext<T> {
}
}
/// Insert a downloaded column into an active custody request. Then make progress on the
/// entire request.
///
/// ### Returns
///
/// - `Some`: Request completed, won't make more progress. Expect requester to act on the result.
/// - `None`: Request still active, requester should do no action
#[allow(clippy::type_complexity)]
pub fn on_custody_by_root_response(
&mut self,
id: CustodyId,
req_id: DataColumnsByRootRequestId,
peer_id: PeerId,
resp: RpcResponseResult<Vec<Arc<DataColumnSidecar<T::EthSpec>>>>,
) -> Option<Result<(DataColumnSidecarList<T::EthSpec>, PeerGroup), RpcResponseError>> {
// Note: need to remove the request to borrow self again below. Otherwise we can't
// do nested requests
let Some(mut request) = self.custody_by_root_requests.remove(&id.requester) else {
// TOOD(das): This log can happen if the request is error'ed early and dropped
debug!(self.log, "Custody column downloaded event for unknown request"; "id" => ?id);
return None;
};
let result = request
.on_data_column_downloaded(peer_id, req_id, resp, self)
.map_err(RpcResponseError::CustodyRequestError)
.transpose();
// Convert a result from internal format of `ActiveCustodyRequest` (error first to use ?) to
// an Option first to use in an `if let Some() { act on result }` block.
if let Some(result) = result {
match result.as_ref() {
Ok((columns, peer_group)) => {
debug!(self.log, "Custody request success, removing"; "id" => ?id, "count" => columns.len(), "peers" => ?peer_group)
}
Err(e) => {
debug!(self.log, "Custody request failure, removing"; "id" => ?id, "error" => ?e)
}
}
Some(result)
} else {
self.custody_by_root_requests.insert(id.requester, request);
None
}
}
pub fn send_block_for_processing(
&self,
id: Id,
@@ -961,22 +1174,28 @@ impl<T: BeaconChainTypes> SyncNetworkContext<T> {
pub fn send_custody_columns_for_processing(
&self,
id: Id,
_id: Id,
block_root: Hash256,
_custody_columns: DataColumnSidecarList<T::EthSpec>,
_duration: Duration,
custody_columns: DataColumnSidecarList<T::EthSpec>,
duration: Duration,
process_type: BlockProcessType,
) -> Result<(), SendErrorProcessor> {
let _beacon_processor = self
let beacon_processor = self
.beacon_processor_if_enabled()
.ok_or(SendErrorProcessor::ProcessorNotAvailable)?;
debug!(self.log, "Sending custody columns for processing"; "block" => ?block_root, "id" => id);
debug!(self.log, "Sending custody columns for processing"; "block" => ?block_root, "process_type" => ?process_type);
// Lookup sync event safety: If `beacon_processor.send_rpc_custody_columns` returns Ok() sync
// must receive a single `SyncMessage::BlockComponentProcessed` event with this process type
//
// TODO(das): After merging processor import PR, actually send columns to beacon processor.
Ok(())
beacon_processor
.send_rpc_custody_columns(block_root, custody_columns, duration, process_type)
.map_err(|e| {
error!(
self.log,
"Failed to send sync custody columns to processor";
"error" => ?e
);
SendErrorProcessor::SendError
})
}
pub(crate) fn register_metrics(&self) {
@@ -993,7 +1212,7 @@ impl<T: BeaconChainTypes> SyncNetworkContext<T> {
metrics::set_gauge_vec(
&metrics::SYNC_ACTIVE_NETWORK_REQUESTS,
&["range_blocks"],
self.range_blocks_and_blobs_requests.len() as i64,
self.range_block_components_requests.len() as i64,
);
}
}