Single blob lookups (#4152)

* some blob reprocessing work

* remove ForceBlockLookup

* reorder enum match arms in sync manager

* a lot more reprocessing work

* impl logic for triggerng blob lookups along with block lookups

* deal with rpc blobs in groups per block in the da checker. don't cache missing blob ids in the da checker.

* make single block lookup generic

* more work

* add delayed processing logic and combine some requests

* start fixing some compile errors

* fix compilation in main block lookup mod

* much work

* get things compiling

* parent blob lookups

* fix compile

* revert red/stevie changes

* fix up sync manager delay message logic

* add peer usefulness enum

* should remove lookup refactor

* consolidate retry error handling

* improve peer scoring during certain failures in parent lookups

* improve retry code

* drop parent lookup if either req has a peer disconnect during download

* refactor single block processed method

* processing peer refactor

* smol bugfix

* fix some todos

* fix lints

* fix lints

* fix compile in lookup tests

* fix lints

* fix lints

* fix existing block lookup tests

* renamings

* fix after merge

* cargo fmt

* compilation fix in beacon chain tests

* fix

* refactor lookup tests to work with multiple forks and response types

* make tests into macros

* wrap availability check error

* fix compile after merge

* add random blobs

* start fixing up lookup verify error handling

* some bug fixes and the start of deneb only tests

* make tests work for all forks

* track information about peer source

* error refactoring

* improve peer scoring

* fix test compilation

* make sure blobs are sent for processing after stream termination, delete copied tests

* add some tests and fix a bug

* smol bugfixes and moar tests

* add tests and fix some things

* compile after merge

* lots of refactoring

* retry on invalid block/blob

* merge unknown parent messages before current slot lookup

* get tests compiling

* penalize blob peer on invalid blobs

* Check disk on in-memory cache miss

* Update beacon_node/beacon_chain/src/data_availability_checker/overflow_lru_cache.rs

* Update beacon_node/network/src/sync/network_context.rs

Co-authored-by: Divma <26765164+divagant-martian@users.noreply.github.com>

* fix bug in matching blocks and blobs in range sync

* pr feedback

* fix conflicts

* upgrade logs from warn to crit when we receive incorrect response in range

* synced_and_connected_within_tolerance -> should_search_for_block

* remove todo

* Fix Broken Overflow Tests

* fix merge conflicts

* checkpoint sync without alignment

* add import

* query for checkpoint state by slot rather than state root (teku doesn't serve by state root)

* get state first and query by most recent block root

* simplify delay logic

* rename unknown parent sync message variants

* rename parameter, block_slot -> slot

* add some docs to the lookup module

* use interval instead of sleep

* drop request if blocks and blobs requests both return `None` for `Id`

* clean up `find_single_lookup` logic

* add lookup source enum

* clean up `find_single_lookup` logic

* add docs to find_single_lookup_request

* move LookupSource our of param where unnecessary

* remove unnecessary todo

* query for block by `state.latest_block_header.slot`

* fix lint

* fix test

* fix test

* fix observed  blob sidecars test

* PR updates

* use optional params instead of a closure

* create lookup and trigger request in separate method calls

* remove `LookupSource`

* make sure duplicate lookups are not dropped

---------

Co-authored-by: Pawan Dhananjay <pawandhananjay@gmail.com>
Co-authored-by: Mark Mackey <mark@sigmaprime.io>
Co-authored-by: Divma <26765164+divagant-martian@users.noreply.github.com>
This commit is contained in:
realbigsean
2023-06-15 12:59:10 -04:00
committed by GitHub
parent 5428e68943
commit a62e52f319
47 changed files with 4981 additions and 1309 deletions

View File

@@ -65,6 +65,7 @@ use std::{cmp, collections::HashSet};
use task_executor::TaskExecutor;
use tokio::sync::mpsc;
use tokio::sync::mpsc::error::TrySendError;
use types::blob_sidecar::FixedBlobSidecarList;
use types::{
Attestation, AttesterSlashing, Hash256, LightClientFinalityUpdate, LightClientOptimisticUpdate,
ProposerSlashing, SignedAggregateAndProof, SignedBeaconBlock, SignedBlobSidecar,
@@ -121,9 +122,9 @@ const MAX_AGGREGATED_ATTESTATION_REPROCESS_QUEUE_LEN: usize = 1_024;
/// before we start dropping them.
const MAX_GOSSIP_BLOCK_QUEUE_LEN: usize = 1_024;
/// The maximum number of queued `SignedBeaconBlockAndBlobsSidecar` objects received on gossip that
/// The maximum number of queued `SignedBlobSidecar` objects received on gossip that
/// will be stored before we start dropping them.
const MAX_GOSSIP_BLOCK_AND_BLOB_QUEUE_LEN: usize = 1_024;
const MAX_GOSSIP_BLOB_QUEUE_LEN: usize = 1_024;
/// The maximum number of queued `SignedBeaconBlock` objects received prior to their slot (but
/// within acceptable clock disparity) that will be queued before we start dropping them.
@@ -164,6 +165,7 @@ const MAX_SYNC_CONTRIBUTION_QUEUE_LEN: usize = 1024;
/// The maximum number of queued `SignedBeaconBlock` objects received from the network RPC that
/// will be stored before we start dropping them.
const MAX_RPC_BLOCK_QUEUE_LEN: usize = 1_024;
const MAX_RPC_BLOB_QUEUE_LEN: usize = 1_024 * 4;
/// The maximum number of queued `Vec<SignedBeaconBlock>` objects received during syncing that will
/// be stored before we start dropping them.
@@ -233,6 +235,7 @@ pub const GOSSIP_SYNC_CONTRIBUTION: &str = "gossip_sync_contribution";
pub const GOSSIP_LIGHT_CLIENT_FINALITY_UPDATE: &str = "light_client_finality_update";
pub const GOSSIP_LIGHT_CLIENT_OPTIMISTIC_UPDATE: &str = "light_client_optimistic_update";
pub const RPC_BLOCK: &str = "rpc_block";
pub const RPC_BLOB: &str = "rpc_blob";
pub const CHAIN_SEGMENT: &str = "chain_segment";
pub const CHAIN_SEGMENT_BACKFILL: &str = "chain_segment_backfill";
pub const STATUS_PROCESSING: &str = "status_processing";
@@ -628,6 +631,23 @@ impl<T: BeaconChainTypes> WorkEvent<T> {
}
}
pub fn rpc_blobs(
block_root: Hash256,
blobs: FixedBlobSidecarList<T::EthSpec>,
seen_timestamp: Duration,
process_type: BlockProcessType,
) -> Self {
Self {
drop_during_sync: false,
work: Work::RpcBlobs {
block_root,
blobs,
seen_timestamp,
process_type,
},
}
}
/// Create a new work event to import `blocks` as a beacon chain segment.
pub fn chain_segment(
process_id: ChainSegmentProcessId,
@@ -927,6 +947,12 @@ pub enum Work<T: BeaconChainTypes> {
process_type: BlockProcessType,
should_process: bool,
},
RpcBlobs {
block_root: Hash256,
blobs: FixedBlobSidecarList<T::EthSpec>,
seen_timestamp: Duration,
process_type: BlockProcessType,
},
ChainSegment {
process_id: ChainSegmentProcessId,
blocks: Vec<BlockWrapper<T::EthSpec>>,
@@ -986,6 +1012,7 @@ impl<T: BeaconChainTypes> Work<T> {
Work::GossipLightClientFinalityUpdate { .. } => GOSSIP_LIGHT_CLIENT_FINALITY_UPDATE,
Work::GossipLightClientOptimisticUpdate { .. } => GOSSIP_LIGHT_CLIENT_OPTIMISTIC_UPDATE,
Work::RpcBlock { .. } => RPC_BLOCK,
Work::RpcBlobs { .. } => RPC_BLOB,
Work::ChainSegment {
process_id: ChainSegmentProcessId::BackSyncBatchId { .. },
..
@@ -1148,11 +1175,11 @@ impl<T: BeaconChainTypes> BeaconProcessor<T> {
// Using a FIFO queue since blocks need to be imported sequentially.
let mut rpc_block_queue = FifoQueue::new(MAX_RPC_BLOCK_QUEUE_LEN);
let mut rpc_blob_queue = FifoQueue::new(MAX_RPC_BLOB_QUEUE_LEN);
let mut chain_segment_queue = FifoQueue::new(MAX_CHAIN_SEGMENT_QUEUE_LEN);
let mut backfill_chain_segment = FifoQueue::new(MAX_CHAIN_SEGMENT_QUEUE_LEN);
let mut gossip_block_queue = FifoQueue::new(MAX_GOSSIP_BLOCK_QUEUE_LEN);
let mut gossip_block_and_blobs_sidecar_queue =
FifoQueue::new(MAX_GOSSIP_BLOCK_AND_BLOB_QUEUE_LEN);
let mut gossip_blob_queue = FifoQueue::new(MAX_GOSSIP_BLOB_QUEUE_LEN);
let mut delayed_block_queue = FifoQueue::new(MAX_DELAYED_BLOCK_QUEUE_LEN);
let mut status_queue = FifoQueue::new(MAX_STATUS_QUEUE_LEN);
@@ -1302,6 +1329,8 @@ impl<T: BeaconChainTypes> BeaconProcessor<T> {
// evolves.
} else if let Some(item) = rpc_block_queue.pop() {
self.spawn_worker(item, toolbox);
} else if let Some(item) = rpc_blob_queue.pop() {
self.spawn_worker(item, toolbox);
// Check delayed blocks before gossip blocks, the gossip blocks might rely
// on the delayed ones.
} else if let Some(item) = delayed_block_queue.pop() {
@@ -1310,7 +1339,7 @@ impl<T: BeaconChainTypes> BeaconProcessor<T> {
// required to verify some attestations.
} else if let Some(item) = gossip_block_queue.pop() {
self.spawn_worker(item, toolbox);
} else if let Some(item) = gossip_block_and_blobs_sidecar_queue.pop() {
} else if let Some(item) = gossip_blob_queue.pop() {
self.spawn_worker(item, toolbox);
// Check the aggregates, *then* the unaggregates since we assume that
// aggregates are more valuable to local validators and effectively give us
@@ -1526,7 +1555,7 @@ impl<T: BeaconChainTypes> BeaconProcessor<T> {
gossip_block_queue.push(work, work_id, &self.log)
}
Work::GossipSignedBlobSidecar { .. } => {
gossip_block_and_blobs_sidecar_queue.push(work, work_id, &self.log)
gossip_blob_queue.push(work, work_id, &self.log)
}
Work::DelayedImportBlock { .. } => {
delayed_block_queue.push(work, work_id, &self.log)
@@ -1551,6 +1580,7 @@ impl<T: BeaconChainTypes> BeaconProcessor<T> {
optimistic_update_queue.push(work, work_id, &self.log)
}
Work::RpcBlock { .. } => rpc_block_queue.push(work, work_id, &self.log),
Work::RpcBlobs { .. } => rpc_blob_queue.push(work, work_id, &self.log),
Work::ChainSegment { ref process_id, .. } => match process_id {
ChainSegmentProcessId::RangeBatchId { .. }
| ChainSegmentProcessId::ParentLookup { .. } => {
@@ -1620,6 +1650,10 @@ impl<T: BeaconChainTypes> BeaconProcessor<T> {
&metrics::BEACON_PROCESSOR_RPC_BLOCK_QUEUE_TOTAL,
rpc_block_queue.len() as i64,
);
metrics::set_gauge(
&metrics::BEACON_PROCESSOR_RPC_BLOB_QUEUE_TOTAL,
rpc_blob_queue.len() as i64,
);
metrics::set_gauge(
&metrics::BEACON_PROCESSOR_CHAIN_SEGMENT_QUEUE_TOTAL,
chain_segment_queue.len() as i64,
@@ -1977,6 +2011,17 @@ impl<T: BeaconChainTypes> BeaconProcessor<T> {
duplicate_cache,
should_process,
)),
Work::RpcBlobs {
block_root,
blobs,
seen_timestamp,
process_type,
} => task_spawner.spawn_async(worker.process_rpc_blobs(
block_root,
blobs,
seen_timestamp,
process_type,
)),
/*
* Verification for a chain segment (multiple blocks).
*/

View File

@@ -14,8 +14,7 @@ use super::MAX_SCHEDULED_WORK_QUEUE_LEN;
use crate::beacon_processor::{ChainSegmentProcessId, Work, WorkEvent};
use crate::metrics;
use crate::sync::manager::BlockProcessType;
use beacon_chain::blob_verification::AsBlock;
use beacon_chain::blob_verification::BlockWrapper;
use beacon_chain::blob_verification::{AsBlock, BlockWrapper};
use beacon_chain::{BeaconChainTypes, GossipVerifiedBlock, MAXIMUM_GOSSIP_CLOCK_DISPARITY};
use fnv::FnvHashMap;
use futures::task::Poll;

View File

@@ -682,19 +682,15 @@ impl<T: BeaconChainTypes> Worker<T> {
}
Err(err) => {
match err {
BlobError::BlobParentUnknown {
blob_root,
blob_parent_root,
} => {
BlobError::BlobParentUnknown(blob) => {
debug!(
self.log,
"Unknown parent hash for blob";
"action" => "requesting parent",
"blob_root" => %blob_root,
"parent_root" => %blob_parent_root
"blob_root" => %blob.block_root,
"parent_root" => %blob.block_parent_root
);
// TODO: send blob to reprocessing queue and queue a sync request for the blob.
todo!();
self.send_sync_message(SyncMessage::UnknownParentBlob(peer_id, blob));
}
BlobError::ProposerSignatureInvalid
| BlobError::UnknownValidator(_)
@@ -757,28 +753,42 @@ impl<T: BeaconChainTypes> Worker<T> {
// This value is not used presently, but it might come in handy for debugging.
_seen_duration: Duration,
) {
// TODO
let blob_root = verified_blob.block_root();
let blob_slot = verified_blob.slot();
let blob_clone = verified_blob.clone().to_blob();
match self
.chain
.process_blob(verified_blob, CountUnrealized::True)
.await
{
Ok(AvailabilityProcessingStatus::Imported(_hash)) => {
todo!()
// add to metrics
// logging
//TODO(sean) add metrics and logging
self.chain.recompute_head_at_current_slot().await;
}
Ok(AvailabilityProcessingStatus::PendingBlobs(pending_blobs)) => self
.send_sync_message(SyncMessage::UnknownBlobHash {
Ok(AvailabilityProcessingStatus::MissingComponents(slot, block_hash)) => {
self.send_sync_message(SyncMessage::MissingGossipBlockComponents(
slot, peer_id, block_hash,
));
}
Err(err) => {
debug!(
self.log,
"Invalid gossip blob";
"outcome" => ?err,
"block root" => ?blob_root,
"block slot" => blob_slot,
"blob index" => blob_clone.index,
);
self.gossip_penalize_peer(
peer_id,
pending_blobs,
}),
Ok(AvailabilityProcessingStatus::PendingBlock(block_hash)) => {
self.send_sync_message(SyncMessage::UnknownBlockHash(peer_id, block_hash));
}
Err(_err) => {
// handle errors
todo!()
PeerAction::MidToleranceError,
"bad_gossip_blob_ssz",
);
trace!(
self.log,
"Invalid gossip blob ssz";
"ssz" => format_args!("0x{}", hex::encode(blob_clone.as_ssz_bytes())),
);
}
}
}
@@ -918,16 +928,13 @@ impl<T: BeaconChainTypes> Worker<T> {
verified_block
}
Err(BlockError::AvailabilityCheck(_err)) => {
todo!()
}
Err(BlockError::ParentUnknown(block)) => {
debug!(
self.log,
"Unknown parent for gossip block";
"root" => ?block_root
);
self.send_sync_message(SyncMessage::UnknownBlock(peer_id, block, block_root));
self.send_sync_message(SyncMessage::UnknownParentBlock(peer_id, block, block_root));
return None;
}
Err(e @ BlockError::BeaconChainError(_)) => {
@@ -987,8 +994,8 @@ impl<T: BeaconChainTypes> Worker<T> {
);
return None;
}
Err(e @ BlockError::BlobValidation(_)) => {
warn!(self.log, "Could not verify blob for gossip. Rejecting the block and blob";
Err(e @ BlockError::BlobValidation(_)) | Err(e @ BlockError::AvailabilityCheck(_)) => {
warn!(self.log, "Could not verify block against known blobs in gossip. Rejecting the block";
"error" => %e);
self.propagate_validation_result(message_id, peer_id, MessageAcceptance::Reject);
self.gossip_penalize_peer(
@@ -1132,23 +1139,13 @@ impl<T: BeaconChainTypes> Worker<T> {
self.chain.recompute_head_at_current_slot().await;
}
Ok(AvailabilityProcessingStatus::PendingBlock(block_root)) => {
// This error variant doesn't make any sense in this context
crit!(
self.log,
"Internal error. Cannot get AvailabilityProcessingStatus::PendingBlock on processing block";
"block_root" => %block_root
);
}
Ok(AvailabilityProcessingStatus::PendingBlobs(pending_blobs)) => {
Ok(AvailabilityProcessingStatus::MissingComponents(slot, block_root)) => {
// make rpc request for blob
self.send_sync_message(SyncMessage::UnknownBlobHash {
self.send_sync_message(SyncMessage::MissingGossipBlockComponents(
*slot,
peer_id,
pending_blobs: pending_blobs.to_vec(),
});
}
Err(BlockError::AvailabilityCheck(_)) => {
todo!()
*block_root,
));
}
Err(BlockError::ParentUnknown(block)) => {
// Inform the sync manager to find parents for this block
@@ -1158,7 +1155,7 @@ impl<T: BeaconChainTypes> Worker<T> {
"Block with unknown parent attempted to be processed";
"peer_id" => %peer_id
);
self.send_sync_message(SyncMessage::UnknownBlock(
self.send_sync_message(SyncMessage::UnknownParentBlock(
peer_id,
block.clone(),
block_root,
@@ -1997,7 +1994,10 @@ impl<T: BeaconChainTypes> Worker<T> {
// We don't know the block, get the sync manager to handle the block lookup, and
// send the attestation to be scheduled for re-processing.
self.sync_tx
.send(SyncMessage::UnknownBlockHash(peer_id, *beacon_block_root))
.send(SyncMessage::UnknownBlockHashFromAttestation(
peer_id,
*beacon_block_root,
))
.unwrap_or_else(|_| {
warn!(
self.log,

View File

@@ -5,7 +5,7 @@ use crate::beacon_processor::work_reprocessing_queue::QueuedRpcBlock;
use crate::beacon_processor::worker::FUTURE_SLOT_TOLERANCE;
use crate::beacon_processor::DuplicateCache;
use crate::metrics;
use crate::sync::manager::{BlockProcessType, SyncMessage};
use crate::sync::manager::{BlockProcessType, ResponseType, SyncMessage};
use crate::sync::{BatchProcessResult, ChainId};
use beacon_chain::blob_verification::BlockWrapper;
use beacon_chain::blob_verification::{AsBlock, MaybeAvailableBlock};
@@ -21,6 +21,7 @@ use slog::{debug, error, info, warn};
use slot_clock::SlotClock;
use std::time::{SystemTime, UNIX_EPOCH};
use tokio::sync::mpsc;
use types::blob_sidecar::FixedBlobSidecarList;
use types::{Epoch, Hash256};
/// Id associated to a batch processing request, either a sync batch or a parent lookup.
@@ -57,9 +58,10 @@ impl<T: BeaconChainTypes> Worker<T> {
) {
if !should_process {
// Sync handles these results
self.send_sync_message(SyncMessage::BlockProcessed {
self.send_sync_message(SyncMessage::BlockComponentProcessed {
process_type,
result: crate::sync::manager::BlockProcessResult::Ignored,
result: crate::sync::manager::BlockProcessingResult::Ignored,
response_type: crate::sync::manager::ResponseType::Block,
});
return;
}
@@ -180,7 +182,8 @@ impl<T: BeaconChainTypes> Worker<T> {
metrics::inc_counter(&metrics::BEACON_PROCESSOR_RPC_BLOCK_IMPORTED_TOTAL);
// RPC block imported, regardless of process type
//TODO(sean) handle pending availability variants
//TODO(sean) do we need to do anything here for missing blobs? or is passing the result
// along to sync enough?
if let &Ok(AvailabilityProcessingStatus::Imported(hash)) = &result {
info!(self.log, "New RPC block received"; "slot" => slot, "hash" => %hash);
@@ -205,15 +208,50 @@ impl<T: BeaconChainTypes> Worker<T> {
}
}
// Sync handles these results
self.send_sync_message(SyncMessage::BlockProcessed {
self.send_sync_message(SyncMessage::BlockComponentProcessed {
process_type,
result: result.into(),
response_type: ResponseType::Block,
});
// Drop the handle to remove the entry from the cache
drop(handle);
}
pub async fn process_rpc_blobs(
self,
block_root: Hash256,
blobs: FixedBlobSidecarList<T::EthSpec>,
_seen_timestamp: Duration,
process_type: BlockProcessType,
) {
let Some(slot) = blobs.iter().find_map(|blob|{
blob.as_ref().map(|blob| blob.slot)
}) else {
return;
};
let result = self
.chain
.check_availability_and_maybe_import(
slot,
|chain| {
chain
.data_availability_checker
.put_rpc_blobs(block_root, blobs)
},
CountUnrealized::True,
)
.await;
// Sync handles these results
self.send_sync_message(SyncMessage::BlockComponentProcessed {
process_type,
result: result.into(),
response_type: ResponseType::Blob,
});
}
/// Attempt to import the chain segment (`blocks`) to the beacon chain, informing the sync
/// thread if more blocks are needed to process it.
pub async fn process_chain_segment(