Handle processing results of non faulty batches (#3439)

## Issue Addressed
Solves #3390 

So after checking some logs @pawanjay176 got, we conclude that this happened because we blacklisted a chain after trying it "too much". Now here, in all occurrences it seems that "too much" means we got too many download failures. This happened very slowly, exactly because the batch is allowed to stay alive for very long times after not counting penalties when the ee is offline. The error here then was not that the batch failed because of offline ee errors, but that we blacklisted a chain because of download errors, which we can't pin on the chain but on the peer. This PR fixes that.

## Proposed Changes

Adds a missing piece of logic so that if a chain fails for errors that can't be attributed to an objectively bad behavior from the peer, it is not blacklisted. The issue at hand occurred when new peers arrived claiming a head that had wrongfully blacklisted, even if the original peers participating in the chain were not penalized.

Another notable change is that we need to consider a batch invalid if it processed correctly but its next non empty batch fails processing. Now since a batch can fail processing in non empty ways, there is no need to mark as invalid previous batches.

Improves some logging as well.

## Additional Info

We should do this regardless of pausing sync on ee offline/unsynced state. This is because I think it's almost impossible to ensure a processing result will reach in a predictable order with a synced notification from the ee. Doing this handles what I think are inevitable data races when we actually pause sync

This also fixes a return that reports which batch failed and caused us some confusion checking the logs
This commit is contained in:
Divma
2022-08-12 00:56:38 +00:00
parent a476ae4907
commit f4ffa9e0b4
12 changed files with 298 additions and 274 deletions

View File

@@ -76,9 +76,7 @@ mod work_reprocessing_queue;
mod worker;
use crate::beacon_processor::work_reprocessing_queue::QueuedGossipBlock;
pub use worker::{
ChainSegmentProcessId, FailureMode, GossipAggregatePackage, GossipAttestationPackage,
};
pub use worker::{ChainSegmentProcessId, GossipAggregatePackage, GossipAttestationPackage};
/// The maximum size of the channel for work events to the `BeaconProcessor`.
///

View File

@@ -10,7 +10,7 @@ mod rpc_methods;
mod sync_methods;
pub use gossip_methods::{GossipAggregatePackage, GossipAttestationPackage};
pub use sync_methods::{ChainSegmentProcessId, FailureMode};
pub use sync_methods::ChainSegmentProcessId;
pub(crate) const FUTURE_SLOT_TOLERANCE: u64 = 1;

View File

@@ -34,15 +34,6 @@ struct ChainSegmentFailed {
message: String,
/// Used to penalize peers.
peer_action: Option<PeerAction>,
/// Failure mode
mode: FailureMode,
}
/// Represents if a block processing failure was on the consensus or execution side.
#[derive(Debug)]
pub enum FailureMode {
ExecutionLayer { pause_sync: bool },
ConsensusLayer,
}
impl<T: BeaconChainTypes> Worker<T> {
@@ -150,7 +141,9 @@ impl<T: BeaconChainTypes> Worker<T> {
"last_block_slot" => end_slot,
"processed_blocks" => sent_blocks,
"service"=> "sync");
BatchProcessResult::Success(sent_blocks > 0)
BatchProcessResult::Success {
was_non_empty: sent_blocks > 0,
}
}
(imported_blocks, Err(e)) => {
debug!(self.log, "Batch processing failed";
@@ -161,11 +154,12 @@ impl<T: BeaconChainTypes> Worker<T> {
"imported_blocks" => imported_blocks,
"error" => %e.message,
"service" => "sync");
BatchProcessResult::Failed {
imported_blocks: imported_blocks > 0,
peer_action: e.peer_action,
mode: e.mode,
match e.peer_action {
Some(penalty) => BatchProcessResult::FaultyFailure {
imported_blocks: imported_blocks > 0,
penalty,
},
None => BatchProcessResult::NonFaultyFailure,
}
}
}
@@ -184,7 +178,9 @@ impl<T: BeaconChainTypes> Worker<T> {
"last_block_slot" => end_slot,
"processed_blocks" => sent_blocks,
"service"=> "sync");
BatchProcessResult::Success(sent_blocks > 0)
BatchProcessResult::Success {
was_non_empty: sent_blocks > 0,
}
}
(_, Err(e)) => {
debug!(self.log, "Backfill batch processing failed";
@@ -193,10 +189,12 @@ impl<T: BeaconChainTypes> Worker<T> {
"last_block_slot" => end_slot,
"error" => %e.message,
"service" => "sync");
BatchProcessResult::Failed {
imported_blocks: false,
peer_action: e.peer_action,
mode: e.mode,
match e.peer_action {
Some(penalty) => BatchProcessResult::FaultyFailure {
imported_blocks: false,
penalty,
},
None => BatchProcessResult::NonFaultyFailure,
}
}
}
@@ -216,15 +214,19 @@ impl<T: BeaconChainTypes> Worker<T> {
{
(imported_blocks, Err(e)) => {
debug!(self.log, "Parent lookup failed"; "error" => %e.message);
BatchProcessResult::Failed {
imported_blocks: imported_blocks > 0,
peer_action: e.peer_action,
mode: e.mode,
match e.peer_action {
Some(penalty) => BatchProcessResult::FaultyFailure {
imported_blocks: imported_blocks > 0,
penalty,
},
None => BatchProcessResult::NonFaultyFailure,
}
}
(imported_blocks, Ok(_)) => {
debug!(self.log, "Parent lookup processed successfully");
BatchProcessResult::Success(imported_blocks > 0)
BatchProcessResult::Success {
was_non_empty: imported_blocks > 0,
}
}
}
}
@@ -307,7 +309,6 @@ impl<T: BeaconChainTypes> Worker<T> {
message: String::from("mismatched_block_root"),
// The peer is faulty if they send blocks with bad roots.
peer_action: Some(PeerAction::LowToleranceError),
mode: FailureMode::ConsensusLayer,
}
}
HistoricalBlockError::InvalidSignature
@@ -322,7 +323,6 @@ impl<T: BeaconChainTypes> Worker<T> {
message: "invalid_signature".into(),
// The peer is faulty if they bad signatures.
peer_action: Some(PeerAction::LowToleranceError),
mode: FailureMode::ConsensusLayer,
}
}
HistoricalBlockError::ValidatorPubkeyCacheTimeout => {
@@ -336,7 +336,6 @@ impl<T: BeaconChainTypes> Worker<T> {
message: "pubkey_cache_timeout".into(),
// This is an internal error, do not penalize the peer.
peer_action: None,
mode: FailureMode::ConsensusLayer,
}
}
HistoricalBlockError::NoAnchorInfo => {
@@ -347,7 +346,6 @@ impl<T: BeaconChainTypes> Worker<T> {
// There is no need to do a historical sync, this is not a fault of
// the peer.
peer_action: None,
mode: FailureMode::ConsensusLayer,
}
}
HistoricalBlockError::IndexOutOfBounds => {
@@ -360,7 +358,6 @@ impl<T: BeaconChainTypes> Worker<T> {
message: String::from("logic_error"),
// This should never occur, don't penalize the peer.
peer_action: None,
mode: FailureMode::ConsensusLayer,
}
}
HistoricalBlockError::BlockOutOfRange { .. } => {
@@ -373,7 +370,6 @@ impl<T: BeaconChainTypes> Worker<T> {
message: String::from("unexpected_error"),
// This should never occur, don't penalize the peer.
peer_action: None,
mode: FailureMode::ConsensusLayer,
}
}
},
@@ -383,7 +379,6 @@ impl<T: BeaconChainTypes> Worker<T> {
message: format!("{:?}", other),
// This is an internal error, don't penalize the peer.
peer_action: None,
mode: FailureMode::ConsensusLayer,
}
}
};
@@ -404,7 +399,6 @@ impl<T: BeaconChainTypes> Worker<T> {
message: format!("Block has an unknown parent: {}", block.parent_root()),
// Peers are faulty if they send non-sequential blocks.
peer_action: Some(PeerAction::LowToleranceError),
mode: FailureMode::ConsensusLayer,
})
}
BlockError::BlockIsAlreadyKnown => {
@@ -442,7 +436,6 @@ impl<T: BeaconChainTypes> Worker<T> {
),
// Peers are faulty if they send blocks from the future.
peer_action: Some(PeerAction::LowToleranceError),
mode: FailureMode::ConsensusLayer,
})
}
BlockError::WouldRevertFinalizedSlot { .. } => {
@@ -464,7 +457,6 @@ impl<T: BeaconChainTypes> Worker<T> {
message: format!("Internal error whilst processing block: {:?}", e),
// Do not penalize peers for internal errors.
peer_action: None,
mode: FailureMode::ConsensusLayer,
})
}
ref err @ BlockError::ExecutionPayloadError(ref epe) => {
@@ -480,7 +472,6 @@ impl<T: BeaconChainTypes> Worker<T> {
message: format!("Execution layer offline. Reason: {:?}", err),
// Do not penalize peers for internal errors.
peer_action: None,
mode: FailureMode::ExecutionLayer { pause_sync: true },
})
} else {
debug!(self.log,
@@ -493,7 +484,6 @@ impl<T: BeaconChainTypes> Worker<T> {
err
),
peer_action: Some(PeerAction::LowToleranceError),
mode: FailureMode::ExecutionLayer { pause_sync: false },
})
}
}
@@ -508,7 +498,6 @@ impl<T: BeaconChainTypes> Worker<T> {
message: format!("Peer sent invalid block. Reason: {:?}", other),
// Do not penalize peers for internal errors.
peer_action: None,
mode: FailureMode::ConsensusLayer,
})
}
}