Batch BLS verification for attestations (#2399)

## Issue Addressed

NA

## Proposed Changes

Adds the ability to verify batches of aggregated/unaggregated attestations from the network.

When the `BeaconProcessor` finds there are messages in the aggregated or unaggregated attestation queues, it will first check the length of the queue:

- `== 1` verify the attestation individually.
- `>= 2` take up to 64 of those attestations and verify them in a batch.

Notably, we only perform batch verification if the queue has a backlog. We don't apply any artificial delays to attestations to try and force them into batches. 

### Batching Details

To assist with implementing batches we modify `beacon_chain::attestation_verification` to have two distinct categories for attestations:

- *Indexed* attestations: those which have passed initial validation and were valid enough for us to derive an `IndexedAttestation`.
- *Verified* attestations: those attestations which were indexed *and also* passed signature verification. These are well-formed, interesting messages which were signed by validators.

The batching functions accept `n` attestations and then return `n` attestation verification `Result`s, where those `Result`s can be any combination of `Ok` or `Err`. In other words, we attempt to verify as many attestations as possible and return specific per-attestation results so peer scores can be updated, if required.

When we batch verify attestations, we first try to map all those attestations to *indexed* attestations. If any of those attestations were able to be indexed, we then perform batch BLS verification on those indexed attestations. If the batch verification succeeds, we convert them into *verified* attestations, disabling individual signature checking. If the batch fails, we convert to verified attestations with individual signature checking enabled.

Ultimately, we optimistically try to do a batch verification of attestation signatures and fall-back to individual verification if it fails. This opens an attach vector for "poisoning" the attestations and causing us to waste a batch verification. I argue that peer scoring should do a good-enough job of defending against this and the typical-case gains massively outweigh the worst-case losses.

## Additional Info

Before this PR, attestation verification took the attestations by value (instead of by reference). It turns out that this was unnecessary and, in my opinion, resulted in some undesirable ergonomics (e.g., we had to pass the attestation back in the `Err` variant to avoid clones). In this PR I've modified attestation verification so that it now takes a reference.

I refactored the `beacon_chain/tests/attestation_verification.rs` tests so they use a builder-esque "tester" struct instead of a weird macro. It made it easier for me to test individual/batch with the same set of tests and I think it was a nice tidy-up. Notably, I did this last to try and make sure my new refactors to *actual* production code would pass under the existing test suite.
This commit is contained in:
Paul Hauner
2021-09-22 08:49:41 +00:00
parent 9667dc2f03
commit be11437c27
13 changed files with 1962 additions and 1037 deletions

View File

@@ -46,7 +46,8 @@ use eth2_libp2p::{
};
use futures::stream::{Stream, StreamExt};
use futures::task::Poll;
use slog::{debug, error, trace, warn, Logger};
use slog::{crit, debug, error, trace, warn, Logger};
use std::cmp;
use std::collections::VecDeque;
use std::fmt;
use std::pin::Pin;
@@ -70,7 +71,7 @@ mod tests;
mod work_reprocessing_queue;
mod worker;
pub use worker::ProcessId;
pub use worker::{GossipAggregatePackage, GossipAttestationPackage, ProcessId};
/// The maximum size of the channel for work events to the `BeaconProcessor`.
///
@@ -159,11 +160,27 @@ const WORKER_TASK_NAME: &str = "beacon_processor_worker";
/// The minimum interval between log messages indicating that a queue is full.
const LOG_DEBOUNCE_INTERVAL: Duration = Duration::from_secs(30);
/// The `MAX_..._BATCH_SIZE` variables define how many attestations can be included in a single
/// batch.
///
/// Choosing these values is difficult since there is a trade-off between:
///
/// - It is faster to verify one large batch than multiple smaller batches.
/// - "Poisoning" attacks have a larger impact as the batch size increases.
///
/// Poisoning occurs when an invalid signature is included in a batch of attestations. A single
/// invalid signature causes the entire batch to fail. When a batch fails, we fall-back to
/// individually verifying each attestation signature.
const MAX_GOSSIP_ATTESTATION_BATCH_SIZE: usize = 64;
const MAX_GOSSIP_AGGREGATE_BATCH_SIZE: usize = 64;
/// Unique IDs used for metrics and testing.
pub const WORKER_FREED: &str = "worker_freed";
pub const NOTHING_TO_DO: &str = "nothing_to_do";
pub const GOSSIP_ATTESTATION: &str = "gossip_attestation";
pub const GOSSIP_ATTESTATION_BATCH: &str = "gossip_attestation_batch";
pub const GOSSIP_AGGREGATE: &str = "gossip_aggregate";
pub const GOSSIP_AGGREGATE_BATCH: &str = "gossip_aggregate_batch";
pub const GOSSIP_BLOCK: &str = "gossip_block";
pub const DELAYED_IMPORT_BLOCK: &str = "delayed_import_block";
pub const GOSSIP_VOLUNTARY_EXIT: &str = "gossip_voluntary_exit";
@@ -564,6 +581,9 @@ pub enum Work<T: BeaconChainTypes> {
should_import: bool,
seen_timestamp: Duration,
},
GossipAttestationBatch {
packages: Vec<GossipAttestationPackage<T::EthSpec>>,
},
GossipAggregate {
message_id: MessageId,
peer_id: PeerId,
@@ -576,6 +596,9 @@ pub enum Work<T: BeaconChainTypes> {
aggregate: Box<SignedAggregateAndProof<T::EthSpec>>,
seen_timestamp: Duration,
},
GossipAggregateBatch {
packages: Vec<GossipAggregatePackage<T::EthSpec>>,
},
GossipBlock {
message_id: MessageId,
peer_id: PeerId,
@@ -644,7 +667,9 @@ impl<T: BeaconChainTypes> Work<T> {
fn str_id(&self) -> &'static str {
match self {
Work::GossipAttestation { .. } => GOSSIP_ATTESTATION,
Work::GossipAttestationBatch { .. } => GOSSIP_ATTESTATION_BATCH,
Work::GossipAggregate { .. } => GOSSIP_AGGREGATE,
Work::GossipAggregateBatch { .. } => GOSSIP_AGGREGATE_BATCH,
Work::GossipBlock { .. } => GOSSIP_BLOCK,
Work::DelayedImportBlock { .. } => DELAYED_IMPORT_BLOCK,
Work::GossipVoluntaryExit { .. } => GOSSIP_VOLUNTARY_EXIT,
@@ -922,10 +947,103 @@ impl<T: BeaconChainTypes> BeaconProcessor<T> {
// Check the aggregates, *then* the unaggregates since we assume that
// aggregates are more valuable to local validators and effectively give us
// more information with less signature verification time.
} else if let Some(item) = aggregate_queue.pop() {
self.spawn_worker(item, toolbox);
} else if let Some(item) = attestation_queue.pop() {
self.spawn_worker(item, toolbox);
} else if aggregate_queue.len() > 0 {
let batch_size =
cmp::min(aggregate_queue.len(), MAX_GOSSIP_AGGREGATE_BATCH_SIZE);
if batch_size < 2 {
// One single aggregate is in the queue, process it individually.
if let Some(item) = aggregate_queue.pop() {
self.spawn_worker(item, toolbox);
}
} else {
// Collect two or more aggregates into a batch, so they can take
// advantage of batch signature verification.
//
// Note: this will convert the `Work::GossipAggregate` item into a
// `Work::GossipAggregateBatch` item.
let mut packages = Vec::with_capacity(batch_size);
for _ in 0..batch_size {
if let Some(item) = aggregate_queue.pop() {
match item {
Work::GossipAggregate {
message_id,
peer_id,
aggregate,
seen_timestamp,
} => {
packages.push(GossipAggregatePackage::new(
message_id,
peer_id,
aggregate,
seen_timestamp,
));
}
_ => {
error!(self.log, "Invalid item in aggregate queue")
}
}
}
}
// Process all aggregates with a single worker.
self.spawn_worker(Work::GossipAggregateBatch { packages }, toolbox)
}
// Check the unaggregated attestation queue.
//
// Potentially use batching.
} else if attestation_queue.len() > 0 {
let batch_size = cmp::min(
attestation_queue.len(),
MAX_GOSSIP_ATTESTATION_BATCH_SIZE,
);
if batch_size < 2 {
// One single attestation is in the queue, process it individually.
if let Some(item) = attestation_queue.pop() {
self.spawn_worker(item, toolbox);
}
} else {
// Collect two or more attestations into a batch, so they can take
// advantage of batch signature verification.
//
// Note: this will convert the `Work::GossipAttestation` item into a
// `Work::GossipAttestationBatch` item.
let mut packages = Vec::with_capacity(batch_size);
for _ in 0..batch_size {
if let Some(item) = attestation_queue.pop() {
match item {
Work::GossipAttestation {
message_id,
peer_id,
attestation,
subnet_id,
should_import,
seen_timestamp,
} => {
packages.push(GossipAttestationPackage::new(
message_id,
peer_id,
attestation,
subnet_id,
should_import,
seen_timestamp,
));
}
_ => error!(
self.log,
"Invalid item in attestation queue"
),
}
}
}
// Process all attestations with a single worker.
self.spawn_worker(
Work::GossipAttestationBatch { packages },
toolbox,
)
}
// Check sync committee messages after attestations as their rewards are lesser
// and they don't influence fork choice.
} else if let Some(item) = sync_contribution_queue.pop() {
@@ -1009,7 +1127,21 @@ impl<T: BeaconChainTypes> BeaconProcessor<T> {
match work {
_ if can_spawn => self.spawn_worker(work, toolbox),
Work::GossipAttestation { .. } => attestation_queue.push(work),
// Attestation batches are formed internally within the
// `BeaconProcessor`, they are not sent from external services.
Work::GossipAttestationBatch { .. } => crit!(
self.log,
"Unsupported inbound event";
"type" => "GossipAttestationBatch"
),
Work::GossipAggregate { .. } => aggregate_queue.push(work),
// Aggregate batches are formed internally within the `BeaconProcessor`,
// they are not sent from external services.
Work::GossipAggregateBatch { .. } => crit!(
self.log,
"Unsupported inbound event";
"type" => "GossipAggregateBatch"
),
Work::GossipBlock { .. } => {
gossip_block_queue.push(work, work_id, &self.log)
}
@@ -1180,7 +1312,7 @@ impl<T: BeaconChainTypes> BeaconProcessor<T> {
match work {
/*
* Unaggregated attestation verification.
* Individual unaggregated attestation verification.
*/
Work::GossipAttestation {
message_id,
@@ -1192,14 +1324,19 @@ impl<T: BeaconChainTypes> BeaconProcessor<T> {
} => worker.process_gossip_attestation(
message_id,
peer_id,
*attestation,
attestation,
subnet_id,
should_import,
Some(work_reprocessing_tx),
seen_timestamp,
),
/*
* Aggregated attestation verification.
* Batched unaggregated attestation verification.
*/
Work::GossipAttestationBatch { packages } => worker
.process_gossip_attestation_batch(packages, Some(work_reprocessing_tx)),
/*
* Individual aggregated attestation verification.
*/
Work::GossipAggregate {
message_id,
@@ -1209,10 +1346,16 @@ impl<T: BeaconChainTypes> BeaconProcessor<T> {
} => worker.process_gossip_aggregate(
message_id,
peer_id,
*aggregate,
aggregate,
Some(work_reprocessing_tx),
seen_timestamp,
),
/*
* Batched aggregated attestation verification.
*/
Work::GossipAggregateBatch { packages } => {
worker.process_gossip_aggregate_batch(packages, Some(work_reprocessing_tx))
}
/*
* Verification for beacon blocks received on gossip.
*/
@@ -1345,7 +1488,7 @@ impl<T: BeaconChainTypes> BeaconProcessor<T> {
} => worker.process_gossip_attestation(
message_id,
peer_id,
*attestation,
attestation,
subnet_id,
should_import,
None, // Do not allow this attestation to be re-processed beyond this point.
@@ -1359,7 +1502,7 @@ impl<T: BeaconChainTypes> BeaconProcessor<T> {
} => worker.process_gossip_aggregate(
message_id,
peer_id,
*aggregate,
aggregate,
None,
seen_timestamp,
),