mirror of
https://github.com/sigp/lighthouse.git
synced 2026-03-17 11:52:42 +00:00
Handle early blocks (#2155)
## Issue Addressed
NA
## Problem this PR addresses
There's an issue where Lighthouse is banning a lot of peers due to the following sequence of events:
1. Gossip block 0xabc arrives ~200ms early
- It is propagated across the network, with respect to [`MAXIMUM_GOSSIP_CLOCK_DISPARITY`](https://github.com/ethereum/eth2.0-specs/blob/v1.0.0/specs/phase0/p2p-interface.md#why-is-there-maximum_gossip_clock_disparity-when-validating-slot-ranges-of-messages-in-gossip-subnets).
- However, it is not imported to our database since the block is early.
2. Attestations for 0xabc arrive, but the block was not imported.
- The peer that sent the attestation is down-voted.
- Each unknown-block attestation causes a score loss of 1, the peer is banned at -100.
- When the peer is on an attestation subnet there can be hundreds of attestations, so the peer is banned quickly (before the missed block can be obtained via rpc).
## Potential solutions
I can think of three solutions to this:
1. Wait for attestation-queuing (#635) to arrive and solve this.
- Easy
- Not immediate fix.
- Whilst this would work, I don't think it's a perfect solution for this particular issue, rather (3) is better.
1. Allow importing blocks with a tolerance of `MAXIMUM_GOSSIP_CLOCK_DISPARITY`.
- Easy
- ~~I have implemented this, for now.~~
1. If a block is verified for gossip propagation (i.e., signature verified) and it's within `MAXIMUM_GOSSIP_CLOCK_DISPARITY`, then queue it to be processed at the start of the appropriate slot.
- More difficult
- Feels like the best solution, I will try to implement this.
**This PR takes approach (3).**
## Changes included
- Implement the `block_delay_queue`, based upon a [`DelayQueue`](https://docs.rs/tokio-util/0.6.3/tokio_util/time/delay_queue/struct.DelayQueue.html) which can store blocks until it's time to import them.
- Add a new `DelayedImportBlock` variant to the `beacon_processor::WorkEvent` enum to handle this new event.
- In the `BeaconProcessor`, refactor a `tokio::select!` to a struct with an explicit `Stream` implementation. I experienced some issues with `tokio::select!` in the block delay queue and I also found it hard to debug. I think this explicit implementation is nicer and functionally equivalent (apart from the fact that `tokio::select!` randomly chooses futures to poll, whereas now we're deterministic).
- Add a testing framework to the `beacon_processor` module that tests this new block delay logic. I also tested a handful of other operations in the beacon processor (attns, slashings, exits) since it was super easy to copy-pasta the code from the `http_api` tester.
- To implement these tests I added the concept of an optional `work_journal_tx` to the `BeaconProcessor` which will spit out a log of events. I used this in the tests to ensure that things were happening as I expect.
- The tests are a little racey, but it's hard to avoid that when testing timing-based code. If we see CI failures I can revise. I haven't observed *any* failures due to races on my machine or on CI yet.
- To assist with testing I allowed for directly setting the time on the `ManualSlotClock`.
- I gave the `beacon_processor::Worker` a `Toolbox` for two reasons; (a) it avoids changing tons of function sigs when you want to pass a new object to the worker and (b) it seemed cute.
This commit is contained in:
499
beacon_node/network/src/beacon_processor/tests.rs
Normal file
499
beacon_node/network/src/beacon_processor/tests.rs
Normal file
@@ -0,0 +1,499 @@
|
||||
#![cfg(not(debug_assertions))] // Tests are too slow in debug.
|
||||
#![cfg(test)]
|
||||
|
||||
use crate::beacon_processor::*;
|
||||
use crate::{service::NetworkMessage, sync::SyncMessage};
|
||||
use beacon_chain::{
|
||||
test_utils::{AttestationStrategy, BeaconChainHarness, BlockStrategy, EphemeralHarnessType},
|
||||
BeaconChain, MAXIMUM_GOSSIP_CLOCK_DISPARITY,
|
||||
};
|
||||
use discv5::enr::{CombinedKey, EnrBuilder};
|
||||
use environment::{null_logger, Environment, EnvironmentBuilder};
|
||||
use eth2_libp2p::{rpc::methods::MetaData, types::EnrBitfield, MessageId, NetworkGlobals, PeerId};
|
||||
use slot_clock::SlotClock;
|
||||
use std::cmp;
|
||||
use std::iter::Iterator;
|
||||
use std::sync::Arc;
|
||||
use std::time::Duration;
|
||||
use tokio::runtime::Runtime;
|
||||
use tokio::sync::mpsc;
|
||||
use types::{
|
||||
test_utils::generate_deterministic_keypairs, Attestation, AttesterSlashing, MainnetEthSpec,
|
||||
ProposerSlashing, SignedBeaconBlock, SignedVoluntaryExit, SubnetId,
|
||||
};
|
||||
|
||||
type E = MainnetEthSpec;
|
||||
type T = EphemeralHarnessType<E>;
|
||||
|
||||
const SLOTS_PER_EPOCH: u64 = 32;
|
||||
const VALIDATOR_COUNT: usize = SLOTS_PER_EPOCH as usize;
|
||||
const SMALL_CHAIN: u64 = 2;
|
||||
const LONG_CHAIN: u64 = SLOTS_PER_EPOCH * 2;
|
||||
|
||||
const TCP_PORT: u16 = 42;
|
||||
const UDP_PORT: u16 = 42;
|
||||
const SEQ_NUMBER: u64 = 0;
|
||||
|
||||
/// The default time to wait for `BeaconProcessor` events.
|
||||
const STANDARD_TIMEOUT: Duration = Duration::from_secs(10);
|
||||
|
||||
/// Provides utilities for testing the `BeaconProcessor`.
|
||||
struct TestRig {
|
||||
chain: Arc<BeaconChain<T>>,
|
||||
next_block: SignedBeaconBlock<E>,
|
||||
attestations: Vec<(Attestation<E>, SubnetId)>,
|
||||
attester_slashing: AttesterSlashing<E>,
|
||||
proposer_slashing: ProposerSlashing,
|
||||
voluntary_exit: SignedVoluntaryExit,
|
||||
beacon_processor_tx: mpsc::Sender<WorkEvent<T>>,
|
||||
work_journal_rx: mpsc::Receiver<String>,
|
||||
_network_rx: mpsc::UnboundedReceiver<NetworkMessage<E>>,
|
||||
_sync_rx: mpsc::UnboundedReceiver<SyncMessage<E>>,
|
||||
environment: Option<Environment<E>>,
|
||||
}
|
||||
|
||||
/// This custom drop implementation ensures that we shut down the tokio runtime gracefully. Without
|
||||
/// it, tests will hang indefinitely.
|
||||
impl Drop for TestRig {
|
||||
fn drop(&mut self) {
|
||||
// Causes the beacon processor to shutdown.
|
||||
self.beacon_processor_tx = mpsc::channel(MAX_WORK_EVENT_QUEUE_LEN).0;
|
||||
self.environment.take().unwrap().shutdown_on_idle();
|
||||
}
|
||||
}
|
||||
|
||||
impl TestRig {
|
||||
pub fn new(chain_length: u64) -> Self {
|
||||
let mut harness = BeaconChainHarness::new(
|
||||
MainnetEthSpec,
|
||||
generate_deterministic_keypairs(VALIDATOR_COUNT),
|
||||
);
|
||||
|
||||
harness.advance_slot();
|
||||
|
||||
for _ in 0..chain_length {
|
||||
harness.extend_chain(
|
||||
1,
|
||||
BlockStrategy::OnCanonicalHead,
|
||||
AttestationStrategy::AllValidators,
|
||||
);
|
||||
|
||||
harness.advance_slot();
|
||||
}
|
||||
|
||||
let head = harness.chain.head().unwrap();
|
||||
|
||||
assert_eq!(
|
||||
harness.chain.slot().unwrap(),
|
||||
head.beacon_block.slot() + 1,
|
||||
"precondition: current slot is one after head"
|
||||
);
|
||||
|
||||
let (next_block, _next_state) =
|
||||
harness.make_block(head.beacon_state.clone(), harness.chain.slot().unwrap());
|
||||
|
||||
let attestations = harness
|
||||
.get_unaggregated_attestations(
|
||||
&AttestationStrategy::AllValidators,
|
||||
&head.beacon_state,
|
||||
head.beacon_block_root,
|
||||
harness.chain.slot().unwrap(),
|
||||
)
|
||||
.into_iter()
|
||||
// .map(|vec| vec.into_iter().map(|(attestation, _subnet_id)| attestation))
|
||||
.flatten()
|
||||
.collect::<Vec<_>>();
|
||||
|
||||
assert!(
|
||||
!attestations.is_empty(),
|
||||
"precondition: attestations for testing"
|
||||
);
|
||||
|
||||
let attester_slashing = harness.make_attester_slashing(vec![0, 1]);
|
||||
let proposer_slashing = harness.make_proposer_slashing(2);
|
||||
let voluntary_exit = harness.make_voluntary_exit(3, harness.chain.epoch().unwrap());
|
||||
|
||||
// Changing this *after* the chain has been initialized is a bit cheeky, but it shouldn't
|
||||
// cause issue.
|
||||
//
|
||||
// This allows for testing voluntary exits without building out a massive chain.
|
||||
harness.chain.spec.shard_committee_period = 2;
|
||||
|
||||
let chain = Arc::new(harness.chain);
|
||||
|
||||
let (network_tx, _network_rx) = mpsc::unbounded_channel();
|
||||
|
||||
let log = null_logger().unwrap();
|
||||
|
||||
let (beacon_processor_tx, beacon_processor_rx) = mpsc::channel(MAX_WORK_EVENT_QUEUE_LEN);
|
||||
let (sync_tx, _sync_rx) = mpsc::unbounded_channel();
|
||||
|
||||
// Default metadata
|
||||
let meta_data = MetaData {
|
||||
seq_number: SEQ_NUMBER,
|
||||
attnets: EnrBitfield::<MainnetEthSpec>::default(),
|
||||
};
|
||||
let enr_key = CombinedKey::generate_secp256k1();
|
||||
let enr = EnrBuilder::new("v4").build(&enr_key).unwrap();
|
||||
let network_globals = Arc::new(NetworkGlobals::new(
|
||||
enr,
|
||||
TCP_PORT,
|
||||
UDP_PORT,
|
||||
meta_data,
|
||||
vec![],
|
||||
&log,
|
||||
));
|
||||
|
||||
let mut environment = EnvironmentBuilder::mainnet()
|
||||
.null_logger()
|
||||
.unwrap()
|
||||
.multi_threaded_tokio_runtime()
|
||||
.unwrap()
|
||||
.build()
|
||||
.unwrap();
|
||||
|
||||
let executor = environment.core_context().executor;
|
||||
|
||||
let (work_journal_tx, work_journal_rx) = mpsc::channel(16_364);
|
||||
|
||||
BeaconProcessor {
|
||||
beacon_chain: Arc::downgrade(&chain),
|
||||
network_tx,
|
||||
sync_tx,
|
||||
network_globals,
|
||||
executor,
|
||||
max_workers: cmp::max(1, num_cpus::get()),
|
||||
current_workers: 0,
|
||||
log: log.clone(),
|
||||
}
|
||||
.spawn_manager(beacon_processor_rx, Some(work_journal_tx));
|
||||
|
||||
Self {
|
||||
chain,
|
||||
next_block,
|
||||
attestations,
|
||||
attester_slashing,
|
||||
proposer_slashing,
|
||||
voluntary_exit,
|
||||
beacon_processor_tx,
|
||||
work_journal_rx,
|
||||
_network_rx,
|
||||
_sync_rx,
|
||||
environment: Some(environment),
|
||||
}
|
||||
}
|
||||
|
||||
pub fn enqueue_gossip_block(&self) {
|
||||
self.beacon_processor_tx
|
||||
.try_send(WorkEvent::gossip_beacon_block(
|
||||
junk_message_id(),
|
||||
junk_peer_id(),
|
||||
Box::new(self.next_block.clone()),
|
||||
Duration::from_secs(0),
|
||||
))
|
||||
.unwrap();
|
||||
}
|
||||
|
||||
pub fn enqueue_unaggregated_attestation(&self) {
|
||||
let (attestation, subnet_id) = self.attestations.first().unwrap().clone();
|
||||
self.beacon_processor_tx
|
||||
.try_send(WorkEvent::unaggregated_attestation(
|
||||
junk_message_id(),
|
||||
junk_peer_id(),
|
||||
attestation,
|
||||
subnet_id,
|
||||
true,
|
||||
Duration::from_secs(0),
|
||||
))
|
||||
.unwrap();
|
||||
}
|
||||
|
||||
pub fn enqueue_gossip_attester_slashing(&self) {
|
||||
self.beacon_processor_tx
|
||||
.try_send(WorkEvent::gossip_attester_slashing(
|
||||
junk_message_id(),
|
||||
junk_peer_id(),
|
||||
Box::new(self.attester_slashing.clone()),
|
||||
))
|
||||
.unwrap();
|
||||
}
|
||||
|
||||
pub fn enqueue_gossip_proposer_slashing(&self) {
|
||||
self.beacon_processor_tx
|
||||
.try_send(WorkEvent::gossip_proposer_slashing(
|
||||
junk_message_id(),
|
||||
junk_peer_id(),
|
||||
Box::new(self.proposer_slashing.clone()),
|
||||
))
|
||||
.unwrap();
|
||||
}
|
||||
|
||||
pub fn enqueue_gossip_voluntary_exit(&self) {
|
||||
self.beacon_processor_tx
|
||||
.try_send(WorkEvent::gossip_voluntary_exit(
|
||||
junk_message_id(),
|
||||
junk_peer_id(),
|
||||
Box::new(self.voluntary_exit.clone()),
|
||||
))
|
||||
.unwrap();
|
||||
}
|
||||
|
||||
fn runtime(&mut self) -> Arc<Runtime> {
|
||||
self.environment
|
||||
.as_mut()
|
||||
.unwrap()
|
||||
.core_context()
|
||||
.executor
|
||||
.runtime()
|
||||
.upgrade()
|
||||
.unwrap()
|
||||
}
|
||||
|
||||
/// Assert that the `BeaconProcessor` doesn't produce any events in the given `duration`.
|
||||
pub fn assert_no_events_for(&mut self, duration: Duration) {
|
||||
self.runtime().block_on(async {
|
||||
tokio::select! {
|
||||
_ = tokio::time::sleep(duration) => (),
|
||||
event = self.work_journal_rx.recv() => panic!(
|
||||
"received {:?} within {:?} when expecting no events",
|
||||
event,
|
||||
duration
|
||||
),
|
||||
}
|
||||
})
|
||||
}
|
||||
|
||||
/// Assert that the `BeaconProcessor` event journal is as `expected`.
|
||||
///
|
||||
/// ## Note
|
||||
///
|
||||
/// We won't attempt to listen for any more than `expected.len()` events. As such, it makes sense
|
||||
/// to use the `NOTHING_TO_DO` event to ensure that execution has completed.
|
||||
pub fn assert_event_journal(&mut self, expected: &[&str]) {
|
||||
let events = self.runtime().block_on(async {
|
||||
let mut events = vec![];
|
||||
|
||||
let drain_future = async {
|
||||
loop {
|
||||
match self.work_journal_rx.recv().await {
|
||||
Some(event) => {
|
||||
events.push(event);
|
||||
|
||||
// Break as soon as we collect the desired number of events.
|
||||
if events.len() >= expected.len() {
|
||||
break;
|
||||
}
|
||||
}
|
||||
None => break,
|
||||
}
|
||||
}
|
||||
};
|
||||
|
||||
// Drain the expected number of events from the channel, or time out and give up.
|
||||
tokio::select! {
|
||||
_ = tokio::time::sleep(STANDARD_TIMEOUT) => panic!(
|
||||
"timeout ({:?}) expired waiting for events. expected {:?} but got {:?}",
|
||||
STANDARD_TIMEOUT,
|
||||
expected,
|
||||
events
|
||||
),
|
||||
_ = drain_future => {},
|
||||
}
|
||||
|
||||
events
|
||||
});
|
||||
|
||||
assert_eq!(
|
||||
events,
|
||||
expected
|
||||
.into_iter()
|
||||
.map(|s| s.to_string())
|
||||
.collect::<Vec<_>>()
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
fn junk_peer_id() -> PeerId {
|
||||
PeerId::random()
|
||||
}
|
||||
|
||||
fn junk_message_id() -> MessageId {
|
||||
MessageId::new(&[])
|
||||
}
|
||||
|
||||
/// Blocks that arrive early should be queued for later processing.
|
||||
#[test]
|
||||
fn import_gossip_block_acceptably_early() {
|
||||
let mut rig = TestRig::new(SMALL_CHAIN);
|
||||
|
||||
let slot_start = rig
|
||||
.chain
|
||||
.slot_clock
|
||||
.start_of(rig.next_block.slot())
|
||||
.unwrap();
|
||||
|
||||
rig.chain
|
||||
.slot_clock
|
||||
.set_current_time(slot_start - MAXIMUM_GOSSIP_CLOCK_DISPARITY);
|
||||
|
||||
assert_eq!(
|
||||
rig.chain.slot().unwrap(),
|
||||
rig.next_block.slot() - 1,
|
||||
"chain should be at the correct slot"
|
||||
);
|
||||
|
||||
rig.enqueue_gossip_block();
|
||||
|
||||
rig.assert_event_journal(&[GOSSIP_BLOCK, WORKER_FREED, NOTHING_TO_DO]);
|
||||
|
||||
// Note: this section of the code is a bit race-y. We're assuming that we can set the slot clock
|
||||
// and check the head in the time between the block arrived early and when its due for
|
||||
// processing.
|
||||
//
|
||||
// If this causes issues we might be able to make the block delay queue add a longer delay for
|
||||
// processing, instead of just MAXIMUM_GOSSIP_CLOCK_DISPARITY. Speak to @paulhauner if this test
|
||||
// starts failing.
|
||||
rig.chain.slot_clock.set_slot(rig.next_block.slot().into());
|
||||
assert!(
|
||||
rig.chain.head().unwrap().beacon_block_root != rig.next_block.canonical_root(),
|
||||
"block not yet imported"
|
||||
);
|
||||
|
||||
rig.assert_event_journal(&[DELAYED_IMPORT_BLOCK, WORKER_FREED, NOTHING_TO_DO]);
|
||||
|
||||
assert_eq!(
|
||||
rig.chain.head().unwrap().beacon_block_root,
|
||||
rig.next_block.canonical_root(),
|
||||
"block should be imported and become head"
|
||||
);
|
||||
}
|
||||
|
||||
/// Blocks that are *too* early shouldn't get into the delay queue.
|
||||
#[test]
|
||||
fn import_gossip_block_unacceptably_early() {
|
||||
let mut rig = TestRig::new(SMALL_CHAIN);
|
||||
|
||||
let slot_start = rig
|
||||
.chain
|
||||
.slot_clock
|
||||
.start_of(rig.next_block.slot())
|
||||
.unwrap();
|
||||
|
||||
rig.chain
|
||||
.slot_clock
|
||||
.set_current_time(slot_start - MAXIMUM_GOSSIP_CLOCK_DISPARITY - Duration::from_millis(1));
|
||||
|
||||
assert_eq!(
|
||||
rig.chain.slot().unwrap(),
|
||||
rig.next_block.slot() - 1,
|
||||
"chain should be at the correct slot"
|
||||
);
|
||||
|
||||
rig.enqueue_gossip_block();
|
||||
|
||||
rig.assert_event_journal(&[GOSSIP_BLOCK, WORKER_FREED, NOTHING_TO_DO]);
|
||||
|
||||
// Waiting for 5 seconds is a bit arbtirary, however it *should* be long enough to ensure the
|
||||
// block isn't imported.
|
||||
rig.assert_no_events_for(Duration::from_secs(5));
|
||||
|
||||
assert!(
|
||||
rig.chain.head().unwrap().beacon_block_root != rig.next_block.canonical_root(),
|
||||
"block should not be imported"
|
||||
);
|
||||
}
|
||||
|
||||
/// Blocks that arrive on-time should be processed normally.
|
||||
#[test]
|
||||
fn import_gossip_block_at_current_slot() {
|
||||
let mut rig = TestRig::new(SMALL_CHAIN);
|
||||
|
||||
assert_eq!(
|
||||
rig.chain.slot().unwrap(),
|
||||
rig.next_block.slot(),
|
||||
"chain should be at the correct slot"
|
||||
);
|
||||
|
||||
rig.enqueue_gossip_block();
|
||||
|
||||
rig.assert_event_journal(&[GOSSIP_BLOCK, WORKER_FREED, NOTHING_TO_DO]);
|
||||
|
||||
assert_eq!(
|
||||
rig.chain.head().unwrap().beacon_block_root,
|
||||
rig.next_block.canonical_root(),
|
||||
"block should be imported and become head"
|
||||
);
|
||||
}
|
||||
|
||||
/// Ensure a valid attestation can be imported.
|
||||
#[test]
|
||||
fn import_gossip_attestation() {
|
||||
let mut rig = TestRig::new(SMALL_CHAIN);
|
||||
|
||||
let initial_attns = rig.chain.naive_aggregation_pool.read().num_attestations();
|
||||
|
||||
rig.enqueue_unaggregated_attestation();
|
||||
|
||||
rig.assert_event_journal(&[GOSSIP_ATTESTATION, WORKER_FREED, NOTHING_TO_DO]);
|
||||
|
||||
assert_eq!(
|
||||
rig.chain.naive_aggregation_pool.read().num_attestations(),
|
||||
initial_attns + 1,
|
||||
"op pool should have one more attestation"
|
||||
);
|
||||
}
|
||||
|
||||
/// Ensure a bunch of valid operations can be imported.
|
||||
#[test]
|
||||
fn import_misc_gossip_ops() {
|
||||
// Exits need the long chain so validators aren't too young to exit.
|
||||
let mut rig = TestRig::new(LONG_CHAIN);
|
||||
|
||||
/*
|
||||
* Attester slashing
|
||||
*/
|
||||
|
||||
let initial_attester_slashings = rig.chain.op_pool.num_attester_slashings();
|
||||
|
||||
rig.enqueue_gossip_attester_slashing();
|
||||
|
||||
rig.assert_event_journal(&[GOSSIP_ATTESTER_SLASHING, WORKER_FREED, NOTHING_TO_DO]);
|
||||
|
||||
assert_eq!(
|
||||
rig.chain.op_pool.num_attester_slashings(),
|
||||
initial_attester_slashings + 1,
|
||||
"op pool should have one more attester slashing"
|
||||
);
|
||||
|
||||
/*
|
||||
* Proposer slashing
|
||||
*/
|
||||
|
||||
let initial_proposer_slashings = rig.chain.op_pool.num_proposer_slashings();
|
||||
|
||||
rig.enqueue_gossip_proposer_slashing();
|
||||
|
||||
rig.assert_event_journal(&[GOSSIP_PROPOSER_SLASHING, WORKER_FREED, NOTHING_TO_DO]);
|
||||
|
||||
assert_eq!(
|
||||
rig.chain.op_pool.num_proposer_slashings(),
|
||||
initial_proposer_slashings + 1,
|
||||
"op pool should have one more proposer slashing"
|
||||
);
|
||||
|
||||
/*
|
||||
* Voluntary exit
|
||||
*/
|
||||
|
||||
let initial_voluntary_exits = rig.chain.op_pool.num_voluntary_exits();
|
||||
|
||||
rig.enqueue_gossip_voluntary_exit();
|
||||
|
||||
rig.assert_event_journal(&[GOSSIP_VOLUNTARY_EXIT, WORKER_FREED, NOTHING_TO_DO]);
|
||||
|
||||
assert_eq!(
|
||||
rig.chain.op_pool.num_voluntary_exits(),
|
||||
initial_voluntary_exits + 1,
|
||||
"op pool should have one more exit"
|
||||
);
|
||||
}
|
||||
Reference in New Issue
Block a user