Super Silky Smooth Syncs, like a Sir (#1628)

## Issue Addressed
In principle.. closes #1551 but in general are improvements for performance, maintainability and readability. The logic for the optimistic sync in actually simple

## Proposed Changes
There are miscellaneous things here:
- Remove unnecessary `BatchProcessResult::Partial` to simplify the batch validation logic
- Make batches a state machine. This is done to ensure batch state transitions respect our logic (this was previously done by moving batches between `Vec`s) and to ease the cognitive load of the `SyncingChain` struct
- Move most batch-related logic to the batch
- Remove `PendingBatches` in favor of a map of peers to their batches. This is to avoid duplicating peers inside the chain (peer_pool and pending_batches)
- Add `must_use` decoration to the `ProcessingResult` so that chains that request to be removed are handled accordingly. This also means that chains are now removed in more places than before to account for unhandled cases
- Store batches in a sorted map (`BTreeMap`) access is not O(1) but since the number of _active_ batches is bounded this should be fast, and saves performing hashing ops. Batches are indexed by the epoch they start. Sorted, to easily handle chain advancements (range logic)
- Produce the chain Id from the identifying fields: target root and target slot. This, to guarantee there can't be duplicated chains and be able to consistently search chains by either Id or checkpoint
- Fix chain_id not being present in all chain loggers
- Handle mega-edge case where the processor's work queue is full and the batch can't be sent. In this case the chain would lose the blocks, remain in a "syncing" state and waiting for a result that won't arrive, effectively stalling sync.
- When a batch imports blocks or the chain starts syncing with a local finalized epoch greater that the chain's start epoch, the chain is advanced instead of reset. This is to avoid losing download progress and validate batches faster. This also means that the old `start_epoch` now means "current first unvalidated batch", so it represents more accurately the progress of the chain.
- Batch status peers from the same chain to reduce Arc access.
- Handle a couple of cases where the retry counters for a batch were not updated/checked are now handled via the batch state machine. Basically now if we forget to do it, we will know.
- Do not send back the blocks from the processor to the batch. Instead register the attempt before sending the blocks (does not count as failed)
- When re-requesting a batch, try to avoid not only the last failed peer, but all previous failed peers.
- Optimize requesting batches ahead in the buffer by shuffling idle peers just once (this is just addressing a couple of old TODOs in the code)
- In chain_collection, store chains by their id in a map
- Include a mapping from request_ids to (chain, batch) that requested the batch to avoid the double O(n) search on block responses
- Other stuff:
  - impl `slog::KV` for batches
  - impl `slog::KV` for syncing chains
  - PSA: when logging, we can use `%thing` if `thing` implements `Display`. Same for `?` and `Debug`

### Optimistic syncing:
Try first the batch that contains the current head, if the batch imports any block, advance the chain. If not, if this optimistic batch is inside the current processing window leave it there for future use, if not drop it. The tolerance for this block is the same for downloading, but just once for processing



Co-authored-by: Age Manning <Age@AgeManning.com>
This commit is contained in:
divma
2020-09-23 06:29:55 +00:00
parent 80e52a0263
commit b8013b7b2c
13 changed files with 1480 additions and 1199 deletions

View File

@@ -29,9 +29,9 @@
//!
//! Block Lookup
//!
//! To keep the logic maintained to the syncing thread (and manage the request_ids), when a block needs to be searched for (i.e
//! if an attestation references an unknown block) this manager can search for the block and
//! subsequently search for parents if needed.
//! To keep the logic maintained to the syncing thread (and manage the request_ids), when a block
//! needs to be searched for (i.e if an attestation references an unknown block) this manager can
//! search for the block and subsequently search for parents if needed.
use super::network_context::SyncNetworkContext;
use super::peer_sync_info::{PeerSyncInfo, PeerSyncType};
@@ -106,7 +106,6 @@ pub enum SyncMessage<T: EthSpec> {
BatchProcessed {
chain_id: ChainId,
epoch: Epoch,
downloaded_blocks: Vec<SignedBeaconBlock<T>>,
result: BatchProcessResult,
},
@@ -123,12 +122,10 @@ pub enum SyncMessage<T: EthSpec> {
// TODO: When correct batch error handling occurs, we will include an error type.
#[derive(Debug)]
pub enum BatchProcessResult {
/// The batch was completed successfully.
Success,
/// The batch processing failed.
Failed,
/// The batch processing failed but managed to import at least one block.
Partial,
/// The batch was completed successfully. It carries whether the sent batch contained blocks.
Success(bool),
/// The batch processing failed. It carries whether the processing imported any block.
Failed(bool),
}
/// Maintains a sequential list of parents to lookup and the lookup's current state.
@@ -275,9 +272,9 @@ impl<T: BeaconChainTypes> SyncManager<T> {
match local_peer_info.peer_sync_type(&remote) {
PeerSyncType::FullySynced => {
trace!(self.log, "Peer synced to our head found";
"peer" => format!("{:?}", peer_id),
"peer_head_slot" => remote.head_slot,
"local_head_slot" => local_peer_info.head_slot,
"peer" => %peer_id,
"peer_head_slot" => remote.head_slot,
"local_head_slot" => local_peer_info.head_slot,
);
self.synced_peer(&peer_id, remote);
// notify the range sync that a peer has been added
@@ -285,11 +282,11 @@ impl<T: BeaconChainTypes> SyncManager<T> {
}
PeerSyncType::Advanced => {
trace!(self.log, "Useful peer for sync found";
"peer" => format!("{:?}", peer_id),
"peer_head_slot" => remote.head_slot,
"local_head_slot" => local_peer_info.head_slot,
"peer_finalized_epoch" => remote.finalized_epoch,
"local_finalized_epoch" => local_peer_info.finalized_epoch,
"peer" => %peer_id,
"peer_head_slot" => remote.head_slot,
"local_head_slot" => local_peer_info.head_slot,
"peer_finalized_epoch" => remote.finalized_epoch,
"local_finalized_epoch" => local_peer_info.finalized_epoch,
);
// There are few cases to handle here:
@@ -908,14 +905,12 @@ impl<T: BeaconChainTypes> SyncManager<T> {
SyncMessage::BatchProcessed {
chain_id,
epoch,
downloaded_blocks,
result,
} => {
self.range_sync.handle_block_process_result(
&mut self.network,
chain_id,
epoch,
downloaded_blocks,
result,
);
}