Optimize validator duties (#2243)

## Issue Addressed

Closes #2052

## Proposed Changes

- Refactor the attester/proposer duties endpoints in the BN
    - Performance improvements
    - Fixes some potential inconsistencies with the dependent root fields.
    - Removes `http_api::beacon_proposer_cache` and just uses the one on the `BeaconChain` instead.
    - Move the code for the proposer/attester duties endpoints into separate files, for readability.
- Refactor the `DutiesService` in the VC
    - Required to reduce the delay on broadcasting new blocks.
    - Gets rid of the `ValidatorDuty` shim struct that came about when we adopted the standard API.
    - Separate block/attestation duty tasks so that they don't block each other when one is slow.
- In the VC, use `PublicKeyBytes` to represent validators instead of `PublicKey`. `PublicKey` is a legit crypto object whilst `PublicKeyBytes` is just a byte-array, it's much faster to clone/hash `PublicKeyBytes` and this change has had a significant impact on runtimes.
    - Unfortunately this has created lots of dust changes.
 - In the BN, store `PublicKeyBytes` in the `beacon_proposer_cache` and allow access to them. The HTTP API always sends `PublicKeyBytes` over the wire and the conversion from `PublicKey` -> `PublickeyBytes` is non-trivial, especially when queries have 100s/1000s of validators (like Pyrmont).
 - Add the `state_processing::state_advance` mod which dedups a lot of the "apply `n` skip slots to the state" code.
    - This also fixes a bug with some functions which were failing to include a state root as per [this comment](072695284f/consensus/state_processing/src/state_advance.rs (L69-L74)). I couldn't find any instance of this bug that resulted in anything more severe than keying a shuffling cache by the wrong block root.
 - Swap the VC block service to use `mpsc` from `tokio` instead of `futures`. This is consistent with the rest of the code base.
    
~~This PR *reduces* the size of the codebase 🎉~~ It *used* to reduce the size of the code base before I added more comments. 

## Observations on Prymont

- Proposer duties times down from peaks of 450ms to consistent <1ms.
- Current epoch attester duties times down from >1s peaks to a consistent 20-30ms.
- Block production down from +600ms to 100-200ms.

## Additional Info

- ~~Blocked on #2241~~
- ~~Blocked on #2234~~

## TODO

- [x] ~~Refactor this into some smaller PRs?~~ Leaving this as-is for now.
- [x] Address `per_slot_processing` roots.
- [x] Investigate slow next epoch times. Not getting added to cache on block processing?
- [x] Consider [this](072695284f/beacon_node/store/src/hot_cold_store.rs (L811-L812)) in the scenario of replacing the state roots


Co-authored-by: pawan <pawandhananjay@gmail.com>
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
This commit is contained in:
Paul Hauner
2021-03-17 05:09:57 +00:00
parent 6a69b20be1
commit 015ab7d0a7
49 changed files with 2201 additions and 1833 deletions

View File

@@ -11,7 +11,6 @@ mod http_metrics;
mod initialized_validators;
mod key_cache;
mod notifier;
mod validator_duty;
mod validator_store;
pub mod http_api;
@@ -26,12 +25,11 @@ use account_utils::validator_definitions::ValidatorDefinitions;
use attestation_service::{AttestationService, AttestationServiceBuilder};
use block_service::{BlockService, BlockServiceBuilder};
use clap::ArgMatches;
use duties_service::{DutiesService, DutiesServiceBuilder};
use duties_service::DutiesService;
use environment::RuntimeContext;
use eth2::types::StateId;
use eth2::{reqwest::ClientBuilder, BeaconNodeHttpClient, StatusCode, Url};
use fork_service::{ForkService, ForkServiceBuilder};
use futures::channel::mpsc;
use http_api::ApiSecret;
use initialized_validators::InitializedValidators;
use notifier::spawn_notifier;
@@ -44,7 +42,10 @@ use std::marker::PhantomData;
use std::net::SocketAddr;
use std::sync::Arc;
use std::time::{SystemTime, UNIX_EPOCH};
use tokio::time::{sleep, Duration};
use tokio::{
sync::mpsc,
time::{sleep, Duration},
};
use types::{EthSpec, Fork, Hash256};
use validator_store::ValidatorStore;
@@ -60,7 +61,7 @@ const HTTP_TIMEOUT: Duration = Duration::from_secs(12);
#[derive(Clone)]
pub struct ProductionValidatorClient<T: EthSpec> {
context: RuntimeContext<T>,
duties_service: DutiesService<SystemTimeSlotClock, T>,
duties_service: Arc<DutiesService<SystemTimeSlotClock, T>>,
fork_service: ForkService<SystemTimeSlotClock, T>,
block_service: BlockService<SystemTimeSlotClock, T>,
attestation_service: AttestationService<SystemTimeSlotClock, T>,
@@ -285,13 +286,22 @@ impl<T: EthSpec> ProductionValidatorClient<T> {
validator_store.prune_slashing_protection_db(slot.epoch(T::slots_per_epoch()), true);
}
let duties_service = DutiesServiceBuilder::new()
.slot_clock(slot_clock.clone())
.validator_store(validator_store.clone())
.beacon_nodes(beacon_nodes.clone())
.allow_unsynced_beacon_node(config.allow_unsynced_beacon_node)
.runtime_context(context.service_context("duties".into()))
.build()?;
let duties_context = context.service_context("duties".into());
let duties_service = Arc::new(DutiesService {
attesters: <_>::default(),
proposers: <_>::default(),
indices: <_>::default(),
slot_clock: slot_clock.clone(),
beacon_nodes: beacon_nodes.clone(),
validator_store: validator_store.clone(),
require_synced: if config.allow_unsynced_beacon_node {
RequireSynced::Yes
} else {
RequireSynced::No
},
spec: context.eth2_config.spec.clone(),
context: duties_context,
});
// Update the metrics server.
if let Some(ctx) = &http_metrics_ctx {
@@ -343,13 +353,7 @@ impl<T: EthSpec> ProductionValidatorClient<T> {
let (block_service_tx, block_service_rx) = mpsc::channel(channel_capacity);
let log = self.context.log();
self.duties_service
.clone()
.start_update_service(
block_service_tx,
Arc::new(self.context.eth2_config.spec.clone()),
)
.map_err(|e| format!("Unable to start duties service: {}", e))?;
duties_service::start_update_service(self.duties_service.clone(), block_service_tx);
self.fork_service
.clone()