mirror of
https://github.com/sigp/lighthouse.git
synced 2026-03-11 04:31:51 +00:00
## Proposed Changes
`@potuz` on the Eth R&D Discord observed that Lighthouse blocks on Pyrmont were always arriving at other nodes after at least 1 second. Part of this could be due to processing and slow propagation, but metrics also revealed that the Lighthouse nodes were usually taking 400-600ms to even just produce a block before broadcasting it.
I tracked the slowness down to the lack of a pre-built tree hash cache (THC) on the states being used for block production. This was due to using the head state for block production, which lacks a THC in order to keep fork choice fast (cloning a THC takes at least 30ms for 100k validators). This PR modifies block production to clone a state from the snapshot cache rather than the head, which speeds things up by 200-400ms by avoiding the tree hash cache rebuild. In practice this seems to have cut block production time down to 300ms or less. Ideally we could _remove_ the snapshot from the cache (and save the 30ms), but it is required for when we re-process the block after signing it with the validator client.
## Alternatives
I experimented with 2 alternatives to this approach, before deciding on it:
* Alternative 1: ensure the `head` has a tree hash cache. This is too slow, as it imposes a +30ms hit on fork choice, which currently takes ~5ms (with occasional spikes).
* Alternative 2: use `Arc<BeaconSnapshot>` in the snapshot cache and share snapshots between the cache and the `head`. This made fork choice blazing fast (1ms), and block production the same as in this PR, but had a negative impact on block processing which I don't think is worth it. It ended up being necessary to clone the full state from the snapshot cache during block production, imposing the +30ms penalty there _as well_ as in block production.
In contract, the approach in this PR should only impact block production, and it improves it! Yay for pareto improvements 🎉
## Additional Info
This commit (ac59dfa) is currently running on all the Lighthouse Pyrmont nodes, and I've added a dashboard to the Pyrmont grafana instance with the metrics.
In future work we should optimise the attestation packing, which consumes around 30-60ms and is now a substantial contributor to the total.
133 lines
4.9 KiB
Rust
133 lines
4.9 KiB
Rust
use crate::beacon_chain::ForkChoiceError;
|
|
use crate::beacon_fork_choice_store::Error as ForkChoiceStoreError;
|
|
use crate::eth1_chain::Error as Eth1ChainError;
|
|
use crate::migrate::PruningError;
|
|
use crate::naive_aggregation_pool::Error as NaiveAggregationError;
|
|
use crate::observed_attestations::Error as ObservedAttestationsError;
|
|
use crate::observed_attesters::Error as ObservedAttestersError;
|
|
use crate::observed_block_producers::Error as ObservedBlockProducersError;
|
|
use futures::channel::mpsc::TrySendError;
|
|
use operation_pool::OpPoolError;
|
|
use safe_arith::ArithError;
|
|
use ssz_types::Error as SszTypesError;
|
|
use state_processing::{
|
|
block_signature_verifier::Error as BlockSignatureVerifierError,
|
|
per_block_processing::errors::{
|
|
AttestationValidationError, AttesterSlashingValidationError, ExitValidationError,
|
|
ProposerSlashingValidationError,
|
|
},
|
|
signature_sets::Error as SignatureSetError,
|
|
BlockProcessingError, SlotProcessingError,
|
|
};
|
|
use std::time::Duration;
|
|
use types::*;
|
|
|
|
macro_rules! easy_from_to {
|
|
($from: ident, $to: ident) => {
|
|
impl From<$from> for $to {
|
|
fn from(e: $from) -> $to {
|
|
$to::$from(e)
|
|
}
|
|
}
|
|
};
|
|
}
|
|
|
|
#[derive(Debug)]
|
|
pub enum BeaconChainError {
|
|
InsufficientValidators,
|
|
UnableToReadSlot,
|
|
RevertedFinalizedEpoch {
|
|
previous_epoch: Epoch,
|
|
new_epoch: Epoch,
|
|
},
|
|
SlotClockDidNotStart,
|
|
NoStateForSlot(Slot),
|
|
UnableToFindTargetRoot(Slot),
|
|
BeaconStateError(BeaconStateError),
|
|
DBInconsistent(String),
|
|
DBError(store::Error),
|
|
ForkChoiceError(ForkChoiceError),
|
|
ForkChoiceStoreError(ForkChoiceStoreError),
|
|
MissingBeaconBlock(Hash256),
|
|
MissingBeaconState(Hash256),
|
|
SlotProcessingError(SlotProcessingError),
|
|
UnableToAdvanceState(String),
|
|
NoStateForAttestation {
|
|
beacon_block_root: Hash256,
|
|
},
|
|
CannotAttestToFutureState,
|
|
AttestationValidationError(AttestationValidationError),
|
|
ExitValidationError(ExitValidationError),
|
|
ProposerSlashingValidationError(ProposerSlashingValidationError),
|
|
AttesterSlashingValidationError(AttesterSlashingValidationError),
|
|
StateSkipTooLarge {
|
|
start_slot: Slot,
|
|
requested_slot: Slot,
|
|
max_task_runtime: Duration,
|
|
},
|
|
MissingFinalizedStateRoot(Slot),
|
|
/// Returned when an internal check fails, indicating corrupt data.
|
|
InvariantViolated(String),
|
|
SszTypesError(SszTypesError),
|
|
CanonicalHeadLockTimeout,
|
|
AttestationCacheLockTimeout,
|
|
ValidatorPubkeyCacheLockTimeout,
|
|
IncorrectStateForAttestation(RelativeEpochError),
|
|
InvalidValidatorPubkeyBytes(bls::Error),
|
|
ValidatorPubkeyCacheIncomplete(usize),
|
|
SignatureSetError(SignatureSetError),
|
|
BlockSignatureVerifierError(state_processing::block_signature_verifier::Error),
|
|
DuplicateValidatorPublicKey,
|
|
ValidatorPubkeyCacheFileError(String),
|
|
OpPoolError(OpPoolError),
|
|
NaiveAggregationError(NaiveAggregationError),
|
|
ObservedAttestationsError(ObservedAttestationsError),
|
|
ObservedAttestersError(ObservedAttestersError),
|
|
ObservedBlockProducersError(ObservedBlockProducersError),
|
|
PruningError(PruningError),
|
|
ArithError(ArithError),
|
|
InvalidShufflingId {
|
|
shuffling_epoch: Epoch,
|
|
head_block_epoch: Epoch,
|
|
},
|
|
WeakSubjectivtyVerificationFailure,
|
|
WeakSubjectivtyShutdownError(TrySendError<&'static str>),
|
|
}
|
|
|
|
easy_from_to!(SlotProcessingError, BeaconChainError);
|
|
easy_from_to!(AttestationValidationError, BeaconChainError);
|
|
easy_from_to!(ExitValidationError, BeaconChainError);
|
|
easy_from_to!(ProposerSlashingValidationError, BeaconChainError);
|
|
easy_from_to!(AttesterSlashingValidationError, BeaconChainError);
|
|
easy_from_to!(SszTypesError, BeaconChainError);
|
|
easy_from_to!(OpPoolError, BeaconChainError);
|
|
easy_from_to!(NaiveAggregationError, BeaconChainError);
|
|
easy_from_to!(ObservedAttestationsError, BeaconChainError);
|
|
easy_from_to!(ObservedAttestersError, BeaconChainError);
|
|
easy_from_to!(ObservedBlockProducersError, BeaconChainError);
|
|
easy_from_to!(BlockSignatureVerifierError, BeaconChainError);
|
|
easy_from_to!(PruningError, BeaconChainError);
|
|
easy_from_to!(ArithError, BeaconChainError);
|
|
easy_from_to!(ForkChoiceStoreError, BeaconChainError);
|
|
|
|
#[derive(Debug)]
|
|
pub enum BlockProductionError {
|
|
UnableToGetHeadInfo(BeaconChainError),
|
|
UnableToGetBlockRootFromState,
|
|
UnableToReadSlot,
|
|
UnableToProduceAtSlot(Slot),
|
|
SlotProcessingError(SlotProcessingError),
|
|
BlockProcessingError(BlockProcessingError),
|
|
Eth1ChainError(Eth1ChainError),
|
|
BeaconStateError(BeaconStateError),
|
|
OpPoolError(OpPoolError),
|
|
/// The `BeaconChain` was explicitly configured _without_ a connection to eth1, therefore it
|
|
/// cannot produce blocks.
|
|
NoEth1ChainConnection,
|
|
}
|
|
|
|
easy_from_to!(BlockProcessingError, BlockProductionError);
|
|
easy_from_to!(BeaconStateError, BlockProductionError);
|
|
easy_from_to!(SlotProcessingError, BlockProductionError);
|
|
easy_from_to!(Eth1ChainError, BlockProductionError);
|