mirror of
https://github.com/sigp/lighthouse.git
synced 2026-03-11 18:04:18 +00:00
Optimise tree hash caching for block production (#2106)
## Proposed Changes
`@potuz` on the Eth R&D Discord observed that Lighthouse blocks on Pyrmont were always arriving at other nodes after at least 1 second. Part of this could be due to processing and slow propagation, but metrics also revealed that the Lighthouse nodes were usually taking 400-600ms to even just produce a block before broadcasting it.
I tracked the slowness down to the lack of a pre-built tree hash cache (THC) on the states being used for block production. This was due to using the head state for block production, which lacks a THC in order to keep fork choice fast (cloning a THC takes at least 30ms for 100k validators). This PR modifies block production to clone a state from the snapshot cache rather than the head, which speeds things up by 200-400ms by avoiding the tree hash cache rebuild. In practice this seems to have cut block production time down to 300ms or less. Ideally we could _remove_ the snapshot from the cache (and save the 30ms), but it is required for when we re-process the block after signing it with the validator client.
## Alternatives
I experimented with 2 alternatives to this approach, before deciding on it:
* Alternative 1: ensure the `head` has a tree hash cache. This is too slow, as it imposes a +30ms hit on fork choice, which currently takes ~5ms (with occasional spikes).
* Alternative 2: use `Arc<BeaconSnapshot>` in the snapshot cache and share snapshots between the cache and the `head`. This made fork choice blazing fast (1ms), and block production the same as in this PR, but had a negative impact on block processing which I don't think is worth it. It ended up being necessary to clone the full state from the snapshot cache during block production, imposing the +30ms penalty there _as well_ as in block production.
In contract, the approach in this PR should only impact block production, and it improves it! Yay for pareto improvements 🎉
## Additional Info
This commit (ac59dfa) is currently running on all the Lighthouse Pyrmont nodes, and I've added a dashboard to the Pyrmont grafana instance with the metrics.
In future work we should optimise the attestation packing, which consumes around 30-60ms and is now a substantial contributor to the total.
This commit is contained in:
@@ -1,6 +1,6 @@
|
||||
use crate::BeaconSnapshot;
|
||||
use std::cmp;
|
||||
use types::{Epoch, EthSpec, Hash256};
|
||||
use types::{beacon_state::CloneConfig, Epoch, EthSpec, Hash256};
|
||||
|
||||
/// The default size of the cache.
|
||||
pub const DEFAULT_SNAPSHOT_CACHE_SIZE: usize = 4;
|
||||
@@ -69,13 +69,16 @@ impl<T: EthSpec> SnapshotCache<T> {
|
||||
.map(|i| self.snapshots.remove(i))
|
||||
}
|
||||
|
||||
/// If there is a snapshot with `block_root`, clone it (with only the committee caches) and
|
||||
/// return the clone.
|
||||
pub fn get_cloned(&self, block_root: Hash256) -> Option<BeaconSnapshot<T>> {
|
||||
/// If there is a snapshot with `block_root`, clone it and return the clone.
|
||||
pub fn get_cloned(
|
||||
&self,
|
||||
block_root: Hash256,
|
||||
clone_config: CloneConfig,
|
||||
) -> Option<BeaconSnapshot<T>> {
|
||||
self.snapshots
|
||||
.iter()
|
||||
.find(|snapshot| snapshot.beacon_block_root == block_root)
|
||||
.map(|snapshot| snapshot.clone_with_only_committee_caches())
|
||||
.map(|snapshot| snapshot.clone_with(clone_config))
|
||||
}
|
||||
|
||||
/// Removes all snapshots from the queue that are less than or equal to the finalized epoch.
|
||||
@@ -165,11 +168,13 @@ mod test {
|
||||
cache.try_remove(Hash256::from_low_u64_be(1)).is_none(),
|
||||
"the snapshot with the lowest slot should have been removed during the insert function"
|
||||
);
|
||||
assert!(cache.get_cloned(Hash256::from_low_u64_be(1)).is_none());
|
||||
assert!(cache
|
||||
.get_cloned(Hash256::from_low_u64_be(1), CloneConfig::none())
|
||||
.is_none());
|
||||
|
||||
assert!(
|
||||
cache
|
||||
.get_cloned(Hash256::from_low_u64_be(0))
|
||||
.get_cloned(Hash256::from_low_u64_be(0), CloneConfig::none())
|
||||
.expect("the head should still be in the cache")
|
||||
.beacon_block_root
|
||||
== Hash256::from_low_u64_be(0),
|
||||
|
||||
Reference in New Issue
Block a user