mirror of
https://github.com/sigp/lighthouse.git
synced 2026-03-23 14:54:45 +00:00
Optimise tree hash caching for block production (#2106)
## Proposed Changes
`@potuz` on the Eth R&D Discord observed that Lighthouse blocks on Pyrmont were always arriving at other nodes after at least 1 second. Part of this could be due to processing and slow propagation, but metrics also revealed that the Lighthouse nodes were usually taking 400-600ms to even just produce a block before broadcasting it.
I tracked the slowness down to the lack of a pre-built tree hash cache (THC) on the states being used for block production. This was due to using the head state for block production, which lacks a THC in order to keep fork choice fast (cloning a THC takes at least 30ms for 100k validators). This PR modifies block production to clone a state from the snapshot cache rather than the head, which speeds things up by 200-400ms by avoiding the tree hash cache rebuild. In practice this seems to have cut block production time down to 300ms or less. Ideally we could _remove_ the snapshot from the cache (and save the 30ms), but it is required for when we re-process the block after signing it with the validator client.
## Alternatives
I experimented with 2 alternatives to this approach, before deciding on it:
* Alternative 1: ensure the `head` has a tree hash cache. This is too slow, as it imposes a +30ms hit on fork choice, which currently takes ~5ms (with occasional spikes).
* Alternative 2: use `Arc<BeaconSnapshot>` in the snapshot cache and share snapshots between the cache and the `head`. This made fork choice blazing fast (1ms), and block production the same as in this PR, but had a negative impact on block processing which I don't think is worth it. It ended up being necessary to clone the full state from the snapshot cache during block production, imposing the +30ms penalty there _as well_ as in block production.
In contract, the approach in this PR should only impact block production, and it improves it! Yay for pareto improvements 🎉
## Additional Info
This commit (ac59dfa) is currently running on all the Lighthouse Pyrmont nodes, and I've added a dashboard to the Pyrmont grafana instance with the metrics.
In future work we should optimise the attestation packing, which consumes around 30-60ms and is now a substantial contributor to the total.
This commit is contained in:
@@ -68,6 +68,30 @@ lazy_static! {
|
||||
);
|
||||
pub static ref BLOCK_PRODUCTION_TIMES: Result<Histogram> =
|
||||
try_create_histogram("beacon_block_production_seconds", "Full runtime of block production");
|
||||
pub static ref BLOCK_PRODUCTION_STATE_LOAD_TIMES: Result<Histogram> = try_create_histogram(
|
||||
"beacon_block_production_state_load_seconds",
|
||||
"Time taken to load the base state for block production"
|
||||
);
|
||||
pub static ref BLOCK_PRODUCTION_SLOT_PROCESS_TIMES: Result<Histogram> = try_create_histogram(
|
||||
"beacon_block_production_slot_process_seconds",
|
||||
"Time taken to advance the state to the block production slot"
|
||||
);
|
||||
pub static ref BLOCK_PRODUCTION_UNAGGREGATED_TIMES: Result<Histogram> = try_create_histogram(
|
||||
"beacon_block_production_unaggregated_seconds",
|
||||
"Time taken to import the naive aggregation pool for block production"
|
||||
);
|
||||
pub static ref BLOCK_PRODUCTION_ATTESTATION_TIMES: Result<Histogram> = try_create_histogram(
|
||||
"beacon_block_production_attestation_seconds",
|
||||
"Time taken to pack attestations into a block"
|
||||
);
|
||||
pub static ref BLOCK_PRODUCTION_PROCESS_TIMES: Result<Histogram> = try_create_histogram(
|
||||
"beacon_block_production_process_seconds",
|
||||
"Time taken to process the block produced"
|
||||
);
|
||||
pub static ref BLOCK_PRODUCTION_STATE_ROOT_TIMES: Result<Histogram> = try_create_histogram(
|
||||
"beacon_block_production_state_root_seconds",
|
||||
"Time taken to calculate the block's state root"
|
||||
);
|
||||
|
||||
/*
|
||||
* Block Statistics
|
||||
|
||||
Reference in New Issue
Block a user