Hierarchical state diffs (#5978)

* Start extracting freezer changes for tree-states

* Remove unused config args

* Add comments

* Remove unwraps

* Subjective more clear implementation

* Clean up hdiff

* Update xdelta3

* Tree states archive metrics (#6040)

* Add store cache size metrics

* Add compress timer metrics

* Add diff apply compute timer metrics

* Add diff buffer cache hit metrics

* Add hdiff buffer load times

* Add blocks replayed metric

* Move metrics to store

* Future proof some metrics

---------

Co-authored-by: Michael Sproul <michael@sigmaprime.io>

* Port and clean up forwards iterator changes

* Add and polish hierarchy-config flag

* Merge remote-tracking branch 'origin/unstable' into tree-states-archive

* Cleaner errors

* Fix beacon_chain test compilation

* Merge remote-tracking branch 'origin/unstable' into tree-states-archive

* Patch a few more freezer block roots

* Fix genesis block root bug

* Fix test failing due to pending updates

* Beacon chain tests passing

* Merge remote-tracking branch 'origin/unstable' into tree-states-archive

* Merge remote-tracking branch 'origin/unstable' into tree-states-archive

* Fix doc lint

* Implement DB schema upgrade for hierarchical state diffs (#6193)

* DB upgrade

* Add flag

* Delete RestorePointHash

* Update docs

* Update docs

* Implement hierarchical state diffs config migration (#6245)

* Implement hierarchical state diffs config migration

* Review PR

* Remove TODO

* Set CURRENT_SCHEMA_VERSION correctly

* Fix genesis state loading

* Re-delete some PartialBeaconState stuff

---------

Co-authored-by: Michael Sproul <michael@sigmaprime.io>

* Merge remote-tracking branch 'origin/unstable' into tree-states-archive

* Fix test compilation

* Update schema downgrade test

* Fix tests

* Fix null anchor migration

* Merge remote-tracking branch 'origin/unstable' into tree-states-archive

* Fix tree states upgrade migration (#6328)

* Towards crash safety

* Fix compilation

* Move cold summaries and state roots to new columns

* Rename StateRoots chunked field

* Update prune states

* Clean hdiff CLI flag and metrics

* Fix "staged reconstruction"

* Merge remote-tracking branch 'origin/unstable' into tree-states-archive

* Fix alloy issues

* Fix staged reconstruction logic

* Prevent weird slot drift

* Remove "allow" flag

* Update CLI help

* Remove FIXME about downgrade

* Merge remote-tracking branch 'origin/unstable' into tree-states-archive

* Remove some unnecessary error variants

* Fix new test

* Tree states archive - review comments and metrics (#6386)

* Review PR comments and metrics

* Comments

* Add anchor metrics

* drop prev comment

* Update metadata.rs

* Apply suggestions from code review

---------

Co-authored-by: Michael Sproul <micsproul@gmail.com>

* Update beacon_node/store/src/hot_cold_store.rs

Co-authored-by: Lion - dapplion <35266934+dapplion@users.noreply.github.com>

* Merge remote-tracking branch 'origin/unstable' into tree-states-archive

* Clarify comment and remove anchor_slot garbage

* Simplify database anchor (#6397)

* Simplify database anchor

* Update beacon_node/store/src/reconstruct.rs

* Add migration for anchor

* Fix and simplify light_client store tests

* Fix incompatible config test

* Merge remote-tracking branch 'origin/unstable' into tree-states-archive

* Merge remote-tracking branch 'origin/unstable' into tree-states-archive

* More metrics

* Merge remote-tracking branch 'origin/unstable' into tree-states-archive

* New historic state cache (#6475)

* New historic state cache

* Add more metrics

* State cache hit rate metrics

* Fix store metrics

* More logs and metrics

* Fix logger

* Ensure cached states have built caches :O

* Replay blocks in preference to diffing

* Two separate caches

* Distribute cache build time to next slot

* Re-plumb historic-state-cache flag

* Clean up metrics

* Update book

* Update beacon_node/store/src/hdiff.rs

Co-authored-by: Lion - dapplion <35266934+dapplion@users.noreply.github.com>

* Update beacon_node/store/src/historic_state_cache.rs

Co-authored-by: Lion - dapplion <35266934+dapplion@users.noreply.github.com>

---------

Co-authored-by: Lion - dapplion <35266934+dapplion@users.noreply.github.com>

* Update database docs

* Update diagram

* Merge remote-tracking branch 'origin/unstable' into tree-states-archive

* Update lockbud to work with bindgen/etc

* Correct pkg name for Debian

* Remove vestigial epochs_per_state_diff

* Merge remote-tracking branch 'origin/unstable' into tree-states-archive

* Markdown lint

* Merge remote-tracking branch 'origin/unstable' into tree-states-archive

* Address Jimmy's review comments

* Simplify ReplayFrom case

* Fix and document genesis_state_root

* Typo

Co-authored-by: Jimmy Chen <jchen.tc@gmail.com>

* Merge branch 'unstable' into tree-states-archive

* Compute diff of validators list manually (#6556)

* Split hdiff computation

* Dedicated logic for historical roots and summaries

* Benchmark against real states

* Mutated source?

* Version the hdiff

* Add lighthouse DB config for hierarchy exponents

* Tidy up hierarchy exponents flag

* Apply suggestions from code review

Co-authored-by: Michael Sproul <micsproul@gmail.com>

* Address PR review

* Remove hardcoded paths in benchmarks

* Delete unused function in benches

* lint

---------

Co-authored-by: Michael Sproul <michael@sigmaprime.io>

* Test hdiff binary format stability (#6585)

* Merge remote-tracking branch 'origin/unstable' into tree-states-archive

* Add deprecation warning for SPRP

* Update xdelta to get rid of duplicate deps

* Document test
This commit is contained in:
Michael Sproul
2024-11-18 12:51:44 +11:00
committed by GitHub
parent 654fc6acdc
commit 9fdd53df56
57 changed files with 3360 additions and 1691 deletions

View File

@@ -1,14 +1,16 @@
//! Implementation of historic state reconstruction (given complete block history).
use crate::hot_cold_store::{HotColdDB, HotColdDBError};
use crate::metadata::ANCHOR_FOR_ARCHIVE_NODE;
use crate::metrics;
use crate::{Error, ItemStore};
use itertools::{process_results, Itertools};
use slog::info;
use slog::{debug, info};
use state_processing::{
per_block_processing, per_slot_processing, BlockSignatureStrategy, ConsensusContext,
VerifyBlockRoot,
};
use std::sync::Arc;
use types::{EthSpec, Hash256};
use types::EthSpec;
impl<E, Hot, Cold> HotColdDB<E, Hot, Cold>
where
@@ -16,11 +18,16 @@ where
Hot: ItemStore<E>,
Cold: ItemStore<E>,
{
pub fn reconstruct_historic_states(self: &Arc<Self>) -> Result<(), Error> {
let Some(mut anchor) = self.get_anchor_info() else {
// Nothing to do, history is complete.
pub fn reconstruct_historic_states(
self: &Arc<Self>,
num_blocks: Option<usize>,
) -> Result<(), Error> {
let mut anchor = self.get_anchor_info();
// Nothing to do, history is complete.
if anchor.all_historic_states_stored() {
return Ok(());
};
}
// Check that all historic blocks are known.
if anchor.oldest_block_slot != 0 {
@@ -29,37 +36,30 @@ where
});
}
info!(
debug!(
self.log,
"Beginning historic state reconstruction";
"Starting state reconstruction batch";
"start_slot" => anchor.state_lower_limit,
);
let slots_per_restore_point = self.config.slots_per_restore_point;
let _t = metrics::start_timer(&metrics::STORE_BEACON_RECONSTRUCTION_TIME);
// Iterate blocks from the state lower limit to the upper limit.
let lower_limit_slot = anchor.state_lower_limit;
let split = self.get_split_info();
let upper_limit_state = self.get_restore_point(
anchor.state_upper_limit.as_u64() / slots_per_restore_point,
&split,
)?;
let upper_limit_slot = upper_limit_state.slot();
let lower_limit_slot = anchor.state_lower_limit;
let upper_limit_slot = std::cmp::min(split.slot, anchor.state_upper_limit);
// Use a dummy root, as we never read the block for the upper limit state.
let upper_limit_block_root = Hash256::repeat_byte(0xff);
let block_root_iter = self.forwards_block_roots_iterator(
lower_limit_slot,
upper_limit_state,
upper_limit_block_root,
&self.spec,
)?;
// If `num_blocks` is not specified iterate all blocks. Add 1 so that we end on an epoch
// boundary when `num_blocks` is a multiple of an epoch boundary. We want to be *inclusive*
// of the state at slot `lower_limit_slot + num_blocks`.
let block_root_iter = self
.forwards_block_roots_iterator_until(lower_limit_slot, upper_limit_slot - 1, || {
Err(Error::StateShouldNotBeRequired(upper_limit_slot - 1))
})?
.take(num_blocks.map_or(usize::MAX, |n| n + 1));
// The state to be advanced.
let mut state = self
.load_cold_state_by_slot(lower_limit_slot)?
.ok_or(HotColdDBError::MissingLowerLimitState(lower_limit_slot))?;
let mut state = self.load_cold_state_by_slot(lower_limit_slot)?;
state.build_caches(&self.spec)?;
@@ -110,8 +110,19 @@ where
// Stage state for storage in freezer DB.
self.store_cold_state(&state_root, &state, &mut io_batch)?;
// If the slot lies on an epoch boundary, commit the batch and update the anchor.
if slot % slots_per_restore_point == 0 || slot + 1 == upper_limit_slot {
let batch_complete =
num_blocks.map_or(false, |n_blocks| slot == lower_limit_slot + n_blocks as u64);
let reconstruction_complete = slot + 1 == upper_limit_slot;
// Commit the I/O batch if:
//
// - The diff/snapshot for this slot is required for future slots, or
// - The reconstruction batch is complete (we are about to return), or
// - Reconstruction is complete.
if self.hierarchy.should_commit_immediately(slot)?
|| batch_complete
|| reconstruction_complete
{
info!(
self.log,
"State reconstruction in progress";
@@ -122,9 +133,9 @@ where
self.cold_db.do_atomically(std::mem::take(&mut io_batch))?;
// Update anchor.
let old_anchor = Some(anchor.clone());
let old_anchor = anchor.clone();
if slot + 1 == upper_limit_slot {
if reconstruction_complete {
// The two limits have met in the middle! We're done!
// Perform one last integrity check on the state reached.
let computed_state_root = state.update_tree_hash_cache()?;
@@ -136,23 +147,36 @@ where
});
}
self.compare_and_set_anchor_info_with_write(old_anchor, None)?;
self.compare_and_set_anchor_info_with_write(
old_anchor,
ANCHOR_FOR_ARCHIVE_NODE,
)?;
return Ok(());
} else {
// The lower limit has been raised, store it.
anchor.state_lower_limit = slot;
self.compare_and_set_anchor_info_with_write(
old_anchor,
Some(anchor.clone()),
)?;
self.compare_and_set_anchor_info_with_write(old_anchor, anchor.clone())?;
}
// If this is the end of the batch, return Ok. The caller will run another
// batch when there is idle capacity.
if batch_complete {
debug!(
self.log,
"Finished state reconstruction batch";
"start_slot" => lower_limit_slot,
"end_slot" => slot,
);
return Ok(());
}
}
}
// Should always reach the `upper_limit_slot` and return early above.
Err(Error::StateReconstructionDidNotComplete)
// Should always reach the `upper_limit_slot` or the end of the batch and return early
// above.
Err(Error::StateReconstructionLogicError)
})??;
// Check that the split point wasn't mutated during the state reconstruction process.