Hierarchical state diffs (#5978)

* Start extracting freezer changes for tree-states

* Remove unused config args

* Add comments

* Remove unwraps

* Subjective more clear implementation

* Clean up hdiff

* Update xdelta3

* Tree states archive metrics (#6040)

* Add store cache size metrics

* Add compress timer metrics

* Add diff apply compute timer metrics

* Add diff buffer cache hit metrics

* Add hdiff buffer load times

* Add blocks replayed metric

* Move metrics to store

* Future proof some metrics

---------

Co-authored-by: Michael Sproul <michael@sigmaprime.io>

* Port and clean up forwards iterator changes

* Add and polish hierarchy-config flag

* Merge remote-tracking branch 'origin/unstable' into tree-states-archive

* Cleaner errors

* Fix beacon_chain test compilation

* Merge remote-tracking branch 'origin/unstable' into tree-states-archive

* Patch a few more freezer block roots

* Fix genesis block root bug

* Fix test failing due to pending updates

* Beacon chain tests passing

* Merge remote-tracking branch 'origin/unstable' into tree-states-archive

* Merge remote-tracking branch 'origin/unstable' into tree-states-archive

* Fix doc lint

* Implement DB schema upgrade for hierarchical state diffs (#6193)

* DB upgrade

* Add flag

* Delete RestorePointHash

* Update docs

* Update docs

* Implement hierarchical state diffs config migration (#6245)

* Implement hierarchical state diffs config migration

* Review PR

* Remove TODO

* Set CURRENT_SCHEMA_VERSION correctly

* Fix genesis state loading

* Re-delete some PartialBeaconState stuff

---------

Co-authored-by: Michael Sproul <michael@sigmaprime.io>

* Merge remote-tracking branch 'origin/unstable' into tree-states-archive

* Fix test compilation

* Update schema downgrade test

* Fix tests

* Fix null anchor migration

* Merge remote-tracking branch 'origin/unstable' into tree-states-archive

* Fix tree states upgrade migration (#6328)

* Towards crash safety

* Fix compilation

* Move cold summaries and state roots to new columns

* Rename StateRoots chunked field

* Update prune states

* Clean hdiff CLI flag and metrics

* Fix "staged reconstruction"

* Merge remote-tracking branch 'origin/unstable' into tree-states-archive

* Fix alloy issues

* Fix staged reconstruction logic

* Prevent weird slot drift

* Remove "allow" flag

* Update CLI help

* Remove FIXME about downgrade

* Merge remote-tracking branch 'origin/unstable' into tree-states-archive

* Remove some unnecessary error variants

* Fix new test

* Tree states archive - review comments and metrics (#6386)

* Review PR comments and metrics

* Comments

* Add anchor metrics

* drop prev comment

* Update metadata.rs

* Apply suggestions from code review

---------

Co-authored-by: Michael Sproul <micsproul@gmail.com>

* Update beacon_node/store/src/hot_cold_store.rs

Co-authored-by: Lion - dapplion <35266934+dapplion@users.noreply.github.com>

* Merge remote-tracking branch 'origin/unstable' into tree-states-archive

* Clarify comment and remove anchor_slot garbage

* Simplify database anchor (#6397)

* Simplify database anchor

* Update beacon_node/store/src/reconstruct.rs

* Add migration for anchor

* Fix and simplify light_client store tests

* Fix incompatible config test

* Merge remote-tracking branch 'origin/unstable' into tree-states-archive

* Merge remote-tracking branch 'origin/unstable' into tree-states-archive

* More metrics

* Merge remote-tracking branch 'origin/unstable' into tree-states-archive

* New historic state cache (#6475)

* New historic state cache

* Add more metrics

* State cache hit rate metrics

* Fix store metrics

* More logs and metrics

* Fix logger

* Ensure cached states have built caches :O

* Replay blocks in preference to diffing

* Two separate caches

* Distribute cache build time to next slot

* Re-plumb historic-state-cache flag

* Clean up metrics

* Update book

* Update beacon_node/store/src/hdiff.rs

Co-authored-by: Lion - dapplion <35266934+dapplion@users.noreply.github.com>

* Update beacon_node/store/src/historic_state_cache.rs

Co-authored-by: Lion - dapplion <35266934+dapplion@users.noreply.github.com>

---------

Co-authored-by: Lion - dapplion <35266934+dapplion@users.noreply.github.com>

* Update database docs

* Update diagram

* Merge remote-tracking branch 'origin/unstable' into tree-states-archive

* Update lockbud to work with bindgen/etc

* Correct pkg name for Debian

* Remove vestigial epochs_per_state_diff

* Merge remote-tracking branch 'origin/unstable' into tree-states-archive

* Markdown lint

* Merge remote-tracking branch 'origin/unstable' into tree-states-archive

* Address Jimmy's review comments

* Simplify ReplayFrom case

* Fix and document genesis_state_root

* Typo

Co-authored-by: Jimmy Chen <jchen.tc@gmail.com>

* Merge branch 'unstable' into tree-states-archive

* Compute diff of validators list manually (#6556)

* Split hdiff computation

* Dedicated logic for historical roots and summaries

* Benchmark against real states

* Mutated source?

* Version the hdiff

* Add lighthouse DB config for hierarchy exponents

* Tidy up hierarchy exponents flag

* Apply suggestions from code review

Co-authored-by: Michael Sproul <micsproul@gmail.com>

* Address PR review

* Remove hardcoded paths in benchmarks

* Delete unused function in benches

* lint

---------

Co-authored-by: Michael Sproul <michael@sigmaprime.io>

* Test hdiff binary format stability (#6585)

* Merge remote-tracking branch 'origin/unstable' into tree-states-archive

* Add deprecation warning for SPRP

* Update xdelta to get rid of duplicate deps

* Document test
This commit is contained in:
Michael Sproul
2024-11-18 12:51:44 +11:00
committed by GitHub
parent 654fc6acdc
commit 9fdd53df56
57 changed files with 3360 additions and 1691 deletions

View File

@@ -3,6 +3,7 @@ use clap_utils::get_color_style;
use clap_utils::FLAG_HEADER;
use serde::{Deserialize, Serialize};
use std::path::PathBuf;
use store::hdiff::HierarchyConfig;
use crate::InspectTarget;
@@ -21,13 +22,14 @@ use crate::InspectTarget;
pub struct DatabaseManager {
#[clap(
long,
value_name = "SLOT_COUNT",
help = "Specifies how often a freezer DB restore point should be stored. \
Cannot be changed after initialization. \
[default: 2048 (mainnet) or 64 (minimal)]",
global = true,
value_name = "N0,N1,N2,...",
help = "Specifies the frequency for storing full state snapshots and hierarchical \
diffs in the freezer DB.",
default_value_t = HierarchyConfig::default(),
display_order = 0
)]
pub slots_per_restore_point: Option<u64>,
pub hierarchy_exponents: HierarchyConfig,
#[clap(
long,

View File

@@ -6,7 +6,7 @@ use beacon_chain::{
builder::Witness, eth1_chain::CachingEth1Backend, schema_change::migrate_schema,
slot_clock::SystemTimeSlotClock,
};
use beacon_node::{get_data_dir, get_slots_per_restore_point, ClientConfig};
use beacon_node::{get_data_dir, ClientConfig};
use clap::ArgMatches;
use clap::ValueEnum;
use cli::{Compact, Inspect};
@@ -16,7 +16,6 @@ use slog::{info, warn, Logger};
use std::fs;
use std::io::Write;
use std::path::PathBuf;
use store::metadata::STATE_UPPER_LIMIT_NO_RETAIN;
use store::{
errors::Error,
metadata::{SchemaVersion, CURRENT_SCHEMA_VERSION},
@@ -39,13 +38,8 @@ fn parse_client_config<E: EthSpec>(
client_config
.blobs_db_path
.clone_from(&database_manager_config.blobs_dir);
let (sprp, sprp_explicit) =
get_slots_per_restore_point::<E>(database_manager_config.slots_per_restore_point)?;
client_config.store.slots_per_restore_point = sprp;
client_config.store.slots_per_restore_point_set_explicitly = sprp_explicit;
client_config.store.blob_prune_margin_epochs = database_manager_config.blob_prune_margin_epochs;
client_config.store.hierarchy_config = database_manager_config.hierarchy_exponents.clone();
Ok(client_config)
}
@@ -298,6 +292,7 @@ fn parse_migrate_config(migrate_config: &Migrate) -> Result<MigrateConfig, Strin
pub fn migrate_db<E: EthSpec>(
migrate_config: MigrateConfig,
client_config: ClientConfig,
mut genesis_state: BeaconState<E>,
runtime_context: &RuntimeContext<E>,
log: Logger,
) -> Result<(), Error> {
@@ -328,13 +323,13 @@ pub fn migrate_db<E: EthSpec>(
"to" => to.as_u64(),
);
let genesis_state_root = genesis_state.canonical_root()?;
migrate_schema::<Witness<SystemTimeSlotClock, CachingEth1Backend<E>, _, _, _>>(
db,
client_config.eth1.deposit_contract_deploy_block,
Some(genesis_state_root),
from,
to,
log,
&spec,
)
}
@@ -426,8 +421,7 @@ pub fn prune_states<E: EthSpec>(
// correct network, and that we don't end up storing the wrong genesis state.
let genesis_from_db = db
.load_cold_state_by_slot(Slot::new(0))
.map_err(|e| format!("Error reading genesis state: {e:?}"))?
.ok_or("Error: genesis state missing from database. Check schema version.")?;
.map_err(|e| format!("Error reading genesis state: {e:?}"))?;
if genesis_from_db.genesis_validators_root() != genesis_state.genesis_validators_root() {
return Err(format!(
@@ -438,18 +432,12 @@ pub fn prune_states<E: EthSpec>(
// Check that the user has confirmed they want to proceed.
if !prune_config.confirm {
match db.get_anchor_info() {
Some(anchor_info)
if anchor_info.state_lower_limit == 0
&& anchor_info.state_upper_limit == STATE_UPPER_LIMIT_NO_RETAIN =>
{
info!(log, "States have already been pruned");
return Ok(());
}
_ => {
info!(log, "Ready to prune states");
}
if db.get_anchor_info().full_state_pruning_enabled() {
info!(log, "States have already been pruned");
return Ok(());
}
info!(log, "Ready to prune states");
warn!(
log,
"Pruning states is irreversible";
@@ -484,10 +472,33 @@ pub fn run<E: EthSpec>(
let log = context.log().clone();
let format_err = |e| format!("Fatal error: {:?}", e);
let get_genesis_state = || {
let executor = env.core_context().executor;
let network_config = context
.eth2_network_config
.clone()
.ok_or("Missing network config")?;
executor
.block_on_dangerous(
network_config.genesis_state::<E>(
client_config.genesis_state_url.as_deref(),
client_config.genesis_state_url_timeout,
&log,
),
"get_genesis_state",
)
.ok_or("Shutting down")?
.map_err(|e| format!("Error getting genesis state: {e}"))?
.ok_or("Genesis state missing".to_string())
};
match &db_manager_config.subcommand {
cli::DatabaseManagerSubcommand::Migrate(migrate_config) => {
let migrate_config = parse_migrate_config(migrate_config)?;
migrate_db(migrate_config, client_config, &context, log).map_err(format_err)
let genesis_state = get_genesis_state()?;
migrate_db(migrate_config, client_config, genesis_state, &context, log)
.map_err(format_err)
}
cli::DatabaseManagerSubcommand::Inspect(inspect_config) => {
let inspect_config = parse_inspect_config(inspect_config)?;
@@ -503,27 +514,8 @@ pub fn run<E: EthSpec>(
prune_blobs(client_config, &context, log).map_err(format_err)
}
cli::DatabaseManagerSubcommand::PruneStates(prune_states_config) => {
let executor = env.core_context().executor;
let network_config = context
.eth2_network_config
.clone()
.ok_or("Missing network config")?;
let genesis_state = executor
.block_on_dangerous(
network_config.genesis_state::<E>(
client_config.genesis_state_url.as_deref(),
client_config.genesis_state_url_timeout,
&log,
),
"get_genesis_state",
)
.ok_or("Shutting down")?
.map_err(|e| format!("Error getting genesis state: {e}"))?
.ok_or("Genesis state missing")?;
let prune_config = parse_prune_states_config(prune_states_config)?;
let genesis_state = get_genesis_state()?;
prune_states(client_config, prune_config, genesis_state, &context, log)
}
cli::DatabaseManagerSubcommand::Compact(compact_config) => {