Hierarchical state diffs (#5978)

* Start extracting freezer changes for tree-states * Remove unused config args * Add comments * Remove unwraps * Subjective more clear implementation * Clean up hdiff * Update xdelta3 * Tree states archive metrics (#6040) * Add store cache size metrics * Add compress timer metrics * Add diff apply compute timer metrics * Add diff buffer cache hit metrics * Add hdiff buffer load times * Add blocks replayed metric * Move metrics to store * Future proof some metrics --------- Co-authored-by: Michael Sproul <michael@sigmaprime.io> * Port and clean up forwards iterator changes * Add and polish hierarchy-config flag * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * Cleaner errors * Fix beacon_chain test compilation * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * Patch a few more freezer block roots * Fix genesis block root bug * Fix test failing due to pending updates * Beacon chain tests passing * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * Fix doc lint * Implement DB schema upgrade for hierarchical state diffs (#6193) * DB upgrade * Add flag * Delete RestorePointHash * Update docs * Update docs * Implement hierarchical state diffs config migration (#6245) * Implement hierarchical state diffs config migration * Review PR * Remove TODO * Set CURRENT_SCHEMA_VERSION correctly * Fix genesis state loading * Re-delete some PartialBeaconState stuff --------- Co-authored-by: Michael Sproul <michael@sigmaprime.io> * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * Fix test compilation * Update schema downgrade test * Fix tests * Fix null anchor migration * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * Fix tree states upgrade migration (#6328) * Towards crash safety * Fix compilation * Move cold summaries and state roots to new columns * Rename StateRoots chunked field * Update prune states * Clean hdiff CLI flag and metrics * Fix "staged reconstruction" * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * Fix alloy issues * Fix staged reconstruction logic * Prevent weird slot drift * Remove "allow" flag * Update CLI help * Remove FIXME about downgrade * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * Remove some unnecessary error variants * Fix new test * Tree states archive - review comments and metrics (#6386) * Review PR comments and metrics * Comments * Add anchor metrics * drop prev comment * Update metadata.rs * Apply suggestions from code review --------- Co-authored-by: Michael Sproul <micsproul@gmail.com> * Update beacon_node/store/src/hot_cold_store.rs Co-authored-by: Lion - dapplion <35266934+dapplion@users.noreply.github.com> * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * Clarify comment and remove anchor_slot garbage * Simplify database anchor (#6397) * Simplify database anchor * Update beacon_node/store/src/reconstruct.rs * Add migration for anchor * Fix and simplify light_client store tests * Fix incompatible config test * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * More metrics * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * New historic state cache (#6475) * New historic state cache * Add more metrics * State cache hit rate metrics * Fix store metrics * More logs and metrics * Fix logger * Ensure cached states have built caches :O * Replay blocks in preference to diffing * Two separate caches * Distribute cache build time to next slot * Re-plumb historic-state-cache flag * Clean up metrics * Update book * Update beacon_node/store/src/hdiff.rs Co-authored-by: Lion - dapplion <35266934+dapplion@users.noreply.github.com> * Update beacon_node/store/src/historic_state_cache.rs Co-authored-by: Lion - dapplion <35266934+dapplion@users.noreply.github.com> --------- Co-authored-by: Lion - dapplion <35266934+dapplion@users.noreply.github.com> * Update database docs * Update diagram * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * Update lockbud to work with bindgen/etc * Correct pkg name for Debian * Remove vestigial epochs_per_state_diff * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * Markdown lint * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * Address Jimmy's review comments * Simplify ReplayFrom case * Fix and document genesis_state_root * Typo Co-authored-by: Jimmy Chen <jchen.tc@gmail.com> * Merge branch 'unstable' into tree-states-archive * Compute diff of validators list manually (#6556) * Split hdiff computation * Dedicated logic for historical roots and summaries * Benchmark against real states * Mutated source? * Version the hdiff * Add lighthouse DB config for hierarchy exponents * Tidy up hierarchy exponents flag * Apply suggestions from code review Co-authored-by: Michael Sproul <micsproul@gmail.com> * Address PR review * Remove hardcoded paths in benchmarks * Delete unused function in benches * lint --------- Co-authored-by: Michael Sproul <michael@sigmaprime.io> * Test hdiff binary format stability (#6585) * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * Add deprecation warning for SPRP * Update xdelta to get rid of duplicate deps * Document test
2026-05-30 20:57:10 +00:00 · 2024-11-18 12:51:44 +11:00
parent 654fc6acdc
commit 9fdd53df56
57 changed files with 3360 additions and 1691 deletions
--- a/beacon_node/store/src/reconstruct.rs
+++ b/beacon_node/store/src/reconstruct.rs
@@ -1,14 +1,16 @@
 //! Implementation of historic state reconstruction (given complete block history).
 use crate::hot_cold_store::{HotColdDB, HotColdDBError};
+use crate::metadata::ANCHOR_FOR_ARCHIVE_NODE;
+use crate::metrics;
 use crate::{Error, ItemStore};
 use itertools::{process_results, Itertools};
-use slog::info;
+use slog::{debug, info};
 use state_processing::{
    per_block_processing, per_slot_processing, BlockSignatureStrategy, ConsensusContext,
    VerifyBlockRoot,
 };
 use std::sync::Arc;
-use types::{EthSpec, Hash256};
+use types::EthSpec;

 impl<E, Hot, Cold> HotColdDB<E, Hot, Cold>
 where
@@ -16,11 +18,16 @@ where
    Hot: ItemStore<E>,
    Cold: ItemStore<E>,
 {
-    pub fn reconstruct_historic_states(self: &Arc<Self>) -> Result<(), Error> {
-        let Some(mut anchor) = self.get_anchor_info() else {
-            // Nothing to do, history is complete.
+    pub fn reconstruct_historic_states(
+        self: &Arc<Self>,
+        num_blocks: Option<usize>,
+    ) -> Result<(), Error> {
+        let mut anchor = self.get_anchor_info();
+
+        // Nothing to do, history is complete.
+        if anchor.all_historic_states_stored() {
            return Ok(());
-        };
+        }

        // Check that all historic blocks are known.
        if anchor.oldest_block_slot != 0 {
@@ -29,37 +36,30 @@ where
            });
        }

-        info!(
+        debug!(
            self.log,
-            "Beginning historic state reconstruction";
+            "Starting state reconstruction batch";
            "start_slot" => anchor.state_lower_limit,
        );

-        let slots_per_restore_point = self.config.slots_per_restore_point;
+        let _t = metrics::start_timer(&metrics::STORE_BEACON_RECONSTRUCTION_TIME);

        // Iterate blocks from the state lower limit to the upper limit.
-        let lower_limit_slot = anchor.state_lower_limit;
        let split = self.get_split_info();
-        let upper_limit_state = self.get_restore_point(
-            anchor.state_upper_limit.as_u64() / slots_per_restore_point,
-            &split,
-        )?;
-        let upper_limit_slot = upper_limit_state.slot();
+        let lower_limit_slot = anchor.state_lower_limit;
+        let upper_limit_slot = std::cmp::min(split.slot, anchor.state_upper_limit);

-        // Use a dummy root, as we never read the block for the upper limit state.
-        let upper_limit_block_root = Hash256::repeat_byte(0xff);
-
-        let block_root_iter = self.forwards_block_roots_iterator(
-            lower_limit_slot,
-            upper_limit_state,
-            upper_limit_block_root,
-            &self.spec,
-        )?;
+        // If `num_blocks` is not specified iterate all blocks. Add 1 so that we end on an epoch
+        // boundary when `num_blocks` is a multiple of an epoch boundary. We want to be *inclusive*
+        // of the state at slot `lower_limit_slot + num_blocks`.
+        let block_root_iter = self
+            .forwards_block_roots_iterator_until(lower_limit_slot, upper_limit_slot - 1, || {
+                Err(Error::StateShouldNotBeRequired(upper_limit_slot - 1))
+            })?
+            .take(num_blocks.map_or(usize::MAX, |n| n + 1));

        // The state to be advanced.
-        let mut state = self
-            .load_cold_state_by_slot(lower_limit_slot)?
-            .ok_or(HotColdDBError::MissingLowerLimitState(lower_limit_slot))?;
+        let mut state = self.load_cold_state_by_slot(lower_limit_slot)?;

        state.build_caches(&self.spec)?;

@@ -110,8 +110,19 @@ where
                // Stage state for storage in freezer DB.
                self.store_cold_state(&state_root, &state, &mut io_batch)?;

-                // If the slot lies on an epoch boundary, commit the batch and update the anchor.
-                if slot % slots_per_restore_point == 0 || slot + 1 == upper_limit_slot {
+                let batch_complete =
+                    num_blocks.map_or(false, |n_blocks| slot == lower_limit_slot + n_blocks as u64);
+                let reconstruction_complete = slot + 1 == upper_limit_slot;
+
+                // Commit the I/O batch if:
+                //
+                // - The diff/snapshot for this slot is required for future slots, or
+                // - The reconstruction batch is complete (we are about to return), or
+                // - Reconstruction is complete.
+                if self.hierarchy.should_commit_immediately(slot)?
+                    || batch_complete
+                    || reconstruction_complete
+                {
                    info!(
                        self.log,
                        "State reconstruction in progress";
@@ -122,9 +133,9 @@ where
                    self.cold_db.do_atomically(std::mem::take(&mut io_batch))?;

                    // Update anchor.
-                    let old_anchor = Some(anchor.clone());
+                    let old_anchor = anchor.clone();

-                    if slot + 1 == upper_limit_slot {
+                    if reconstruction_complete {
                        // The two limits have met in the middle! We're done!
                        // Perform one last integrity check on the state reached.
                        let computed_state_root = state.update_tree_hash_cache()?;
@@ -136,23 +147,36 @@ where
                            });
                        }

-                        self.compare_and_set_anchor_info_with_write(old_anchor, None)?;
+                        self.compare_and_set_anchor_info_with_write(
+                            old_anchor,
+                            ANCHOR_FOR_ARCHIVE_NODE,
+                        )?;

                        return Ok(());
                    } else {
                        // The lower limit has been raised, store it.
                        anchor.state_lower_limit = slot;

-                        self.compare_and_set_anchor_info_with_write(
-                            old_anchor,
-                            Some(anchor.clone()),
-                        )?;
+                        self.compare_and_set_anchor_info_with_write(old_anchor, anchor.clone())?;
+                    }
+
+                    // If this is the end of the batch, return Ok. The caller will run another
+                    // batch when there is idle capacity.
+                    if batch_complete {
+                        debug!(
+                            self.log,
+                            "Finished state reconstruction batch";
+                            "start_slot" => lower_limit_slot,
+                            "end_slot" => slot,
+                        );
+                        return Ok(());
                    }
                }
            }

-            // Should always reach the `upper_limit_slot` and return early above.
-            Err(Error::StateReconstructionDidNotComplete)
+            // Should always reach the `upper_limit_slot` or the end of the batch and return early
+            // above.
+            Err(Error::StateReconstructionLogicError)
        })??;

        // Check that the split point wasn't mutated during the state reconstruction process.