Implement database temp states to reduce memory usage (#1798)

## Issue Addressed Closes #800 Closes #1713 ## Proposed Changes Implement the temporary state storage algorithm described in #800. Specifically: * Add `DBColumn::BeaconStateTemporary`, for storing 0-length temporary marker values. * Store intermediate states immediately as they are created, marked temporary. Delete the temporary flag if the block is processed successfully. * Add a garbage collection process to delete leftover temporary states on start-up. * Bump the database schema version to 2 so that a DB with temporary states can't accidentally be used with older versions of the software. The auto-migration is a no-op, but puts in place some infra that we can use for future migrations (e.g. #1784) ## Additional Info There are two known race conditions, one potentially causing permanent faults (hopefully rare), and the other insignificant. ### Race 1: Permanent state marked temporary EDIT: this has been fixed by the addition of a lock around the relevant critical section There are 2 threads that are trying to store 2 different blocks that share some intermediate states (e.g. they both skip some slots from the current head). Consider this sequence of events: 1. Thread 1 checks if state `s` already exists, and seeing that it doesn't, prepares an atomic commit of `(s, s_temporary_flag)`. 2. Thread 2 does the same, but also gets as far as committing the state txn, finishing the processing of its block, and _deleting_ the temporary flag. 3. Thread 1 is (finally) scheduled again, and marks `s` as temporary with its transaction. 4. a) The process is killed, or thread 1's block fails verification and the temp flag is not deleted. This is a permanent failure! Any attempt to load state `s` will fail... hope it isn't on the main chain! Alternatively (4b) happens... b) Thread 1 finishes, and re-deletes the temporary flag. In this case the failure is transient, state `s` will disappear temporarily, but will come back once thread 1 finishes running. I _hope_ that steps 1-3 only happen very rarely, and 4a even more rarely. It's hard to know This once again begs the question of why we're using LevelDB (#483), when it clearly doesn't care about atomicity! A ham-fisted fix would be to wrap the hot and cold DBs in locks, which would bring us closer to how other DBs handle read-write transactions. E.g. [LMDB only allows one R/W transaction at a time](https://docs.rs/lmdb/0.8.0/lmdb/struct.Environment.html#method.begin_rw_txn). ### Race 2: Temporary state returned from `get_state` I don't think this race really matters, but in `load_hot_state`, if another thread stores a state between when we call `load_state_temporary_flag` and when we call `load_hot_state_summary`, then we could end up returning that state even though it's only a temporary state. I can't think of any case where this would be relevant, and I suspect if it did come up, it would be safe/recoverable (having data is safer than _not_ having data). This could be fixed by using a LevelDB read snapshot, but that would require substantial changes to how we read all our values, so I don't think it's worth it right now.
2026-06-18 03:08:28 +00:00 · 2020-10-23 01:27:51 +00:00
parent 66f0cf4430
commit acd49d988d
14 changed files with 343 additions and 96 deletions
--- a/beacon_node/beacon_chain/src/beacon_chain.rs
+++ b/beacon_node/beacon_chain/src/beacon_chain.rs
@@ -1498,7 +1498,7 @@ impl<T: BeaconChainTypes> BeaconChain<T> {
        let block_root = fully_verified_block.block_root;
        let mut state = fully_verified_block.state;
        let current_slot = self.slot()?;
-        let mut ops = fully_verified_block.intermediate_states;
+        let mut ops = fully_verified_block.confirmation_db_batch;

        let attestation_observation_timer =
            metrics::start_timer(&metrics::BLOCK_PROCESSING_ATTESTATION_OBSERVATION);
@@ -1623,13 +1623,16 @@ impl<T: BeaconChainTypes> BeaconChain<T> {

        let db_write_timer = metrics::start_timer(&metrics::BLOCK_PROCESSING_DB_WRITE);

-        // Store all the states between the parent block state and this block's slot, the block and state.
-        ops.push(StoreOp::PutBlock(block_root.into(), signed_block.clone()));
-        ops.push(StoreOp::PutState(
-            block.state_root.into(),
-            Cow::Borrowed(&state),
+        // Store the block and its state, and execute the confirmation batch for the intermediate
+        // states, which will delete their temporary flags.
+        ops.push(StoreOp::PutBlock(
+            block_root,
+            Box::new(signed_block.clone()),
        ));
+        ops.push(StoreOp::PutState(block.state_root, &state));
+        let txn_lock = self.store.hot_db.begin_rw_transaction();
        self.store.do_atomically(ops)?;
+        drop(txn_lock);

        // The fork choice write-lock is dropped *after* the on-disk database has been updated.
        // This prevents inconsistency between the two at the expense of concurrency.
--- a/beacon_node/beacon_chain/src/block_verification.rs
+++ b/beacon_node/beacon_chain/src/block_verification.rs
@@ -63,7 +63,7 @@ use std::borrow::Cow;
 use std::convert::TryFrom;
 use std::fs;
 use std::io::Write;
-use store::{Error as DBError, HotColdDB, HotStateSummary, StoreOp};
+use store::{Error as DBError, HotColdDB, HotStateSummary, KeyValueStore, StoreOp};
 use tree_hash::TreeHash;
 use types::{
    BeaconBlock, BeaconState, BeaconStateError, ChainSpec, CloneConfig, EthSpec, Hash256,
@@ -363,7 +363,7 @@ pub struct FullyVerifiedBlock<'a, T: BeaconChainTypes> {
    pub block_root: Hash256,
    pub state: BeaconState<T::EthSpec>,
    pub parent_block: SignedBeaconBlock<T::EthSpec>,
-    pub intermediate_states: Vec<StoreOp<'a, T::EthSpec>>,
+    pub confirmation_db_batch: Vec<StoreOp<'a, T::EthSpec>>,
 }

 /// Implemented on types that can be converted into a `FullyVerifiedBlock`.
@@ -676,9 +676,9 @@ impl<'a, T: BeaconChainTypes> FullyVerifiedBlock<'a, T> {

        let catchup_timer = metrics::start_timer(&metrics::BLOCK_PROCESSING_CATCHUP_STATE);

-        // Keep a batch of any states that were "skipped" (block-less) in between the parent state
-        // slot and the block slot. These will be stored in the database.
-        let mut intermediate_states: Vec<StoreOp<T::EthSpec>> = Vec::new();
+        // Stage a batch of operations to be completed atomically if this block is imported
+        // successfully.
+        let mut confirmation_db_batch = vec![];

        // The block must have a higher slot than its parent.
        if block.slot() <= parent.beacon_state.slot {
@@ -702,18 +702,36 @@ impl<'a, T: BeaconChainTypes> FullyVerifiedBlock<'a, T> {
                // processing, but we get early access to it.
                let state_root = state.update_tree_hash_cache()?;

-                let op = if state.slot % T::EthSpec::slots_per_epoch() == 0 {
-                    StoreOp::PutState(
-                        state_root.into(),
-                        Cow::Owned(state.clone_with(CloneConfig::committee_caches_only())),
-                    )
+                // Store the state immediately, marking it as temporary, and staging the deletion
+                // of its temporary status as part of the larger atomic operation.
+                let txn_lock = chain.store.hot_db.begin_rw_transaction();
+                let state_already_exists =
+                    chain.store.load_hot_state_summary(&state_root)?.is_some();
+
+                let state_batch = if state_already_exists {
+                    // If the state exists, it could be temporary or permanent, but in neither case
+                    // should we rewrite it or store a new temporary flag for it. We *will* stage
+                    // the temporary flag for deletion because it's OK to double-delete the flag,
+                    // and we don't mind if another thread gets there first.
+                    vec![]
                } else {
-                    StoreOp::PutStateSummary(
-                        state_root.into(),
-                        HotStateSummary::new(&state_root, &state)?,
-                    )
+                    vec![
+                        if state.slot % T::EthSpec::slots_per_epoch() == 0 {
+                            StoreOp::PutState(state_root, &state)
+                        } else {
+                            StoreOp::PutStateSummary(
+                                state_root,
+                                HotStateSummary::new(&state_root, &state)?,
+                            )
+                        },
+                        StoreOp::PutStateTemporaryFlag(state_root),
+                    ]
                };
-                intermediate_states.push(op);
+                chain.store.do_atomically(state_batch)?;
+                drop(txn_lock);
+
+                confirmation_db_batch.push(StoreOp::DeleteStateTemporaryFlag(state_root));
+
                state_root
            };

@@ -801,7 +819,7 @@ impl<'a, T: BeaconChainTypes> FullyVerifiedBlock<'a, T> {
            block_root,
            state,
            parent_block: parent.beacon_block,
-            intermediate_states,
+            confirmation_db_batch,
        })
    }
 }
--- a/beacon_node/beacon_chain/src/chain_config.rs
+++ b/beacon_node/beacon_chain/src/chain_config.rs
@@ -1,9 +1,6 @@
 use serde_derive::{Deserialize, Serialize};
 use types::Checkpoint;

-/// There is a 693 block skip in the current canonical Medalla chain, we use 700 to be safe.
-pub const DEFAULT_IMPORT_BLOCK_MAX_SKIP_SLOTS: u64 = 700;
-
 #[derive(Debug, PartialEq, Eq, Clone, Deserialize, Serialize)]
 pub struct ChainConfig {
    /// Maximum number of slots to skip when importing a consensus message (e.g., block,
@@ -20,7 +17,7 @@ pub struct ChainConfig {
 impl Default for ChainConfig {
    fn default() -> Self {
        Self {
-            import_max_skip_slots: Some(DEFAULT_IMPORT_BLOCK_MAX_SKIP_SLOTS),
+            import_max_skip_slots: None,
            weak_subjectivity_checkpoint: None,
        }
    }
--- a/beacon_node/beacon_chain/src/migrate.rs
+++ b/beacon_node/beacon_chain/src/migrate.rs
@@ -436,11 +436,12 @@ impl<E: EthSpec, Hot: ItemStore<E>, Cold: ItemStore<E>> BackgroundMigrator<E, Ho

        let batch: Vec<StoreOp<E>> = abandoned_blocks
            .into_iter()
+            .map(Into::into)
            .map(StoreOp::DeleteBlock)
            .chain(
                abandoned_states
                    .into_iter()
-                    .map(|(slot, state_hash)| StoreOp::DeleteState(state_hash, slot)),
+                    .map(|(slot, state_hash)| StoreOp::DeleteState(state_hash.into(), Some(slot))),
            )
            .collect();

--- a/beacon_node/beacon_chain/src/test_utils.rs
+++ b/beacon_node/beacon_chain/src/test_utils.rs
@@ -730,7 +730,7 @@ where
        }
    }

-    fn set_current_slot(&self, slot: Slot) {
+    pub fn set_current_slot(&self, slot: Slot) {
        let current_slot = self.chain.slot().unwrap();
        let current_epoch = current_slot.epoch(E::slots_per_epoch());
        let epoch = slot.epoch(E::slots_per_epoch());