Hierarchical state diffs in hot DB (#6750)

This PR implements https://github.com/sigp/lighthouse/pull/5978 (tree-states) but on the hot DB. It allows Lighthouse to massively reduce its disk footprint during non-finality and overall I/O in all cases.

Closes https://github.com/sigp/lighthouse/issues/6580

Conga into https://github.com/sigp/lighthouse/pull/6744

### TODOs

- [x] Fix OOM in CI https://github.com/sigp/lighthouse/pull/7176
- [x] optimise store_hot_state to avoid storing a duplicate state if the summary already exists (should be safe from races now that pruning is cleaner)
- [x] mispelled: get_ancenstor_state_root
- [x] get_ancestor_state_root should use state summaries
- [x] Prevent split from changing during ancestor calc
- [x] Use same hierarchy for hot and cold

### TODO Good optimization for future PRs

- [ ] On the migration, if the latest hot snapshot is aligned with the cold snapshot migrate the diffs instead of the full states.
```
align slot  time
10485760    Nov-26-2024
12582912    Sep-14-2025
14680064    Jul-02-2026
```

### TODO Maybe things good to have

- [ ] Rename anchor_slot https://github.com/sigp/lighthouse/compare/tree-states-hot-rebase-oom...dapplion:lighthouse:tree-states-hot-anchor-slot-rename?expand=1
- [ ] Make anchor fields not public such that they must be mutated through a method. To prevent un-wanted changes of the anchor_slot

### NOTTODO

- [ ] Use fork-choice and a new method [`descendants_of_checkpoint`](ca2388e196 (diff-046fbdb517ca16b80e4464c2c824cf001a74a0a94ac0065e635768ac391062a8)) to filter only the state summaries that descend of finalized checkpoint]
This commit is contained in:
Lion - dapplion
2025-06-19 04:43:25 +02:00
committed by GitHub
parent 6786b9d12a
commit dd98534158
33 changed files with 2695 additions and 812 deletions

View File

@@ -90,7 +90,7 @@ use std::fmt::Debug;
use std::fs;
use std::io::Write;
use std::sync::Arc;
use store::{Error as DBError, HotStateSummary, KeyValueStore, StoreOp};
use store::{Error as DBError, KeyValueStore};
use strum::AsRefStr;
use task_executor::JoinHandle;
use tracing::{debug, error};
@@ -1467,28 +1467,19 @@ impl<T: BeaconChainTypes> ExecutionPendingBlock<T> {
// processing, but we get early access to it.
let state_root = state.update_tree_hash_cache()?;
// Store the state immediately.
let txn_lock = chain.store.hot_db.begin_rw_transaction();
// Store the state immediately. States are ONLY deleted on finalization pruning, so
// we won't have race conditions where we should have written a state and didn't.
let state_already_exists =
chain.store.load_hot_state_summary(&state_root)?.is_some();
let state_batch = if state_already_exists {
if state_already_exists {
// If the state exists, we do not need to re-write it.
vec![]
} else {
vec![if state.slot() % T::EthSpec::slots_per_epoch() == 0 {
StoreOp::PutState(state_root, &state)
} else {
StoreOp::PutStateSummary(
state_root,
HotStateSummary::new(&state_root, &state)?,
)
}]
// Recycle store codepath to create a state summary and store the state / diff
let mut ops = vec![];
chain.store.store_hot_state(&state_root, &state, &mut ops)?;
chain.store.hot_db.do_atomically(ops)?;
};
chain
.store
.do_atomically_with_block_and_blobs_cache(state_batch)?;
drop(txn_lock);
state_root
};