Hierarchical state diffs in hot DB (#6750)

This PR implements https://github.com/sigp/lighthouse/pull/5978 (tree-states) but on the hot DB. It allows Lighthouse to massively reduce its disk footprint during non-finality and overall I/O in all cases.

Closes https://github.com/sigp/lighthouse/issues/6580

Conga into https://github.com/sigp/lighthouse/pull/6744

### TODOs

- [x] Fix OOM in CI https://github.com/sigp/lighthouse/pull/7176
- [x] optimise store_hot_state to avoid storing a duplicate state if the summary already exists (should be safe from races now that pruning is cleaner)
- [x] mispelled: get_ancenstor_state_root
- [x] get_ancestor_state_root should use state summaries
- [x] Prevent split from changing during ancestor calc
- [x] Use same hierarchy for hot and cold

### TODO Good optimization for future PRs

- [ ] On the migration, if the latest hot snapshot is aligned with the cold snapshot migrate the diffs instead of the full states.
```
align slot  time
10485760    Nov-26-2024
12582912    Sep-14-2025
14680064    Jul-02-2026
```

### TODO Maybe things good to have

- [ ] Rename anchor_slot https://github.com/sigp/lighthouse/compare/tree-states-hot-rebase-oom...dapplion:lighthouse:tree-states-hot-anchor-slot-rename?expand=1
- [ ] Make anchor fields not public such that they must be mutated through a method. To prevent un-wanted changes of the anchor_slot

### NOTTODO

- [ ] Use fork-choice and a new method [`descendants_of_checkpoint`](ca2388e196 (diff-046fbdb517ca16b80e4464c2c824cf001a74a0a94ac0065e635768ac391062a8)) to filter only the state summaries that descend of finalized checkpoint]
This commit is contained in:
Lion - dapplion
2025-06-19 04:43:25 +02:00
committed by GitHub
parent 6786b9d12a
commit dd98534158
33 changed files with 2695 additions and 812 deletions

View File

@@ -818,14 +818,26 @@ pub fn cli_app() -> Command {
Arg::new("hdiff-buffer-cache-size")
.long("hdiff-buffer-cache-size")
.value_name("SIZE")
.help("Number of hierarchical diff (hdiff) buffers to cache in memory. Each buffer \
is around the size of a BeaconState so you should be cautious about setting \
this value too high. This flag is irrelevant for most nodes, which run with \
state pruning enabled.")
.help("Number of cold hierarchical diff (hdiff) buffers to cache in memory. Each \
buffer is around the size of a BeaconState so you should be cautious about \
setting this value too high. This flag is irrelevant for most nodes, which \
run with state pruning enabled.")
.default_value("16")
.action(ArgAction::Set)
.display_order(0)
)
.arg(
Arg::new("hot-hdiff-buffer-cache-size")
.long("hot-hdiff-buffer-cache-size")
.value_name("SIZE")
.help("Number of hot hierarchical diff (hdiff) buffers to cache in memory. Each \
buffer is around the size of a BeaconState so you should be cautious about \
setting this value too high. Setting this value higher can reduce the time \
taken to store new states on disk at the cost of higher memory usage.")
.default_value("1")
.action(ArgAction::Set)
.display_order(0)
)
.arg(
Arg::new("state-cache-size")
.long("state-cache-size")
@@ -1655,7 +1667,7 @@ pub fn cli_app() -> Command {
.arg(
Arg::new("delay-data-column-publishing")
.long("delay-data-column-publishing")
.value_name("SECONDS")
.value_name("SECONDS")
.action(ArgAction::Set)
.help_heading(FLAG_HEADER)
.help("TESTING ONLY: Artificially delay data column publishing by the specified number of seconds. \

View File

@@ -418,7 +418,13 @@ pub fn get_config<E: EthSpec>(
if let Some(hdiff_buffer_cache_size) =
clap_utils::parse_optional(cli_args, "hdiff-buffer-cache-size")?
{
client_config.store.hdiff_buffer_cache_size = hdiff_buffer_cache_size;
client_config.store.cold_hdiff_buffer_cache_size = hdiff_buffer_cache_size;
}
if let Some(hdiff_buffer_cache_size) =
clap_utils::parse_optional(cli_args, "hot-hdiff-buffer-cache-size")?
{
client_config.store.hot_hdiff_buffer_cache_size = hdiff_buffer_cache_size;
}
client_config.store.compact_on_init = cli_args.get_flag("compact-db");