Commit Graph

9 Commits

Author SHA1 Message Date
chonghe
522bd9e9c6 Update Rust Edition to 2024 (#7766)
* #7749

Thanks @dknopik and @michaelsproul for your help!
2025-08-13 03:04:31 +00:00
Michael Sproul
918121e313 Fix bugs in rebasing of states prior to finalization (#7849)
Attempt to fix this error reported by `beaconcha.in` on their Hoodi archive nodes:

> {"code":500,"message":"UNHANDLED_ERROR: DBError(CacheBuildError(BeaconState(MilhouseError(OutOfBoundsIterFrom { index: 1199549, len: 1060000 }))))","stacktraces":[]}


  There are only a handful of places where we call `iter_from`.

This one is safe by construction (the check immediately prior ensures `self.pubkeys.len()` is not out of bounds):

cfb1f73310/beacon_node/beacon_chain/src/validator_pubkey_cache.rs (L84-L90)

This one should also be safe, and the indexes used here would not be as large as the ones in the reported error:

cfb1f73310/consensus/state_processing/src/per_epoch_processing/single_pass.rs (L365-L368)

Which leaves one remaining usage which must be the culprit:

cfb1f73310/consensus/types/src/beacon_state.rs (L2109-L2113)

This indexing relies on the invariant that `self.pubkey_cache().len() <= self.validators.len()`. We mostly maintain that invariant, except for in `rebase_caches_on` (fixed in this PR).

The other bug, is that we were calling `rebase_on_finalized` for all "hot" states, which post-v7.1.0 includes states prior to the split which are required by the hdiff grid. This is how we end up calling something like `genesis_state.rebase_on(&split_state)`, which then corrupts the pubkey cache of the genesis state using the newer pubkey cache from the split state.
2025-08-12 02:19:24 +00:00
Jimmy Chen
40c2fd5ff4 Instrument tracing spans for block processing and import (#7816)
#7815

- removes all existing spans, so some span fields that appear in logs like `service_name` may be lost.
- instruments a few key code paths in the beacon node, starting from **root spans** named below:

* Gossip block and blobs
* `process_gossip_data_column_sidecar`
* `process_gossip_blob`
* `process_gossip_block`
* Rpc block and blobs
* `process_rpc_block`
* `process_rpc_blobs`
* `process_rpc_custody_columns`
* Rpc blocks (range and backfill)
* `process_chain_segment`
* `PendingComponents` lifecycle
* `pending_components`

To test locally:
* Run Grafana and Tempo with https://github.com/sigp/lighthouse-metrics/pull/57
* Run Lighthouse BN with `--telemetry-collector-url http://localhost:4317`

Some captured traces can be found here: https://hackmd.io/@jimmygchen/r1sLOxPPeg

Removing the old spans seem to have reduced the memory usage quite a lot - i think we were using them on long running tasks and too excessively:
<img width="910" height="495" alt="image" src="https://github.com/user-attachments/assets/5208bbe4-53b2-4ead-bc71-0b782c788669" />
2025-08-08 05:32:22 +00:00
Michael Sproul
7b2f138ca7 Merge remote-tracking branch 'origin/stable' into release-v7.1.0 2025-07-09 11:19:16 +10:00
Lion - dapplion
dd98534158 Hierarchical state diffs in hot DB (#6750)
This PR implements https://github.com/sigp/lighthouse/pull/5978 (tree-states) but on the hot DB. It allows Lighthouse to massively reduce its disk footprint during non-finality and overall I/O in all cases.

Closes https://github.com/sigp/lighthouse/issues/6580

Conga into https://github.com/sigp/lighthouse/pull/6744

### TODOs

- [x] Fix OOM in CI https://github.com/sigp/lighthouse/pull/7176
- [x] optimise store_hot_state to avoid storing a duplicate state if the summary already exists (should be safe from races now that pruning is cleaner)
- [x] mispelled: get_ancenstor_state_root
- [x] get_ancestor_state_root should use state summaries
- [x] Prevent split from changing during ancestor calc
- [x] Use same hierarchy for hot and cold

### TODO Good optimization for future PRs

- [ ] On the migration, if the latest hot snapshot is aligned with the cold snapshot migrate the diffs instead of the full states.
```
align slot  time
10485760    Nov-26-2024
12582912    Sep-14-2025
14680064    Jul-02-2026
```

### TODO Maybe things good to have

- [ ] Rename anchor_slot https://github.com/sigp/lighthouse/compare/tree-states-hot-rebase-oom...dapplion:lighthouse:tree-states-hot-anchor-slot-rename?expand=1
- [ ] Make anchor fields not public such that they must be mutated through a method. To prevent un-wanted changes of the anchor_slot

### NOTTODO

- [ ] Use fork-choice and a new method [`descendants_of_checkpoint`](ca2388e196 (diff-046fbdb517ca16b80e4464c2c824cf001a74a0a94ac0065e635768ac391062a8)) to filter only the state summaries that descend of finalized checkpoint]
2025-06-19 02:43:25 +00:00
Michael Sproul
6c8770e80d Change default state cache size back to 128 (#7364)
Closes:

- https://github.com/sigp/lighthouse/issues/7363


  - Change default state cache size back to 128.
- Make state pruning properly LRU rather than MSU after skipping the cull-exempt states.
2025-04-29 01:43:25 +00:00
Michael Sproul
4de062626b State cache tweaks (#7095)
Backport of:

- https://github.com/sigp/lighthouse/pull/7067

For:

- https://github.com/sigp/lighthouse/issues/7039


  - Prevent writing to state cache when migrating the database
- Add `state-cache-headroom` flag to control pruning
- Prune old epoch boundary states ahead of mid-epoch states
- Never prune head block's state
- Avoid caching ancestor states unless they are on an epoch boundary
- Log when states enter/exit the cache

Co-authored-by: Eitan Seri-Levi <eserilev@ucsc.edu>
2025-03-18 02:10:21 +00:00
Pawan Dhananjay
1f6850fae2 Rust 1.84 lints (#6781)
* Fix few lints

* Fix remaining lints

* Use fully qualified syntax
2025-01-10 01:13:29 +00:00
Michael Sproul
61962898e2 In-memory tree states (#5533)
* Consensus changes

* EF tests

* lcli

* common and watch

* account manager

* cargo

* fork choice

* promise cache

* beacon chain

* interop genesis

* http api

* lighthouse

* op pool

* beacon chain misc

* parallel state cache

* store

* fix issues in store

* IT COMPILES

* Remove some unnecessary module qualification

* Revert Arced pubkey optimization (#5536)

* Merge remote-tracking branch 'origin/unstable' into tree-states-memory

* Fix caching, rebasing and some tests

* Remove unused deps

* Merge remote-tracking branch 'origin/unstable' into tree-states-memory

* Small cleanups

* Revert shuffling cache/promise cache changes

* Fix state advance bugs

* Fix shuffling tests

* Remove some resolved FIXMEs

* Remove StateProcessingStrategy

* Optimise withdrawals calculation

* Don't reorg if state cache is missed

* Remove inconsistent state func

* Fix beta compiler

* Rebase early, rebase often

* Fix state caching behaviour

* Update to milhouse release

* Fix on-disk consensus context format

* Merge remote-tracking branch 'origin/unstable' into tree-states-memory

* Squashed commit of the following:

commit 3a16649023
Author: Michael Sproul <michael@sigmaprime.io>
Date:   Thu Apr 18 14:26:09 2024 +1000

    Fix on-disk consensus context format

* Keep indexed attestations, thanks Sean

* Merge branch 'on-disk-consensus-context' into tree-states-memory

* Merge branch 'unstable' into tree-states-memory

* Address half of Sean's review

* More simplifications from Sean's review

* Cache state after get_advanced_hot_state
2024-04-24 01:22:36 +00:00