Commit Graph

412 Commits

Author SHA1 Message Date
Eitan Seri- Levi
c48594fe18 YAY 2026-02-26 18:19:19 -08:00
Michael Sproul
e44f37895d Simplify diff strat and expand tests (they mostly pass!) 2026-02-26 17:15:32 +11:00
Michael Sproul
984f0d70e0 Make state cache payload status aware 2026-02-25 13:21:48 +11:00
Michael Sproul
5f3faced1a Small fixes for the genesis state 2026-02-25 10:15:31 +11:00
Jimmy Chen
e59f1f03ef Add debug spans to DB write paths (#8895)
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
2026-02-24 20:53:33 +00:00
Michael Sproul
28eb5adf0a Update HotStateSummary construction 2026-02-24 18:16:53 +11:00
Michael Sproul
e2b3971cbd Add StatePayloadStatus to storage_strategy 2026-02-24 17:48:28 +11:00
Michael Sproul
886d31fe7e Delete dysfunctional fork_revert feature (#8891)
I found myself having to update this code for Gloas, and figured we may as well delete it seeing as it doesn't work.

See:

- https://github.com/sigp/lighthouse/issues/4198


  Delete all `fork_revert` logic and the accompanying test.


Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
2026-02-24 06:27:16 +00:00
Michael Sproul
99e6ad5ca3 Merge remote-tracking branch 'michael/delete-fork-revert' into gloas-replay-blocks 2026-02-24 16:51:05 +11:00
Michael Sproul
b29c6c0e48 Address review comments 2026-02-24 16:45:41 +11:00
Michael Sproul
295aaf982c Thread more payload status 2026-02-24 15:33:43 +11:00
Michael Sproul
b3d2e85e55 Avoid Result::flatten (would require MSRV bump) 2026-02-23 17:28:46 +11:00
Michael Sproul
a2e0068b85 Payloads for cold blocks 2026-02-23 16:09:10 +11:00
Michael Sproul
afc6fb137c Connect up DB replay_blocks/load_blocks 2026-02-23 15:43:19 +11:00
Michael Sproul
a959c5f640 Add payload support to BlockReplayer 2026-02-23 12:55:50 +11:00
Michael Sproul
48a2b2802d Delete OnDiskConsensusContext (#8824)
Remove the `OnDiskConsensusContext` type which is no longer used at all after the merge of:

- https://github.com/sigp/lighthouse/pull/8724

This type was not necessary since the merge of Lion's change which removed the on-disk storage:

- https://github.com/sigp/lighthouse/pull/5891


Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
2026-02-16 02:49:46 +00:00
Eitan Seri-Levi
ed7354d460 Payload envelope db operations (#8717)
Adds support for payload envelopes in the db. This is the minimum we'll need to store and fetch payloads.


  


Co-Authored-By: Eitan Seri- Levi <eserilev@gmail.com>
2026-02-03 05:46:10 +00:00
Eitan Seri-Levi
9bec8df37a Add Gloas data column support (#8682)
Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>

Co-Authored-By: Eitan Seri- Levi <eserilev@gmail.com>
2026-01-28 04:52:12 +00:00
Mac L
58b153cac5 Remove remaining facade module re-exports from consensus/types (#8672)
Removes the remaining facade re-exports from `consensus/types`.
I have left `graffiti` as I think it has some utility so am leaning towards keeping it in the final API design.


Co-Authored-By: Mac L <mjladson@pm.me>
2026-01-16 19:51:29 +00:00
Mac L
3903e1c67f More consensus/types re-export cleanup (#8665)
Remove more of the temporary re-exports from `consensus/types`


Co-Authored-By: Mac L <mjladson@pm.me>
2026-01-16 04:43:05 +00:00
Michael Sproul
4c268bc0d5 Delete PartialBeaconState (#8591)
While reviewing Gloas I noticed we were updating `PartialBeaconState`. This code isn't used since v7.1.0 introduced hdiffs, so we can delete it and stop maintaining it 🎉

Similarly the `chunked_vector`/`chunked_iter` code can also go!


Co-Authored-By: Michael Sproul <michael@sigmaprime.io>

Co-Authored-By: Pawan Dhananjay <pawandhananjay@gmail.com>
2025-12-16 09:02:31 +00:00
ethDreamer
a39e991557 Gloas(EIP-7732): Containers / Constants (#7923)
* #7850

This is the first round of the conga line! 🎉

Just spec constants and container changes so far.


  


Co-Authored-By: shane-moore <skm1790@gmail.com>

Co-Authored-By: Mark Mackey <mark@sigmaprime.io>

Co-Authored-By: Shane K Moore <41407272+shane-moore@users.noreply.github.com>

Co-Authored-By: Eitan Seri- Levi <eserilev@gmail.com>

Co-Authored-By: ethDreamer <37123614+ethDreamer@users.noreply.github.com>

Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>

Co-Authored-By: Jimmy Chen <jimmy@sigmaprime.io>

Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
2025-12-16 06:45:45 +00:00
Mac L
6a3a32515f Update strum to 0.27 (#8564)
#8547


  Update our `strum` dependency to `0.27`. This unifies our strum dependencies and removes our duplication of `strum` (and by extension, `strum_macros`).


Co-Authored-By: Mac L <mjladson@pm.me>

Co-Authored-By: Michael Sproul <michaelsproul@users.noreply.github.com>
2025-12-15 03:20:10 +00:00
Mac L
f3fd1f210b Remove consensus/types re-exports (#8540)
There are certain crates which we re-export within `types` which creates a fragmented DevEx, where there are various ways to import the same crates.

```rust
// consensus/types/src/lib.rs
pub use bls::{
AggregatePublicKey, AggregateSignature, Error as BlsError, Keypair, PUBLIC_KEY_BYTES_LEN,
PublicKey, PublicKeyBytes, SIGNATURE_BYTES_LEN, SecretKey, Signature, SignatureBytes,
get_withdrawal_credentials,
};
pub use context_deserialize::{ContextDeserialize, context_deserialize};
pub use fixed_bytes::FixedBytesExtended;
pub use milhouse::{self, List, Vector};
pub use ssz_types::{BitList, BitVector, FixedVector, VariableList, typenum, typenum::Unsigned};
pub use superstruct::superstruct;
```

This PR removes these re-exports and makes it explicit that these types are imported from a non-`consensus/types` crate.


Co-Authored-By: Mac L <mjladson@pm.me>
2025-12-09 07:13:41 +00:00
Michael Sproul
261322c3e3 Merge remote-tracking branch 'origin/stable' into unstable 2025-11-20 13:04:32 +11:00
Lion - dapplion
74b8c02630 Reimport the checkpoint sync block (#8417)
We want to not require checkpoint sync starts to include the required custody data columns, and instead fetch them from p2p.


Closes https://github.com/sigp/lighthouse/issues/6837


  The checkpoint sync slot can:
1. Be the first slot in the epoch, such that the epoch of the block == the start checkpoint epoch
2. Be in an epoch prior to the start checkpoint epoch

In both cases backfill sync already fetches that epoch worth of blocks with current code. This PR modifies the backfill import filter function to allow to re-importing the oldest block slot in the DB.

I feel this solution is sufficient unless I'm missing something. ~~I have not tested this yet!~~ Michael has tested this and it works.


Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>

Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
2025-11-19 11:00:38 +00:00
Michael Sproul
e282363669 Gracefully handle deleting states prior to anchor_slot (#8409)
Fix an issue detected by @jimmygchen that occurs when checkpoint sync is aborted midway and then later restarted.

The characteristic error is something like:

> Nov 13 00:51:35.832 ERROR Database write failed                         error: Hdiff(LessThanStart(Slot(1728288), Slot(1728320))), action: "reverting blob DB changes"
Nov 13 00:51:35.833 WARN  Hot DB pruning failed                         error: DBError(HotColdDBError(Rollback))

This issue has existed since v7.1.0.


  Delete snapshot/diff in the case where `hot_storage_strategy` fails.


Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
2025-11-17 01:27:57 +00:00
Michael Sproul
2e55a0a9c8 New design for blob/column pruning (#8266)
We are seeing some crazy IO utilisation on Holesky now that data columns have started to expire. Our previous approach of _iterating the entire blobs DB_ doesn't seem to be scaling.


  New blob pruning algorithm that uses a backwards block iterator from the epoch we want to prune, stopping early if an already-pruned slot is encountered.


Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
2025-10-23 05:54:24 +00:00
Eitan Seri-Levi
33e21634cb Custody backfill sync (#7907)
#7603


  #### Custody backfill sync service
Similar in many ways to the current backfill service. There may be ways to unify the two services. The difficulty there is that the current backfill service tightly couples blocks and their associated blobs/data columns. Any attempts to unify the two services should be left to a separate PR in my opinion.

#### `SyncNeworkContext`
`SyncNetworkContext` manages custody sync data columns by range requests separetly from other sync RPC requests. I think this is a nice separation considering that custody backfill is its own service.

#### Data column import logic
The import logic verifies KZG committments and that the data columns block root matches the block root in the nodes store before importing columns

#### New channel to send messages to `SyncManager`
Now external services can communicate with the `SyncManager`. In this PR this channel is used to trigger a custody sync. Alternatively we may be able to use the existing `mpsc` channel that the `SyncNetworkContext` uses to communicate with the `SyncManager`. I will spend some time reviewing this.


Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>

Co-Authored-By: Eitan Seri- Levi <eserilev@gmail.com>

Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>
2025-10-22 03:51:34 +00:00
SunnysidedJ
d1e06dc40d #6853 Adding store tests for data column pruning (#7228)
#6853 Update store tests to cover data column pruning


  Created a helper function `check_data_column_existence` which is a copy of `check_blob_existence` but checking data columns instead.
The helper function is then used to check whether data columns are also pruned when blobs are pruned if PeerDAS is enabled.


Co-Authored-By: SunnysidedJ <j@testinprod.io>

Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>

Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
2025-10-16 15:20:26 +00:00
Michael Sproul
51321daabb Make the block cache optional (#8066)
Address contention on the store's `block_cache` by allowing it to be disabled when `--block-cache-size 0` is provided, and also making this the default.


Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
2025-09-18 07:10:18 +00:00
Eitan Seri-Levi
242bdfcf12 Add instrumentation to recompute_head_at_slot (#8049)
Co-Authored-By: Eitan Seri- Levi <eserilev@gmail.com>
2025-09-16 05:18:31 +00:00
Jimmy Chen
fb77ce9e19 Add missing event in PendingComponent span and clean up sync logs (#8033)
I was looking into some long `PendingComponents` span and noticed the block event wasn't added to the span, so it wasn't possible to see when the block was added from the trace view, this PR fixes this.

<img width="637" height="430" alt="image" src="https://github.com/user-attachments/assets/65040b1c-11e7-43ac-951b-bdfb34b665fb" />

Additionally I've noticed a lot of noises and confusion in sync logs due to the initial`peer_id` being included as part of the syncing chain span, causing all logs under the span to have that `peer_id`, which may not be accurate for some sync logs, I've removed `peer_id` from the `SyncingChain` span, and also cleaned up a bunch of spans to use `%` (display) for slots and epochs to make logs easier to read.


  


Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
2025-09-12 05:11:30 +00:00
Michael Sproul
d235f2c697 Delete RuntimeVariableList::from_vec (#7930)
This method is a footgun because it truncates the list. It is the source of a recent bug:

- https://github.com/sigp/lighthouse/pull/7927


  - Delete uses of `RuntimeVariableList::from_vec` and replace them with `::new` which does validation and can fail.
- Propagate errors where possible, unwrap in tests and use `expect` for obviously-safe uses (in `chain_spec.rs`).
2025-08-27 06:52:14 +00:00
Mac L
e438691683 Add Gloas boilerplate (#7728)
Adds the required boilerplate code for the Gloas (Glamsterdam) hard fork. This allows PRs testing Gloas-candidate features to test fork transition.

This also includes de-duplication of post-Bellatrix readiness notifiers from #6797 (credit to @dapplion)
2025-08-26 02:49:48 +00:00
Michael Sproul
836c39efaa Shrink persisted fork choice data (#7805)
Closes:

- https://github.com/sigp/lighthouse/issues/7760


  - [x] Remove `balances_cache` from `PersistedForkChoiceStore` (~65 MB saving on mainnet)
- [x] Remove `justified_balances` from `PersistedForkChoiceStore` (~16 MB saving on mainnet)
- [x] Remove `balances` from `ProtoArray`/`SszContainer`.
- [x] Implement zstd compression for votes
- [x] Fix bug in justified state usage
- [x] Bump schema version to V28 and implement migration.
2025-08-18 06:03:28 +00:00
chonghe
522bd9e9c6 Update Rust Edition to 2024 (#7766)
* #7749

Thanks @dknopik and @michaelsproul for your help!
2025-08-13 03:04:31 +00:00
Michael Sproul
918121e313 Fix bugs in rebasing of states prior to finalization (#7849)
Attempt to fix this error reported by `beaconcha.in` on their Hoodi archive nodes:

> {"code":500,"message":"UNHANDLED_ERROR: DBError(CacheBuildError(BeaconState(MilhouseError(OutOfBoundsIterFrom { index: 1199549, len: 1060000 }))))","stacktraces":[]}


  There are only a handful of places where we call `iter_from`.

This one is safe by construction (the check immediately prior ensures `self.pubkeys.len()` is not out of bounds):

cfb1f73310/beacon_node/beacon_chain/src/validator_pubkey_cache.rs (L84-L90)

This one should also be safe, and the indexes used here would not be as large as the ones in the reported error:

cfb1f73310/consensus/state_processing/src/per_epoch_processing/single_pass.rs (L365-L368)

Which leaves one remaining usage which must be the culprit:

cfb1f73310/consensus/types/src/beacon_state.rs (L2109-L2113)

This indexing relies on the invariant that `self.pubkey_cache().len() <= self.validators.len()`. We mostly maintain that invariant, except for in `rebase_caches_on` (fixed in this PR).

The other bug, is that we were calling `rebase_on_finalized` for all "hot" states, which post-v7.1.0 includes states prior to the split which are required by the hdiff grid. This is how we end up calling something like `genesis_state.rebase_on(&split_state)`, which then corrupts the pubkey cache of the genesis state using the newer pubkey cache from the split state.
2025-08-12 02:19:24 +00:00
Jimmy Chen
40c2fd5ff4 Instrument tracing spans for block processing and import (#7816)
#7815

- removes all existing spans, so some span fields that appear in logs like `service_name` may be lost.
- instruments a few key code paths in the beacon node, starting from **root spans** named below:

* Gossip block and blobs
* `process_gossip_data_column_sidecar`
* `process_gossip_blob`
* `process_gossip_block`
* Rpc block and blobs
* `process_rpc_block`
* `process_rpc_blobs`
* `process_rpc_custody_columns`
* Rpc blocks (range and backfill)
* `process_chain_segment`
* `PendingComponents` lifecycle
* `pending_components`

To test locally:
* Run Grafana and Tempo with https://github.com/sigp/lighthouse-metrics/pull/57
* Run Lighthouse BN with `--telemetry-collector-url http://localhost:4317`

Some captured traces can be found here: https://hackmd.io/@jimmygchen/r1sLOxPPeg

Removing the old spans seem to have reduced the memory usage quite a lot - i think we were using them on long running tasks and too excessively:
<img width="910" height="495" alt="image" src="https://github.com/user-attachments/assets/5208bbe4-53b2-4ead-bc71-0b782c788669" />
2025-08-08 05:32:22 +00:00
Michael Sproul
0dcce40ccb Fix Clippy for Rust 1.90 beta (#7826)
Fix Clippy for recently released Rust 1.90 beta. There may be more changes required when Rust 1.89 stable is released in a few days, but possibly not 🤞
2025-08-05 13:52:26 +00:00
Eitan Seri-Levi
db8b6be9df Data column custody info (#7648)
#7647


  Introduces a new record in the blobs db `DataColumnCustodyInfo`

When `DataColumnCustodyInfo` exists in the db this indicates that a recent cgc change has occurred and/or that a custody backfill sync is currently in progress (custody backfill will be added as a separate PR). When a cgc change has occurred `earliest_available_slot` will be equal to the slot at which the cgc change occured. During custody backfill sync`earliest_available_slot` should be updated incrementally as it progresses.

~~Note that if `advertise_false_custody_group_count` is enabled we do not add a `DataColumnCustodyInfo` record in the db as that would affect the status v2 response.~~
(See comment https://github.com/sigp/lighthouse/pull/7648#discussion_r2212403389)

~~If `DataColumnCustodyInfo` doesn't exist in the db this indicates that we have fulfilled our custody requirements up to the DA window.~~
(It now always exist, and the slot will be set to `None` once backfill is complete)

StatusV2 now uses `DataColumnCustodyInfo` to calculate the `earliest_available_slot` if a `DataColumnCustodyInfo` record exists in the db, if it's `None`, then we return the `oldest_block_slot`.
2025-07-22 13:30:30 +00:00
Michael Sproul
538067f1ff Merge remote-tracking branch 'origin/stable' into unstable 2025-07-10 15:53:45 +10:00
Michael Sproul
7b2f138ca7 Merge remote-tracking branch 'origin/stable' into release-v7.1.0 2025-07-09 11:19:16 +10:00
Michael Sproul
b9c1a2b0c0 Fix description of DB read bytes metric (#7716)
Fix a trivial typo that mixed up reads and writes.
2025-07-08 08:50:15 +00:00
Michael Sproul
a459a9af98 Fix and test checkpoint sync from genesis (#7689)
Fix a bug involving checkpoint sync from genesis reported by Sunnyside labs.


  Ensure that the store's `anchor` is initialised prior to storing the genesis state. In the case of checkpoint sync from genesis, the genesis state will be in the _hot DB_, so we need the hot DB metadata to be initialised in order to store it.

I've extended the existing checkpoint sync tests to cover this case as well. There are some subtleties around what the `state_upper_limit` should be set to in this case. I've opted to just enable state reconstruction from the start in the test so it gets set to 0, which results in an end state more consistent with the other test cases (full state reconstruction). This is required because we can't meaningfully do any state reconstruction when the split slot is 0 (there is no range of frozen slots to reconstruct).
2025-07-02 04:50:33 +00:00
Jimmy Chen
fcc602a787 Update fulu network configs and add MIN_EPOCHS_FOR_DATA_COLUMN_SIDECARS_REQUESTS (#7646)
- #6240
- Bring built-in network configs up to date with latest consensus-spec PeerDAS configs.
- Add `MIN_EPOCHS_FOR_DATA_COLUMN_SIDECARS_REQUESTS` and use it to determine data availability window after the Fulu fork.
2025-07-02 02:38:25 +00:00
Pawan Dhananjay
e305cb1b92 Custody persist fix (#7661)
N/A


  Persist the epoch -> cgc values. This is to ensure that `ValidatorRegistrations::latest_validator_custody_requirement` always returns a `Some` value post restart assuming the `epoch_validator_custody_requirements` map has been updated in the previous runs.
2025-07-01 06:06:37 +00:00
Michael Sproul
6be646ca11 Bump DB schema to v25 (#7666)
When we removed the eth1 data, I wrote a v25 schema upgrade to delete the data on disk:

- https://github.com/sigp/lighthouse/pull/7133

However, I forgot to update the current schema version, so this change was never actioned.


  This PR updates the current schema version to v25 so that the migration runs.
2025-06-30 05:52:28 +00:00
Pawan Dhananjay
9b1f3ed9d1 Add gossip check (#7652)
N/A


  Add an additional gossip condition.
2025-06-27 00:26:38 +00:00
chonghe
8e3c5d1524 Rust 1.89 compiler lint fix (#7644)
Fix lints for Rust 1.89 beta compiler
2025-06-25 05:33:17 +00:00