Commit Graph

823 Commits

Author SHA1 Message Date
Eitan Seri- Levi
9f46ec6083 Merge branch 'gloas-block-and-bid-production' into gloas-devnet-0 2026-02-13 23:09:45 -08:00
Eitan Seri- Levi
75014fe3ad SseExtendPayloadAttributes hack 2026-02-13 23:09:33 -08:00
Eitan Seri- Levi
4795e1f341 Merge branch 'unstable' of https://github.com/sigp/lighthouse into gloas-block-and-bid-production 2026-02-13 21:33:06 -08:00
Eitan Seri- Levi
e5598d529c block verification changes 2026-02-13 15:00:31 -08:00
Eitan Seri- Levi
ebaca3144c merge block production 2026-02-13 00:05:59 -08:00
Eitan Seri- Levi
47782a68c3 delay cache, and remove some todos 2026-02-12 21:27:39 -08:00
Eitan Seri- Levi
5796864201 Merge branch 'unstable' of https://github.com/sigp/lighthouse into gloas-payload-processing 2026-02-12 14:18:46 -08:00
Mac L
c59e4a0cee Disable legacy-arith by default in consensus/types (#8695)
Currently, `consensus/types` cannot build with `no-default-features` since we use "legacy" standard arithmetic operations.


  - Remove the offending arithmetic to fix compilation.
- Rename `legacy-arith` to `saturating-arith` and disable it by default.


Co-Authored-By: Mac L <mjladson@pm.me>
2026-02-12 20:51:39 +00:00
Eitan Seri- Levi
f637a68e04 Import payload flow 2026-02-11 23:34:53 -08:00
Eitan Seri- Levi
9f972d1743 progress 2026-02-11 14:53:24 -08:00
Eitan Seri- Levi
8204241b45 Progress 2026-02-10 19:57:53 -08:00
Eitan Seri- Levi
846b1ba023 pub crate 2026-02-10 12:16:59 -08:00
Eitan Seri- Levi
43c24d3ee2 init payload processing 2026-02-10 09:15:50 -08:00
Eitan Seri- Levi
8eb409a73a linting 2026-02-09 21:34:00 -08:00
Eitan Seri- Levi
fea43fb0c8 Move block production specific stuff to block_production module 2026-02-09 21:05:25 -08:00
Eitan Seri- Levi
50dde1585c Add payload to a cache for later signing 2026-02-03 17:27:58 -08:00
Eitan Seri- Levi
7cf4eb0396 Add new block production endpoint 2026-02-03 16:13:07 -08:00
Michael Sproul
d42327bb86 Implement Gloas withdrawals and refactor (#8692)
Co-Authored-By: Michael Sproul <michael@sigmaprime.io>

Co-Authored-By: Michael Sproul <michaelsproul@users.noreply.github.com>
2026-02-03 07:36:20 +00:00
Eitan Seri-Levi
ed7354d460 Payload envelope db operations (#8717)
Adds support for payload envelopes in the db. This is the minimum we'll need to store and fetch payloads.


  


Co-Authored-By: Eitan Seri- Levi <eserilev@gmail.com>
2026-02-03 05:46:10 +00:00
Eitan Seri-Levi
3ecf964385 Replace INTERVALS_PER_SLOT with explicit slot component times (#7944)
https://github.com/ethereum/consensus-specs/pull/4476


  


Co-Authored-By: Barnabas Busa <barnabas.busa@ethereum.org>

Co-Authored-By: Eitan Seri- Levi <eserilev@gmail.com>

Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>

Co-Authored-By: Michael Sproul <michaelsproul@users.noreply.github.com>

Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
2026-02-02 05:58:42 +00:00
Jimmy Chen
cd8049a696 Emit NewHead SSE event earlier in block import (#8718)
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
2026-01-29 07:39:05 +00:00
Eitan Seri-Levi
9bec8df37a Add Gloas data column support (#8682)
Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>

Co-Authored-By: Eitan Seri- Levi <eserilev@gmail.com>
2026-01-28 04:52:12 +00:00
Jimmy Chen
7f065009a7 Implement custom OpenTelemetry sampler to filter uninstrumented traces (#8647)
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
2026-01-22 05:11:26 +00:00
Mac L
3903e1c67f More consensus/types re-export cleanup (#8665)
Remove more of the temporary re-exports from `consensus/types`


Co-Authored-By: Mac L <mjladson@pm.me>
2026-01-16 04:43:05 +00:00
Mac L
1abc41e337 Cleanup consensus/types re-exports (#8643)
Removes some of the temporary re-exports in `consensus/types`.

I am doing this in multiple parts to keep each diff small.


Co-Authored-By: Mac L <mjladson@pm.me>
2026-01-15 02:23:55 +00:00
Jimmy Chen
dbe474e132 Delete attester cache (#8469)
Fixes attester cache write lock contention. Alternative to #8463.


  


Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
2026-01-06 03:08:02 +00:00
ethDreamer
a39e991557 Gloas(EIP-7732): Containers / Constants (#7923)
* #7850

This is the first round of the conga line! 🎉

Just spec constants and container changes so far.


  


Co-Authored-By: shane-moore <skm1790@gmail.com>

Co-Authored-By: Mark Mackey <mark@sigmaprime.io>

Co-Authored-By: Shane K Moore <41407272+shane-moore@users.noreply.github.com>

Co-Authored-By: Eitan Seri- Levi <eserilev@gmail.com>

Co-Authored-By: ethDreamer <37123614+ethDreamer@users.noreply.github.com>

Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>

Co-Authored-By: Jimmy Chen <jimmy@sigmaprime.io>

Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
2025-12-16 06:45:45 +00:00
chonghe
86c2b7cfbe Append client version info to graffiti (#7558)
* #7201


  


Co-Authored-By: Tan Chee Keong <tanck@sigmaprime.io>

Co-Authored-By: chonghe <44791194+chong-he@users.noreply.github.com>

Co-Authored-By: Jimmy Chen <jimmy@sigmaprime.io>

Co-Authored-By: Tan Chee Keong <tanck2005@gmail.com>
2025-12-16 03:19:28 +00:00
Mac L
f3fd1f210b Remove consensus/types re-exports (#8540)
There are certain crates which we re-export within `types` which creates a fragmented DevEx, where there are various ways to import the same crates.

```rust
// consensus/types/src/lib.rs
pub use bls::{
AggregatePublicKey, AggregateSignature, Error as BlsError, Keypair, PUBLIC_KEY_BYTES_LEN,
PublicKey, PublicKeyBytes, SIGNATURE_BYTES_LEN, SecretKey, Signature, SignatureBytes,
get_withdrawal_credentials,
};
pub use context_deserialize::{ContextDeserialize, context_deserialize};
pub use fixed_bytes::FixedBytesExtended;
pub use milhouse::{self, List, Vector};
pub use ssz_types::{BitList, BitVector, FixedVector, VariableList, typenum, typenum::Unsigned};
pub use superstruct::superstruct;
```

This PR removes these re-exports and makes it explicit that these types are imported from a non-`consensus/types` crate.


Co-Authored-By: Mac L <mjladson@pm.me>
2025-12-09 07:13:41 +00:00
Mac L
4e958a92d3 Refactor consensus/types (#7827)
Organize and categorize `consensus/types` into modules based on their relation to key consensus structures/concepts.
This is a precursor to a sensible public interface.

While this refactor is very opinionated, I am open to suggestions on module names, or type groupings if my current ones are inappropriate.


Co-Authored-By: Mac L <mjladson@pm.me>
2025-12-04 09:28:52 +00:00
0xMushow
4fbe517491 Fix data columns sorting when reconstructing blobs (#8510)
Closes https://github.com/sigp/lighthouse/issues/8509


  


Co-Authored-By: Antoine James <antoine@ethereum.org>
2025-12-02 03:06:29 +00:00
Lion - dapplion
53e73fa376 Remove duplicate state in ProtoArray (#8324)
Part of a fork-choice tech debt clean-up https://github.com/sigp/lighthouse/issues/8325

https://github.com/sigp/lighthouse/issues/7089 (non-finalized checkpoint sync) changes the meaning of the checkpoints inside fork-choice. It turns out that we persist the justified and finalized checkpoints **twice** in fork-choice
1. Inside the fork-choice store
2. Inside the proto-array

There's no reason for 2. except for making the function signature of some methods smallers. It's not consistent with the rest of the crate, because in some functions we pass the external variable of time (current_slot) via args, but then read the finalized checkpoint from the internal state. Passing both variables as args makes fork-choice easier to reason about at the cost of a few extra lines.


  Remove the unnecessary state (`justified_checkpoint`, `finalized_checkpoint`) inside `ProtoArray`, to make it easier to reason about.


Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>

Co-Authored-By: Michael Sproul <michaelsproul@users.noreply.github.com>
2025-11-12 03:42:17 +00:00
Michael Sproul
a7e89a8761 Optimise state_root_at_slot for finalized slot (#8353)
This is an optimisation targeted at Fulu networks in non-finality.

While debugging on Holesky, we found that `state_root_at_slot` was being called from `prepare_beacon_proposer` a lot, for the finalized state:

2c9b670f5d/beacon_node/http_api/src/lib.rs (L3860-L3861)

This was causing `prepare_beacon_proposer` calls to take upwards of 5 seconds, sometimes 10 seconds, because it would trigger _multiple_ beacon state loads in order to iterate back to the finalized slot. Ideally, loading the finalized state should be quick because we keep it cached in the state cache (technically we keep the split state, but they usually coincide). Instead we are computing the finalized state root separately (slow), and then loading the state from the cache (fast).

Although it would be possible to make the API faster by removing the `state_root_at_slot` call, I believe it's simpler to change `state_root_at_slot` itself and remove the footgun. Devs rightly expect operations involving the finalized state to be fast.


Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
2025-11-05 02:08:46 +00:00
Mac L
f5809aff87 Bump ssz_types to v0.12.2 (#8032)
https://github.com/sigp/lighthouse/issues/8012


  Replace all instances of `VariableList::from` and `FixedVector::from` to their `try_from` variants.

While I tried to use proper error handling in most cases, there were certain situations where adding an `expect` for situations where `try_from` can trivially never fail avoided adding a lot of extra complexity.


Co-Authored-By: Mac L <mjladson@pm.me>

Co-Authored-By: Michael Sproul <michaelsproul@users.noreply.github.com>

Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
2025-10-28 04:01:09 +00:00
Jimmy Chen
43c5e924d7 Add --semi-supernode support (#8254)
Addresses #8218

A simplified version of #8241 for the initial release.

I've tried to minimise the logic change in this PR, although introducing the `NodeCustodyType` enum still result in quite a bit a of diff, but the actual logic change in `CustodyContext` is quite small.

The main changes are in the `CustdoyContext` struct
* ~~combining `validator_custody_count` and `current_is_supernode` fields into a single `custody_group_count_at_head` field. We persist the cgc of the initial cli values into the `custody_group_count_at_head` field and only allow for increase (same behaviour as before).~~
* I noticed the above approach caused a backward compatibility issue, I've [made a fix](15569bc085) and changed the approach slightly (which was actually what I had originally in mind):
* when initialising, only override the  `validator_custody_count` value if either flag `--supernode` or `--semi-supernode` is used; otherwise leave it as the existing default `0`. Most other logic remains unchanged.

All existing validator custody unit tests are still all passing, and I've added additional tests to cover semi-supernode, and restoring `CustodyContext` from disk.

Note: I've added a `WARN` if the user attempts to switch to a `--semi-supernode` or `--supernode` - this currently has no effect, but once @eserilev column backfill is merged, we should be able to support this quite easily.

Things to test
- [x] cgc in metadata / enr
- [x] cgc in metrics
- [x] subscribed subnets
- [x] getBlobs endpoint


  


Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
2025-10-22 05:23:17 +00:00
Eitan Seri-Levi
33e21634cb Custody backfill sync (#7907)
#7603


  #### Custody backfill sync service
Similar in many ways to the current backfill service. There may be ways to unify the two services. The difficulty there is that the current backfill service tightly couples blocks and their associated blobs/data columns. Any attempts to unify the two services should be left to a separate PR in my opinion.

#### `SyncNeworkContext`
`SyncNetworkContext` manages custody sync data columns by range requests separetly from other sync RPC requests. I think this is a nice separation considering that custody backfill is its own service.

#### Data column import logic
The import logic verifies KZG committments and that the data columns block root matches the block root in the nodes store before importing columns

#### New channel to send messages to `SyncManager`
Now external services can communicate with the `SyncManager`. In this PR this channel is used to trigger a custody sync. Alternatively we may be able to use the existing `mpsc` channel that the `SyncNetworkContext` uses to communicate with the `SyncManager`. I will spend some time reviewing this.


Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>

Co-Authored-By: Eitan Seri- Levi <eserilev@gmail.com>

Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>
2025-10-22 03:51:34 +00:00
Eitan Seri-Levi
46dde9afee Fix data column rpc request (#8247)
Fixes an issue mentioned in this comment regarding data column rpc requests:
https://github.com/sigp/lighthouse/issues/6572#issuecomment-3400076236


  


Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>

Co-Authored-By: Michael Sproul <micsproul@gmail.com>
2025-10-21 23:54:35 +00:00
Michael Sproul
21bab0899a Improve block header signature handling (#8253)
Closes:

- https://github.com/sigp/lighthouse/issues/7650


  Reject blob and data column sidecars from RPC with invalid signatures.


Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
2025-10-21 13:58:12 +00:00
Michael Sproul
2f8587301d More proposer shuffling cleanup (#8130)
Addressing more review comments from:

- https://github.com/sigp/lighthouse/pull/8101

I've also tweaked a few more things that I think are minor bugs.


  - Instrument `ensure_state_can_determine_proposers_for_epoch`
- Fix `block_root` usage in `compute_proposer_duties_from_head`. This was a regression introduced in 8101 😬 .
- Update the `state_advance_timer` to prime the next-epoch proposer cache post-Fulu.


Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
2025-10-20 03:14:14 +00:00
Pawan Dhananjay
2c328e32a6 Persist only custody columns in db (#8188)
* Only persist custody columns

* Get claude to write tests

* lint

* Address review comments and fix tests.

* Use supernode only when building chain segments

* Clean up

* Rewrite tests.

* Fix tests

* Clippy

---------

Co-authored-by: Jimmy Chen <jchen.tc@gmail.com>
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2025-10-13 20:32:13 +11:00
Michael Sproul
13dfa9200f Block proposal optimisations (#8156)
Closes:

- https://github.com/sigp/lighthouse/issues/4412

This should reduce Lighthouse's block proposal times on Holesky and prevent us getting reorged.


  - [x] Allow the head state to be advanced further than 1 slot. This lets us avoid epoch processing on hot paths including block production, by having new epoch boundaries pre-computed and available in the state cache.
- [x] Use the finalized state to prune the op pool. We were previously using the head state and trying to infer slashing/exit relevance based on `exit_epoch`. However some exit epochs are far in the future, despite occurring recently.


Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
2025-10-08 06:09:12 +00:00
Michael Sproul
c754234b2c Fix bugs in proposer calculation post-Fulu (#8101)
As identified by a researcher during the Fusaka security competition, we were computing the proposer index incorrectly in some places by computing without lookahead.


  - [x] Add "low level" checks to computation functions in `consensus/types` to ensure they error cleanly
- [x] Re-work the determination of proposer shuffling decision roots, which are now fork aware.
- [x] Re-work and simplify the beacon proposer cache to be fork-aware.
- [x] Optimise `with_proposer_cache` to use `OnceCell`.
- [x] All tests passing.
- [x] Resolve all remaining `FIXME(sproul)`s.
- [x] Unit tests for `ProtoBlock::proposer_shuffling_root_for_child_block`.
- [x] End-to-end regression test.
- [x] Test on pre-Fulu network.
- [x] Test on post-Fulu network.


Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
2025-09-26 14:44:50 +00:00
Lion - dapplion
ffa7b2b2b9 Only mark block lookups as pending if block is importing from gossip (#8112)
- PR https://github.com/sigp/lighthouse/pull/8045 introduced a regression of how lookup sync interacts with the da_checker.

Now in unstable block import from the HTTP API also insert the block in the da_checker while the block is being execution verified. If lookup sync finds the block in the da_checker in `NotValidated` state it expects a `GossipBlockProcessResult` message sometime later. That message is only sent after block import in gossip.

I confirmed in our node's logs for 4/4 cases of stuck lookups are caused by this sequence of events:
- Receive block through API, insert into da_checker in fn process_block in put_pre_execution_block
- Create lookup and leave in AwaitingDownload(block in processing cache) state
- Block from HTTP API finishes importing
- Lookup is left stuck

Closes https://github.com/sigp/lighthouse/issues/8104


  - https://github.com/sigp/lighthouse/pull/8110 was my initial solution attempt but we can't send the `GossipBlockProcessResult` event from the `http_api` crate without adding new channels, which seems messy.

For a given node it's rare that a lookup is created at the same time that a block is being published. This PR solves https://github.com/sigp/lighthouse/issues/8104 by allowing lookup sync to import the block twice in that case.


Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>
2025-09-25 03:52:27 +00:00
Eitan Seri-Levi
af274029e8 Run reconstruction inside a scoped rayon pool (#8075)
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>

Co-Authored-By: Eitan Seri- Levi <eserilev@gmail.com>

Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>
2025-09-24 06:37:34 +00:00
Jimmy Chen
78d330e4b7 Consolidate reqresp_pre_import_cache into data_availability_checker (#8045)
This PR consolidates the `reqresp_pre_import_cache` into the `data_availability_checker` for the following reasons:
- the `reqresp_pre_import_cache` suffers from the same TOCTOU bug we had with `data_availability_checker` earlier, and leads to unbounded memory leak, which we have observed over the last 6 months on some nodes.
- the `reqresp_pre_import_cache` is no longer necessary, because we now hold blocks in the `data_availability_checker` for longer since (#7961), and recent blocks can be served from the DA checker.

This PR also maintains the following functionalities
- Serving pre-executed blocks over RPC, and they're now served from the `data_availability_checker` instead.
- Using the cache for de-duplicating lookup requests.


  


Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>

Co-Authored-By: Jimmy Chen <jimmy@sigmaprime.io>
2025-09-19 07:01:13 +00:00
Jimmy Chen
3de646c8b3 Enable reconstruction for nodes custodying more than 50% of columns and instrument tracing (#8052)
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>

Co-Authored-By: Jimmy Chen <jimmy@sigmaprime.io>
2025-09-16 08:17:43 +00:00
Michael Sproul
f04d5ecddd Another check to prevent duplicate block imports (#8050)
Attempt to address performance issues caused by importing the same block multiple times.


  - Check fork choice "after" obtaining the fork choice write lock in `BeaconChain::import_block`. We actually use an upgradable read lock, but this is semantically equivalent (the upgradable read has the advantage of not excluding regular reads).

The hope is that this change has several benefits:

1. By preventing duplicate block imports we save time repeating work inside `import_block` that is unnecessary, e.g. writing the state to disk. Although the store itself now takes some measures to avoid re-writing diffs, it is even better if we avoid a disk write entirely.
2. By returning `DuplicateFullyImported`, we reduce some duplicated work downstream. E.g. if multiple threads importing columns trigger `import_block`, now only _one_ of them will get a notification of the block import completing successfully, and only this one will run `recompute_head`. This should help avoid a situation where multiple beacon processor workers are consumed by threads blocking on the `recompute_head_lock`. However, a similar block-fest is still possible with the upgradable fork choice lock (a large number of threads can be blocked waiting for the first thread to complete block import).


Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
2025-09-16 04:10:42 +00:00
Jimmy Chen
8a4f6cf0d5 Instrument tracing on block production code path (#8017)
Partially #7814. Instrument block production code path.

New root spans:
* `produce_block_v3`
* `produce_block_v2`

Example traces:

<img width="518" height="432" alt="image" src="https://github.com/user-attachments/assets/a9413d25-501c-49dc-95cc-623db5988981" />


  


Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
2025-09-10 03:30:51 +00:00
Jimmy Chen
eef02afc93 Fix data availability checker race condition causing partial data columns to be served over RPC (#7961)
Partially resolves #6439, an simpler alternative to #7931.

Race condition occurs when RPC data columns arrives after a block has been imported and removed from the DA checker:
1. Block becomes available via gossip
2. RPC columns arrive and pass fork choice check (block hasn't been imported)
3. Block import completes (removing block from DA checker)
4. RPC data columns finish verification and get imported into DA checker

This causes two issues:
1. **Partial data serving**: Already imported components get re-inserted, potentially causing LH to serve incomplete data
2. **State cache misses**: Leads to state reconstruction, holding the availability cache write lock longer and increasing race likelihood

### Proposed Changes

1. Never manually remove pending components from DA checker. Components are only removed via LRU eviction as finality advances. This makes sure we don't run into the issue described above.
2. Use `get` instead of `pop` when recovering the executed block, this prevents cache misses in race condition. This should reduce the likelihood of the race condition
3. Refactor DA checker to drop write lock as soon as components are added. This should also reduce the likelihood of the race condition

**Trade-offs:**

This solution eliminates a few nasty race conditions while allowing simplicity, with the cost of allowing block re-import (already existing).

The increase in memory in DA checker can be partially offset by a reduction in block cache size if this really comes an issue (as we now serve recent blocks from DA checker).
2025-09-02 07:18:23 +00:00
Jimmy Chen
c13fb2fb46 Instrument publish_block code path (#7945)
Instrument `publish_block` code path and log dropped data columns when publishing.

Example spans (running the devnet from my laptop, so the numbers aren't great)

<img width="734" height="296" alt="image" src="https://github.com/user-attachments/assets/20620bf7-2b38-4392-aa75-9ba96d3a7f0d" />

<img width="718" height="625" alt="image" src="https://github.com/user-attachments/assets/61e1ff1c-65b5-4ad4-981a-d0fadc9829e1" />
2025-08-28 03:31:29 +00:00