head_payload_status is internal fork choice state, not an EL
forkchoiceUpdated parameter. It already lives on CachedHead — source
it directly from get_head() return in recompute_head_at_slot instead
of threading through ForkchoiceUpdateParameters.
Also add TODO(gloas) for parent_head_hash in re-org path (V29 nodes
don't carry execution_status).
Remove V17/V29 branching in beacon_chain reorg weight computation.
Use total weight for both pre and post-GLOAS, which is correct for
pre-GLOAS and conservative for post-GLOAS. The payload-aware version
will be needed when reorg logic is enabled for GLOAS.
Serves envelope by range and by root requests. Added PayloadEnvelopeStreamer so that we dont need to alter upstream code when we introduce blinded payload envelopes.
Co-Authored-By: Eitan Seri- Levi <eserilev@gmail.com>
Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>
Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>
Closes:
- https://github.com/sigp/lighthouse/issues/8958
- Update the `HotColdStore` to handle storage of cold states.
- Update `BeaconSnapshot` to hold the execution envelope. This is required to make `chain_dump`-related checks sane, and will be generally useful (see: https://github.com/sigp/lighthouse/issues/8956).
- Bug fix in the `BlockReplayer` for the case where the starting state is already `Full` (we should not try to apply another payload). This happens on the cold DB path because we try to replay from the closest cached state (which is often full).
- Update `test_gloas_hot_state_hierarchy` to cover the cold DB migration.
Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
Co-Authored-By: Michael Sproul <michaelsproul@users.noreply.github.com>
Closes:
- https://github.com/sigp/lighthouse/issues/8869
- Update `BlockReplayer` to support replay of execution payload envelopes.
- Update `HotColdDB` to load payload envelopes and feed them to the `BlockReplayer` for both hot + cold states. However the cold DB code is not fully working yet (see: https://github.com/sigp/lighthouse/issues/8958).
- Add `StatePayloadStatus` to allow callers to specify whether they want a state with a payload applied, or not.
- Fix the state cache to key by `StatePayloadStatus`.
- Lots of fixes to block production and block processing regarding state management.
- Initial test harness support for producing+processing Gloas blocks+envelopes
- A few new tests to cover Gloas DB operations
Co-Authored-By: Eitan Seri- Levi <eserilev@gmail.com>
Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>
Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
Co-Authored-By: Michael Sproul <michaelsproul@users.noreply.github.com>
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
We received a bug report of a node restarting custody backfill unnecessarily after upgrading to Lighthouse v8.1.1. What happened is:
- User started LH v8.0.1 many months ago, CGC updated 0 -> N but the CGC was not eagerly persisted.
- LH experienced an unclean shutdown (not sure of what type).
- Upon restarting (still running v8.0.1), the custody context read from disk contains CGC=0: `DEBUG Loaded persisted custody context custody_context: CustodyContext { validator_custody_count: 0, ...`).
- CGC updates again to N, retriggering custody backfill: `DEBUG Validator count at head updated old_count: 0, new_count: N`.
- Custody backfill does a bunch of downloading for no gain: `DEBUG Imported historical data columns epoch: Epoch(428433), total_imported: 0`
- While custody backfill is running user updated to v8.1.1, and we see logs for the CGC=N being peristed upon clean shutdown, and then correctly read on startup with v8.1.1.
- Custody backfill keeps running and downloading due to the CGC change still being considered in progress.
- Call `persist_custody_context` inside the `register_validators` handler so that it is written to disk eagerly whenever it changes. The performance impact of this should be minimal as the amount of data is very small and this call can only happen at most ~128 times (once for each change) in the entire life of a beacon node.
- Call `persist_custody_context` inside `BeaconChainBuilder::build` so that changes caused by CLI flags are persisted (otherwise starting a node with `--semi-supernode` and no validators, then shutting it down uncleanly would cause use to forget the CGC).
These changes greatly reduce the timespan during which an unclean shutdown can create inconsistency. In the worst case, we only lose backfill progress that runs concurrently with the `register_validators` handler (should be extremely minimal, nigh impossible).
Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
The flow for local block building is
1. Create execution payload and bid
2. Construct beacon block
3. Sign beacon block and publish
4. Sign execution payload and publish
This PR adds the beacon block v4 flow , GET payload envelope and POST payload envelope (local block building only). The spec for these endpoints can be found here: https://github.com/ethereum/beacon-APIs/pull/552 and is subject to change.
We needed a way to store the unsigned execution payload envelope associated to the execution payload bid that was included in the block. I introduced a new cache that stores these unsigned execution payload envelopes. the GET payload envelope queries this cache directly so that a proposer, after publishing a block, can fetch the payload envelope + sign and publish it.
I kept payload signing and publishing within the validators block service to keep things simple for now. The idea was to build out a block production MVP for devnet 0, try not to affect any non gloas code paths and build things out in such a way that it will be easy to deprecate pre-gloas code paths later on (for example block production v2 and v3).
We will eventually need to track which beacon node was queried for the block so that we can later query it for the payload. But thats not needed for the devnet.
Co-Authored-By: Eitan Seri- Levi <eserilev@gmail.com>
Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>
Currently, `consensus/types` cannot build with `no-default-features` since we use "legacy" standard arithmetic operations.
- Remove the offending arithmetic to fix compilation.
- Rename `legacy-arith` to `saturating-arith` and disable it by default.
Co-Authored-By: Mac L <mjladson@pm.me>
Adds support for payload envelopes in the db. This is the minimum we'll need to store and fetch payloads.
Co-Authored-By: Eitan Seri- Levi <eserilev@gmail.com>
Removes some of the temporary re-exports in `consensus/types`.
I am doing this in multiple parts to keep each diff small.
Co-Authored-By: Mac L <mjladson@pm.me>
There are certain crates which we re-export within `types` which creates a fragmented DevEx, where there are various ways to import the same crates.
```rust
// consensus/types/src/lib.rs
pub use bls::{
AggregatePublicKey, AggregateSignature, Error as BlsError, Keypair, PUBLIC_KEY_BYTES_LEN,
PublicKey, PublicKeyBytes, SIGNATURE_BYTES_LEN, SecretKey, Signature, SignatureBytes,
get_withdrawal_credentials,
};
pub use context_deserialize::{ContextDeserialize, context_deserialize};
pub use fixed_bytes::FixedBytesExtended;
pub use milhouse::{self, List, Vector};
pub use ssz_types::{BitList, BitVector, FixedVector, VariableList, typenum, typenum::Unsigned};
pub use superstruct::superstruct;
```
This PR removes these re-exports and makes it explicit that these types are imported from a non-`consensus/types` crate.
Co-Authored-By: Mac L <mjladson@pm.me>
Organize and categorize `consensus/types` into modules based on their relation to key consensus structures/concepts.
This is a precursor to a sensible public interface.
While this refactor is very opinionated, I am open to suggestions on module names, or type groupings if my current ones are inappropriate.
Co-Authored-By: Mac L <mjladson@pm.me>
Part of a fork-choice tech debt clean-up https://github.com/sigp/lighthouse/issues/8325https://github.com/sigp/lighthouse/issues/7089 (non-finalized checkpoint sync) changes the meaning of the checkpoints inside fork-choice. It turns out that we persist the justified and finalized checkpoints **twice** in fork-choice
1. Inside the fork-choice store
2. Inside the proto-array
There's no reason for 2. except for making the function signature of some methods smallers. It's not consistent with the rest of the crate, because in some functions we pass the external variable of time (current_slot) via args, but then read the finalized checkpoint from the internal state. Passing both variables as args makes fork-choice easier to reason about at the cost of a few extra lines.
Remove the unnecessary state (`justified_checkpoint`, `finalized_checkpoint`) inside `ProtoArray`, to make it easier to reason about.
Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>
Co-Authored-By: Michael Sproul <michaelsproul@users.noreply.github.com>
This is an optimisation targeted at Fulu networks in non-finality.
While debugging on Holesky, we found that `state_root_at_slot` was being called from `prepare_beacon_proposer` a lot, for the finalized state:
2c9b670f5d/beacon_node/http_api/src/lib.rs (L3860-L3861)
This was causing `prepare_beacon_proposer` calls to take upwards of 5 seconds, sometimes 10 seconds, because it would trigger _multiple_ beacon state loads in order to iterate back to the finalized slot. Ideally, loading the finalized state should be quick because we keep it cached in the state cache (technically we keep the split state, but they usually coincide). Instead we are computing the finalized state root separately (slow), and then loading the state from the cache (fast).
Although it would be possible to make the API faster by removing the `state_root_at_slot` call, I believe it's simpler to change `state_root_at_slot` itself and remove the footgun. Devs rightly expect operations involving the finalized state to be fast.
Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
https://github.com/sigp/lighthouse/issues/8012
Replace all instances of `VariableList::from` and `FixedVector::from` to their `try_from` variants.
While I tried to use proper error handling in most cases, there were certain situations where adding an `expect` for situations where `try_from` can trivially never fail avoided adding a lot of extra complexity.
Co-Authored-By: Mac L <mjladson@pm.me>
Co-Authored-By: Michael Sproul <michaelsproul@users.noreply.github.com>
Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
Addresses #8218
A simplified version of #8241 for the initial release.
I've tried to minimise the logic change in this PR, although introducing the `NodeCustodyType` enum still result in quite a bit a of diff, but the actual logic change in `CustodyContext` is quite small.
The main changes are in the `CustdoyContext` struct
* ~~combining `validator_custody_count` and `current_is_supernode` fields into a single `custody_group_count_at_head` field. We persist the cgc of the initial cli values into the `custody_group_count_at_head` field and only allow for increase (same behaviour as before).~~
* I noticed the above approach caused a backward compatibility issue, I've [made a fix](15569bc085) and changed the approach slightly (which was actually what I had originally in mind):
* when initialising, only override the `validator_custody_count` value if either flag `--supernode` or `--semi-supernode` is used; otherwise leave it as the existing default `0`. Most other logic remains unchanged.
All existing validator custody unit tests are still all passing, and I've added additional tests to cover semi-supernode, and restoring `CustodyContext` from disk.
Note: I've added a `WARN` if the user attempts to switch to a `--semi-supernode` or `--supernode` - this currently has no effect, but once @eserilev column backfill is merged, we should be able to support this quite easily.
Things to test
- [x] cgc in metadata / enr
- [x] cgc in metrics
- [x] subscribed subnets
- [x] getBlobs endpoint
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
#7603
#### Custody backfill sync service
Similar in many ways to the current backfill service. There may be ways to unify the two services. The difficulty there is that the current backfill service tightly couples blocks and their associated blobs/data columns. Any attempts to unify the two services should be left to a separate PR in my opinion.
#### `SyncNeworkContext`
`SyncNetworkContext` manages custody sync data columns by range requests separetly from other sync RPC requests. I think this is a nice separation considering that custody backfill is its own service.
#### Data column import logic
The import logic verifies KZG committments and that the data columns block root matches the block root in the nodes store before importing columns
#### New channel to send messages to `SyncManager`
Now external services can communicate with the `SyncManager`. In this PR this channel is used to trigger a custody sync. Alternatively we may be able to use the existing `mpsc` channel that the `SyncNetworkContext` uses to communicate with the `SyncManager`. I will spend some time reviewing this.
Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>
Co-Authored-By: Eitan Seri- Levi <eserilev@gmail.com>
Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>