Removes the remaining facade re-exports from `consensus/types`.
I have left `graffiti` as I think it has some utility so am leaning towards keeping it in the final API design.
Co-Authored-By: Mac L <mjladson@pm.me>
Removes some of the temporary re-exports in `consensus/types`.
I am doing this in multiple parts to keep each diff small.
Co-Authored-By: Mac L <mjladson@pm.me>
Closes#8569
Updates the HTTP API error when the node cannot reconstruct blobs due to "Insufficient data columns".
Changes the response from 500 Internal Server Error to 400 Bad Request and adds a hint to run with --supernode or --semi-supernode.
Co-Authored-By: Andrurachi <andruvrch@gmail.com>
There are certain crates which we re-export within `types` which creates a fragmented DevEx, where there are various ways to import the same crates.
```rust
// consensus/types/src/lib.rs
pub use bls::{
AggregatePublicKey, AggregateSignature, Error as BlsError, Keypair, PUBLIC_KEY_BYTES_LEN,
PublicKey, PublicKeyBytes, SIGNATURE_BYTES_LEN, SecretKey, Signature, SignatureBytes,
get_withdrawal_credentials,
};
pub use context_deserialize::{ContextDeserialize, context_deserialize};
pub use fixed_bytes::FixedBytesExtended;
pub use milhouse::{self, List, Vector};
pub use ssz_types::{BitList, BitVector, FixedVector, VariableList, typenum, typenum::Unsigned};
pub use superstruct::superstruct;
```
This PR removes these re-exports and makes it explicit that these types are imported from a non-`consensus/types` crate.
Co-Authored-By: Mac L <mjladson@pm.me>
Remove certain dependencies from `eth2`, and feature-gate others which are only used by certain endpoints.
| Removed | Optional | Dev only |
| -------- | -------- | -------- |
| `either` `enr` `libp2p-identity` `multiaddr` | `protoarray` `eth2_keystore` `eip_3076` `zeroize` `reqwest-eventsource` `futures` `futures-util` | `rand` `test_random_derive` |
This is done by adding an `events` feature which enables the events endpoint and its associated dependencies.
The `lighthouse` feature also enables its associated dependencies making them optional.
The networking-adjacent dependencies were removed by just having certain fields use a `String` instead of an explicit network type. This means the user should handle conversion at the call site instead. This is a bit spicy, but I believe `PeerId`, `Enr` and `Multiaddr` are easily converted to and from `String`s so I think it's fine and reduces our dependency space by a lot. The alternative is to feature gate these types behind a `network` feature instead.
Co-Authored-By: Mac L <mjladson@pm.me>
Continuation of:
* #8536
Moving `/beacon/pool` endpoints out of `http_api` to a separation module. This should improve code maintainability, incremental compilation time and rust analyzer performance.
This is a tedious but straight forward change, so we're going with a pair & insta-merge approach to avoid painful & slow async review
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
Continuation of:
* #8529
Moving `/validator` endpoints out of `http_api` to a separation module. This should improve code maintainability, incremental compilation time and rust analyzer performance.
This is a tedious but straight forward change, so we're going with a pair & insta-merge approach to avoid painful & slow async review. @michaelsproul and I paired on the first commit - I believe we are almost done, will pair with @pawanjay176 tomorrow to wrap it up and merge tomorrow. (cc @macladson )
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
Organize and categorize `consensus/types` into modules based on their relation to key consensus structures/concepts.
This is a precursor to a sensible public interface.
While this refactor is very opinionated, I am open to suggestions on module names, or type groupings if my current ones are inappropriate.
Co-Authored-By: Mac L <mjladson@pm.me>
Part of the http api refactor to move endpoint handlers to separate modules.
This should improve code maintainability, incremental compilation time and rust analyzer performance.
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
Update `reqwest` to 0.12 so we only depend on a single version. This should slightly improve compile times and reduce binary bloat.
Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
Part of a fork-choice tech debt clean-up https://github.com/sigp/lighthouse/issues/8325https://github.com/sigp/lighthouse/issues/7089 (non-finalized checkpoint sync) changes the meaning of the checkpoints inside fork-choice. It turns out that we persist the justified and finalized checkpoints **twice** in fork-choice
1. Inside the fork-choice store
2. Inside the proto-array
There's no reason for 2. except for making the function signature of some methods smallers. It's not consistent with the rest of the crate, because in some functions we pass the external variable of time (current_slot) via args, but then read the finalized checkpoint from the internal state. Passing both variables as args makes fork-choice easier to reason about at the cost of a few extra lines.
Remove the unnecessary state (`justified_checkpoint`, `finalized_checkpoint`) inside `ProtoArray`, to make it easier to reason about.
Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>
Co-Authored-By: Michael Sproul <michaelsproul@users.noreply.github.com>
Debugging https://github.com/sigp/lighthouse/issues/8104 it would have been helpful to quickly see in the logs that a specific block was submitted into the HTTP API.
Because we want to optimize the block root computation we don't include it in the logs, and just log the block slot. I believe we can take a minute performance hit to have the block root in all the logs during block publishing.
Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
`beacon-chain-tests` is now regularly taking 1h+ on CI since Fulu fork was added.
This PR attemtpts to reduce the test time by bringing down the number of blobs generated in tests - instead of generating 0..max_blobs, the generator now generates 0..1 blobs by default, and this can be modified by setting `harness.execution_block_generator.set_min_blob_count(n)`.
Note: The blobs are pre-generated and doesn't require too much CPU to generate however processing a larger number of them on the beacon chain does take a lot of time.
This PR also include a few other small improvements
- Our slowest test (`chain_segment_varying_chunk_size`) runs 3x faster in Fulu just by reusing chain segments
- Avoid re-running fork specific tests on all forks
- Fix a bunch of tests that depends on the harness's existing random blob generation, which is fragile
beacon chain test time on test machine is **~2x** faster:
### `unstable`
```
Summary [ 751.586s] 291 tests run: 291 passed (13 slow), 0 skipped
```
### this branch
```
Summary [ 373.792s] 291 tests run: 291 passed (2 slow), 0 skipped
```
The next set of tests to optimise is the ones that use [`get_chain_segment`](77a9af96de/beacon_node/beacon_chain/tests/block_verification.rs (L45)), as it by default build 320 blocks with supernode - an easy optimisation would be to build these blocks with cgc = 8 for tests that only require fullnodes.
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
Co-Authored-By: Jimmy Chen <jimmy@sigmaprime.io>
Custody backfill sync has a bug when we request columns from more than one peer per batch. The fix here ensures we wait for all requests to be completed before performing verification and importing the responses.
I've also added an endpoint `lighthouse/custody/backfill` that resets a nodes earliest available data column to the current epoch so that custody backfill can be triggered. This endpoint is needed to rescue any nodes that may have missing columns due to the custody backfill sync bug without requiring a full re-sync.
Co-Authored-By: Eitan Seri- Levi <eserilev@gmail.com>
Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
Co-Authored-By: Michael Sproul <michaelsproul@users.noreply.github.com>
The beacon API spec wasn't updated to use the Fulu definition of `dependent_root` for the proposer duties endpoint. No other client updated their logic, so to retain backwards compatibility the decision has been made to continue using the block root at the end of epoch `N - 1`, and introduce a new v2 endpoint down the track to use the correct dependent root.
Eth R&D discussion: https://discord.com/channels/595666850260713488/598292067260825641/1433036715848765562
Change the behaviour of the v1 endpoint back to using the last slot of `N - 1` rather than the last slot of `N - 2`. This introduces the possibility of dependent root false positives (the root can change without changing the shuffling), but causes the least compatibility issues with other clients.
Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
Addresses #8218
A simplified version of #8241 for the initial release.
I've tried to minimise the logic change in this PR, although introducing the `NodeCustodyType` enum still result in quite a bit a of diff, but the actual logic change in `CustodyContext` is quite small.
The main changes are in the `CustdoyContext` struct
* ~~combining `validator_custody_count` and `current_is_supernode` fields into a single `custody_group_count_at_head` field. We persist the cgc of the initial cli values into the `custody_group_count_at_head` field and only allow for increase (same behaviour as before).~~
* I noticed the above approach caused a backward compatibility issue, I've [made a fix](15569bc085) and changed the approach slightly (which was actually what I had originally in mind):
* when initialising, only override the `validator_custody_count` value if either flag `--supernode` or `--semi-supernode` is used; otherwise leave it as the existing default `0`. Most other logic remains unchanged.
All existing validator custody unit tests are still all passing, and I've added additional tests to cover semi-supernode, and restoring `CustodyContext` from disk.
Note: I've added a `WARN` if the user attempts to switch to a `--semi-supernode` or `--supernode` - this currently has no effect, but once @eserilev column backfill is merged, we should be able to support this quite easily.
Things to test
- [x] cgc in metadata / enr
- [x] cgc in metrics
- [x] subscribed subnets
- [x] getBlobs endpoint
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
#7603
#### Custody backfill sync service
Similar in many ways to the current backfill service. There may be ways to unify the two services. The difficulty there is that the current backfill service tightly couples blocks and their associated blobs/data columns. Any attempts to unify the two services should be left to a separate PR in my opinion.
#### `SyncNeworkContext`
`SyncNetworkContext` manages custody sync data columns by range requests separetly from other sync RPC requests. I think this is a nice separation considering that custody backfill is its own service.
#### Data column import logic
The import logic verifies KZG committments and that the data columns block root matches the block root in the nodes store before importing columns
#### New channel to send messages to `SyncManager`
Now external services can communicate with the `SyncManager`. In this PR this channel is used to trigger a custody sync. Alternatively we may be able to use the existing `mpsc` channel that the `SyncNetworkContext` uses to communicate with the `SyncManager`. I will spend some time reviewing this.
Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>
Co-Authored-By: Eitan Seri- Levi <eserilev@gmail.com>
Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>
Addressing more review comments from:
- https://github.com/sigp/lighthouse/pull/8101
I've also tweaked a few more things that I think are minor bugs.
- Instrument `ensure_state_can_determine_proposers_for_epoch`
- Fix `block_root` usage in `compute_proposer_duties_from_head`. This was a regression introduced in 8101 😬 .
- Update the `state_advance_timer` to prime the next-epoch proposer cache post-Fulu.
Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
As identified by a researcher during the Fusaka security competition, we were computing the proposer index incorrectly in some places by computing without lookahead.
- [x] Add "low level" checks to computation functions in `consensus/types` to ensure they error cleanly
- [x] Re-work the determination of proposer shuffling decision roots, which are now fork aware.
- [x] Re-work and simplify the beacon proposer cache to be fork-aware.
- [x] Optimise `with_proposer_cache` to use `OnceCell`.
- [x] All tests passing.
- [x] Resolve all remaining `FIXME(sproul)`s.
- [x] Unit tests for `ProtoBlock::proposer_shuffling_root_for_child_block`.
- [x] End-to-end regression test.
- [x] Test on pre-Fulu network.
- [x] Test on post-Fulu network.
Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
Ensure that we don't log a warning for HTTP 202s, which are expected on the blinded block endpoints after Fulu.
Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
Part of #7866
- Continuation of #7921
In the above PR, we enabled rayon for batch KZG verification in chain segment processing. However, using the global rayon thread pool for backfill is likely to create resource contention with higher-priority beacon processor work.
This PR introduces a dedicated low-priority rayon thread pool `LOW_PRIORITY_RAYON_POOL` and uses it for processing backfill chain segments.
This prevents backfill KZG verification from using the global rayon thread pool and competing with high-priority beacon processor tasks for CPU resources.
However, this PR by itself doesn't prevent CPU oversubscription because other tasks could still fill up the global rayon thread pool, and having an extra thread pool could make things worse. To address this we need the beacon
processor to coordinate total CPU allocation across all tasks, which is covered in:
- #7789
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
Co-Authored-By: Eitan Seri- Levi <eserilev@gmail.com>
Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>
Since we re-enabled HTTP API tests on CI (https://github.com/sigp/lighthouse/pull/7943) there have been a few spurious failures:
- https://github.com/sigp/lighthouse/actions/runs/17608432465/job/50024519938?pr=7783
That error is awkward, but running locally with a short timeout confirms it to be a timeout.
Change the request timeout to 5s everywhere. We had kept it shorter to try to detect performance regressions, but I think this is better suited to being done with metrics & traces. On CI we really just want things to pass reliably without flakiness, so I think a longer timeout to handle slower test code (like mock-builder) and overworked CI boxes makes sense.
Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
Anchor currently depends on `lighthouse_network` for a few types and utilities that live within. As we use our own libp2p behaviours, we actually do not use the core logic in that crate. This makes us transitively depend on a bunch of unneeded crates (even a whole separate libp2p if the versions mismatch!)
Move things we require into it's own lightweight crate.
Co-Authored-By: Daniel Knopik <daniel@dknopik.de>
#7950
Skip column gossip verification logic during block production as its redundant and potentially computationally expensive.
Co-Authored-By: Eitan Seri- Levi <eserilev@gmail.com>
Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>
Co-Authored-By: Jimmy Chen <jimmy@sigmaprime.io>
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
A performance issue was discovered when devnet-3 was under non-finality - some of the lighthouse nodes are "stuck" with syncing because of handling proposer duties HTTP requests.
These validator requests are higher priority than Status processing, and if they are taking a long time to process, the node won't be able to progress. What's worse is - under long period of non finality, the proposer duties calculation function tries to do state advance for a large number of [slots](d545ddcbc7/beacon_node/beacon_chain/src/beacon_proposer_cache.rs (L183)) here, causing the node to spend all its CPU time on a task that doesn't really help, e.g. the computed duties aren't useful if the node is 20000 slots behind.
To solve this issue, we use the `not_while_syncing` filter to prevent serving these requests, until the node is synced. This should allow the node to focus on sync under non-finality situations.
Adds the required boilerplate code for the Gloas (Glamsterdam) hard fork. This allows PRs testing Gloas-candidate features to test fork transition.
This also includes de-duplication of post-Bellatrix readiness notifiers from #6797 (credit to @dapplion)