Create a `testing` feature which we can use to gate off `test_utils.rs` and its associated dependencies from the rest of the crate.
Co-Authored-By: Mac L <mjladson@pm.me>
We received a bug report of a node restarting custody backfill unnecessarily after upgrading to Lighthouse v8.1.1. What happened is:
- User started LH v8.0.1 many months ago, CGC updated 0 -> N but the CGC was not eagerly persisted.
- LH experienced an unclean shutdown (not sure of what type).
- Upon restarting (still running v8.0.1), the custody context read from disk contains CGC=0: `DEBUG Loaded persisted custody context custody_context: CustodyContext { validator_custody_count: 0, ...`).
- CGC updates again to N, retriggering custody backfill: `DEBUG Validator count at head updated old_count: 0, new_count: N`.
- Custody backfill does a bunch of downloading for no gain: `DEBUG Imported historical data columns epoch: Epoch(428433), total_imported: 0`
- While custody backfill is running user updated to v8.1.1, and we see logs for the CGC=N being peristed upon clean shutdown, and then correctly read on startup with v8.1.1.
- Custody backfill keeps running and downloading due to the CGC change still being considered in progress.
- Call `persist_custody_context` inside the `register_validators` handler so that it is written to disk eagerly whenever it changes. The performance impact of this should be minimal as the amount of data is very small and this call can only happen at most ~128 times (once for each change) in the entire life of a beacon node.
- Call `persist_custody_context` inside `BeaconChainBuilder::build` so that changes caused by CLI flags are persisted (otherwise starting a node with `--semi-supernode` and no validators, then shutting it down uncleanly would cause use to forget the CGC).
These changes greatly reduce the timespan during which an unclean shutdown can create inconsistency. In the worst case, we only lose backfill progress that runs concurrently with the `register_validators` handler (should be extremely minimal, nigh impossible).
Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
I found myself having to update this code for Gloas, and figured we may as well delete it seeing as it doesn't work.
See:
- https://github.com/sigp/lighthouse/issues/4198
Delete all `fork_revert` logic and the accompanying test.
Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
Which issue # does this PR address?
None
All of these are performing a check, and adding a batch, or creating a new lookup, or a new query, etc..
Hence all of these limits would be off by one.
Example:
```rust
// BACKFILL_BATCH_BUFFER_SIZE = 5
if self.batches.iter().filter(...).count() >= BACKFILL_BATCH_BUFFER_SIZE {
return None; // ← REJECT
}
// ... later adds batch via Entry::Vacant(entry).insert(...)
```
Without the `>` being changed to a `>=` , we would allow 6. The same idea applies to all changes proposed.
Co-Authored-By: Antoine James <antoine@ethereum.org>
Co-Authored-By: Jimmy Chen <jimmy@sigmaprime.io>
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
#8547
Bump the version of `uuid` in our Cargo.toml to version `1` which removes `uuid 0.8` and unifies it across the workspace to version `1.19.0`.
Co-Authored-By: Mac L <mjladson@pm.me>
N/A
Another find by @gitToki. Sort the preferred_ids in descending order as originally intended from the comment in the function.
Co-Authored-By: Pawan Dhananjay <pawandhananjay@gmail.com>
Fix the failure of the beacon-chain tests for phase0/altair, which now only runs nightly.
Just skip the payload invalidation tests, they don't make any sense prior to Bellatrix anyway.
Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
Fix a bug in v8.1.0 whereby the VC times out continuously with:
> Feb 18 02:03:48.030 WARN Head service failed retrying starting next slot error: "Head monitoring stream error, node: 0, error: SseClient(Transport(reqwest::Error { kind: Decode, source: reqwest::Error { kind: Body, source: TimedOut } }))"
- Remove the existing timeout for the events API by using `Duration::MAX`. This is necessary as the client is configured with a default timeout. This is the only way to override/remove it.
- DO NOT add a `read_timeout` (yet), as this would need to be configured on a per-client basis. We do not want to create a new Client for every call as the early commits on this branch were doing, as this would bypass the TLS cert config, and is also wasteful.
Co-Authored-By: hopinheimer <knmanas6@gmail.com>
Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
Co-Authored-By: Michael Sproul <michaelsproul@users.noreply.github.com>
Since https://github.com/sigp/lighthouse/pull/6847, invalid `BlocksByRange`/`BlobsByRange` requests, which do not comply with the spec, are [handled in the Handler](3d16d1080f/beacon_node/lighthouse_network/src/rpc/handler.rs (L880-L911)). Any peer that sends an invalid request is penalized and disconnected.
However, other kinds of invalid rpc request, which result in decoding errors, are just dropped. No penalty is applied and the connection with the peer remains.
I have added handling for the `ListenUpgradeError` event to notify the application of an `RPCError:InvalidData` error and disconnect to the peer that sent the invalid rpc request.
I also added tests for handling invalid rpc requests.
Co-Authored-By: ackintosh <sora.akatsuki@gmail.com>
None
I noticed that `observed_column_sidecars` is missing its prune call in the finalization handler, which results in a memory leak on long-running nodes (very slow (**7MB/day**)) :
13dfa9200f/beacon_node/beacon_chain/src/canonical_head.rs (L940-L959)
Both caches use the same generic type `ObservedDataSidecars<T>:`
22ec4b3271/beacon_node/beacon_chain/src/beacon_chain.rs (L413-L416)
The type's documentation explicitly requires manual pruning:
> "*The cache supports pruning based upon the finalized epoch. It does not automatically prune, you must call Self::prune manually.*"
b4704eab4a/beacon_node/beacon_chain/src/observed_data_sidecars.rs (L66-L74)
Currently:
- `observed_blob_sidecars` => pruned
- `observed_column_sidecars` => **NOT** pruned
Without pruning, the underlying HashMap accumulates entries indefinitely, causing continuous memory growth until the node restarts.
Co-Authored-By: Antoine James <antoine@ethereum.org>
- Implement `process_payload_attestation`
- Implement EF tests for payload attestations (allows simplification of handler now that we support all `operations` tests).
- Update the `BlockSignatureVerifier` to signature-verify payload attestations
Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
The flow for local block building is
1. Create execution payload and bid
2. Construct beacon block
3. Sign beacon block and publish
4. Sign execution payload and publish
This PR adds the beacon block v4 flow , GET payload envelope and POST payload envelope (local block building only). The spec for these endpoints can be found here: https://github.com/ethereum/beacon-APIs/pull/552 and is subject to change.
We needed a way to store the unsigned execution payload envelope associated to the execution payload bid that was included in the block. I introduced a new cache that stores these unsigned execution payload envelopes. the GET payload envelope queries this cache directly so that a proposer, after publishing a block, can fetch the payload envelope + sign and publish it.
I kept payload signing and publishing within the validators block service to keep things simple for now. The idea was to build out a block production MVP for devnet 0, try not to affect any non gloas code paths and build things out in such a way that it will be easy to deprecate pre-gloas code paths later on (for example block production v2 and v3).
We will eventually need to track which beacon node was queried for the block so that we can later query it for the payload. But thats not needed for the devnet.
Co-Authored-By: Eitan Seri- Levi <eserilev@gmail.com>
Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>
`eth2` is currently unable to be built without the `events` feature.
Feature gates the `SseEventSource` match arm in the `status` function.
Co-Authored-By: Mac L <mjladson@pm.me>
fixes issue #8750
This PR enables the inactivity_scores reward EF tests from v1.7.0-alpha.2.
- Enabled Tests: Added the inactivity_scores handler to the rewards test suite.
- Fork Filtering: Updated the runner to execute these tests only on supported forks (Altair onwards), preventing directory-not-found errors on earlier forks.
- CI Coverage: Removed exclusions in the file access check script to ensures all new test vectors are fully tracked.
Co-Authored-By: romeoscript <romeobourne211@gmail.com>
Co-Authored-By: Eitan Seri-Levi <eserilev@gmail.com>
Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
I believe I found a bug where during computation of `pid_process_seconds_total` we add `children_system` twice.
I assume that it was originally intended to add `children_system` and `children_user` once each
Co-Authored-By: Mac L <mjladson@pm.me>
- [x] Implement `process_builder_pending_payments` in epoch processing for Gloas. Enable the new EF tests for this sub-component as well.
- [x] Update `include_all_signatures_except_proposal` for Gloas to safely include the execution payload bid signature (this was an omission in the previous bid PR).
- [x] Enable Gloas for _all_ remaining EF tests by default. They all pass with the exception of the finality tests (see below).
Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
Co-Authored-By: Eitan Seri- Levi <eserilev@gmail.com>
- Step 0 of the tree-sync roadmap https://github.com/sigp/lighthouse/issues/7678
Current lookup sync tests are written in an explicit way that assume how the internals of lookup sync work. For example the test would do:
- Emit unknown block parent message
- Expect block request for X
- Respond with successful block request
- Expect block processing request for X
- Response with successful processing request
- etc..
This is unnecessarily verbose. And it will requires a complete re-write when something changes in the internals of lookup sync (has happened a few times, mostly for deneb and fulu).
What we really want to assert is:
- WHEN: we receive an unknown block parent message
- THEN: Lookup sync can sync that block
- ASSERT: Without penalizing peers, without unnecessary retries
Keep all existing tests and add new cases but written in the new style described above. The logic to serve and respond to request is in this function `fn simulate` 2288a3aeb1/beacon_node/network/src/sync/tests/lookups.rs (L301)
- It controls peer behavior based on a `CompleteStrategy` where you can set for example "respond to BlocksByRoot requests with empty"
- It actually runs beacon processor messages running their clousures. Now sync tests actually import blocks, increasing the test coverage to the interaction of sync and the da_checker.
- To achieve the above the tests create real blocks with the test harness. To make the tests as fast as before, I disabled crypto with `TestConfig`
Along the way I found a couple bugs, which I documented on the diff.
Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>
Currently, `consensus/types` cannot build with `no-default-features` since we use "legacy" standard arithmetic operations.
- Remove the offending arithmetic to fix compilation.
- Rename `legacy-arith` to `saturating-arith` and disable it by default.
Co-Authored-By: Mac L <mjladson@pm.me>
This reverts some of the changes from #8524 by adding back the typed network endpoints with an optional `network` feature. Without the `network` feature, these endpoints (and associated dependencies) will not be built.
This means the `enr`, `multiaddr` and `libp2p-identity` dependencies have returned but are now optional
Co-Authored-By: Mac L <mjladson@pm.me>
- [x] Consensus changes for execution payload bids
- [x] EF tests for bids (and `block_header` -- no changes required).
Co-Authored-By: Michael Sproul <michael@sigmaprime.io>