Update `SAMPLES_PER_SLOT` to be number of custody groups instead of data columns. This should have no impact on the current implementation as config currently maintains a `group:subnet:column` ratio of `1:1:1`. **In short, this PR doesn't change anything for Fusaka, but ensures compliance with the spec and potential future changes.**
I've added separate methods to compute sampling columns and custody groups for clarity: `spec.sampling_size_columns` and `spec.sampling_size_custod_groups`
See the clarifications in this PR for more details: https://github.com/ethereum/consensus-specs/pull/4251
N/A
Persist the epoch -> cgc values. This is to ensure that `ValidatorRegistrations::latest_validator_custody_requirement` always returns a `Some` value post restart assuming the `epoch_validator_custody_requirements` map has been updated in the previous runs.
This PR implements some heuristics to check for breaking database changes. The goal is to prevent accidental changes to the database schema occurring without a version bump.
The `sync_contributions_across_fork_with_skip_slots` test have been quite flaky recently on CI, we suspect it might be caused by the recent introduction of a `default` timeout in #7400, and CI is failing to consistently process those http requests within 1s likely due to limited resources.
This PR increases the `default` timeout to 2s in tests to avoid flaky tests, but keeps the remaining timeout the same (1s).
https://github.com/sigp/lighthouse/actions/runs/15965113170/job/45023976021
```
FAIL [ 8.945s] http_api::bn_http_api_tests fork_tests::sync_contributions_across_fork_with_skip_slots
stdout ───
running 1 test
test fork_tests::sync_contributions_across_fork_with_skip_slots ... FAILED
failures:
failures:
fork_tests::sync_contributions_across_fork_with_skip_slots
test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 175 filtered out; finished in 8.91s
stderr ───
thread 'fork_tests::sync_contributions_across_fork_with_skip_slots' panicked at beacon_node/http_api/tests/fork_tests.rs:239:10:
called `Result::unwrap()` on an `Err` value: HttpClient(url: http://127.0.0.1:41793/, kind: timeout, detail: operation timed out)
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
```
When we removed the eth1 data, I wrote a v25 schema upgrade to delete the data on disk:
- https://github.com/sigp/lighthouse/pull/7133
However, I forgot to update the current schema version, so this change was never actioned.
This PR updates the current schema version to v25 so that the migration runs.
This bug was first found and partially fixed by @VolodymyrBg in #7317 - this PR applies the same fix everywhere else.
The old logic updated the waker when it already matched the context, and did nothing when it was stale:
```rust
if waker.will_wake(cx.waker()) {
self.waker = Some(cx.waker().clone());
}
```
This is the wrong way around. We only want to update the waker if it doesn't match the current context:
```rust
if !waker.will_wake(cx.waker()) {
self.waker = Some(cx.waker().clone());
}
```
I don't think we've ever noticed any issues, but it’s a subtle bug that could lead to missed wakeups.
N/A
For responding to by_range requests , we should ideally only respond with items in the range `req.start_slot()..req.start_slot() + req.count`.
We were not filtering the generated response for blobs and data columns, only for blocks. This PR adds the filtering for the sidecars as well.
N/A
After the electra fork which includes EIP 6110, the beacon node no longer needs the eth1 bridging mechanism to include new deposits as they are provided by the EL as a `deposit_request`. So after electra + a transition period where the finalized bridge deposits pre-fork are included through the old mechanism, we no longer need the elaborate machinery we had to get deposit contract data from the execution layer.
Since holesky has already forked to electra and completed the transition period, this PR basically checks to see if removing all the eth1 related logic leads to any surprises.
I think this should resolve#7155
This removes the level field from the instrumenting we were doing across a range of functions. The level will now default to the level of the log.
Partially https://github.com/sigp/lighthouse/issues/6291
This PR removes the reprocess event channel from being externally exposed. All work events are now sent through the single `BeaconProcessorSend` channel. I've introduced a new `Work::Reprocess` enum variant which we then use to schedule jobs for reprocess. I've also created a new scheduler module which will eventually house the different scheduler impls.
This is all needed as an initial step to generalize the beacon processor
A "full" implementation for the generalized beacon processor can be found here
https://github.com/sigp/lighthouse/pull/6448
I'm going to try to break up the full implementation into smaller PR's so it can actually be reviewed
#6970
This allows for us to receive `SingleAttestation` over gossip and process it without converting. There is still a conversion to `Attestation` as a final step in the attestation verification process, but by then the `SingleAttestation` is fully verified.
I've also fully removed the `submitPoolAttestationsV1` endpoint as its been deprecated
I've also pre-emptively deprecated supporting `Attestation` in `submitPoolAttestationsV2` endpoint. See here for more info: https://github.com/ethereum/beacon-APIs/pull/531
I tried to the minimize the diff here by only making the "required" changes. There are some unnecessary complexities with the way we manage the different attestation verification wrapper types. We could probably consolidate this to one wrapper type and refactor this even further. We could leave that to a separate PR if we feel like cleaning things up in the future.
Note that I've also updated the test harness to always submit `SingleAttestation` regardless of fork variant. I don't see a problem in that approach and it allows us to delete more code :)
Currently the validator effective balance used for computing PeerDAS custody group count is only updated when the validator subscribes to the BN via `validator/beacon_committee_subscriptions`.
If a validator stops registering with the node, the effective balance gets outdated and stays in the BN memory until the next restart. They are no longer required for CGC computation, as long as the CGC never reduces as per the spec, therefore they can be dropped.
Resolves#6767
This PR implements a basic version of validator custody.
- It introduces a new `CustodyContext` object which contains info regarding number of validators attached to a node and the custody count they contribute to the cgc.
- The `CustodyContext` is added in the da_checker and has methods for returning the current cgc and the number of columns to sample at head. Note that the logic for returning the cgc existed previously in the network globals.
- To estimate the number of validators attached, we use the `beacon_committee_subscriptions` endpoint. This might overestimate the number of validators actually publishing attestations from the node in the case of multi BN setups. We could also potentially use the `publish_attestations` endpoint to get a more conservative estimate at a later point.
- Anytime there's a change in the `custody_group_count` due to addition/removal of validators, the custody context should send an event on a broadcast channnel. The only subscriber for the channel exists in the network service which simply subscribes to more subnets. There can be additional subscribers in sync that will start a backfill once the cgc changes.
TODO
- [ ] **NOT REQUIRED:** Currently, the logic only handles an increase in validator count and does not handle a decrease. We should ideally unsubscribe from subnets when the cgc has decreased.
- [ ] **NOT REQUIRED:** Add a service in the `CustodyContext` that emits an event once `MIN_EPOCHS_FOR_BLOB_SIDECARS_REQUESTS ` passes after updating the current cgc. This event should be picked up by a subscriber which updates the enr and metadata.
- [x] Add more tests
We're seeing slow KZG performance on `fusaka-devnet-0` and looking for optimisations to improve performance.
Zipping the list first then `into_par_iter` shows a 10% improvement in performance benchmark, i suspect this might be even more material when running on a beacon node.
Before:
```
blobs_to_data_column_sidecars_20
time: [11.583 ms 12.041 ms 12.534 ms]
Found 5 outliers among 100 measurements (5.00%)
```
After:
```
blobs_to_data_column_sidecars_20
time: [10.506 ms 10.724 ms 10.982 ms]
change: [-14.925% -10.941% -6.5452%] (p = 0.00 < 0.05)
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
```
Issue discovered on PeerDAS devnet (node `lighthouse-geth-2.peerdas-devnet-5.ethpandaops.io`). Summary:
- A lookup is created for block root `0x28299de15843970c8ea4f95f11f07f75e76a690f9a8af31d354c38505eebbe12`
- That block or a parent is faulty and `0x28299de15843970c8ea4f95f11f07f75e76a690f9a8af31d354c38505eebbe12` is added to the failed chains cache
- We later receive a block that is a child of a child of `0x28299de15843970c8ea4f95f11f07f75e76a690f9a8af31d354c38505eebbe12`
- We create a lookup, which attempts to process the child of `0x28299de15843970c8ea4f95f11f07f75e76a690f9a8af31d354c38505eebbe12` and hit a processor error `UnknownParent`, hitting this line
bf955c7543/beacon_node/network/src/sync/block_lookups/mod.rs (L686-L688)
`search_parent_of_child` does not create a parent lookup because the parent root is in the failed chain cache. However, we have **already** marked the child as awaiting the parent. This results in an inconsistent state of lookup sync, as there's a lookup awaiting a parent that doesn't exist.
Now we have a lookup (the child of `0x28299de15843970c8ea4f95f11f07f75e76a690f9a8af31d354c38505eebbe12`) that is awaiting a parent lookup that doesn't exist: hence stuck.
### Impact
This bug can affect Mainnet as well as PeerDAS devnets.
This bug may stall lookup sync for a few minutes (up to `LOOKUP_MAX_DURATION_STUCK_SECS = 15 min`) until the stuck prune routine deletes it. By that time the root will be cleared from the failed chain cache and sync should succeed. During that time the user will see a lot of `WARN` logs when attempting to add each peer to the inconsistent lookup. We may also sync the block through range sync if we fall behind by more than 2 epochs. We may also create the parent lookup successfully after the failed cache clears and complete the child lookup.
This bug is triggered if:
- We have a lookup that fails and its root is added to the failed chain cache (much more likely to happen in PeerDAS networks)
- We receive a block that builds on a child of the block added to the failed chain cache
Ensure that we never create (or leave existing) a lookup that references a non-existing parent.
I added `must_use` lints to the functions that create lookups. To fix the specific bug we must recursively drop the child lookup if the parent is not created. So if `search_parent_of_child` returns `false` now return `LookupRequestError::Failed` instead of `LookupResult::Pending`.
As a bonus I have a added more logging and reason strings to the errors
Lighthouse currently requires checkpoint sync to be performed against a supernode in a PeerDAS network, as only supernodes can serve blobs.
This PR lifts that requirement, enabling Lighthouse to checkpoint sync from either a fullnode or a supernode (See https://github.com/sigp/lighthouse/issues/6837#issuecomment-2933094923)
Missing data columns for the checkpoint block isn't a big issue, but we should be able to easily implement backfill once we have the logic to backfill data columns.
Getting this error on a non-PeerDAS network:
```
May 29 13:30:13.484 ERROR Error fetching or processing blobs from EL error: BlobProcessingError(AvailabilityCheck(Unexpected("empty blobs"))), block_root: 0x98aa3927056d453614fefbc79eb1f9865666d1f119d0e8aa9e6f4d02aa9395d9
```
It appears we're passing an empty `Vec` to DA checker, because all blobs were already seen on gossip and filtered out, this causes a `AvailabilityCheckError::Unexpected("empty blobs")`.
I've added equivalent unit tests for `getBlobsV1` to cover all the scenarios we test in `getBlobsV2`. This would have caught the bug if I had added it earlier. It also caught another bug which could trigger duplicate block import.
Thanks Santito for reporting this! 🙏
Addresses a regression recently introduced when we started gossip verifying data columns from EL blobs
```
failures:
network_beacon_processor::tests::accept_processed_gossip_data_columns_without_import
test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 90 filtered out; finished in 16.60s
stderr ───
thread 'network_beacon_processor::tests::accept_processed_gossip_data_columns_without_import' panicked at beacon_node/network/src/network_beacon_processor/tests.rs:829:10:
should put data columns into availability cache: Unexpected("empty columns")
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
```
https://github.com/sigp/lighthouse/actions/runs/15309278812/job/43082341868?pr=7521
If an empty `Vec` is passed to the DA checker, it causes an unexpected error.
This PR addresses it by not passing an empty `Vec` for processing, and not spawning a task to publish.
https://github.com/sigp/lighthouse/issues/6875
- Enabled the linter in rate-limiter and fixed errors.
- Changed the type of `Quota::max_tokens` from `u64` to `NonZeroU64` because `max_tokens` cannot be zero.
- Added a test to ensure that a large value for `tokens`, which causes an overflow, is handled properly.
Partly addresses:
- https://github.com/sigp/lighthouse/issues/7379
Handle attestation validation errors from `get_attesting_indices` to prevent an error log, downscore the peer, and reject the message.
#7461 and partly #6439.
Desired behaviour after receiving `engine_getBlobs` response:
1. Gossip verify the blobs and proofs, but don't mark them as observed yet. This is because not all blobs are published immediately (due to staggered publishing). If we mark them as observed and not publish them, we could end up blocking the gossip propagation.
2. Blobs are marked as observed _either_ when:
* They are received from gossip and forwarded to the network .
* They are published by the node.
Current behaviour:
- ❗ We only gossip verify `engine_getBlobsV1` responses, but not `engine_getBlobsV2` responses (PeerDAS).
- ❗ After importing EL blobs AND before they're published, if the same blobs arrive via gossip, they will get re-processed, which may result in a re-import.
1. Perform gossip verification on data columns computed from EL `getBlobsV2` response. We currently only do this for `getBlobsV1` to prevent importing blobs with invalid proofs into the `DataAvailabilityChecker`, this should be done on V2 responses too.
2. Add additional gossip verification to make sure we don't re-process a ~~blob~~ or data column that was imported via the EL `getBlobs` but not yet "seen" on the gossip network. If an "unobserved" gossip blob is found in the availability cache, then we know it has passed verification so we can immediately propagate the `ACCEPT` result and forward it to the network, but without re-processing it.
**UPDATE:** I've left blobs out for the second change mentioned above, as the likelihood and impact is very slow and we haven't seen it enough, but under PeerDAS this issue is a regular occurrence and we do see the same block getting imported many times.
Update `engine_getBlobsV2` response type to `Option<Vec<BlobsAndProofV2>>`. See recent spec change [here](https://github.com/ethereum/execution-apis/pull/630).
Added some tests to cover basic fetch blob scenarios.
Currently `test_delayed_rpc_response` is flaky (possibly specific to Windows?), but I'm not sure why.
Enabled stdout logging in rpc_tests. Note that in nextest, std output is only displayed when a test fails.
Use slice.is_sorted which was stabilised in Rust 1.82.0
I thought there would be more places we could use this, but it seems we often want to check strict monotonicity (i.e. sorted + no duplicates)
The endpoint `/eth/v1/beacon/states/head/validator_balances` returns an empty data when the data field is `[]`. According to the beacon API spec, it should return the balances of all validators:
Reference: https://ethereum.github.io/beacon-APIs/#/Beacon/postStateValidatorBalances
`If the supplied list is empty (i.e. the body is []) or no body is supplied then balances will be returned for all validators.`
This PR changes so that: `curl -X 'POST' 'http://localhost:5052/eth/v1/beacon/states/head/validator_balances' -d '[]' | jq` returns balances of all validators.
Closes#5016
The op pool was using the wrong denominator when calculating proposer block rewards! This was mostly inconsequential as our studies of Lighthouse's block profitability already showed that it is very close to optimal. The wrong denominator was leftover from phase0 code, and wasn't properly updated for Altair.