Resolves#6767
This PR implements a basic version of validator custody.
- It introduces a new `CustodyContext` object which contains info regarding number of validators attached to a node and the custody count they contribute to the cgc.
- The `CustodyContext` is added in the da_checker and has methods for returning the current cgc and the number of columns to sample at head. Note that the logic for returning the cgc existed previously in the network globals.
- To estimate the number of validators attached, we use the `beacon_committee_subscriptions` endpoint. This might overestimate the number of validators actually publishing attestations from the node in the case of multi BN setups. We could also potentially use the `publish_attestations` endpoint to get a more conservative estimate at a later point.
- Anytime there's a change in the `custody_group_count` due to addition/removal of validators, the custody context should send an event on a broadcast channnel. The only subscriber for the channel exists in the network service which simply subscribes to more subnets. There can be additional subscribers in sync that will start a backfill once the cgc changes.
TODO
- [ ] **NOT REQUIRED:** Currently, the logic only handles an increase in validator count and does not handle a decrease. We should ideally unsubscribe from subnets when the cgc has decreased.
- [ ] **NOT REQUIRED:** Add a service in the `CustodyContext` that emits an event once `MIN_EPOCHS_FOR_BLOB_SIDECARS_REQUESTS ` passes after updating the current cgc. This event should be picked up by a subscriber which updates the enr and metadata.
- [x] Add more tests
#7461 and partly #6439.
Desired behaviour after receiving `engine_getBlobs` response:
1. Gossip verify the blobs and proofs, but don't mark them as observed yet. This is because not all blobs are published immediately (due to staggered publishing). If we mark them as observed and not publish them, we could end up blocking the gossip propagation.
2. Blobs are marked as observed _either_ when:
* They are received from gossip and forwarded to the network .
* They are published by the node.
Current behaviour:
- ❗ We only gossip verify `engine_getBlobsV1` responses, but not `engine_getBlobsV2` responses (PeerDAS).
- ❗ After importing EL blobs AND before they're published, if the same blobs arrive via gossip, they will get re-processed, which may result in a re-import.
1. Perform gossip verification on data columns computed from EL `getBlobsV2` response. We currently only do this for `getBlobsV1` to prevent importing blobs with invalid proofs into the `DataAvailabilityChecker`, this should be done on V2 responses too.
2. Add additional gossip verification to make sure we don't re-process a ~~blob~~ or data column that was imported via the EL `getBlobs` but not yet "seen" on the gossip network. If an "unobserved" gossip blob is found in the availability cache, then we know it has passed verification so we can immediately propagate the `ACCEPT` result and forward it to the network, but without re-processing it.
**UPDATE:** I've left blobs out for the second change mentioned above, as the likelihood and impact is very slow and we haven't seen it enough, but under PeerDAS this issue is a regular occurrence and we do see the same block getting imported many times.
The endpoint `/eth/v1/beacon/states/head/validator_balances` returns an empty data when the data field is `[]`. According to the beacon API spec, it should return the balances of all validators:
Reference: https://ethereum.github.io/beacon-APIs/#/Beacon/postStateValidatorBalances
`If the supplied list is empty (i.e. the body is []) or no body is supplied then balances will be returned for all validators.`
This PR changes so that: `curl -X 'POST' 'http://localhost:5052/eth/v1/beacon/states/head/validator_balances' -d '[]' | jq` returns balances of all validators.
Resolves https://github.com/sigp/lighthouse/issues/7414
The health endpoint returns a 503 if the engine state is offline. The default state for the engine is `Offline`. So until the first request to the EL is made and the state is updated, the health endpoint will keep returning 503s.
This PR changes the default state to Online to avoid that. I don't think this causes any issues because in case the EL is actually offline, the first fcu will set the state to offline.
Pending testing on kurtosis.
Fix clippy lints for `rustc` 1.87
clippy complains about `BeaconChainError` being too large. I went on a bit of a boxing spree because of this. We may instead want to `Box` some of the `BeaconChainError` variants?
Did not find a specific issue beside https://github.com/sigp/lighthouse/issues/6821
Leverage `whistleblower_reward_quotient_for_state` to have accurate post-electra `proposer_slashings` and `attester_slashings` fields returned by `/eth/v1/beacon/rewards/blocks/<id>`.
#7294
Fix the filtering logic so that we actually filter by committee index for both `Base` and `Electra` attestations.
Added a tiny optimization when calculating committee_index to prevent unneeded memory allocations
Added a regression test
Closes#7167
- Ensure the fork digest is generated from ther light client updates attested header and not the signature slot
- Ensure the format of the SSZ response is spec compliant
#6296: Deterministic RNG in peer DAS publish block tests
Made test functions to call publish-block APIs with true for the deterministic RNG boolean parameter while production code with false. This will deterministically shuffle columns for unit tests under broadcast_validation_tests.rs.
N/A
Adds endpoints to add and remove trusted peers from the http api. The added peers are trusted peers so they won't be disconnected for bad scores. We try to maintain a connection to the peer in case they disconnect from us by trying to dial it every heartbeat.
- Part of https://github.com/sigp/lighthouse/issues/6767
Validator custody makes the CGC and set of sampling columns dynamic. Right now this information is stored twice:
- in the data availability checker
- in the network globals
If that state becomes dynamic we must make sure it is in sync updating it twice, or guarding it behind a mutex. However, I noted that we don't really have to keep the CGC inside the data availability checker. All consumers can actually read it from the network globals, and we can update `make_available` to read the expected count of data columns from the block.
Backport of:
- https://github.com/sigp/lighthouse/pull/7067
For:
- https://github.com/sigp/lighthouse/issues/7039
- Prevent writing to state cache when migrating the database
- Add `state-cache-headroom` flag to control pruning
- Prune old epoch boundary states ahead of mid-epoch states
- Never prune head block's state
- Avoid caching ancestor states unless they are on an epoch boundary
- Log when states enter/exit the cache
Co-authored-by: Eitan Seri-Levi <eserilev@ucsc.edu>
- #6452 (partially)
Remove dependencies on `store` and `lighthouse_network` from `eth2`. This was achieved as follows:
- depend on `enr` and `multiaddr` directly instead of using `lighthouse_network`'s reexports.
- make `lighthouse_network` responsible for converting between API and internal types.
- in two cases, remove complex internal types and use the generic `serde_json::Value` instead - this is not ideal, but should be fine for now, as this affects two internal non-spec endpoints which are meant for debugging, unstable, and subject to change without notice anyway. Inspired by #6679. The alternative is to move all relevant types to `eth2` or `types` instead - what do you think?
Delete duplicate sync tolerance epoch config in the HTTP API which is unused.
We introduced the `sync-tolerance-epoch` flag in this PR:
- https://github.com/sigp/lighthouse/pull/7030
Then refined it in this PR:
- https://github.com/sigp/lighthouse/pull/7044
Somewhere in the merge of `release-v7.0.0` into `unstable`, the config from the original PR which had been deleted came back. I think I resolved these conflicts, so my bad.
Related to #6880, an issue that's usually observed on local devnets with small number of nodes.
When testing range sync, I usually shutdown a node for some period of time and restart it again. However, if it's within `SYNC_TOLERANCE_EPOCHS` (8), Lighthouse would consider the node as synced, and if it may attempt to produce a block if requested by a validator - on a local devnet, nodes frequently produce blocks - when this happens, the node ends up producing a block that would revert finality and would get disconnected from peers immediately.
NOTE: This is PR#7030 cherry-picked from `unstable` to `release-v7.0.0`.
Run Lighthouse BN with this flag to override:
```
--sync-tolerance--epoch 0
```
Related to #6880, an issue that's usually observed on local devnets with small number of nodes.
When testing range sync, I usually shutdown a node for some period of time and restart it again. However, if it's within `SYNC_TOLERANCE_EPOCHS` (8), Lighthouse would consider the node as synced, and if it may attempt to produce a block if requested by a validator - on a local devnet, nodes frequently produce blocks - when this happens, the node ends up producing a block that would revert finality and would get disconnected from peers immediately.
### Usage
Run Lighthouse BN with this flag to override:
```
--sync-tolerance--epoch 0
```
Partly addresses
- https://github.com/sigp/lighthouse/issues/6959
Use the `enable_light_client_server` field from the beacon chain config in the HTTP API. I think we can make this the single source of truth, as I think the network crate also has access to the beacon chain config.
N/A
2 changes:
1. Replace Option::map_or(true, ...) with is_none_or(...)
2. Remove unnecessary `Into::into` blocks where the type conversion is apparent from the types
Closes#6983
`GET v2/validator/aggregate_attestation` is not backwards compatible. It only works for post electra attestations. This PR adds backwards compatibility and additional test coverage. We should include this in the upcoming 7.0 beta release if possible
Closes:
- https://github.com/sigp/lighthouse/issues/6818
Use `MAX_EFFECTIVE_BALANCE_ELECTRA` (2048) for attestation reward calculations involving Electra.
Add a new `InteropGenesisBuilder` that tries to provide a more flexible way to build genesis states. Unfortunately due to lifetime jank, it is quite unergonomic at present. We may want to refactor this builder in future to make it easier to use.
Closes
- https://github.com/sigp/lighthouse/issues/6805
- Use a new `WorkEvent::GossipAttestationToConvert` to handle the conversion from `SingleAttestation` to `Attestation` _on_ the beacon processor (prevents a Tokio thread being blocked).
- Improve the error handling for single attestations. I think previously we had no ability to reprocess single attestations for unknown blocks -- we would just error. This seemed to be the case in both gossip processing and processing of `SingleAttestation`s from the HTTP API.
- Move the `SingleAttestation -> Attestation` conversion function into `beacon_chain` so that it can return the `attestation_verification::Error` type, which has well-defined error handling and peer penalties. The now-unused variants of `types::Attestation::Error` have been removed.
Addresses #6706
This PR activates PeerDAS at the Fulu fork epoch instead of `EIP_7594_FORK_EPOCH`. This means we no longer support testing PeerDAS with Deneb / Electrs, as it's now part of a hard fork.