I've been working at updating another library to latest Lighthouse and got very confused with RPC request Ids.
There were types that had fields called `request_id` and `id`. And interchangeably could have types `PeerRequestId`, `rpc::RequestId`, `AppRequestId`, `api_types::RequestId` or even `Request.id`.
I couldn't keep track of which Id was linked to what and what each type meant.
So this PR mainly does a few things:
- Changes the field naming to match the actual type. So any field that has an `AppRequestId` will be named `app_request_id` rather than `id` or `request_id` for example.
- I simplified the types. I removed the two different `RequestId` types (one in Lighthouse_network the other in the rpc) and grouped them into one. It has one downside tho. I had to add a few unreachable lines of code in the beacon processor, which the extra type would prevent, but I feel like it might be worth it. Happy to add an extra type to avoid those few lines.
- I also removed the concept of `PeerRequestId` which sometimes went alongside a `request_id`. There were times were had a `PeerRequest` and a `Request` being returned, both of which contain a `RequestId` so we had redundant information. I've simplified the logic by removing `PeerRequestId` and made a `ResponseId`. I think if you look at the code changes, it simplifies things a bit and removes the redundant extra info.
I think with this PR things are a little bit easier to reasonable about what is going on with all these RPC Ids.
NOTE: I did this with the help of AI, so probably should be checked
N/A
Adds endpoints to add and remove trusted peers from the http api. The added peers are trusted peers so they won't be disconnected for bad scores. We try to maintain a connection to the peer in case they disconnect from us by trying to dial it every heartbeat.
This is a workaround for #7216
In the case of gaps between the in-memory pub key cache and its on-disk representation, use the head state on startup to "top-up" the cache/db w/ any missing validators
- Part of https://github.com/sigp/lighthouse/issues/6767
Validator custody makes the CGC and set of sampling columns dynamic. Right now this information is stored twice:
- in the data availability checker
- in the network globals
If that state becomes dynamic we must make sure it is in sync updating it twice, or guarding it behind a mutex. However, I noted that we don't really have to keep the CGC inside the data availability checker. All consumers can actually read it from the network globals, and we can update `make_available` to read the expected count of data columns from the block.
Part of
- https://github.com/sigp/lighthouse/issues/6258
`RangeBlockComponentsRequest` handles a set of by_range requests. It's quite lose on these requests, not tracking them by ID. We want to implement individual request retries, so we must make `RangeBlockComponentsRequest` aware of its requests IDs. We don't want the result of a prior by_range request to affect the state of a future retry. Lookup sync uses this mechanism.
Now `RangeBlockComponentsRequest` tracks:
```rust
pub struct RangeBlockComponentsRequest<E: EthSpec> {
blocks_request: ByRangeRequest<BlocksByRangeRequestId, Vec<Arc<SignedBeaconBlock<E>>>>,
block_data_request: RangeBlockDataRequest<E>,
}
enum RangeBlockDataRequest<E: EthSpec> {
NoData,
Blobs(ByRangeRequest<BlobsByRangeRequestId, Vec<Arc<BlobSidecar<E>>>>),
DataColumns {
requests: HashMap<
DataColumnsByRangeRequestId,
ByRangeRequest<DataColumnsByRangeRequestId, DataColumnSidecarList<E>>,
>,
expected_custody_columns: Vec<ColumnIndex>,
},
}
enum ByRangeRequest<I: PartialEq + std::fmt::Display, T> {
Active(I),
Complete(T),
}
```
I have merged `is_finished` and `Into_responses` into the same function. Otherwise, we need to duplicate the logic to figure out if the requests are done.
Backport of:
- https://github.com/sigp/lighthouse/pull/7067
For:
- https://github.com/sigp/lighthouse/issues/7039
- Prevent writing to state cache when migrating the database
- Add `state-cache-headroom` flag to control pruning
- Prune old epoch boundary states ahead of mid-epoch states
- Never prune head block's state
- Avoid caching ancestor states unless they are on an epoch boundary
- Log when states enter/exit the cache
Co-authored-by: Eitan Seri-Levi <eserilev@ucsc.edu>
I feel it's preferable to do this explicitly by updating the revision on `Cargo.toml` rather than implicitly by letting `Cargo.lock` control the revision of the branch.
- #6452 (partially)
Remove dependencies on `store` and `lighthouse_network` from `eth2`. This was achieved as follows:
- depend on `enr` and `multiaddr` directly instead of using `lighthouse_network`'s reexports.
- make `lighthouse_network` responsible for converting between API and internal types.
- in two cases, remove complex internal types and use the generic `serde_json::Value` instead - this is not ideal, but should be fine for now, as this affects two internal non-spec endpoints which are meant for debugging, unstable, and subject to change without notice anyway. Inspired by #6679. The alternative is to move all relevant types to `eth2` or `types` instead - what do you think?
Currently range sync download log errors just say `error: rpc_error` which isn't helpful. The actual error is suppressed unless logged somewhere else.
Log the actual error that caused the batch download to fail as part of the log that states that the batch download failed.
We forked `gossipsub` into the lighthouse repo sometime ago so that we could iterate quicker on implementing back pressure and IDONTWANT.
Meanwhile we have pushed all our changes upstream and we are now the main maintainers of `rust-libp2p` this allows us to use upstream `gossipsub` again.
Nonetheless we still use our forked repo to give us freedom to experiment with features before submitting them upstream
Delete duplicate sync tolerance epoch config in the HTTP API which is unused.
We introduced the `sync-tolerance-epoch` flag in this PR:
- https://github.com/sigp/lighthouse/pull/7030
Then refined it in this PR:
- https://github.com/sigp/lighthouse/pull/7044
Somewhere in the merge of `release-v7.0.0` into `unstable`, the config from the original PR which had been deleted came back. I think I resolved these conflicts, so my bad.
NA
Bumps the `ethereum_ssz` version, along with other crates that share the dep.
Primarily, this give us bitfields which can store 128 bytes on the stack before allocating, rather than 32 bytes (https://github.com/sigp/ethereum_ssz/pull/38). The validator count has increase massively since we set it at 32 bytes, so aggregation bitfields (et al) now require a heap allocation. This new value of 128 should get us to ~2m active validators.
This is a backport from `holesky-rescue`.
Part of:
- https://github.com/sigp/lighthouse/issues/7039
Original PR to `holesky-rescue`:
- https://github.com/sigp/lighthouse/pull/7054
Avoid doing database lookups for slots that lie in the hot database when processing status messages. This avoids a DoS vector during non-finality, as loading hot states to iterate block roots is very expensive.
This change makes the `total_difficulty` field in `ExecutionBlock` an `Option<Uint256>` since newer clients are no longer including the `totalDifficulty` field.
I think this will fix https://github.com/sigp/lighthouse/issues/6937 but I was actually more focused on the builder registration case described below.
In our [builder-playground](https://github.com/flashbots/builder-playground) we setup a local devnet using lighthouse, reth, and mev-boost-relay. After upgrading to reth 1.2.0 and lighthouse v7.0.0.beta.0 for Pectra, we noticed that the validator registration process was _sometimes_ failing with:
```
Feb 25 23:35:25.038 ERRO Unable to publish proposer preparation to all beacon nodes, error: Some endpoints failed, num_failed: 1 http://localhost:3500/ => RequestFailed(ServerMessage(ErrorMessage { code: 400, message: "BAD_REQUEST: error updating proposer preparations: ForkchoiceUpdate(EngineError(Api { error: Json(Error(\"missing field `totalDifficulty`\", line: 0, column: 0)) }))", stacktraces: [] })), service: preparation
Feb 25 23:35:25.099 WARN Unable to publish validator registrations to the builder network, error: Some endpoints failed, num_failed: 1 http://localhost:3500/ => RequestFailed(ServerMessage(ErrorMessage { code: 400, message: "BAD_REQUEST: error updating proposer preparations: ForkchoiceUpdate(EngineError(Api { error: Json(Error(\"missing field `totalDifficulty`\", line: 0, column: 0)) }))", stacktraces: [] })), service: preparation
```
What was even more confusing, was that it was sometimes working, which actually led to a wild goose chase thinking it was a networking issue. However, when tracing through the LH code, I came across this comment:
70194dfc6a/beacon_node/beacon_chain/src/beacon_chain.rs (L6048-L6049)
This explained why it sometimes worked, in our playground we run lighthouse with `--prepare-payload-lookahead 8000` thus there was always a 4-second window where the call wasn't made.
But, if the call was made, then this code would 100% fail with updated reth:
https://github.com/sigp/lighthouse/blob/unstable/beacon_node/execution_layer/src/lib.rs#L1688-L1692
Which would then mapped to a `Error::ForkchoiceUpdate` in `update_execution_engine_forkchoice`.
Anyways, the fix was to make `total_difficulty` Optional, and then to update any code paths where it was used. In doing so, I assume that if the EL doesn't include total difficulty then the chain is already post-merge.
Related to #6880, an issue that's usually observed on local devnets with small number of nodes.
When testing range sync, I usually shutdown a node for some period of time and restart it again. However, if it's within `SYNC_TOLERANCE_EPOCHS` (8), Lighthouse would consider the node as synced, and if it may attempt to produce a block if requested by a validator - on a local devnet, nodes frequently produce blocks - when this happens, the node ends up producing a block that would revert finality and would get disconnected from peers immediately.
NOTE: This is PR#7030 cherry-picked from `unstable` to `release-v7.0.0`.
Run Lighthouse BN with this flag to override:
```
--sync-tolerance--epoch 0
```
Related to #6880, an issue that's usually observed on local devnets with small number of nodes.
When testing range sync, I usually shutdown a node for some period of time and restart it again. However, if it's within `SYNC_TOLERANCE_EPOCHS` (8), Lighthouse would consider the node as synced, and if it may attempt to produce a block if requested by a validator - on a local devnet, nodes frequently produce blocks - when this happens, the node ends up producing a block that would revert finality and would get disconnected from peers immediately.
### Usage
Run Lighthouse BN with this flag to override:
```
--sync-tolerance--epoch 0
```
Partly addresses
- https://github.com/sigp/lighthouse/issues/6959
Use the `enable_light_client_server` field from the beacon chain config in the HTTP API. I think we can make this the single source of truth, as I think the network crate also has access to the beacon chain config.
PeerDAS has undergone multiple refactors + the blending with the get_blobs optimization has generated technical debt.
A function signature like this
f008b84079/beacon_node/beacon_chain/src/beacon_chain.rs (L7171-L7178)
Allows at least the following combination of states:
- blobs: Some / None
- data_columns: Some / None
- data_column_recv: Some / None
- Block has data? Yes / No
- Block post-PeerDAS? Yes / No
In reality, we don't have that many possible states, only:
- `NoData`: pre-deneb, pre-PeerDAS with 0 blobs or post-PeerDAS with 0 blobs
- `Blobs(BlobSidecarList<E>)`: post-Deneb pre-PeerDAS with > 0 blobs
- `DataColumns(DataColumnSidecarList<E>)`: post-PeerDAS with > 0 blobs
- `DataColumnsRecv(oneshot::Receiver<DataColumnSidecarList<E>>)`: post-PeerDAS with > 0 blobs, but we obtained the columns via reconstruction
^ this are the variants of the new `AvailableBlockData` enum
So we go from 2^5 states to 4 well-defined. Downstream code benefits nicely from this clarity and I think it makes the whole feature much more maintainable.
Currently `is_available` returns a bool, and then we construct the available block in `make_available`. In a way the availability condition is duplicated in both functions. Instead, this PR constructs `AvailableBlockData` in `is_available` so the availability conditions are written once
```rust
if let Some(block_data) = is_available(..) {
let available_block = make_available(block_data);
}
```
Resolves https://github.com/sigp/lighthouse/issues/7000
Set the accept header on builder to the correct value when requesting ssz.
This PR also adds a flag to disable ssz over the builder api altogether. In the case that builders/relays have an ssz bug, we can react quickly by asking clients to restart their nodes with the `--disable-ssz-builder` flag to force json. I'm not fully convinced if this is useful so open to removing it or opening another PR for it.
Testing this currently.
N/A
2 changes:
1. Replace Option::map_or(true, ...) with is_none_or(...)
2. Remove unnecessary `Into::into` blocks where the type conversion is apparent from the types
We don't need to store `BehaviourAction` for `ready_requests` and therefore avoid having an `unreachable!` on #6625.
Therefore this PR should be merged before it
Our Holesky nodes running with the light client enabled were logging messages about full queues:
> Feb 12 22:09:28.949 ERRO Work queue is full queue: unknown_light_client_optimistic_update, queue_len: 128, msg: the system has insufficient resources for load, service: bproc
I thought this might be genuine overload, but it turns out this queue was never being read from!
- [x] Rename light-client related queues in the beacon processor for clarity.
- [x] Ensure all light-client related queues are being popped from.
Closes#6983
`GET v2/validator/aggregate_attestation` is not backwards compatible. It only works for post electra attestations. This PR adds backwards compatibility and additional test coverage. We should include this in the upcoming 7.0 beta release if possible
Addresses #6854.
PeerDAS requires unsubscribing a Gossip topic at a fork boundary. This is not possible with our current topic machinery.
Instead of defining which topics have to be **added** at a given fork, we define the complete set of topics at a given fork. The new start of the show and key function is:
```rust
pub fn core_topics_to_subscribe<E: EthSpec>(
fork_name: ForkName,
opts: &TopicConfig,
spec: &ChainSpec,
) -> Vec<GossipKind> {
// ...
if fork_name.deneb_enabled() && !fork_name.fulu_enabled() {
// All of deneb blob topics are core topics
for i in 0..spec.blob_sidecar_subnet_count(fork_name) {
topics.push(GossipKind::BlobSidecar(i));
}
}
// ...
}
```
`core_topics_to_subscribe` only returns the blob topics if `fork < Fulu`. Then at the fork boundary, we subscribe with the new fork digest to `core_topics_to_subscribe(next_fork)`, which excludes the blob topics.
I added `is_fork_non_core_topic` to carry on to the next fork the aggregator topics for attestations and sync committee messages. This approach is future-proof if those topics ever become fork-dependent.
Closes https://github.com/sigp/lighthouse/issues/6854