N/A
Adds endpoints to add and remove trusted peers from the http api. The added peers are trusted peers so they won't be disconnected for bad scores. We try to maintain a connection to the peer in case they disconnect from us by trying to dial it every heartbeat.
This is a workaround for #7216
In the case of gaps between the in-memory pub key cache and its on-disk representation, use the head state on startup to "top-up" the cache/db w/ any missing validators
Backport of:
- https://github.com/sigp/lighthouse/pull/7067
For:
- https://github.com/sigp/lighthouse/issues/7039
- Prevent writing to state cache when migrating the database
- Add `state-cache-headroom` flag to control pruning
- Prune old epoch boundary states ahead of mid-epoch states
- Never prune head block's state
- Avoid caching ancestor states unless they are on an epoch boundary
- Log when states enter/exit the cache
Co-authored-by: Eitan Seri-Levi <eserilev@ucsc.edu>
This is a backport from `holesky-rescue`.
Part of:
- https://github.com/sigp/lighthouse/issues/7039
Original PR to `holesky-rescue`:
- https://github.com/sigp/lighthouse/pull/7054
Avoid doing database lookups for slots that lie in the hot database when processing status messages. This avoids a DoS vector during non-finality, as loading hot states to iterate block roots is very expensive.
This change makes the `total_difficulty` field in `ExecutionBlock` an `Option<Uint256>` since newer clients are no longer including the `totalDifficulty` field.
I think this will fix https://github.com/sigp/lighthouse/issues/6937 but I was actually more focused on the builder registration case described below.
In our [builder-playground](https://github.com/flashbots/builder-playground) we setup a local devnet using lighthouse, reth, and mev-boost-relay. After upgrading to reth 1.2.0 and lighthouse v7.0.0.beta.0 for Pectra, we noticed that the validator registration process was _sometimes_ failing with:
```
Feb 25 23:35:25.038 ERRO Unable to publish proposer preparation to all beacon nodes, error: Some endpoints failed, num_failed: 1 http://localhost:3500/ => RequestFailed(ServerMessage(ErrorMessage { code: 400, message: "BAD_REQUEST: error updating proposer preparations: ForkchoiceUpdate(EngineError(Api { error: Json(Error(\"missing field `totalDifficulty`\", line: 0, column: 0)) }))", stacktraces: [] })), service: preparation
Feb 25 23:35:25.099 WARN Unable to publish validator registrations to the builder network, error: Some endpoints failed, num_failed: 1 http://localhost:3500/ => RequestFailed(ServerMessage(ErrorMessage { code: 400, message: "BAD_REQUEST: error updating proposer preparations: ForkchoiceUpdate(EngineError(Api { error: Json(Error(\"missing field `totalDifficulty`\", line: 0, column: 0)) }))", stacktraces: [] })), service: preparation
```
What was even more confusing, was that it was sometimes working, which actually led to a wild goose chase thinking it was a networking issue. However, when tracing through the LH code, I came across this comment:
70194dfc6a/beacon_node/beacon_chain/src/beacon_chain.rs (L6048-L6049)
This explained why it sometimes worked, in our playground we run lighthouse with `--prepare-payload-lookahead 8000` thus there was always a 4-second window where the call wasn't made.
But, if the call was made, then this code would 100% fail with updated reth:
https://github.com/sigp/lighthouse/blob/unstable/beacon_node/execution_layer/src/lib.rs#L1688-L1692
Which would then mapped to a `Error::ForkchoiceUpdate` in `update_execution_engine_forkchoice`.
Anyways, the fix was to make `total_difficulty` Optional, and then to update any code paths where it was used. In doing so, I assume that if the EL doesn't include total difficulty then the chain is already post-merge.
Related to #6880, an issue that's usually observed on local devnets with small number of nodes.
When testing range sync, I usually shutdown a node for some period of time and restart it again. However, if it's within `SYNC_TOLERANCE_EPOCHS` (8), Lighthouse would consider the node as synced, and if it may attempt to produce a block if requested by a validator - on a local devnet, nodes frequently produce blocks - when this happens, the node ends up producing a block that would revert finality and would get disconnected from peers immediately.
NOTE: This is PR#7030 cherry-picked from `unstable` to `release-v7.0.0`.
Run Lighthouse BN with this flag to override:
```
--sync-tolerance--epoch 0
```
Resolves https://github.com/sigp/lighthouse/issues/7000
Set the accept header on builder to the correct value when requesting ssz.
This PR also adds a flag to disable ssz over the builder api altogether. In the case that builders/relays have an ssz bug, we can react quickly by asking clients to restart their nodes with the `--disable-ssz-builder` flag to force json. I'm not fully convinced if this is useful so open to removing it or opening another PR for it.
Testing this currently.
N/A
2 changes:
1. Replace Option::map_or(true, ...) with is_none_or(...)
2. Remove unnecessary `Into::into` blocks where the type conversion is apparent from the types
Our Holesky nodes running with the light client enabled were logging messages about full queues:
> Feb 12 22:09:28.949 ERRO Work queue is full queue: unknown_light_client_optimistic_update, queue_len: 128, msg: the system has insufficient resources for load, service: bproc
I thought this might be genuine overload, but it turns out this queue was never being read from!
- [x] Rename light-client related queues in the beacon processor for clarity.
- [x] Ensure all light-client related queues are being popped from.
Closes#6983
`GET v2/validator/aggregate_attestation` is not backwards compatible. It only works for post electra attestations. This PR adds backwards compatibility and additional test coverage. We should include this in the upcoming 7.0 beta release if possible
- Re-opened PR from https://github.com/sigp/lighthouse/pull/6869
Writing and running tests I noted that the sync RPC requests are very verbose now.
`DataColumnsByRootRequestId { id: 123, requester: Custody(CustodyId { requester: CustodyRequester(SingleLookupReqId { req_id: 121, lookup_id: 101 }) }) }`
Since this Id is logged rather often I believe there's value in
1. Making them more succinct for log verbosity
2. Make them a string that's easy to copy and work with elastic
Write custom `Display` implementations to render Ids in a more DX format
_ DataColumnsByRootRequestId with a block lookup_
```
123/Custody/121/Lookup/101
```
_DataColumnsByRangeRequestId_
```
123/122/RangeSync/0/5492900659401505034
```
- This one will be shorter after https://github.com/sigp/lighthouse/pull/6868
Also made the logs format and text consistent across all methods
Closes:
- https://github.com/sigp/lighthouse/issues/6818
Use `MAX_EFFECTIVE_BALANCE_ELECTRA` (2048) for attestation reward calculations involving Electra.
Add a new `InteropGenesisBuilder` that tries to provide a more flexible way to build genesis states. Unfortunately due to lifetime jank, it is quite unergonomic at present. We may want to refactor this builder in future to make it easier to use.
Closes
- https://github.com/sigp/lighthouse/issues/6805
- Use a new `WorkEvent::GossipAttestationToConvert` to handle the conversion from `SingleAttestation` to `Attestation` _on_ the beacon processor (prevents a Tokio thread being blocked).
- Improve the error handling for single attestations. I think previously we had no ability to reprocess single attestations for unknown blocks -- we would just error. This seemed to be the case in both gossip processing and processing of `SingleAttestation`s from the HTTP API.
- Move the `SingleAttestation -> Attestation` conversion function into `beacon_chain` so that it can return the `attestation_verification::Error` type, which has well-defined error handling and peer penalties. The now-unused variants of `types::Attestation::Error` have been removed.
Fix another issue with fetch-blobs, similar to:
- https://github.com/sigp/lighthouse/pull/6911
Check if the list of blobs returned is all `None`, and if so, do not proceed any further.
This prevents an ugly error like:
> Feb 03 17:32:12.384 ERRO Error fetching or processing blobs from EL, block_root: 0x7326fe2dc1cb9036c9de7a07a662c86a339085597849016eadf061b70b7815ba, error: BlobProcessingError(AvailabilityCheck(Unexpected)), module
: network::network_beacon_processor:1011
- #6510
- Keep execution payload during historical backfill when `--prune-payloads false` is set
- Add a field in the historical backfill debug log to indicate if execution payload is kept
- Add a test to check historical blocks has execution payload when `--prune-payloads false is set
- Very minor typo correction that I notice when working on this
- PR https://github.com/sigp/lighthouse/pull/6497 made obsolete some consistency checks inside the batch
I forgot to remove the consumers of those errors
Remove un-used batch sync error condition, which was a nested `Result<_, Result<_, E>>`
Part of
- https://github.com/sigp/lighthouse/issues/6258
To address PeerDAS sync issues we need to make individual by_range requests within a batch retriable. We should adopt the same pattern for lookup sync where each request (block/blobs/columns) is tracked individually within a "meta" request that group them all and handles retry logic.
- Building on https://github.com/sigp/lighthouse/pull/6398
second step is to add individual request accumulators for `blocks_by_range`, `blobs_by_range`, and `data_columns_by_range`. This will allow each request to progress independently and be retried separately.
Most of the logic is just piping, excuse the large diff. This PR does not change the logic of how requests are handled or retried. This will be done in a future PR changing the logic of `RangeBlockComponentsRequest`.
### Before
- Sync manager receives block with `SyncRequestId::RangeBlockAndBlobs`
- Insert block into `SyncNetworkContext::range_block_components_requests`
- (If received stream terminators of all requests)
- Return `Vec<RpcBlock>`, and insert into `range_sync`
### Now
- Sync manager receives block with `SyncRequestId::RangeBlockAndBlobs`
- Insert block into `SyncNetworkContext:: blocks_by_range_requests`
- (If received stream terminator of this request)
- Return `Vec<SignedBlock>`, and insert into `SyncNetworkContext::components_by_range_requests `
- (If received a result for all requests)
- Return `Vec<RpcBlock>`, and insert into `range_sync`
N/A
Previously, we were returning an empty vec of Nones if get_blobs was not supported in the EL. This results in confusing logging where we try to process the empty list of blobs and log a bunch of Unexpected errors. See
```
Feb 03 17:32:12.383 DEBG Fetching blobs from the EL, num_expected_blobs: 6, block_root: 0x7326fe2dc1cb9036c9de7a07a662c86a339085597849016eadf061b70b7815ba, service: fetch_engine_blobs, service: beacon, module: beac
on_chain::fetch_blobs:84
Feb 03 17:32:12.384 DEBG Processing engine blobs, num_fetched_blobs: 0, block_root: 0x7326fe2dc1cb9036c9de7a07a662c86a339085597849016eadf061b70b7815ba, service: fetch_engine_blobs, service: beacon, module: beacon_c
hain::fetch_blobs:197
Feb 03 17:32:12.384 ERRO Error fetching or processing blobs from EL, block_root: 0x7326fe2dc1cb9036c9de7a07a662c86a339085597849016eadf061b70b7815ba, error: BlobProcessingError(AvailabilityCheck(Unexpected)), module
: network::network_beacon_processor:1011
```
The error we should be getting is that getBlobs is not supported, this PR adds a new error variant and returns that.
There were two things I came across during some recent testing, that this PR addresses.
1 - The default port for IPv6 was set to 9090, which is confusing. I've set this to match its ipv4 counterpart (i.e 9000 and 9001). This makes more sense and is easier to firewall, for those firewalls that support both versions for a single rule.
2 - Watching the NAT status of lighthouse, I notice we only set the field to 1 once the NAT is passed. We don't give it a default 0 (false). So we only see results when its successful. On peer disconnects, i've piggy-backed a loop of the connected peers to also watch and check for NAT status updates.
Resolve a `TODO(das)` to use KZG batch verification in `put_rpc_custody_columns`
Uses `verify_kzg_for_data_column_list_with_scoring` in all paths that send more than one column. To use batch verification and have attributability of which peer is sending a bad column.
Needs to move `verify_kzg_for_data_column_list_with_scoring` into the type's module to convert to the KZG verified type.
`TODO(das)` now that PeerDAS is scheduled in a hard fork we can subscribe to its topics on the fork activation. In current stable we subscribe to PeerDAS topics as soon as the node starts if PeerDAS is scheduled.
This PR adds another todo to unsubscribe to blob topics at the fork. This other PR included solution for that, but I can include it in a separate PR
- https://github.com/sigp/lighthouse/pull/5899/files
Include PeerDAS topics as part of Fulu fork in `fork_core_topics`.
Hopefully fixes https://github.com/sigp/lighthouse/issues/6732
In our `scheduled_subscriptions`, we were setting unsubscription slot to be `current_slot + 1`. Given that we were subscribing to the subnet at `duty.slot - 1`, the unsubscription slot ended up being `duty.slot`. So we were unsubscribing to the subnet at the beginning of the duty slot which is insane.
Fixes the `scheduled_subscriptions` to unsubscribe at `duty.slot + 1`.
Addresses #6026.
Post-PeerDAS the DB expects to have data columns for the finalized block.
Instead of forcing the user to submit the columns, this PR computes the columns from the blobs that we can already fetch from the checkpointz server or with the existing CLI options.
Note 1: (EDIT) Pruning concern addressed
Note 2: I have not tested this feature
Note 3: @michaelsproul an alternative I recall is to not require the blobs / columns at this point and expect backfill to populate the finalized block
Addresses #6706
This PR activates PeerDAS at the Fulu fork epoch instead of `EIP_7594_FORK_EPOCH`. This means we no longer support testing PeerDAS with Deneb / Electrs, as it's now part of a hard fork.
Currently, we set the `chain_id` of range sync chains to `u64(hash(target_root, target_slot))`, which results in a long integer.
```
Jan 27 00:43:27.246 DEBG Batch downloaded, chain: 4223372036854775807, awaiting_batches: 0, batch_state: [p,E,E,E,E], blocks: 0, epoch: 0, service: range_sync
```
Instead, we can use `network_context.next_id()` as we do for all other sync items and get a unique sequential (not too big) integer as id.
```
Jan 27 00:43:27.246 DEBG Batch downloaded, chain: 4, awaiting_batches: 0, batch_state: [p,E,E,E,E], blocks: 0, epoch: 0, service: range_sync
```
Also, if a specific chain for the same target is retried later, it won't get the same ID so we can more clearly differentiate the logs associated with each attempt.
When working on unrelated changes I noted:
- An unnecessary closure left by a commit of some guy named @dapplion that can be removed
- match statements that can be simplified with the new let else syntax
- instead of mapping a result to ignore the Ok value, return
N/A
In https://github.com/sigp/lighthouse/pull/6329 we changed `max_blobs_per_block` from a preset to a config value.
We weren't using the right value based on fork in that PR. This is a follow up PR to use the fork dependent values.
In the proces, I also updated other places where we weren't using fork dependent values from the ChainSpec.
Note to reviewer: easier to go through by commit