Resolves https://github.com/sigp/lighthouse/issues/7414
The health endpoint returns a 503 if the engine state is offline. The default state for the engine is `Offline`. So until the first request to the EL is made and the state is updated, the health endpoint will keep returning 503s.
This PR changes the default state to Online to avoid that. I don't think this causes any issues because in case the EL is actually offline, the first fcu will set the state to offline.
Pending testing on kurtosis.
Fix clippy lints for `rustc` 1.87
clippy complains about `BeaconChainError` being too large. I went on a bit of a boxing spree because of this. We may instead want to `Box` some of the `BeaconChainError` variants?
This PR relates to:
- https://github.com/sigp/lighthouse/pull/7199
- -> workspace_filter has been enabled (dependency logging has been disabled)
- https://github.com/sigp/lighthouse/pull/7394
- -> file logging has been optionally enabled
Building on these, this PR enables dependency logging for the simulators. The logs are written to separate files.
The libp2p/discv5 logs:
- are saved to the directory specified with `--log-dir`
- respects the `RUST_LOG` environment variable for log level configuration
Which issue # does this PR address?
Closes#7457
Added `E::slots_per_epoch()` and now it ensures conversion from epochs to slots while calculating deneb time
Workaround/fix for:
- https://github.com/sigp/lighthouse/issues/7323
- Remove the `StateSummariesNotContiguousError`. This allows us to continue with DAG construction and pruning, even in the case where the DAG is disjointed. We will treat any disjoint summaries as roots of their own tree, and prune them (as they are not descended from finalized). This should be safe, as canonical summaries should not be disjoint (if they are, then the DB is already corrupt).
Beacon logs in the simulator are printed only to stdout. The logs are usually large, so persisting them would be helpful for debugging.
Added `--log-dir` parameter to the simulators and a step to upload the logs to Artifacts.
(Update)
Added `--disable-stdout-logging` to disable stdout logging, making the CI page cleaner.
Closes https://github.com/sigp/lighthouse/issues/6895
We need sync to retry custody requests when a peer CGC updates. A higher CGC can result in a data column subnet peer count increasing from 0 to 1, allowing requests to happen.
Add new sync event `SyncMessage::UpdatedPeerCgc`. It's sent by the router when a metadata response updates the known CGC
This PR adds transitions to Electra ~~and Fulu~~ fork epochs in the simulator tests.
~~It also covers blob inclusion verification and data column syncing on a full node in Fulu.~~
UPDATE: Remove fulu fork from sim tests due to https://github.com/sigp/lighthouse/pull/7199#issuecomment-2852281176
Don't publish data columns reconstructed from RPC columns to the gossip network, as this may result in peer downscoring if we're sending columns from past slots.
Prevent running `lighthouse vc --http-port <PORT>` without `--http`.
Issue: https://github.com/sigp/lighthouse/issues/7402
Added requires `--http` when using `lighthouse vc --http-port <PORT>`.
Implemented a test code for this issue.
- Create trait `ValidatorStore` with all functions used by the `validator_services`
- Make `validator_services` generic on `S: ValidatorStore`
- Introduce `LighthouseValidatorStore`, which has identical functionality to the old `ValidatorStore`
- Remove dependencies (especially `environment`) from `validator_services` and `beacon_node_fallback` in order to be able to cleanly use them in Anchor
- Re-opens https://github.com/sigp/lighthouse/pull/6864 targeting unstable
Range sync and backfill sync still assume that each batch request is done by a single peer. This assumption breaks with PeerDAS, where we request custody columns to N peers.
Issues with current unstable:
- Peer prioritization counts batch requests per peer. This accounting is broken now, data columns by range request are not accounted
- Peer selection for data columns by range ignores the set of peers on a syncing chain, instead draws from the global pool of peers
- The implementation is very strict when we have no peers to request from. After PeerDAS this case is very common and we want to be flexible or easy and handle that case better than just hard failing everything.
- [x] Upstream peer prioritization to the network context, it knows exactly how many active requests a peer (including columns by range)
- [x] Upstream peer selection to the network context, now `block_components_by_range_request` gets a set of peers to choose from instead of a single peer. If it can't find a peer, it returns the error `RpcRequestSendError::NoPeer`
- [ ] Range sync and backfill sync handle `RpcRequestSendError::NoPeer` explicitly
- [ ] Range sync: leaves the batch in `AwaitingDownload` state and does nothing. **TODO**: we should have some mechanism to fail the chain if it's stale for too long - **EDIT**: Not done in this PR
- [x] Backfill sync: pauses the sync until another peer joins - **EDIT**: Same logic as unstable
### TODOs
- [ ] Add tests :)
- [x] Manually test backfill sync
Note: this touches the mainnet path!
Debugging an sync issue from @pawanjay176 I'm missing some key info where instead of logging the ID of the SyncingChain we just log "Finalized" (the sync type). This looks like some typo or something was lost in translation when refactoring things.
```
Apr 17 12:12:00.707 DEBUG Syncing new finalized chain chain: Finalized, component: "range_sync"
```
This log should include more info about the new chain but just logs "Finalized"
```
Apr 17 12:12:00.810 DEBUG New chain added to sync peer_id: "16Uiu2HAmHP8QLYQJwZ4cjMUEyRgxzpkJF87qPgNecLTpUdruYbdA", sync_type: Finalized, new_chain: Finalized, component: "range_sync"
```
- Remove the Display impl and log the ID explicitly for all logs.
- Log more details when creating a new SyncingChain
When we perform data column gossip verification, we sometimes see multiple proposer shuffling cache miss simultaneously and this results in multiple threads computing the shuffling cache and potentially slows down the gossip verification.
Proposal here is to use a `OnceCell` for each shuffling key to make sure it's only computed once. I have only implemented this in data column verification as a PoC, but this can also be applied to blob and block verification
Related issues:
- https://github.com/sigp/lighthouse/issues/4447
- https://github.com/sigp/lighthouse/issues/7203
Changes the endpoint to get fallback health information from `/lighthouse/ui/fallback_health` to `/lighthouse/beacon/health`. This more accurately describes that the endpoint is related to the connected beacon nodes and also matched the `/lighthouse/beacon/update` endpoint being added in #6551.
Adds documentation for both fallback health and the endpoint to the Lighthouse book.
One of the information in the consolidation section in Lighthouse book is wrong. I realise this after reading https://ethereum.org/en/roadmap/pectra/maxeb/ and a further look at [EIP 7251](https://eips.ethereum.org/EIPS/eip-7251) which states:
`
Note: the system contract uses the EVM CALLER operation (Solidity: msg.sender) to get the address used in the consolidation request, i.e. the address that calls the system contract must match the 0x01 withdrawal credential recorded in the beacon state.
`
So the withdrawal credentials of both source and target validators need not be the same.
closes https://github.com/sigp/lighthouse/issues/5785
The diagram below shows the differences in how the receiver (responder) behaves before and after this PR. The following sentences will detail the changes.
```mermaid
flowchart TD
subgraph "*** After ***"
Start2([START]) --> AA[Receive request]
AA --> COND1{Is there already an active request <br> with the same protocol?}
COND1 --> |Yes| CC[Send error response]
CC --> End2([END])
%% COND1 --> |No| COND2{Request is too large?}
%% COND2 --> |Yes| CC
COND1 --> |No| DD[Process request]
DD --> EE{Rate limit reached?}
EE --> |Yes| FF[Wait until tokens are regenerated]
FF --> EE
EE --> |No| GG[Send response]
GG --> End2
end
subgraph "*** Before ***"
Start([START]) --> A[Receive request]
A --> B{Rate limit reached <br> or <br> request is too large?}
B -->|Yes| C[Send error response]
C --> End([END])
B -->|No| E[Process request]
E --> F[Send response]
F --> End
end
```
### `Is there already an active request with the same protocol?`
This check is not performed in `Before`. This is taken from the PR in the consensus-spec, which proposes updates regarding rate limiting and response timeout.
https://github.com/ethereum/consensus-specs/pull/3767/files
> The requester MUST NOT make more than two concurrent requests with the same ID.
The PR mentions the requester side. In this PR, I introduced the `ActiveRequestsLimiter` for the `responder` side to restrict more than two requests from running simultaneously on the same protocol per peer. If the limiter disallows a request, the responder sends a rate-limited error and penalizes the requester.
### `Rate limit reached?` and `Wait until tokens are regenerated`
UPDATE: I moved the limiter logic to the behaviour side. https://github.com/sigp/lighthouse/pull/5923#issuecomment-2379535927
~~The rate limiter is shared between the behaviour and the handler. (`Arc<Mutex<RateLimiter>>>`) The handler checks the rate limit and queues the response if the limit is reached. The behaviour handles pruning.~~
~~I considered not sharing the rate limiter between the behaviour and the handler, and performing all of these either within the behaviour or handler. However, I decided against this for the following reasons:~~
- ~~Regarding performing everything within the behaviour: The behaviour is unable to recognize the response protocol when `RPC::send_response()` is called, especially when the response is `RPCCodedResponse::Error`. Therefore, the behaviour can't rate limit responses based on the response protocol.~~
- ~~Regarding performing everything within the handler: When multiple connections are established with a peer, there could be multiple handlers interacting with that peer. Thus, we cannot enforce rate limiting per peer solely within the handler. (Any ideas? 🤔 )~~
Lighthouse currently lacks support for cross-compilation targeting the `riscv64` architecture.
This PR introduces initial support for cross-compiling Lighthouse to `riscv64`. The following changes were made:
- **Makefile**: Updated to support `cross` with `riscv64` as a target.
- **Cross.toml**: Added configuration specific to `riscv64`.
- **Documentation**: List 'build-riscv64' in `book/src/installation_cross_compiling.md`.
Did not find a specific issue beside https://github.com/sigp/lighthouse/issues/6821
Leverage `whistleblower_reward_quotient_for_state` to have accurate post-electra `proposer_slashings` and `attester_slashings` fields returned by `/eth/v1/beacon/rewards/blocks/<id>`.