mirror of
https://github.com/sigp/lighthouse.git
synced 2026-03-09 19:51:47 +00:00
Implement el_offline and use it in the VC (#4295)
## Issue Addressed
Closes https://github.com/sigp/lighthouse/issues/4291, part of #3613.
## Proposed Changes
- Implement the `el_offline` field on `/eth/v1/node/syncing`. We set `el_offline=true` if:
- The EL's internal status is `Offline` or `AuthFailed`, _or_
- The most recent call to `newPayload` resulted in an error (more on this in a moment).
- Use the `el_offline` field in the VC to mark nodes with offline ELs as _unsynced_. These nodes will still be used, but only after synced nodes.
- Overhaul the usage of `RequireSynced` so that `::No` is used almost everywhere. The `--allow-unsynced` flag was broken and had the opposite effect to intended, so it has been deprecated.
- Add tests for the EL being offline on the upcheck call, and being offline due to the newPayload check.
## Why track `newPayload` errors?
Tracking the EL's online/offline status is too coarse-grained to be useful in practice, because:
- If the EL is timing out to some calls, it's unlikely to timeout on the `upcheck` call, which is _just_ `eth_syncing`. Every failed call is followed by an upcheck [here](693886b941/beacon_node/execution_layer/src/engines.rs (L372-L380)), which would have the effect of masking the failure and keeping the status _online_.
- The `newPayload` call is the most likely to time out. It's the call in which ELs tend to do most of their work (often 1-2 seconds), with `forkchoiceUpdated` usually returning much faster (<50ms).
- If `newPayload` is failing consistently (e.g. timing out) then this is a good indication that either the node's EL is in trouble, or the network as a whole is. In the first case validator clients _should_ prefer other BNs if they have one available. In the second case, all of their BNs will likely report `el_offline` and they'll just have to proceed with trying to use them.
## Additional Changes
- Add utility method `ForkName::latest` which is quite convenient for test writing, but probably other things too.
- Delete some stale comments from when we used to support multiple execution nodes.
This commit is contained in:
@@ -2285,28 +2285,40 @@ pub fn serve<T: BeaconChainTypes>(
|
||||
.and(chain_filter.clone())
|
||||
.and_then(
|
||||
|network_globals: Arc<NetworkGlobals<T::EthSpec>>, chain: Arc<BeaconChain<T>>| {
|
||||
blocking_json_task(move || {
|
||||
let head_slot = chain.canonical_head.cached_head().head_slot();
|
||||
let current_slot = chain.slot_clock.now_or_genesis().ok_or_else(|| {
|
||||
warp_utils::reject::custom_server_error("Unable to read slot clock".into())
|
||||
})?;
|
||||
|
||||
// Taking advantage of saturating subtraction on slot.
|
||||
let sync_distance = current_slot - head_slot;
|
||||
|
||||
let is_optimistic = chain
|
||||
.is_optimistic_or_invalid_head()
|
||||
.map_err(warp_utils::reject::beacon_chain_error)?;
|
||||
|
||||
let syncing_data = api_types::SyncingData {
|
||||
is_syncing: network_globals.sync_state.read().is_syncing(),
|
||||
is_optimistic: Some(is_optimistic),
|
||||
head_slot,
|
||||
sync_distance,
|
||||
async move {
|
||||
let el_offline = if let Some(el) = &chain.execution_layer {
|
||||
el.is_offline_or_erroring().await
|
||||
} else {
|
||||
true
|
||||
};
|
||||
|
||||
Ok(api_types::GenericResponse::from(syncing_data))
|
||||
})
|
||||
blocking_json_task(move || {
|
||||
let head_slot = chain.canonical_head.cached_head().head_slot();
|
||||
let current_slot = chain.slot_clock.now_or_genesis().ok_or_else(|| {
|
||||
warp_utils::reject::custom_server_error(
|
||||
"Unable to read slot clock".into(),
|
||||
)
|
||||
})?;
|
||||
|
||||
// Taking advantage of saturating subtraction on slot.
|
||||
let sync_distance = current_slot - head_slot;
|
||||
|
||||
let is_optimistic = chain
|
||||
.is_optimistic_or_invalid_head()
|
||||
.map_err(warp_utils::reject::beacon_chain_error)?;
|
||||
|
||||
let syncing_data = api_types::SyncingData {
|
||||
is_syncing: network_globals.sync_state.read().is_syncing(),
|
||||
is_optimistic: Some(is_optimistic),
|
||||
el_offline: Some(el_offline),
|
||||
head_slot,
|
||||
sync_distance,
|
||||
};
|
||||
|
||||
Ok(api_types::GenericResponse::from(syncing_data))
|
||||
})
|
||||
.await
|
||||
}
|
||||
},
|
||||
);
|
||||
|
||||
|
||||
Reference in New Issue
Block a user