mirror of
https://github.com/sigp/lighthouse.git
synced 2026-03-06 18:21:45 +00:00
Implement el_offline and use it in the VC (#4295)
## Issue Addressed
Closes https://github.com/sigp/lighthouse/issues/4291, part of #3613.
## Proposed Changes
- Implement the `el_offline` field on `/eth/v1/node/syncing`. We set `el_offline=true` if:
- The EL's internal status is `Offline` or `AuthFailed`, _or_
- The most recent call to `newPayload` resulted in an error (more on this in a moment).
- Use the `el_offline` field in the VC to mark nodes with offline ELs as _unsynced_. These nodes will still be used, but only after synced nodes.
- Overhaul the usage of `RequireSynced` so that `::No` is used almost everywhere. The `--allow-unsynced` flag was broken and had the opposite effect to intended, so it has been deprecated.
- Add tests for the EL being offline on the upcheck call, and being offline due to the newPayload check.
## Why track `newPayload` errors?
Tracking the EL's online/offline status is too coarse-grained to be useful in practice, because:
- If the EL is timing out to some calls, it's unlikely to timeout on the `upcheck` call, which is _just_ `eth_syncing`. Every failed call is followed by an upcheck [here](693886b941/beacon_node/execution_layer/src/engines.rs (L372-L380)), which would have the effect of masking the failure and keeping the status _online_.
- The `newPayload` call is the most likely to time out. It's the call in which ELs tend to do most of their work (often 1-2 seconds), with `forkchoiceUpdated` usually returning much faster (<50ms).
- If `newPayload` is failing consistently (e.g. timing out) then this is a good indication that either the node's EL is in trouble, or the network as a whole is. In the first case validator clients _should_ prefer other BNs if they have one available. In the second case, all of their BNs will likely report `el_offline` and they'll just have to proceed with trying to use them.
## Additional Changes
- Add utility method `ForkName::latest` which is quite convenient for test writing, but probably other things too.
- Delete some stale comments from when we used to support multiple execution nodes.
This commit is contained in:
@@ -126,6 +126,7 @@ impl<T: EthSpec> MockServer<T> {
|
||||
hook: <_>::default(),
|
||||
new_payload_statuses: <_>::default(),
|
||||
fcu_payload_statuses: <_>::default(),
|
||||
syncing_response: Arc::new(Mutex::new(Ok(false))),
|
||||
engine_capabilities: Arc::new(RwLock::new(DEFAULT_ENGINE_CAPABILITIES)),
|
||||
_phantom: PhantomData,
|
||||
});
|
||||
@@ -414,14 +415,25 @@ impl<T: EthSpec> MockServer<T> {
|
||||
self.ctx
|
||||
.new_payload_statuses
|
||||
.lock()
|
||||
.insert(block_hash, status);
|
||||
.insert(block_hash, Ok(status));
|
||||
}
|
||||
|
||||
pub fn set_fcu_payload_status(&self, block_hash: ExecutionBlockHash, status: PayloadStatusV1) {
|
||||
self.ctx
|
||||
.fcu_payload_statuses
|
||||
.lock()
|
||||
.insert(block_hash, status);
|
||||
.insert(block_hash, Ok(status));
|
||||
}
|
||||
|
||||
pub fn set_new_payload_error(&self, block_hash: ExecutionBlockHash, error: String) {
|
||||
self.ctx
|
||||
.new_payload_statuses
|
||||
.lock()
|
||||
.insert(block_hash, Err(error));
|
||||
}
|
||||
|
||||
pub fn set_syncing_response(&self, res: Result<bool, String>) {
|
||||
*self.ctx.syncing_response.lock() = res;
|
||||
}
|
||||
}
|
||||
|
||||
@@ -478,8 +490,11 @@ pub struct Context<T: EthSpec> {
|
||||
//
|
||||
// This is a more flexible and less stateful alternative to `static_new_payload_response`
|
||||
// and `preloaded_responses`.
|
||||
pub new_payload_statuses: Arc<Mutex<HashMap<ExecutionBlockHash, PayloadStatusV1>>>,
|
||||
pub fcu_payload_statuses: Arc<Mutex<HashMap<ExecutionBlockHash, PayloadStatusV1>>>,
|
||||
pub new_payload_statuses:
|
||||
Arc<Mutex<HashMap<ExecutionBlockHash, Result<PayloadStatusV1, String>>>>,
|
||||
pub fcu_payload_statuses:
|
||||
Arc<Mutex<HashMap<ExecutionBlockHash, Result<PayloadStatusV1, String>>>>,
|
||||
pub syncing_response: Arc<Mutex<Result<bool, String>>>,
|
||||
|
||||
pub engine_capabilities: Arc<RwLock<EngineCapabilities>>,
|
||||
pub _phantom: PhantomData<T>,
|
||||
@@ -489,14 +504,14 @@ impl<T: EthSpec> Context<T> {
|
||||
pub fn get_new_payload_status(
|
||||
&self,
|
||||
block_hash: &ExecutionBlockHash,
|
||||
) -> Option<PayloadStatusV1> {
|
||||
) -> Option<Result<PayloadStatusV1, String>> {
|
||||
self.new_payload_statuses.lock().get(block_hash).cloned()
|
||||
}
|
||||
|
||||
pub fn get_fcu_payload_status(
|
||||
&self,
|
||||
block_hash: &ExecutionBlockHash,
|
||||
) -> Option<PayloadStatusV1> {
|
||||
) -> Option<Result<PayloadStatusV1, String>> {
|
||||
self.fcu_payload_statuses.lock().get(block_hash).cloned()
|
||||
}
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user