mirror of
https://github.com/sigp/lighthouse.git
synced 2026-03-10 12:11:59 +00:00
Validator registration request failures do not cause us to mark BNs offline (#3488)
## Issue Addressed Relates to https://github.com/sigp/lighthouse/issues/3416 ## Proposed Changes - Add an `OfflineOnFailure` enum to the `first_success` method for querying beacon nodes so that a val registration request failure from the BN -> builder does not result in the BN being marked offline. This seems important because these failures could be coming directly from a connected relay and actually have no bearing on BN health. Other messages that are sent to a relay have a local fallback so shouldn't result in errors - Downgrade the following log to a `WARN` ``` ERRO Unable to publish validator registrations to the builder network, error: All endpoints failed https://BN_B => RequestFailed(ServerMessage(ErrorMessage { code: 500, message: "UNHANDLED_ERROR: BuilderMissing", stacktraces: [] })), https://XXXX/ => Unavailable(Offline), [omitted] ``` ## Additional Info I think this change at least improves the UX of having a VC connected to some builder and some non-builder beacon nodes. I think we need to balance potentially alerting users that there is a BN <> VC misconfiguration and also allowing this type of fallback to work. If we want to fully support this type of configuration we may want to consider adding a flag `--builder-beacon-nodes` and track whether a VC should be making builder queries on a per-beacon node basis. But I think the changes in this PR are independent of that type of extension. PS: Sorry for the big diff here, it's mostly formatting changes after I added a new arg to a bunch of methods calls. Co-authored-by: realbigsean <sean@sigmaprime.io>
This commit is contained in:
@@ -70,6 +70,13 @@ pub enum RequireSynced {
|
||||
No,
|
||||
}
|
||||
|
||||
/// Indicates if a beacon node should be set to `Offline` if a request fails.
|
||||
#[derive(PartialEq, Clone, Copy)]
|
||||
pub enum OfflineOnFailure {
|
||||
Yes,
|
||||
No,
|
||||
}
|
||||
|
||||
impl PartialEq<bool> for RequireSynced {
|
||||
fn eq(&self, other: &bool) -> bool {
|
||||
if *other {
|
||||
@@ -387,6 +394,7 @@ impl<T: SlotClock, E: EthSpec> BeaconNodeFallback<T, E> {
|
||||
pub async fn first_success<'a, F, O, Err, R>(
|
||||
&'a self,
|
||||
require_synced: RequireSynced,
|
||||
offline_on_failure: OfflineOnFailure,
|
||||
func: F,
|
||||
) -> Result<O, AllErrored<Err>>
|
||||
where
|
||||
@@ -415,7 +423,9 @@ impl<T: SlotClock, E: EthSpec> BeaconNodeFallback<T, E> {
|
||||
// There exists a race condition where the candidate may have been marked
|
||||
// as ready between the `func` call and now. We deem this an acceptable
|
||||
// inefficiency.
|
||||
$candidate.set_offline().await;
|
||||
if matches!(offline_on_failure, OfflineOnFailure::Yes) {
|
||||
$candidate.set_offline().await;
|
||||
}
|
||||
errors.push(($candidate.beacon_node.to_string(), Error::RequestFailed(e)));
|
||||
inc_counter_vec(&ENDPOINT_ERRORS, &[$candidate.beacon_node.as_ref()]);
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user