status lock in the VC (#3022)
## Issue Addressed Addresses https://github.com/sigp/lighthouse/issues/2926 ## Proposed Changes Appropriated from https://github.com/sigp/lighthouse/issues/2926#issuecomment-1039676768: When a node returns *any* error we call [`CandidateBeaconNode::set_offline`](c3a793fd73/validator_client/src/beacon_node_fallback.rs (L424)) which sets it's `status` to `CandidateError::Offline`. That node will then be ignored until the routine [`fallback_updater_service`](c3a793fd73/validator_client/src/beacon_node_fallback.rs (L44)) manages to reconnect to it. However, I believe there was an issue in the [`CanidateBeaconNode::refesh_status`](c3a793fd73/validator_client/src/beacon_node_fallback.rs (L157-L178)) method, which is used by the updater service to see if the node has come good again. It was holding a [write lock on the `status` field](c3a793fd73/validator_client/src/beacon_node_fallback.rs (L165)) whilst it polled the node status. This means a long timeout would hog the write lock and starve other processes. When a VC is trying to access a beacon node for whatever purpose (getting duties, posting blocks, etc), it performs [three passes](c3a793fd73/validator_client/src/beacon_node_fallback.rs (L432-L482)) through the lists of nodes, trying to run some generic `function` (closure, lambda, etc) on each node: - 1st pass: only try running `function` on all nodes which are both synced and online. - 2nd pass: try running `function` on all nodes that are online, but not necessarily synced. - 3rd pass: for each offline node, try refreshing its status and then running `function` on it. So, it turns out that if the `CanidateBeaconNode::refesh_status` function from the routine update service is hogging the write-lock, the 1st pass gets blocked whilst trying to read the status of the first node. So, nodes that should be left until the 3rd pass are blocking the process of the 1st and 2nd passes, hence the behaviour described in #2926. ## Additional Info NA
Lighthouse: Ethereum 2.0
An open-source Ethereum 2.0 client, written in Rust and maintained by Sigma Prime.
Overview
Lighthouse is:
- Ready for use on Eth2 mainnet.
- Fully open-source, licensed under Apache 2.0.
- Security-focused. Fuzzing techniques have been continuously applied and several external security reviews have been performed.
- Built in Rust, a modern language providing unique safety guarantees and excellent performance (comparable to C++).
- Funded by various organisations, including Sigma Prime, the Ethereum Foundation, ConsenSys, the Decentralization Foundation and private individuals.
- Actively involved in the specification and security analysis of the Ethereum 2.0 specification.
Eth2 Deposit Contract
The Lighthouse team acknowledges
0x00000000219ab540356cBB839Cbe05303d7705Fa
as the canonical Eth2 deposit contract address.
Documentation
The Lighthouse Book contains information for users and developers.
The Lighthouse team maintains a blog at lighthouse.sigmaprime.io which contains periodical progress updates, roadmap insights and interesting findings.
Branches
Lighthouse maintains two permanent branches:
stable: Always points to the latest stable release.- This is ideal for most users.
unstable: Used for development, contains the latest PRs.- Developers should base their PRs on this branch.
Contributing
Lighthouse welcomes contributors.
If you are looking to contribute, please head to the Contributing section of the Lighthouse book.
Contact
The best place for discussion is the Lighthouse Discord server.
Sign up to the Lighthouse Development Updates mailing list for email notifications about releases, network status and other important information.
Encrypt sensitive messages using our PGP key.
Donations
Lighthouse is an open-source project and a public good. Funding public goods is hard and we're grateful for the donations we receive from the community via:
- Gitcoin Grants.
- Ethereum address:
0x25c4a76E7d118705e7Ea2e9b7d8C59930d8aCD3b(donation.sigmaprime.eth).
