## Proposed Changes
* Implement the validator client and HTTP API changes necessary to support Altair
Co-authored-by: realbigsean <seananderson33@gmail.com>
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
## Issue Addressed
Resolves#2069
## Proposed Changes
- Adds a `--doppelganger-detection` flag
- Adds a `lighthouse/seen_validators` endpoint, which will make it so the lighthouse VC is not interopable with other client beacon nodes if the `--doppelganger-detection` flag is used, but hopefully this will become standardized. Relevant Eth2 API repo issue: https://github.com/ethereum/eth2.0-APIs/issues/64
- If the `--doppelganger-detection` flag is used, the VC will wait until the beacon node is synced, and then wait an additional 2 epochs. The reason for this is to make sure the beacon node is able to subscribe to the subnets our validators should be attesting on. I think an alternative would be to have the beacon node subscribe to all subnets for 2+ epochs on startup by default.
## Additional Info
I'd like to add tests and would appreciate feedback.
TODO: handle validators started via the API, potentially make this default behavior
Co-authored-by: realbigsean <seananderson33@gmail.com>
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
Co-authored-by: Paul Hauner <paul@paulhauner.com>
## Issue Addressed
Resolves#2313
## Proposed Changes
Provide `BeaconNodeHttpClient` with a dedicated `Timeouts` struct.
This will allow granular adjustment of the timeout duration for different calls made from the VC to the BN. These can either be a constant value, or as a ratio of the slot duration.
Improve timeout performance by using these adjusted timeout duration's only whenever a fallback endpoint is available.
Add a CLI flag called `use-long-timeouts` to revert to the old behavior.
## Additional Info
Additionally set the default `BeaconNodeHttpClient` timeouts to the be the slot duration of the network, rather than a constant 12 seconds. This will allow it to adjust to different network specifications.
Co-authored-by: Paul Hauner <paul@paulhauner.com>
## Issue Addressed
Closes#2052
## Proposed Changes
- Refactor the attester/proposer duties endpoints in the BN
- Performance improvements
- Fixes some potential inconsistencies with the dependent root fields.
- Removes `http_api::beacon_proposer_cache` and just uses the one on the `BeaconChain` instead.
- Move the code for the proposer/attester duties endpoints into separate files, for readability.
- Refactor the `DutiesService` in the VC
- Required to reduce the delay on broadcasting new blocks.
- Gets rid of the `ValidatorDuty` shim struct that came about when we adopted the standard API.
- Separate block/attestation duty tasks so that they don't block each other when one is slow.
- In the VC, use `PublicKeyBytes` to represent validators instead of `PublicKey`. `PublicKey` is a legit crypto object whilst `PublicKeyBytes` is just a byte-array, it's much faster to clone/hash `PublicKeyBytes` and this change has had a significant impact on runtimes.
- Unfortunately this has created lots of dust changes.
- In the BN, store `PublicKeyBytes` in the `beacon_proposer_cache` and allow access to them. The HTTP API always sends `PublicKeyBytes` over the wire and the conversion from `PublicKey` -> `PublickeyBytes` is non-trivial, especially when queries have 100s/1000s of validators (like Pyrmont).
- Add the `state_processing::state_advance` mod which dedups a lot of the "apply `n` skip slots to the state" code.
- This also fixes a bug with some functions which were failing to include a state root as per [this comment](072695284f/consensus/state_processing/src/state_advance.rs (L69-L74)). I couldn't find any instance of this bug that resulted in anything more severe than keying a shuffling cache by the wrong block root.
- Swap the VC block service to use `mpsc` from `tokio` instead of `futures`. This is consistent with the rest of the code base.
~~This PR *reduces* the size of the codebase 🎉~~ It *used* to reduce the size of the code base before I added more comments.
## Observations on Prymont
- Proposer duties times down from peaks of 450ms to consistent <1ms.
- Current epoch attester duties times down from >1s peaks to a consistent 20-30ms.
- Block production down from +600ms to 100-200ms.
## Additional Info
- ~~Blocked on #2241~~
- ~~Blocked on #2234~~
## TODO
- [x] ~~Refactor this into some smaller PRs?~~ Leaving this as-is for now.
- [x] Address `per_slot_processing` roots.
- [x] Investigate slow next epoch times. Not getting added to cache on block processing?
- [x] Consider [this](072695284f/beacon_node/store/src/hot_cold_store.rs (L811-L812)) in the scenario of replacing the state roots
Co-authored-by: pawan <pawandhananjay@gmail.com>
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
## Proposed Changes
Prune the slashing protection database so that it doesn't exhibit unbounded growth. Prune by dropping attestations and blocks from more than 512 epochs ago, relying on the guards that prevent signing messages with slots or epochs less than the minimum recorded in the DB.
The pruning process is potentially time consuming, so it's scheduled to run only every 512 epochs, in the last 2/3rds of a slot. This gives it at least 4 seconds to run without impacting other signing, which I think should be sufficient. I've seen it run for several minutes (yikes!) on our Pyrmont nodes, but I suspect that 1) this will only occur on the first run when the database is still huge 2) no other production users will be impacted because they don't have enough validators per node.
Pruning also happens at start-up, as I figured this is a fairly infrequent event, and if a user is experiencing problems with the VC related to pruning, it's nice to be able to trigger it with a quick restart. Users are also conditioned to not mind missing a few attestations during a restart.
We need to include a note in the release notes that users may see the message `timed out waiting for connection` the first time they prune a huge database, but that this is totally fine and to be expected (the VC will miss those attestations in the meantime).
I'm also open to making this opt-in for now, although the sooner we get users doing it, the less painful it will be: prune early, prune often!
## Issue Addressed
- Resolves#1883
## Proposed Changes
This follows on from @blacktemplar's work in #2018.
- Allows the VC to connect to multiple BN for redundancy.
- Update the simulator so some nodes always need to rely on their fallback.
- Adds some extra deprecation warnings for `--eth1-endpoint`
- Pass `SignatureBytes` as a reference instead of by value.
## Additional Info
NA
Co-authored-by: blacktemplar <blacktemplar@a1.net>
## Issue Addressed
NA
## Proposed Changes
- Adds a HTTP server to the VC which provides Prometheus metrics.
- Moves the health metrics into the `lighthouse_metrics` crate so it can be shared between BN/VC.
- Sprinkle some metrics around the VC.
- Update the book to indicate that we now have VC metrics.
- Shifts the "waiting for genesis" logic later in the `ProductionValidatorClient::new_from_cli`
- This is worth attention during the review.
## Additional Info
- ~~`clippy` has some new lints that are failing. I'll deal with that in another PR.~~