## Proposed Changes
Clippy 1.67.0 put us on blast for the size of some of our errors, most of them written by me ( 👀 ). This PR shrinks the size of `BeaconChainError` by dropping some extraneous info and boxing an inner error which should only occur infrequently anyway.
For the `AttestationSlashInfo` and `BlockSlashInfo` I opted to ignore the lint as they are always used in a `Result<A, Info>` where `A` is a similar size. This means they don't bloat the size of the `Result`, so it's a bit annoying for Clippy to report this as an issue.
I also chose to ignore `clippy::uninlined-format-args` because I think the benefit-to-churn ratio is too low. E.g. sometimes we have long identifiers in `format!` args and IMO the non-inlined form is easier to read:
```rust
// I prefer this...
format!(
"{} did {} to {}",
REALLY_LONG_CONSTANT_NAME,
ANOTHER_REALLY_LONG_CONSTANT_NAME,
regular_long_identifier_name
);
// To this
format!("{REALLY_LONG_CONSTANT_NAME} did {ANOTHER_REALLY_LONG_CONSTANT_NAME} to {regular_long_identifier_name}");
```
I tried generating an automatic diff with `cargo clippy --fix` but it came out at:
```
250 files changed, 1209 insertions(+), 1469 deletions(-)
```
Which seems like a bad idea when we'd have to back-merge it to `capella` and `eip4844` 😱
## Issue Addressed
Closes https://github.com/sigp/lighthouse/issues/2327
## Proposed Changes
This is an extension of some ideas I implemented while working on `tree-states`:
- Cache the indexed attestations from blocks in the `ConsensusContext`. Previously we were re-computing them 3-4 times over.
- Clean up `import_block` by splitting each part into `import_block_XXX`.
- Move some stuff off hot paths, specifically:
- Relocate non-essential tasks that were running between receiving the payload verification status and priming the early attester cache. These tasks are moved after the cache priming:
- Attestation observation
- Validator monitor updates
- Slasher updates
- Updating the shuffling cache
- Fork choice attestation observation now happens at the end of block verification in parallel with payload verification (this seems to save 5-10ms).
- Payload verification now happens _before_ advancing the pre-state and writing it to disk! States were previously being written eagerly and adding ~20-30ms in front of verifying the execution payload. State catchup also sometimes takes ~500ms if we get a cache miss and need to rebuild the tree hash cache.
The remaining task that's taking substantial time (~20ms) is importing the block to fork choice. I _think_ this is because of pull-tips, and we should be able to optimise it out with a clever total active balance cache in the state (which would be computed in parallel with payload verification). I've decided to leave that for future work though. For now it can be observed via the new `beacon_block_processing_post_exec_pre_attestable_seconds` metric.
Co-authored-by: Michael Sproul <micsproul@gmail.com>
## Issue Addressed
Closes https://github.com/sigp/lighthouse/issues/2371
## Proposed Changes
Backport some changes from `tree-states` that remove duplicated calculations of the `proposer_index`.
With this change the proposer index should be calculated only once for each block, and then plumbed through to every place it is required.
## Additional Info
In future I hope to add more data to the consensus context that is cached on a per-epoch basis, like the effective balances of validators and the base rewards.
There are some other changes to remove indexing in tests that were also useful for `tree-states` (the `tree-states` types don't implement `Index`).
## Issue Addressed
While digging around in some logs I noticed that queries for validators by pubkey were taking 10ms+, which seemed too long. This was due to a loop through the entire validator registry for each lookup.
## Proposed Changes
Rather than using a loop through the register, this PR utilises the pubkey cache which is usually initialised at the head*. In case the cache isn't built, we fall back to the previous loop logic. In the vast majority of cases I expect the cache will be built, as the validator client queries at the `head` where all caches should be built.
## Additional Info
*I had to modify the cache build that runs after fork choice to build the pubkey cache. I think it had been optimised out, perhaps accidentally. I think it's preferable to have the exit cache and the pubkey cache built on the head state, as they are required for verifying deposits and exits respectively, and we may as well build them off the hot path of block processing. Previously they'd get built the first time a deposit or exit needed to be verified.
I've deleted the unused `map_state` function which was obsoleted by `map_state_and_execution_optimistic`.
## Issue Addressed
NA
## Proposed Changes
Adds a test that was written whilst doing some testing. This PR does not make changes to production code, it just adds a test for already existing functionality.
## Additional Info
NA
## Overview
This rather extensive PR achieves two primary goals:
1. Uses the finalized/justified checkpoints of fork choice (FC), rather than that of the head state.
2. Refactors fork choice, block production and block processing to `async` functions.
Additionally, it achieves:
- Concurrent forkchoice updates to the EL and cache pruning after a new head is selected.
- Concurrent "block packing" (attestations, etc) and execution payload retrieval during block production.
- Concurrent per-block-processing and execution payload verification during block processing.
- The `Arc`-ification of `SignedBeaconBlock` during block processing (it's never mutated, so why not?):
- I had to do this to deal with sending blocks into spawned tasks.
- Previously we were cloning the beacon block at least 2 times during each block processing, these clones are either removed or turned into cheaper `Arc` clones.
- We were also `Box`-ing and un-`Box`-ing beacon blocks as they moved throughout the networking crate. This is not a big deal, but it's nice to avoid shifting things between the stack and heap.
- Avoids cloning *all the blocks* in *every chain segment* during sync.
- It also has the potential to clean up our code where we need to pass an *owned* block around so we can send it back in the case of an error (I didn't do much of this, my PR is already big enough 😅)
- The `BeaconChain::HeadSafetyStatus` struct was removed. It was an old relic from prior merge specs.
For motivation for this change, see https://github.com/sigp/lighthouse/pull/3244#issuecomment-1160963273
## Changes to `canonical_head` and `fork_choice`
Previously, the `BeaconChain` had two separate fields:
```
canonical_head: RwLock<Snapshot>,
fork_choice: RwLock<BeaconForkChoice>
```
Now, we have grouped these values under a single struct:
```
canonical_head: CanonicalHead {
cached_head: RwLock<Arc<Snapshot>>,
fork_choice: RwLock<BeaconForkChoice>
}
```
Apart from ergonomics, the only *actual* change here is wrapping the canonical head snapshot in an `Arc`. This means that we no longer need to hold the `cached_head` (`canonical_head`, in old terms) lock when we want to pull some values from it. This was done to avoid deadlock risks by preventing functions from acquiring (and holding) the `cached_head` and `fork_choice` locks simultaneously.
## Breaking Changes
### The `state` (root) field in the `finalized_checkpoint` SSE event
Consider the scenario where epoch `n` is just finalized, but `start_slot(n)` is skipped. There are two state roots we might in the `finalized_checkpoint` SSE event:
1. The state root of the finalized block, which is `get_block(finalized_checkpoint.root).state_root`.
4. The state root at slot of `start_slot(n)`, which would be the state from (1), but "skipped forward" through any skip slots.
Previously, Lighthouse would choose (2). However, we can see that when [Teku generates that event](de2b2801c8/data/beaconrestapi/src/main/java/tech/pegasys/teku/beaconrestapi/handlers/v1/events/EventSubscriptionManager.java (L171-L182)) it uses [`getStateRootFromBlockRoot`](de2b2801c8/data/provider/src/main/java/tech/pegasys/teku/api/ChainDataProvider.java (L336-L341)) which uses (1).
I have switched Lighthouse from (2) to (1). I think it's a somewhat arbitrary choice between the two, where (1) is easier to compute and is consistent with Teku.
## Notes for Reviewers
I've renamed `BeaconChain::fork_choice` to `BeaconChain::recompute_head`. Doing this helped ensure I broke all previous uses of fork choice and I also find it more descriptive. It describes an action and can't be confused with trying to get a reference to the `ForkChoice` struct.
I've changed the ordering of SSE events when a block is received. It used to be `[block, finalized, head]` and now it's `[block, head, finalized]`. It was easier this way and I don't think we were making any promises about SSE event ordering so it's not "breaking".
I've made it so fork choice will run when it's first constructed. I did this because I wanted to have a cached version of the last call to `get_head`. Ensuring `get_head` has been run *at least once* means that the cached values doesn't need to wrapped in an `Option`. This was fairly simple, it just involved passing a `slot` to the constructor so it knows *when* it's being run. When loading a fork choice from the store and a slot clock isn't handy I've just used the `slot` that was saved in the `fork_choice_store`. That seems like it would be a faithful representation of the slot when we saved it.
I added the `genesis_time: u64` to the `BeaconChain`. It's small, constant and nice to have around.
Since we're using FC for the fin/just checkpoints, we no longer get the `0x00..00` roots at genesis. You can see I had to remove a work-around in `ef-tests` here: b56be3bc2. I can't find any reason why this would be an issue, if anything I think it'll be better since the genesis-alias has caught us out a few times (0x00..00 isn't actually a real root). Edit: I did find a case where the `network` expected the 0x00..00 alias and patched it here: 3f26ac3e2.
You'll notice a lot of changes in tests. Generally, tests should be functionally equivalent. Here are the things creating the most diff-noise in tests:
- Changing tests to be `tokio::async` tests.
- Adding `.await` to fork choice, block processing and block production functions.
- Refactor of the `canonical_head` "API" provided by the `BeaconChain`. E.g., `chain.canonical_head.cached_head()` instead of `chain.canonical_head.read()`.
- Wrapping `SignedBeaconBlock` in an `Arc`.
- In the `beacon_chain/tests/block_verification`, we can't use the `lazy_static` `CHAIN_SEGMENT` variable anymore since it's generated with an async function. We just generate it in each test, not so efficient but hopefully insignificant.
I had to disable `rayon` concurrent tests in the `fork_choice` tests. This is because the use of `rayon` and `block_on` was causing a panic.
Co-authored-by: Mac L <mjladson@pm.me>
## Issue Addressed
NA
## Proposed Changes
- Uses a `Vec` in `SingleEpochParticipationCache` rather than `HashMap` to speed up processing times at the cost of memory usage.
- Cache the result of `integer_sqrt` rather than recomputing for each validator.
- Cache `state.previous_epoch` rather than recomputing it for each validator.
### Benchmarks
Benchmarks on a recent mainnet state using #3252 to get timing.
#### Without this PR
```
lcli skip-slots --state-path /tmp/state-0x3cdc.ssz --partial-state-advance --slots 32 --state-root 0x3cdc33cd02713d8d6cc33a6dbe2d3a5bf9af1d357de0d175a403496486ff845e --runs 10
[2022-06-09T08:21:02Z INFO lcli::skip_slots] Using mainnet spec
[2022-06-09T08:21:02Z INFO lcli::skip_slots] Advancing 32 slots
[2022-06-09T08:21:02Z INFO lcli::skip_slots] Doing 10 runs
[2022-06-09T08:21:02Z INFO lcli::skip_slots] State path: "/tmp/state-0x3cdc.ssz"
SSZ decoding /tmp/state-0x3cdc.ssz: 43ms
[2022-06-09T08:21:03Z INFO lcli::skip_slots] Run 0: 245.718794ms
[2022-06-09T08:21:03Z INFO lcli::skip_slots] Run 1: 245.364782ms
[2022-06-09T08:21:03Z INFO lcli::skip_slots] Run 2: 255.866179ms
[2022-06-09T08:21:04Z INFO lcli::skip_slots] Run 3: 243.838909ms
[2022-06-09T08:21:04Z INFO lcli::skip_slots] Run 4: 250.431425ms
[2022-06-09T08:21:04Z INFO lcli::skip_slots] Run 5: 248.68765ms
[2022-06-09T08:21:04Z INFO lcli::skip_slots] Run 6: 262.051113ms
[2022-06-09T08:21:05Z INFO lcli::skip_slots] Run 7: 264.293967ms
[2022-06-09T08:21:05Z INFO lcli::skip_slots] Run 8: 293.202007ms
[2022-06-09T08:21:05Z INFO lcli::skip_slots] Run 9: 264.552017ms
```
#### With this PR:
```
lcli skip-slots --state-path /tmp/state-0x3cdc.ssz --partial-state-advance --slots 32 --state-root 0x3cdc33cd02713d8d6cc33a6dbe2d3a5bf9af1d357de0d175a403496486ff845e --runs 10
[2022-06-09T08:57:59Z INFO lcli::skip_slots] Run 0: 73.898678ms
[2022-06-09T08:57:59Z INFO lcli::skip_slots] Run 1: 75.536978ms
[2022-06-09T08:57:59Z INFO lcli::skip_slots] Run 2: 75.176104ms
[2022-06-09T08:57:59Z INFO lcli::skip_slots] Run 3: 76.460828ms
[2022-06-09T08:57:59Z INFO lcli::skip_slots] Run 4: 75.904195ms
[2022-06-09T08:58:00Z INFO lcli::skip_slots] Run 5: 75.53077ms
[2022-06-09T08:58:00Z INFO lcli::skip_slots] Run 6: 74.745572ms
[2022-06-09T08:58:00Z INFO lcli::skip_slots] Run 7: 75.823489ms
[2022-06-09T08:58:00Z INFO lcli::skip_slots] Run 8: 74.892055ms
[2022-06-09T08:58:00Z INFO lcli::skip_slots] Run 9: 76.333569ms
```
## Additional Info
NA
## Issue Addressed
MEV boost compatibility
## Proposed Changes
See #2987
## Additional Info
This is blocked on the stabilization of a couple specs, [here](https://github.com/ethereum/beacon-APIs/pull/194) and [here](https://github.com/flashbots/mev-boost/pull/20).
Additional TODO's and outstanding questions
- [ ] MEV boost JWT Auth
- [ ] Will `builder_proposeBlindedBlock` return the revealed payload for the BN to propogate
- [ ] Should we remove `private-tx-proposals` flag and communicate BN <> VC with blinded blocks by default once these endpoints enter the beacon-API's repo? This simplifies merge transition logic.
Co-authored-by: realbigsean <seananderson33@gmail.com>
Co-authored-by: realbigsean <sean@sigmaprime.io>
## Issue Addressed
NA
## Proposed Changes
Adds the functionality to allow blocks to be validated/invalidated after their import as per the [optimistic sync spec](https://github.com/ethereum/consensus-specs/blob/dev/sync/optimistic.md#how-to-optimistically-import-blocks). This means:
- Updating `ProtoArray` to allow flipping the `execution_status` of ancestors/descendants based on payload validity updates.
- Creating separation between `execution_layer` and the `beacon_chain` by creating a `PayloadStatus` struct.
- Refactoring how the `execution_layer` selects a `PayloadStatus` from the multiple statuses returned from multiple EEs.
- Adding testing framework for optimistic imports.
- Add `ExecutionBlockHash(Hash256)` new-type struct to avoid confusion between *beacon block roots* and *execution payload hashes*.
- Add `merge` to [`FORKS`](c3a793fd73/Makefile (L17)) in the `Makefile` to ensure we test the beacon chain with merge settings.
- Fix some tests here that were failing due to a missing execution layer.
## TODO
- [ ] Balance tests
Co-authored-by: Mark Mackey <mark@sigmaprime.io>
## Proposed Changes
Lots of lint updates related to `flat_map`, `unwrap_or_else` and string patterns. I did a little more creative refactoring in the op pool, but otherwise followed Clippy's suggestions.
## Additional Info
We need this PR to unblock CI.