This PR fixes a bug where wrong columns could get processed immediately after a CGC increase.
Scenario:
- The node's CGC increased due to additional validators attached to it (lets say from 10 to 11)
- The new CGC is advertised and new subnets are subscribed immediately, however the change won't be effective in the data availability check until the next epoch (See [this](ab0e8870b4/beacon_node/beacon_chain/src/validator_custody.rs (L93-L99))). Data availability checker still only require 10 columns for the current epoch.
- During this time, data columns for the additional custody column (lets say column 11) may arrive via gossip as we're already subscribed to the topic, and it may be incorrectly used to satisfy the existing data availability requirement (10 columns), and result in this additional column (instead of a required one) getting persisted, resulting in database inconsistency.
Fix Clippy for recently released Rust 1.90 beta. There may be more changes required when Rust 1.89 stable is released in a few days, but possibly not 🤞
Resolves#6767
This PR implements a basic version of validator custody.
- It introduces a new `CustodyContext` object which contains info regarding number of validators attached to a node and the custody count they contribute to the cgc.
- The `CustodyContext` is added in the da_checker and has methods for returning the current cgc and the number of columns to sample at head. Note that the logic for returning the cgc existed previously in the network globals.
- To estimate the number of validators attached, we use the `beacon_committee_subscriptions` endpoint. This might overestimate the number of validators actually publishing attestations from the node in the case of multi BN setups. We could also potentially use the `publish_attestations` endpoint to get a more conservative estimate at a later point.
- Anytime there's a change in the `custody_group_count` due to addition/removal of validators, the custody context should send an event on a broadcast channnel. The only subscriber for the channel exists in the network service which simply subscribes to more subnets. There can be additional subscribers in sync that will start a backfill once the cgc changes.
TODO
- [ ] **NOT REQUIRED:** Currently, the logic only handles an increase in validator count and does not handle a decrease. We should ideally unsubscribe from subnets when the cgc has decreased.
- [ ] **NOT REQUIRED:** Add a service in the `CustodyContext` that emits an event once `MIN_EPOCHS_FOR_BLOB_SIDECARS_REQUESTS ` passes after updating the current cgc. This event should be picked up by a subscriber which updates the enr and metadata.
- [x] Add more tests
Backport of:
- https://github.com/sigp/lighthouse/pull/7067
For:
- https://github.com/sigp/lighthouse/issues/7039
- Prevent writing to state cache when migrating the database
- Add `state-cache-headroom` flag to control pruning
- Prune old epoch boundary states ahead of mid-epoch states
- Never prune head block's state
- Avoid caching ancestor states unless they are on an epoch boundary
- Log when states enter/exit the cache
Co-authored-by: Eitan Seri-Levi <eserilev@ucsc.edu>
* Get blobs from EL.
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
* Avoid cloning blobs after fetching blobs.
* Address review comments and refactor code.
* Fix lint.
* Move blob computation metric to the right spot.
* Merge branch 'unstable' into das-fetch-blobs
* Merge branch 'unstable' into das-fetch-blobs
# Conflicts:
# beacon_node/beacon_chain/src/beacon_chain.rs
# beacon_node/beacon_chain/src/block_verification.rs
# beacon_node/beacon_chain/src/data_availability_checker/overflow_lru_cache.rs
* Merge branch 'unstable' into das-fetch-blobs
# Conflicts:
# beacon_node/beacon_chain/src/beacon_chain.rs
* Gradual publication of data columns for supernodes.
* Recompute head after importing block with blobs from the EL.
* Fix lint
* Merge branch 'unstable' into das-fetch-blobs
* Use blocking task instead of async when computing cells.
* Merge branch 'das-fetch-blobs' of github.com:jimmygchen/lighthouse into das-fetch-blobs
* Merge remote-tracking branch 'origin/unstable' into das-fetch-blobs
* Fix semantic conflicts
* Downgrade error log.
* Merge branch 'unstable' into das-fetch-blobs
# Conflicts:
# beacon_node/beacon_chain/src/data_availability_checker.rs
# beacon_node/beacon_chain/src/data_availability_checker/overflow_lru_cache.rs
# beacon_node/execution_layer/src/engine_api.rs
# beacon_node/execution_layer/src/engine_api/json_structures.rs
# beacon_node/network/src/network_beacon_processor/gossip_methods.rs
# beacon_node/network/src/network_beacon_processor/mod.rs
# beacon_node/network/src/network_beacon_processor/sync_methods.rs
* Merge branch 'unstable' into das-fetch-blobs
* Publish block without waiting for blob and column proof computation.
* Address review comments and refactor.
* Merge branch 'unstable' into das-fetch-blobs
* Fix test and docs.
* Comment cleanups.
* Merge branch 'unstable' into das-fetch-blobs
* Address review comments and cleanup
* Address review comments and cleanup
* Refactor to de-duplicate gradual publication logic.
* Add more logging.
* Merge remote-tracking branch 'origin/unstable' into das-fetch-blobs
# Conflicts:
# Cargo.lock
* Fix incorrect comparison on `num_fetched_blobs`.
* Implement gradual blob publication.
* Merge branch 'unstable' into das-fetch-blobs
* Inline `publish_fn`.
* Merge branch 'das-fetch-blobs' of github.com:jimmygchen/lighthouse into das-fetch-blobs
* Gossip verify blobs before publishing
* Avoid queries for 0 blobs and error for duplicates
* Gossip verified engine blob before processing them, and use observe cache to detect duplicates before publishing.
* Merge branch 'das-fetch-blobs' of github.com:jimmygchen/lighthouse into das-fetch-blobs
# Conflicts:
# beacon_node/network/src/network_beacon_processor/mod.rs
* Merge branch 'unstable' into das-fetch-blobs
* Fix invalid commitment inclusion proofs in blob sidecars created from EL blobs.
* Only publish EL blobs triggered from gossip block, and not RPC block.
* Downgrade gossip blob log to `debug`.
* Merge branch 'unstable' into das-fetch-blobs
* Merge branch 'unstable' into das-fetch-blobs
* Grammar
* start splitting gossip verification
* WIP
* Gossip verify separate (#7)
* save
* save
* make ProvenancedBlock concrete
* delete into gossip verified block contents
* get rid of IntoBlobSidecar trait
* remove IntoGossipVerified trait
* get tests compiling
* don't check sidecar slashability in publish
* remove second publish closure
* drop blob bool. also prefer using message index over index of position in list
* Merge remote-tracking branch 'origin/unstable' into gossip-verify-separate
* Fix low-hanging tests
* Fix tests and clean up
* Clean up imports
* more cleanup
* Merge remote-tracking branch 'origin/unstable' into gossip-verify-separate
* Further refine behaviour and add tests
* Merge remote-tracking branch 'origin/unstable' into gossip-verify-separate
* Merge remote-tracking branch 'origin/unstable' into gossip-verify-separate
* Remove empty line
* Fix test (block is not fully imported just gossip verified)
* Merge remote-tracking branch 'origin/unstable' into gossip-verify-separate
* Update for unstable & use empty blob list
* Update comment
* Add test for duplicate block case
* Merge remote-tracking branch 'origin/unstable' into gossip-verify-separate
* Clarify unreachable case
* Fix another publish_block case
* Remove unreachable case in filter chain segment
* Revert unrelated blob optimisation
* Merge remote-tracking branch 'origin/unstable' into gossip-verify-separate
* Merge remote-tracking branch 'origin/unstable' into gossip-verify-separate
* Fix merge conflicts
* Merge remote-tracking branch 'origin/unstable' into gossip-verify-separate
* Fix some compilation issues. Impl is fucked though
* Support peerDAS
* Fix tests
* Merge remote-tracking branch 'origin/unstable' into gossip-verify-separate
* Fix conflict
* Merge remote-tracking branch 'origin/unstable' into gossip-verify-separate
* Address review comments
* Merge remote-tracking branch 'origin/unstable' into gossip-verify-separate
* Attestation superstruct changes for EIP 7549 (#5644)
* update
* experiment
* superstruct changes
* revert
* superstruct changes
* fix tests
* indexed attestation
* indexed attestation superstruct
* updated TODOs
* `superstruct` the `AttesterSlashing` (#5636)
* `superstruct` Attester Fork Variants
* Push a little further
* Deal with Encode / Decode of AttesterSlashing
* not so sure about this..
* Stop Encode/Decode Bounds from Propagating Out
* Tons of Changes..
* More Conversions to AttestationRef
* Add AsReference trait (#15)
* Add AsReference trait
* Fix some snafus
* Got it Compiling! :D
* Got Tests Building
* Get beacon chain tests compiling
---------
Co-authored-by: Michael Sproul <micsproul@gmail.com>
* Merge remote-tracking branch 'upstream/unstable' into electra_attestation_changes
* Make EF Tests Fork-Agnostic (#5713)
* Finish EF Test Fork Agnostic (#5714)
* Superstruct `AggregateAndProof` (#5715)
* Upgrade `superstruct` to `0.8.0`
* superstruct `AggregateAndProof`
* Merge remote-tracking branch 'sigp/unstable' into electra_attestation_changes
* cargo fmt
* Merge pull request #5726 from realbigsean/electra_attestation_changes
Merge unstable into Electra attestation changes
* EIP7549 `get_attestation_indices` (#5657)
* get attesting indices electra impl
* fmt
* get tests to pass
* fmt
* fix some beacon chain tests
* fmt
* fix slasher test
* fmt got me again
* fix more tests
* fix tests
* Some small changes (#5739)
* cargo fmt (#5740)
* Sketch op pool changes
* fix get attesting indices (#5742)
* fix get attesting indices
* better errors
* fix compile
* only get committee index once
* Ef test fixes (#5753)
* attestation related ef test fixes
* delete commented out stuff
* Fix Aggregation Pool for Electra (#5754)
* Fix Aggregation Pool for Electra
* Remove Outdated Interface
* fix ssz (#5755)
* Get `electra_op_pool` up to date (#5756)
* fix get attesting indices (#5742)
* fix get attesting indices
* better errors
* fix compile
* only get committee index once
* Ef test fixes (#5753)
* attestation related ef test fixes
* delete commented out stuff
* Fix Aggregation Pool for Electra (#5754)
* Fix Aggregation Pool for Electra
* Remove Outdated Interface
* fix ssz (#5755)
---------
Co-authored-by: realbigsean <sean@sigmaprime.io>
* Revert "Get `electra_op_pool` up to date (#5756)" (#5757)
This reverts commit ab9e58aa3d.
* Merge branch 'electra_attestation_changes' of https://github.com/sigp/lighthouse into electra_op_pool
* Compute on chain aggregate impl (#5752)
* add compute_on_chain_agg impl to op pool changes
* fmt
* get op pool tests to pass
* update the naive agg pool interface (#5760)
* Fix bugs in cross-committee aggregation
* Add comment to max cover optimisation
* Fix assert
* Merge pull request #5749 from sigp/electra_op_pool
Optimise Electra op pool aggregation
* update committee offset
* Fix Electra Fork Choice Tests (#5764)
* Subscribe to the correct subnets for electra attestations (#5782)
* subscribe to the correct att subnets for electra
* subscribe to the correct att subnets for electra
* cargo fmt
* fix slashing handling
* Merge remote-tracking branch 'upstream/unstable'
* Send unagg attestation based on fork
* Publish all aggregates
* just one more check bro plz..
* Merge pull request #5832 from ethDreamer/electra_attestation_changes_merge_unstable
Merge `unstable` into `electra_attestation_changes`
* Merge pull request #5835 from realbigsean/fix-validator-logic
Fix validator logic
* Merge pull request #5816 from realbigsean/electra-attestation-slashing-handling
Electra slashing handling
* Electra attestation changes rm decode impl (#5856)
* Remove Crappy Decode impl for Attestation
* Remove Inefficient Attestation Decode impl
* Implement Schema Upgrade / Downgrade
* Update beacon_node/beacon_chain/src/schema_change/migration_schema_v20.rs
Co-authored-by: Michael Sproul <micsproul@gmail.com>
---------
Co-authored-by: Michael Sproul <micsproul@gmail.com>
* Fix failing attestation tests and misc electra attestation cleanup (#5810)
* - get attestation related beacon chain tests to pass
- observed attestations are now keyed off of data + committee index
- rename op pool attestationref to compactattestationref
- remove unwraps in agg pool and use options instead
- cherry pick some changes from ef-tests-electra
* cargo fmt
* fix failing test
* Revert dockerfile changes
* make committee_index return option
* function args shouldnt be a ref to attestation ref
* fmt
* fix dup imports
---------
Co-authored-by: realbigsean <seananderson33@GMAIL.com>
* fix some todos (#5817)
* Merge branch 'unstable' of https://github.com/sigp/lighthouse into electra_attestation_changes
* add consolidations to merkle calc for inclusion proof
* Remove Duplicate KZG Commitment Merkle Proof Code (#5874)
* Remove Duplicate KZG Commitment Merkle Proof Code
* s/tree_lists/fields/
* Merge branch 'unstable' of https://github.com/sigp/lighthouse into electra_attestation_changes
* fix compile
* Fix slasher tests (#5906)
* Fix electra tests
* Add electra attestations to double vote tests
* Update superstruct to 0.8
* Merge remote-tracking branch 'origin/unstable' into electra_attestation_changes
* Small cleanup in slasher tests
* Clean up Electra observed aggregates (#5929)
* Use consistent key in observed_attestations
* Remove unwraps from observed aggregates
* Merge branch 'unstable' of https://github.com/sigp/lighthouse into electra_attestation_changes
* De-dup attestation constructor logic
* Remove unwraps in Attestation construction
* Dedup match_attestation_data
* Remove outdated TODO
* Use ForkName Ord in fork-choice tests
* Use ForkName Ord in BeaconBlockBody
* Make to_electra not fallible
* Remove TestRandom impl for IndexedAttestation
* Remove IndexedAttestation faulty Decode impl
* Drop TestRandom impl
* Add PendingAttestationInElectra
* Indexed att on disk (#35)
* indexed att on disk
* fix lints
* Update slasher/src/migrate.rs
Co-authored-by: ethDreamer <37123614+ethDreamer@users.noreply.github.com>
---------
Co-authored-by: Lion - dapplion <35266934+dapplion@users.noreply.github.com>
Co-authored-by: ethDreamer <37123614+ethDreamer@users.noreply.github.com>
* add electra fork enabled fn to ForkName impl (#36)
* add electra fork enabled fn to ForkName impl
* remove inadvertent file
* Update common/eth2/src/types.rs
Co-authored-by: ethDreamer <37123614+ethDreamer@users.noreply.github.com>
* Dedup attestation constructor logic in attester cache
* Use if let Ok for committee_bits
* Dedup Attestation constructor code
* Diff reduction in tests
* Fix beacon_chain tests
* Diff reduction
* Use Ord for ForkName in pubsub
* Resolve into_attestation_and_indices todo
* Remove stale TODO
* Fix beacon_chain tests
* Test spec invariant
* Use electra_enabled in pubsub
* Remove get_indexed_attestation_from_signed_aggregate
* Use ok_or instead of if let else
* committees are sorted
* remove dup method `get_indexed_attestation_from_committees`
* Merge pull request #5940 from dapplion/electra_attestation_changes_lionreview
Electra attestations #5712 review
* update default persisted op pool deserialization
* ensure aggregate and proof uses serde untagged on ref
* Fork aware ssz static attestation tests
* Electra attestation changes from Lions review (#5971)
* dedup/cleanup and remove unneeded hashset use
* remove irrelevant TODOs
* Merge branch 'unstable' of https://github.com/sigp/lighthouse into electra_attestation_changes
* Electra attestation changes sean review (#5972)
* instantiate empty bitlist in unreachable code
* clean up error conversion
* fork enabled bool cleanup
* remove a couple todos
* return bools instead of options in `aggregate` and use the result
* delete commented out code
* use map macros in simple transformations
* remove signers_disjoint_from
* get ef tests compiling
* get ef tests compiling
* update intentionally excluded files
* Avoid changing slasher schema for Electra
* Delete slasher schema v4
* Fix clippy
* Fix compilation of beacon_chain tests
* Update database.rs
* Add electra lightclient types
* Update slasher/src/database.rs
* fix imports
* Merge pull request #5980 from dapplion/electra-lightclient
Add electra lightclient types
* Merge pull request #5975 from michaelsproul/electra-slasher-no-migration
Avoid changing slasher schema for Electra
* Update beacon_node/beacon_chain/src/attestation_verification.rs
* Update beacon_node/beacon_chain/src/attestation_verification.rs
* add processing and processed caching to the DA checker
* move processing cache out of critical cache
* get it compiling
* fix lints
* add docs to `AvailabilityView`
* some self review
* fix lints
* fix beacon chain tests
* cargo fmt
* make availability view easier to implement, start on testing
* move child component cache and finish test
* cargo fix
* cargo fix
* cargo fix
* fmt and lint
* make blob commitments not optional, rename some caches, add missing blobs struct
* Update beacon_node/beacon_chain/src/data_availability_checker/processing_cache.rs
Co-authored-by: ethDreamer <37123614+ethDreamer@users.noreply.github.com>
* marks review feedback and other general cleanup
* cargo fix
* improve availability view docs
* some renames
* some renames and docs
* fix should delay lookup logic
* get rid of some wrapper methods
* fix up single lookup changes
* add a couple docs
* add single blob merge method and improve process_... docs
* update some names
* lints
* fix merge
* remove blob indices from lookup creation log
* remove blob indices from lookup creation log
* delayed lookup logging improvement
* check fork choice before doing any blob processing
* remove unused dep
* Update beacon_node/beacon_chain/src/data_availability_checker/availability_view.rs
Co-authored-by: Michael Sproul <micsproul@gmail.com>
* Update beacon_node/beacon_chain/src/data_availability_checker/availability_view.rs
Co-authored-by: Michael Sproul <micsproul@gmail.com>
* Update beacon_node/beacon_chain/src/data_availability_checker/availability_view.rs
Co-authored-by: Michael Sproul <micsproul@gmail.com>
* Update beacon_node/beacon_chain/src/data_availability_checker/availability_view.rs
Co-authored-by: Michael Sproul <micsproul@gmail.com>
* Update beacon_node/network/src/sync/block_lookups/delayed_lookup.rs
Co-authored-by: Michael Sproul <micsproul@gmail.com>
* remove duplicate deps
* use gen range in random blobs geneartor
* rename processing cache fields
* require block root in rpc block construction and check block root consistency
* send peers as vec in single message
* spawn delayed lookup service from network beacon processor
* fix tests
---------
Co-authored-by: ethDreamer <37123614+ethDreamer@users.noreply.github.com>
Co-authored-by: Michael Sproul <micsproul@gmail.com>
## Issue Addressed
Closes#3210Closes#3211
## Proposed Changes
- Checkpoint sync from the latest finalized state regardless of its alignment.
- Add the `block_root` to the database's split point. This is _only_ added to the in-memory split in order to avoid a schema migration. See `load_split`.
- Add a new method to the DB called `get_advanced_state`, which looks up a state _by block root_, with a `state_root` as fallback. Using this method prevents accidental accesses of the split's unadvanced state, which does not exist in the hot DB and is not guaranteed to exist in the freezer DB at all. Previously Lighthouse would look up this state _from the freezer DB_, even if it was required for block/attestation processing, which was suboptimal.
- Replace several state look-ups in block and attestation processing with `get_advanced_state` so that they can't hit the split block's unadvanced state.
- Do not store any states in the freezer database by default. All states will be deleted upon being evicted from the hot database unless `--reconstruct-historic-states` is set. The anchor info which was previously used for checkpoint sync is used to implement this, including when syncing from genesis.
## Additional Info
Needs further testing. I want to stress-test the pruned database under Hydra.
The `get_advanced_state` method is intended to become more relevant over time: `tree-states` includes an identically named method that returns advanced states from its in-memory cache.
Co-authored-by: realbigsean <seananderson33@gmail.com>
## Issue Addressed
#4118
## Proposed Changes
This PR introduces a "progressive balances" cache on the `BeaconState`, which keeps track of the accumulated target attestation balance for the current & previous epochs. The cached values are utilised by fork choice to calculate unrealized justification and finalization (instead of converting epoch participation arrays to balances for each block we receive).
This optimization will be rolled out gradually to allow for more testing. A new `--progressive-balances disabled|checked|strict|fast` flag is introduced to support this:
- `checked`: enabled with checks against participation cache, and falls back to the existing epoch processing calculation if there is a total target attester balance mismatch. There is no performance gain from this as the participation cache still needs to be computed. **This is the default mode for now.**
- `strict`: enabled with checks against participation cache, returns error if there is a mismatch. **Used for testing only**.
- `fast`: enabled with no comparative checks and without computing the participation cache. This mode gives us the performance gains from the optimization. This is still experimental and not currently recommended for production usage, but will become the default mode in a future release.
- `disabled`: disable the usage of progressive cache, and use the existing method for FFG progression calculation. This mode may be useful if we find a bug and want to stop the frequent error logs.
### Tasks
- [x] Initial cache implementation in `BeaconState`
- [x] Perform checks in fork choice to compare the progressive balances cache against results from `ParticipationCache`
- [x] Add CLI flag, and disable the optimization by default
- [x] Testing on Goerli & Benchmarking
- [x] Move caching logic from state processing to the `ProgressiveBalancesCache` (see [this comment](https://github.com/sigp/lighthouse/pull/4362#discussion_r1230877001))
- [x] Add attesting balance metrics
Co-authored-by: Jimmy Chen <jimmy@sigmaprime.io>
## Issue Addressed
- #4293
- #4264
## Proposed Changes
*Changes largely follow those suggested in the main issue*.
- Add new routes to HTTP API
- `post_beacon_blocks_v2`
- `post_blinded_beacon_blocks_v2`
- Add new routes to `BeaconNodeHttpClient`
- `post_beacon_blocks_v2`
- `post_blinded_beacon_blocks_v2`
- Define new Eth2 common types
- `BroadcastValidation`, enum representing the level of validation to apply to blocks prior to broadcast
- `BroadcastValidationQuery`, the corresponding HTTP query string type for the above type
- ~~Define `_checked` variants of both `publish_block` and `publish_blinded_block` that enforce a validation level at a type level~~
- Add interactive tests to the `bn_http_api_tests` test target covering each validation level (to their own test module, `broadcast_validation_tests`)
- `beacon/blocks`
- `broadcast_validation=gossip`
- Invalid (400)
- Full Pass (200)
- Partial Pass (202)
- `broadcast_validation=consensus`
- Invalid (400)
- Only gossip (400)
- Only consensus pass (i.e., equivocates) (200)
- Full pass (200)
- `broadcast_validation=consensus_and_equivocation`
- Invalid (400)
- Invalid due to early equivocation (400)
- Only gossip (400)
- Only consensus (400)
- Pass (200)
- `beacon/blinded_blocks`
- `broadcast_validation=gossip`
- Invalid (400)
- Full Pass (200)
- Partial Pass (202)
- `broadcast_validation=consensus`
- Invalid (400)
- Only gossip (400)
- ~~Only consensus pass (i.e., equivocates) (200)~~
- Full pass (200)
- `broadcast_validation=consensus_and_equivocation`
- Invalid (400)
- Invalid due to early equivocation (400)
- Only gossip (400)
- Only consensus (400)
- Pass (200)
- Add a new trait, `IntoGossipVerifiedBlock`, which allows type-level guarantees to be made as to gossip validity
- Modify the structure of the `ObservedBlockProducers` cache from a `(slot, validator_index)` mapping to a `((slot, validator_index), block_root)` mapping
- Modify `ObservedBlockProducers::proposer_has_been_observed` to return a `SeenBlock` rather than a boolean on success
- Punish gossip peer (low) for submitting equivocating blocks
- Rename `BlockError::SlashablePublish` to `BlockError::SlashableProposal`
## Additional Info
This PR contains changes that directly modify how blocks are verified within the client. For more context, consult [comments in-thread](https://github.com/sigp/lighthouse/pull/4316#discussion_r1234724202).
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
## Issue Addressed
Closes#4332
## Proposed Changes
Remove the `CountUnrealized` type, defaulting unrealized justification to _on_. This fixes the #4332 issue by ensuring that importing the same block to fork choice always results in the same outcome.
Finalized sync speed may be slightly impacted by this change, but that is deemed an acceptable trade-off until the optimisation from #4118 is implemented.
TODO:
- [x] Also check that the block isn't a duplicate before importing
## Issue Addressed
#3212
## Proposed Changes
- Introduce a new `rate_limiting_backfill_queue` - any new inbound backfill work events gets immediately sent to this FIFO queue **without any processing**
- Spawn a `backfill_scheduler` routine that pops a backfill event from the FIFO queue at specified intervals (currently halfway through a slot, or at 6s after slot start for 12s slots) and sends the event to `BeaconProcessor` via a `scheduled_backfill_work_tx` channel
- This channel gets polled last in the `InboundEvents`, and work event received is wrapped in a `InboundEvent::ScheduledBackfillWork` enum variant, which gets processed immediately or queued by the `BeaconProcessor` (existing logic applies from here)
Diagram comparing backfill processing with / without rate-limiting:
https://github.com/sigp/lighthouse/issues/3212#issuecomment-1386249922
See this comment for @paulhauner's explanation and solution: https://github.com/sigp/lighthouse/issues/3212#issuecomment-1384674956
## Additional Info
I've compared this branch (with backfill processing rate limited to to 1 and 3 batches per slot) against the latest stable version. The CPU usage during backfill sync is reduced by ~5% - 20%, more details on this page:
https://hackmd.io/@jimmygchen/SJuVpJL3j
The above testing is done on Goerli (as I don't currently have hardware for Mainnet), I'm guessing the differences are likely to be bigger on mainnet due to block size.
### TODOs
- [x] Experiment with processing multiple batches per slot. (need to think about how to do this for different slot durations)
- [x] Add option to disable rate-limiting, enabed by default.
- [x] (No longer required now we're reusing the reprocessing queue) Complete the `backfill_scheduler` task when backfill sync is completed or not required
## Issue Addressed
NA
## Proposed Changes
- Implements https://github.com/ethereum/consensus-specs/pull/3290/
- Bumps `ef-tests` to [v1.3.0-rc.4](https://github.com/ethereum/consensus-spec-tests/releases/tag/v1.3.0-rc.4).
The `CountRealizedFull` concept has been removed and the `--count-unrealized-full` and `--count-unrealized` BN flags now do nothing but log a `WARN` when used.
## Database Migration Debt
This PR removes the `best_justified_checkpoint` from fork choice. This field is persisted on-disk and the correct way to go about this would be to make a DB migration to remove the field. However, in this PR I've simply stubbed out the value with a junk value. I've taken this approach because if we're going to do a DB migration I'd love to remove the `Option`s around the justified and finalized checkpoints on `ProtoNode` whilst we're at it. Those options were added in #2822 which was included in Lighthouse v2.1.0. The options were only put there to handle the migration and they've been set to `Some` ever since v2.1.0. There's no reason to keep them as options anymore.
I started adding the DB migration to this branch but I started to feel like I was bloating this rather critical PR with nice-to-haves. I've kept the partially-complete migration [over in my repo](https://github.com/paulhauner/lighthouse/tree/fc-pr-18-migration) so we can pick it up after this PR is merged.
## Issue Addressed
#3704
## Proposed Changes
Adds is_syncing_finalized: bool parameter for block verification functions. Sets the payload_verification_status to Optimistic if is_syncing_finalized is true. Uses SyncState in NetworkGlobals in BeaconProcessor to retrieve the syncing status.
## Additional Info
I could implement FinalizedSignatureVerifiedBlock if you think it would be nicer.
## Issue Addressed
Implements new optimistic sync test format from https://github.com/ethereum/consensus-specs/pull/2982.
## Proposed Changes
- Add parsing and runner support for the new test format.
- Extend the mock EL with a set of canned responses keyed by block hash. Although this doubles up on some of the existing functionality I think it's really nice to use compared to the `preloaded_responses` or static responses. I think we could write novel new opt sync tests using these primtives much more easily than the previous ones. Forks are natively supported, and different responses to `forkchoiceUpdated` and `newPayload` are also straight-forward.
## Additional Info
Blocked on merge of the spec PR and release of new test vectors.
## Issue Addressed
NA
## Proposed Changes
This PR removes duplicated block root computation.
Computing the `SignedBeaconBlock::canonical_root` has become more expensive since the merge as we need to compute the merke root of each transaction inside an `ExecutionPayload`.
Computing the root for [a mainnet block](https://beaconcha.in/slot/4704236) is taking ~10ms on my i7-8700K CPU @ 3.70GHz (no sha extensions). Given that our median seen-to-imported time for blocks is presently 300-400ms, removing a few duplicated block roots (~30ms) could represent an easy 10% improvement. When we consider that the seen-to-imported times include operations *after* the block has been placed in the early attester cache, we could expect the 30ms to be more significant WRT our seen-to-attestable times.
## Additional Info
NA
## Issue Addressed
Resolves#3448
## Proposed Changes
Removes a known failure that wasn't actually a known failure. The tests declare this block invalid and we refuse to import it due to `ExecutionPayloadError(UnverifiedNonOptimisticCandidate)`.
This is correct since there is only one "eth1" block included in this test and two are required to trigger the merge (pre- and post-TTD blocks). It is slot 1 (tick = 12s) when this block is imported so the import must be prevented by `SAFE_SLOTS_TO_IMPORT_OPTIMISTICALLY`.
I'm not sure where I got the idea in #3448 that this test needed retrospective checking, that seems like a false assumption in hindsight.
## Additional Info
- Blocked on #3464
## Proposed Changes
Update the invalid head tests so that they work with the current default fork choice configuration.
Thanks @realbigsean for fixing the persistence test and the EF tests.
Co-authored-by: realbigsean <sean@sigmaprime.io>
## Issue Addressed
NA
## Proposed Changes
There are scenarios where the only viable head will have an invalid execution payload, in this scenario the `get_head` function on `proto_array` will return an error. We must recover from this scenario by importing blocks from the network.
This PR stops `BeaconChain::recompute_head` from returning an error so that we can't accidentally start down-scoring peers or aborting block import just because the current head has an invalid payload.
## Reviewer Notes
The following changes are included:
1. Allow `fork_choice.get_head` to fail gracefully in `BeaconChain::process_block` when trying to update the `early_attester_cache`; simply don't add the block to the cache rather than aborting the entire process.
1. Don't return an error from `BeaconChain::recompute_head_at_current_slot` and `BeaconChain::recompute_head` to defensively prevent calling functions from aborting any process just because the fork choice function failed to run.
- This should have practically no effect, since most callers were still continuing if recomputing the head failed.
- The outlier is that the API will return 200 rather than a 500 when fork choice fails.
1. Add the `ProtoArrayForkChoice::set_all_blocks_to_optimistic` function to recover from the scenario where we've rebooted and the persisted fork choice has an invalid head.
## Issue Addressed
Closes https://github.com/sigp/lighthouse/issues/3241
Closes https://github.com/sigp/lighthouse/issues/3242
## Proposed Changes
* [x] Implement logic to remove equivocating validators from fork choice per https://github.com/ethereum/consensus-specs/pull/2845
* [x] Update tests to v1.2.0-rc.1. The new test which exercises `equivocating_indices` is passing.
* [x] Pull in some SSZ abstractions from the `tree-states` branch that make implementing Vec-compatible encoding for types like `BTreeSet` and `BTreeMap`.
* [x] Implement schema upgrades and downgrades for the database (new schema version is V11).
* [x] Apply attester slashings from blocks to fork choice
## Additional Info
* This PR doesn't need the `BTreeMap` impl, but `tree-states` does, and I don't think there's any harm in keeping it. But I could also be convinced to drop it.
Blocked on #3322.
## Issue Addressed
Add a flag that optionally enables unrealized vote tracking. Would like to test out on testnets and benchmark differences in methods of vote tracking. This PR includes a DB schema upgrade to enable to new vote tracking style.
Co-authored-by: realbigsean <sean@sigmaprime.io>
Co-authored-by: Paul Hauner <paul@paulhauner.com>
Co-authored-by: sean <seananderson33@gmail.com>
Co-authored-by: Mac L <mjladson@pm.me>
## Overview
This rather extensive PR achieves two primary goals:
1. Uses the finalized/justified checkpoints of fork choice (FC), rather than that of the head state.
2. Refactors fork choice, block production and block processing to `async` functions.
Additionally, it achieves:
- Concurrent forkchoice updates to the EL and cache pruning after a new head is selected.
- Concurrent "block packing" (attestations, etc) and execution payload retrieval during block production.
- Concurrent per-block-processing and execution payload verification during block processing.
- The `Arc`-ification of `SignedBeaconBlock` during block processing (it's never mutated, so why not?):
- I had to do this to deal with sending blocks into spawned tasks.
- Previously we were cloning the beacon block at least 2 times during each block processing, these clones are either removed or turned into cheaper `Arc` clones.
- We were also `Box`-ing and un-`Box`-ing beacon blocks as they moved throughout the networking crate. This is not a big deal, but it's nice to avoid shifting things between the stack and heap.
- Avoids cloning *all the blocks* in *every chain segment* during sync.
- It also has the potential to clean up our code where we need to pass an *owned* block around so we can send it back in the case of an error (I didn't do much of this, my PR is already big enough 😅)
- The `BeaconChain::HeadSafetyStatus` struct was removed. It was an old relic from prior merge specs.
For motivation for this change, see https://github.com/sigp/lighthouse/pull/3244#issuecomment-1160963273
## Changes to `canonical_head` and `fork_choice`
Previously, the `BeaconChain` had two separate fields:
```
canonical_head: RwLock<Snapshot>,
fork_choice: RwLock<BeaconForkChoice>
```
Now, we have grouped these values under a single struct:
```
canonical_head: CanonicalHead {
cached_head: RwLock<Arc<Snapshot>>,
fork_choice: RwLock<BeaconForkChoice>
}
```
Apart from ergonomics, the only *actual* change here is wrapping the canonical head snapshot in an `Arc`. This means that we no longer need to hold the `cached_head` (`canonical_head`, in old terms) lock when we want to pull some values from it. This was done to avoid deadlock risks by preventing functions from acquiring (and holding) the `cached_head` and `fork_choice` locks simultaneously.
## Breaking Changes
### The `state` (root) field in the `finalized_checkpoint` SSE event
Consider the scenario where epoch `n` is just finalized, but `start_slot(n)` is skipped. There are two state roots we might in the `finalized_checkpoint` SSE event:
1. The state root of the finalized block, which is `get_block(finalized_checkpoint.root).state_root`.
4. The state root at slot of `start_slot(n)`, which would be the state from (1), but "skipped forward" through any skip slots.
Previously, Lighthouse would choose (2). However, we can see that when [Teku generates that event](de2b2801c8/data/beaconrestapi/src/main/java/tech/pegasys/teku/beaconrestapi/handlers/v1/events/EventSubscriptionManager.java (L171-L182)) it uses [`getStateRootFromBlockRoot`](de2b2801c8/data/provider/src/main/java/tech/pegasys/teku/api/ChainDataProvider.java (L336-L341)) which uses (1).
I have switched Lighthouse from (2) to (1). I think it's a somewhat arbitrary choice between the two, where (1) is easier to compute and is consistent with Teku.
## Notes for Reviewers
I've renamed `BeaconChain::fork_choice` to `BeaconChain::recompute_head`. Doing this helped ensure I broke all previous uses of fork choice and I also find it more descriptive. It describes an action and can't be confused with trying to get a reference to the `ForkChoice` struct.
I've changed the ordering of SSE events when a block is received. It used to be `[block, finalized, head]` and now it's `[block, head, finalized]`. It was easier this way and I don't think we were making any promises about SSE event ordering so it's not "breaking".
I've made it so fork choice will run when it's first constructed. I did this because I wanted to have a cached version of the last call to `get_head`. Ensuring `get_head` has been run *at least once* means that the cached values doesn't need to wrapped in an `Option`. This was fairly simple, it just involved passing a `slot` to the constructor so it knows *when* it's being run. When loading a fork choice from the store and a slot clock isn't handy I've just used the `slot` that was saved in the `fork_choice_store`. That seems like it would be a faithful representation of the slot when we saved it.
I added the `genesis_time: u64` to the `BeaconChain`. It's small, constant and nice to have around.
Since we're using FC for the fin/just checkpoints, we no longer get the `0x00..00` roots at genesis. You can see I had to remove a work-around in `ef-tests` here: b56be3bc2. I can't find any reason why this would be an issue, if anything I think it'll be better since the genesis-alias has caught us out a few times (0x00..00 isn't actually a real root). Edit: I did find a case where the `network` expected the 0x00..00 alias and patched it here: 3f26ac3e2.
You'll notice a lot of changes in tests. Generally, tests should be functionally equivalent. Here are the things creating the most diff-noise in tests:
- Changing tests to be `tokio::async` tests.
- Adding `.await` to fork choice, block processing and block production functions.
- Refactor of the `canonical_head` "API" provided by the `BeaconChain`. E.g., `chain.canonical_head.cached_head()` instead of `chain.canonical_head.read()`.
- Wrapping `SignedBeaconBlock` in an `Arc`.
- In the `beacon_chain/tests/block_verification`, we can't use the `lazy_static` `CHAIN_SEGMENT` variable anymore since it's generated with an async function. We just generate it in each test, not so efficient but hopefully insignificant.
I had to disable `rayon` concurrent tests in the `fork_choice` tests. This is because the use of `rayon` and `block_on` was causing a panic.
Co-authored-by: Mac L <mjladson@pm.me>
## Proposed Changes
Reduce post-merge disk usage by not storing finalized execution payloads in Lighthouse's database.
⚠️ **This is achieved in a backwards-incompatible way for networks that have already merged** ⚠️. Kiln users and shadow fork enjoyers will be unable to downgrade after running the code from this PR. The upgrade migration may take several minutes to run, and can't be aborted after it begins.
The main changes are:
- New column in the database called `ExecPayload`, keyed by beacon block root.
- The `BeaconBlock` column now stores blinded blocks only.
- Lots of places that previously used full blocks now use blinded blocks, e.g. analytics APIs, block replay in the DB, etc.
- On finalization:
- `prune_abanonded_forks` deletes non-canonical payloads whilst deleting non-canonical blocks.
- `migrate_db` deletes finalized canonical payloads whilst deleting finalized states.
- Conversions between blinded and full blocks are implemented in a compositional way, duplicating some work from Sean's PR #3134.
- The execution layer has a new `get_payload_by_block_hash` method that reconstructs a payload using the EE's `eth_getBlockByHash` call.
- I've tested manually that it works on Kiln, using Geth and Nethermind.
- This isn't necessarily the most efficient method, and new engine APIs are being discussed to improve this: https://github.com/ethereum/execution-apis/pull/146.
- We're depending on the `ethers` master branch, due to lots of recent changes. We're also using a workaround for https://github.com/gakonst/ethers-rs/issues/1134.
- Payload reconstruction is used in the HTTP API via `BeaconChain::get_block`, which is now `async`. Due to the `async` fn, the `blocking_json` wrapper has been removed.
- Payload reconstruction is used in network RPC to serve blocks-by-{root,range} responses. Here the `async` adjustment is messier, although I think I've managed to come up with a reasonable compromise: the handlers take the `SendOnDrop` by value so that they can drop it on _task completion_ (after the `fn` returns). Still, this is introducing disk reads onto core executor threads, which may have a negative performance impact (thoughts appreciated).
## Additional Info
- [x] For performance it would be great to remove the cloning of full blocks when converting them to blinded blocks to write to disk. I'm going to experiment with a `put_block` API that takes the block by value, breaks it into a blinded block and a payload, stores the blinded block, and then re-assembles the full block for the caller.
- [x] We should measure the latency of blocks-by-root and blocks-by-range responses.
- [x] We should add integration tests that stress the payload reconstruction (basic tests done, issue for more extensive tests: https://github.com/sigp/lighthouse/issues/3159)
- [x] We should (manually) test the schema v9 migration from several prior versions, particularly as blocks have changed on disk and some migrations rely on being able to load blocks.
Co-authored-by: Paul Hauner <paul@paulhauner.com>