There are certain crates which we re-export within `types` which creates a fragmented DevEx, where there are various ways to import the same crates.
```rust
// consensus/types/src/lib.rs
pub use bls::{
AggregatePublicKey, AggregateSignature, Error as BlsError, Keypair, PUBLIC_KEY_BYTES_LEN,
PublicKey, PublicKeyBytes, SIGNATURE_BYTES_LEN, SecretKey, Signature, SignatureBytes,
get_withdrawal_credentials,
};
pub use context_deserialize::{ContextDeserialize, context_deserialize};
pub use fixed_bytes::FixedBytesExtended;
pub use milhouse::{self, List, Vector};
pub use ssz_types::{BitList, BitVector, FixedVector, VariableList, typenum, typenum::Unsigned};
pub use superstruct::superstruct;
```
This PR removes these re-exports and makes it explicit that these types are imported from a non-`consensus/types` crate.
Co-Authored-By: Mac L <mjladson@pm.me>
Part of a fork-choice tech debt clean-up https://github.com/sigp/lighthouse/issues/8325https://github.com/sigp/lighthouse/issues/7089 (non-finalized checkpoint sync) changes the meaning of the checkpoints inside fork-choice. It turns out that we persist the justified and finalized checkpoints **twice** in fork-choice
1. Inside the fork-choice store
2. Inside the proto-array
There's no reason for 2. except for making the function signature of some methods smallers. It's not consistent with the rest of the crate, because in some functions we pass the external variable of time (current_slot) via args, but then read the finalized checkpoint from the internal state. Passing both variables as args makes fork-choice easier to reason about at the cost of a few extra lines.
Remove the unnecessary state (`justified_checkpoint`, `finalized_checkpoint`) inside `ProtoArray`, to make it easier to reason about.
Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>
Co-Authored-By: Michael Sproul <michaelsproul@users.noreply.github.com>
Follow-up to the bug fixed in:
- https://github.com/sigp/lighthouse/pull/8121
This fixes the root cause of that bug, which was introduced by me in:
- https://github.com/sigp/lighthouse/pull/8101
Lion identified the issue here:
- https://github.com/sigp/lighthouse/pull/8101#discussion_r2382710356
In the methods that compute the proposer shuffling decision root, ensure we don't use lookahead for the Fulu fork epoch itself. This is accomplished by checking if Fulu is enabled at `epoch - 1`, i.e. if `epoch > fulu_fork_epoch`.
I haven't updated the methods that _compute_ shufflings to use these new corrected bounds (e.g. `BeaconState::compute_proposer_indices`), although we could make this change in future. The `get_beacon_proposer_indices` method already gracefully handles the Fulu boundary case by using the `proposer_lookahead` field (if initialised).
Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
As identified by a researcher during the Fusaka security competition, we were computing the proposer index incorrectly in some places by computing without lookahead.
- [x] Add "low level" checks to computation functions in `consensus/types` to ensure they error cleanly
- [x] Re-work the determination of proposer shuffling decision roots, which are now fork aware.
- [x] Re-work and simplify the beacon proposer cache to be fork-aware.
- [x] Optimise `with_proposer_cache` to use `OnceCell`.
- [x] All tests passing.
- [x] Resolve all remaining `FIXME(sproul)`s.
- [x] Unit tests for `ProtoBlock::proposer_shuffling_root_for_child_block`.
- [x] End-to-end regression test.
- [x] Test on pre-Fulu network.
- [x] Test on post-Fulu network.
Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
The head tracker is a persisted piece of state that must be kept in sync with the fork-choice. It has been a source of pruning issues in the past, so we want to remove it
- see https://github.com/sigp/lighthouse/issues/1785
When implementing tree-states in the hot DB we have to change the pruning routine (more details below) so we want to do those changes first in isolation.
- see https://github.com/sigp/lighthouse/issues/6580
- If you want to see the full feature of tree-states hot https://github.com/dapplion/lighthouse/pull/39
Closes https://github.com/sigp/lighthouse/issues/1785
**Current DB migration routine**
- Locate abandoned heads with head tracker
- Use a roots iterator to collect the ancestors of those heads can be pruned
- Delete those abandoned blocks / states
- Migrate the newly finalized chain to the freezer
In summary, it computes what it has to delete and keeps the rest. Then it migrates data to the freezer. If the abandoned forks routine has a bug it can break the freezer migration.
**Proposed migration routine (this PR)**
- Migrate the newly finalized chain to the freezer
- Load all state summaries from disk
- From those, just knowing the head and finalized block compute two sets: (1) descendants of finalized (2) newly finalized chain
- Iterate all summaries, if a summary does not belong to set (1) or (2), delete
This strategy is more sound as it just checks what's there in the hot DB, computes what it has to keep and deletes the rest. Because it does not rely and 3rd pieces of data we can drop the head tracker and pruning checkpoint. Since the DB migration happens **first** now, as long as the computation of the sets to keep is correct we won't have pruning issues.
* chore: Change Into trait impl for KzgProof to From trait impl
* chore: change `impl Into <T> for U` to `impl From<U> for T`
* chore: remove `from-over-into` clippy lint exception
* max_value -> MAX
* remove unnecesary closures
* a couple more max_value -> MAX
* a couple more max_value -> MAX
* Revert "a couple more max_value -> MAX"
This reverts commit 807fe7cae9.
* unused spec field -> phantom data
* ignore some dead code warnings
* update kurtosis repo location
* some blob reprocessing work
* remove ForceBlockLookup
* reorder enum match arms in sync manager
* a lot more reprocessing work
* impl logic for triggerng blob lookups along with block lookups
* deal with rpc blobs in groups per block in the da checker. don't cache missing blob ids in the da checker.
* make single block lookup generic
* more work
* add delayed processing logic and combine some requests
* start fixing some compile errors
* fix compilation in main block lookup mod
* much work
* get things compiling
* parent blob lookups
* fix compile
* revert red/stevie changes
* fix up sync manager delay message logic
* add peer usefulness enum
* should remove lookup refactor
* consolidate retry error handling
* improve peer scoring during certain failures in parent lookups
* improve retry code
* drop parent lookup if either req has a peer disconnect during download
* refactor single block processed method
* processing peer refactor
* smol bugfix
* fix some todos
* fix lints
* fix lints
* fix compile in lookup tests
* fix lints
* fix lints
* fix existing block lookup tests
* renamings
* fix after merge
* cargo fmt
* compilation fix in beacon chain tests
* fix
* refactor lookup tests to work with multiple forks and response types
* make tests into macros
* wrap availability check error
* fix compile after merge
* add random blobs
* start fixing up lookup verify error handling
* some bug fixes and the start of deneb only tests
* make tests work for all forks
* track information about peer source
* error refactoring
* improve peer scoring
* fix test compilation
* make sure blobs are sent for processing after stream termination, delete copied tests
* add some tests and fix a bug
* smol bugfixes and moar tests
* add tests and fix some things
* compile after merge
* lots of refactoring
* retry on invalid block/blob
* merge unknown parent messages before current slot lookup
* get tests compiling
* penalize blob peer on invalid blobs
* Check disk on in-memory cache miss
* Update beacon_node/beacon_chain/src/data_availability_checker/overflow_lru_cache.rs
* Update beacon_node/network/src/sync/network_context.rs
Co-authored-by: Divma <26765164+divagant-martian@users.noreply.github.com>
* fix bug in matching blocks and blobs in range sync
* pr feedback
* fix conflicts
* upgrade logs from warn to crit when we receive incorrect response in range
* synced_and_connected_within_tolerance -> should_search_for_block
* remove todo
* Fix Broken Overflow Tests
* fix merge conflicts
* checkpoint sync without alignment
* add import
* query for checkpoint state by slot rather than state root (teku doesn't serve by state root)
* get state first and query by most recent block root
* simplify delay logic
* rename unknown parent sync message variants
* rename parameter, block_slot -> slot
* add some docs to the lookup module
* use interval instead of sleep
* drop request if blocks and blobs requests both return `None` for `Id`
* clean up `find_single_lookup` logic
* add lookup source enum
* clean up `find_single_lookup` logic
* add docs to find_single_lookup_request
* move LookupSource our of param where unnecessary
* remove unnecessary todo
* query for block by `state.latest_block_header.slot`
* fix lint
* fix test
* fix test
* fix observed blob sidecars test
* PR updates
* use optional params instead of a closure
* create lookup and trigger request in separate method calls
* remove `LookupSource`
* make sure duplicate lookups are not dropped
---------
Co-authored-by: Pawan Dhananjay <pawandhananjay@gmail.com>
Co-authored-by: Mark Mackey <mark@sigmaprime.io>
Co-authored-by: Divma <26765164+divagant-martian@users.noreply.github.com>
## Issue Addressed
#4233
## Proposed Changes
Remove the `best_justified_checkpoint` from the `PersistedForkChoiceStore` type as it is now unused.
Additionally, remove the `Option`'s wrapping the `justified_checkpoint` and `finalized_checkpoint` fields on `ProtoNode` which were only present to facilitate a previous migration.
Include the necessary code to facilitate the migration to a new DB schema.
## Proposed Changes
This change attempts to prevent failed re-orgs by:
1. Lowering the re-org cutoff from 2s to 1s. This is informed by a failed re-org attempted by @yorickdowne's node. The failed block was requested in the 1.5-2s window due to a Vouch failure, and failed to propagate to the majority of the network before the attestation deadline at 4s.
2. Allow users to adjust their re-org cutoff depending on observed network conditions and their risk profile. The static 2 second cutoff was too rigid.
3. Add a `--proposer-reorg-disallowed-offsets` flag which can be used to prohibit reorgs at certain slots. This is intended to help workaround an issue whereby reorging blocks at slot 1 are currently taking ~1.6s to propagate on gossip rather than ~500ms. This is suspected to be due to a cache miss in current versions of Prysm, which should be fixed in their next release.
## Additional Info
I'm of two minds about removing the `shuffling_stable` check which checks for blocks at slot 0 in the epoch. If we removed it users would be able to configure Lighthouse to try reorging at slot 0, which likely wouldn't work very well due to interactions with the proposer index cache. I think we could leave it for now and revisit it later.