Commit Graph

47 Commits

Author SHA1 Message Date
Eitan Seri-Levi
cbf1c04a14 resolve merge conflicts between untstable and release-v7.0.0 2025-03-23 11:09:02 -06:00
Eitan Seri-Levi
8ce9edc584 Add block ban flag --invalid-block-roots (#7042) 2025-03-17 13:18:22 +00:00
Michael Sproul
b4e79edf2a Merge remote-tracking branch 'origin/release-v7.0.0' into unstable 2025-03-10 15:21:24 +11:00
Jimmy Chen
09849e841b Use sync_tolerance_epochs flag to control the proposer prep routines (#7044)
Replace the `2 + 2 == 5` hacks from `holesky-rescue` and use the existing `sync_tolerance_epochs` flag to control the proposer prep routines.
2025-03-06 03:50:42 +00:00
Krishang Shah
6e11bddd4b feat: adds CLI flags to delay publishing for edge case testing on PeerDAS devnets (#6947)
Closes #6919
2025-02-24 06:03:17 +00:00
Eitan Seri-Levi
afdda83798 Enable Light Client server by default (#6950) 2025-02-10 01:27:03 +00:00
Jimmy Chen
5f053b0b6d Improving blob propagation post-PeerDAS with Decentralized Blob Building (#6268)
* Get blobs from EL.

Co-authored-by: Michael Sproul <michael@sigmaprime.io>

* Avoid cloning blobs after fetching blobs.

* Address review comments and refactor code.

* Fix lint.

* Move blob computation metric to the right spot.

* Merge branch 'unstable' into das-fetch-blobs

* Merge branch 'unstable' into das-fetch-blobs

# Conflicts:
#	beacon_node/beacon_chain/src/beacon_chain.rs
#	beacon_node/beacon_chain/src/block_verification.rs
#	beacon_node/beacon_chain/src/data_availability_checker/overflow_lru_cache.rs

* Merge branch 'unstable' into das-fetch-blobs

# Conflicts:
#	beacon_node/beacon_chain/src/beacon_chain.rs

* Gradual publication of data columns for supernodes.

* Recompute head after importing block with blobs from the EL.

* Fix lint

* Merge branch 'unstable' into das-fetch-blobs

* Use blocking task instead of async when computing cells.

* Merge branch 'das-fetch-blobs' of github.com:jimmygchen/lighthouse into das-fetch-blobs

* Merge remote-tracking branch 'origin/unstable' into das-fetch-blobs

* Fix semantic conflicts

* Downgrade error log.

* Merge branch 'unstable' into das-fetch-blobs

# Conflicts:
#	beacon_node/beacon_chain/src/data_availability_checker.rs
#	beacon_node/beacon_chain/src/data_availability_checker/overflow_lru_cache.rs
#	beacon_node/execution_layer/src/engine_api.rs
#	beacon_node/execution_layer/src/engine_api/json_structures.rs
#	beacon_node/network/src/network_beacon_processor/gossip_methods.rs
#	beacon_node/network/src/network_beacon_processor/mod.rs
#	beacon_node/network/src/network_beacon_processor/sync_methods.rs

* Merge branch 'unstable' into das-fetch-blobs

* Publish block without waiting for blob and column proof computation.

* Address review comments and refactor.

* Merge branch 'unstable' into das-fetch-blobs

* Fix test and docs.

* Comment cleanups.

* Merge branch 'unstable' into das-fetch-blobs

* Address review comments and cleanup

* Address review comments and cleanup

* Refactor to de-duplicate gradual publication logic.

* Add more logging.

* Merge remote-tracking branch 'origin/unstable' into das-fetch-blobs

# Conflicts:
#	Cargo.lock

* Fix incorrect comparison on `num_fetched_blobs`.

* Implement gradual blob publication.

* Merge branch 'unstable' into das-fetch-blobs

* Inline `publish_fn`.

* Merge branch 'das-fetch-blobs' of github.com:jimmygchen/lighthouse into das-fetch-blobs

* Gossip verify blobs before publishing

* Avoid queries for 0 blobs and error for duplicates

* Gossip verified engine blob before processing them, and use observe cache to detect duplicates before publishing.

* Merge branch 'das-fetch-blobs' of github.com:jimmygchen/lighthouse into das-fetch-blobs

# Conflicts:
#	beacon_node/network/src/network_beacon_processor/mod.rs

* Merge branch 'unstable' into das-fetch-blobs

* Fix invalid commitment inclusion proofs in blob sidecars created from EL blobs.

* Only publish EL blobs triggered from gossip block, and not RPC block.

* Downgrade gossip blob log to `debug`.

* Merge branch 'unstable' into das-fetch-blobs

* Merge branch 'unstable' into das-fetch-blobs

* Grammar
2024-11-15 03:34:13 +00:00
Lion - dapplion
f75a2cf65b PeerDAS implementation (#5683)
* 1D PeerDAS prototype: Data format and Distribution (#5050)

* Build and publish column sidecars. Add stubs for gossip.

* Add blob column subnets

* Add `BlobColumnSubnetId` and initial compute subnet logic.

* Subscribe to blob column subnets.

* Introduce `BLOB_COLUMN_SUBNET_COUNT` based on DAS configuration parameter changes.

* Fix column sidecar type to use `VariableList` for data.

* Fix lint errors.

* Update types and naming to latest consensus-spec #3574.

* Fix test and some cleanups.

* Merge branch 'unstable' into das

* Merge branch 'unstable' into das

* Merge branch 'unstable' into das

# Conflicts:
#	consensus/types/src/chain_spec.rs

* Add `DataColumnSidecarsByRoot ` req/resp protocol (#5196)

* Add stub for `DataColumnsByRoot`

* Add basic implementation of serving RPC data column from DA checker.

* Store data columns in early attester cache and blobs db.

* Apply suggestions from code review

Co-authored-by: Eitan Seri-Levi <eserilev@gmail.com>
Co-authored-by: Jacob Kaufmann <jacobkaufmann18@gmail.com>

* Fix build.

* Store `DataColumnInfo` in database and various cleanups.

* Update `DataColumnSidecar` ssz max size and remove panic code.

---------

Co-authored-by: Eitan Seri-Levi <eserilev@gmail.com>
Co-authored-by: Jacob Kaufmann <jacobkaufmann18@gmail.com>

* feat: add DAS KZG in data col construction (#5210)

* feat: add DAS KZG in data col construction

* refactor data col sidecar construction

* refactor: add data cols to GossipVerifiedBlockContents

* Disable windows tests for `das` branch. (c-kzg doesn't build on windows)

* Formatting and lint changes only.

* refactor: remove iters in construction of data cols

* Update vec capacity and error handling.

* Add `data_column_sidecar_computation_seconds` metric.

---------

Co-authored-by: Jimmy Chen <jchen.tc@gmail.com>

* Merge branch 'unstable' into das

# Conflicts:
#	.github/workflows/test-suite.yml
#	beacon_node/lighthouse_network/src/types/topics.rs

* fix: update data col subnet count from 64 to 32 (#5413)

* feat: add peerdas custody field to ENR (#5409)

* feat: add peerdas custody field to ENR

* add hash prefix step in subnet computation

* refactor test and fix possible u64 overflow

* default to min custody value if not present in ENR

* Merge branch 'unstable' into das

* Merge branch 'unstable' into das-unstable-merge-0415

# Conflicts:
#	Cargo.lock
#	beacon_node/beacon_chain/src/data_availability_checker.rs
#	beacon_node/beacon_chain/src/data_availability_checker/availability_view.rs
#	beacon_node/beacon_chain/src/data_availability_checker/overflow_lru_cache.rs
#	beacon_node/beacon_chain/src/data_availability_checker/processing_cache.rs
#	beacon_node/lighthouse_network/src/rpc/methods.rs
#	beacon_node/network/src/network_beacon_processor/mod.rs
#	beacon_node/network/src/sync/block_lookups/tests.rs
#	crypto/kzg/Cargo.toml

* Merge remote-tracking branch 'sigp/unstable' into das

* Merge remote-tracking branch 'sigp/unstable' into das

* Fix merge conflicts.

* Send custody data column to `DataAvailabilityChecker` for determining block importability (#5570)

* Only import custody data columns after publishing a block.

* Add `subscribe-all-data-column-subnets` and pass custody column count to `availability_cache`.

* Add custody requirement checks to `availability_cache`.

* Fix config not being passed to DAChecker and add more logging.

* Introduce `peer_das_epoch` and make blobs and columns mutually exclusive.

* Add DA filter for PeerDAS.

* Fix data availability check and use test_logger in tests.

* Fix subscribe to all data column subnets not working correctly.

* Fix tests.

* Only publish column sidecars if PeerDAS is activated. Add `PEER_DAS_EPOCH` chain spec serialization.

* Remove unused data column index in `OverflowKey`.

* Fix column sidecars incorrectly produced when there are no blobs.

* Re-instate index to `OverflowKey::DataColumn` and downgrade noisy debug log to `trace`.

* DAS sampling on sync (#5616)

* Data availability sampling on sync

* Address @jimmygchen review

* Trigger sampling

* Address some review comments and only send `SamplingBlock` sync message after PEER_DAS_EPOCH.

---------

Co-authored-by: Jimmy Chen <jchen.tc@gmail.com>

* Merge branch 'unstable' into das

# Conflicts:
#	Cargo.lock
#	Cargo.toml
#	beacon_node/beacon_chain/src/block_verification.rs
#	beacon_node/http_api/src/publish_blocks.rs
#	beacon_node/lighthouse_network/src/rpc/codec/ssz_snappy.rs
#	beacon_node/lighthouse_network/src/rpc/protocol.rs
#	beacon_node/lighthouse_network/src/types/pubsub.rs
#	beacon_node/network/src/sync/block_lookups/single_block_lookup.rs
#	beacon_node/store/src/hot_cold_store.rs
#	consensus/types/src/beacon_state.rs
#	consensus/types/src/chain_spec.rs
#	consensus/types/src/eth_spec.rs

* Merge branch 'unstable' into das

* Re-process early sampling requests (#5569)

* Re-process early sampling requests

# Conflicts:
#	beacon_node/beacon_processor/src/work_reprocessing_queue.rs
#	beacon_node/lighthouse_network/src/rpc/methods.rs
#	beacon_node/network/src/network_beacon_processor/rpc_methods.rs

* Update beacon_node/beacon_processor/src/work_reprocessing_queue.rs

Co-authored-by: Jimmy Chen <jchen.tc@gmail.com>

* Add missing var

* Beta compiler fixes and small typo fixes.

* Remove duplicate method.

---------

Co-authored-by: Jimmy Chen <jchen.tc@gmail.com>

* Merge remote-tracking branch 'sigp/unstable' into das

* Fix merge conflict.

* Add data columns by root to currently supported protocol list (#5678)

* Add data columns by root to currently supported protocol list.

* Add missing data column by roots handling.

* Merge branch 'unstable' into das

# Conflicts:
#	Cargo.lock
#	Cargo.toml
#	beacon_node/network/src/sync/block_lookups/tests.rs
#	beacon_node/network/src/sync/manager.rs

* Fix simulator tests on `das` branch (#5731)

* Bump genesis delay in sim tests as KZG setup takes longer for DAS.

* Fix incorrect YAML spacing.

* DataColumnByRange boilerplate (#5353)

* add boilerplate

* fmt

* PeerDAS custody lookup sync (#5684)

* Implement custody sync

* Lint

* Fix tests

* Fix rebase issue

* Add data column kzg verification and update `c-kzg`. (#5701)

* Add data column kzg verification and update `c-kzg`.

* Fix incorrect `Cell` size.

* Add kzg verification on rpc blocks.

* Add kzg verification on rpc data columns.

* Rename `PEER_DAS_EPOCH` to `EIP7594_FORK_EPOCH` for client interop. (#5750)

* Fetch custody columns in range sync (#5747)

* Fetch custody columns in range sync

* Clean up todos

* Remove `BlobSidecar` construction and publish after PeerDAS activated (#5759)

* Avoid building and publishing blob sidecars after PeerDAS.

* Ignore gossip blobs with a slot greater than peer das activation epoch.

* Only attempt to verify blob count and import blobs before PeerDAS.

* #5684 review comments (#5748)

* #5684 review comments.

* Doc and message update only.

* Fix incorrect condition when constructing `RpcBlock` with `DataColumn`s

* Make sampling tests deterministic (#5775)

* PeerDAS spec tests (#5772)

* Add get_custody_columns spec tests.

* Add kzg merkle proof spec tests.

* Add SSZ spec tests.

* Add remaining KZG tests

* Load KZG only once per process, exclude electra tests and add missing SSZ tests.

* Fix lint and missing changes.

* Ignore macOS generated file.

* Merge remote branch 'sigp/unstable' into das

* Merge remote tracking branch 'origin/unstable' into das

* Implement unconditional reconstruction for supernodes (#5781)

* Implement unconditional reconstruction for supernodes

* Move code into KzgVerifiedCustodyDataColumn

* Remove expect

* Add test

* Thanks justin

* Add withhold attack mode for interop (#5788)

* Add withhold attack mode

* Update readme

* Drop added readmes

* Undo styling changes

* Add column gossip verification and handle unknown parent block (#5783)

* Add column gossip verification and handle missing parent for columns.

* Review PR

* Fix rebase issue

* more lint issues :)

---------

Co-authored-by: dapplion <35266934+dapplion@users.noreply.github.com>

* Trigger sampling on sync events (#5776)

* Trigger sampling on sync events

* Update beacon_chain.rs

* Fix tests

* Fix tests

* PeerDAS parameter changes for devnet-0 (#5779)

* Update PeerDAS parameters to latest values.

* Lint fix

* Fix lint.

* Update hardcoded subnet count to 64 (#5791)

* Fix incorrect columns per subnet and config cleanup (#5792)

* Tidy up PeerDAS preset and config values.

* Fix broken config

* Fix DAS branch CI (#5793)

* Fix invalid syntax.

* Update cli doc. Ignore get_custody_columns test temporarily.

* Fix failing test and add verify inclusion test.

* Undo accidentally removed code.

* Only attempt reconstruct columns once. (#5794)

* Re-enable precompute table for peerdas kzg (#5795)

* Merge branch 'unstable' into das

* Update subscription filter. (#5797)

* Remove penalty for duplicate columns (expected due to reconstruction) (#5798)

* Revert DAS config for interop testing. Optimise get_custody_columns function. (#5799)

* Don't perform reconstruction for proposer node as it already has all the columns. (#5806)

* Multithread compute_cells_and_proofs (#5805)

* Multi-thread reconstruct data columns

* Multi-thread path for block production

* Merge branch 'unstable' into das

# Conflicts:
#	.github/workflows/test-suite.yml
#	beacon_node/network/src/sync/block_lookups/mod.rs
#	beacon_node/network/src/sync/block_lookups/single_block_lookup.rs
#	beacon_node/network/src/sync/network_context.rs

* Fix CI errors.

* Move PeerDAS type-level config to configurable `ChainSpec` (#5828)

* Move PeerDAS type level config to `ChainSpec`.

* Fix tests

* Misc custody lookup improvements (#5821)

* Improve custody requests

* Type DataColumnsByRootRequestId

* Prioritize peers and load balance

* Update tests

* Address PR review

* Merge branch 'unstable' into das

* Rename deploy_block in network config (`das` branch) (#5852)

* Rename deploy_block.txt to deposit_contract_block.txt

* fmt

---------

Co-authored-by: Pawan Dhananjay <pawandhananjay@gmail.com>

* Merge branch 'unstable' into das

* Fix CI and merge issues.

* Merge branch 'unstable' into das

# Conflicts:
#	beacon_node/beacon_chain/src/data_availability_checker/overflow_lru_cache.rs
#	lcli/src/main.rs

* Store data columns individually in store and caches (#5890)

* Store data columns individually in store and caches

* Implement data column pruning

* Merge branch 'unstable' into das

# Conflicts:
#	Cargo.lock

* Update reconstruction benches to newer criterion version. (#5949)

* Merge branch 'unstable' into das

# Conflicts:
#	.github/workflows/test-suite.yml

* chore: add `recover_cells_and_compute_proofs` method (#5938)

* chore: add recover_cells_and_compute_proofs method

* Introduce type alias `CellsAndKzgProofs` to address type complexity.

---------

Co-authored-by: Jimmy Chen <jchen.tc@gmail.com>

* Update `csc` format in ENR and spec tests for devnet-1 (#5966)

* Update `csc` format in ENR.

* Add spec tests for `recover_cells_and_kzg_proofs`.

* Add tests for ENR.

* Fix failing tests.

* Add protection against invalid csc value in ENR.

* Fix lint

* Fix csc encoding and decoding (#5997)

* Fix data column rpc request not being sent due to incorrect limits set. (#6000)

* Fix incorrect inbound request count causing rate limiting. (#6025)

* Merge branch 'stable' into das

# Conflicts:
#	beacon_node/network/src/sync/block_lookups/tests.rs
#	beacon_node/network/src/sync/block_sidecar_coupling.rs
#	beacon_node/network/src/sync/manager.rs
#	beacon_node/network/src/sync/network_context.rs
#	beacon_node/network/src/sync/network_context/requests.rs

* Merge remote-tracking branch 'unstable' into das

* Add kurtosis config for DAS testing (#5968)

* Add kurtosis config for DAS testing.

* Fix invalid yaml file

* Update network parameter files.

* chore: add rust PeerdasKZG crypto library for peerdas functionality and rollback c-kzg dependency to 4844 version (#5941)

* chore: add recover_cells_and_compute_proofs method

* chore: add rust peerdas crypto library

* chore: integrate peerdaskzg rust library into kzg crate

* chore(multi):

- update `ssz_cell_to_crypto_cell`
- update conversion from the crypto cell type to a Vec<u8>. Since the Rust library defines them as references to an array, the conversion is simply `to_vec`

* chore(multi):

- update rest of code to handle the new crypto `Cell` type
- update test case code to no longer use the Box type

* chore: cleanup of superfluous conversions

* chore: revert c-kzg dependency back to v1

* chore: move dependency into correct order

* chore: update rust dependency

- This version includes a new method `PeerDasContext::with_num_threads`

* chore: remove Default initialization of PeerDasContext and explicitly set the parameters in `new_from_trusted_setup`

* chore: cleanup exports

* chore: commit updated cargo.lock

* Update Cargo.toml

Co-authored-by: Jimmy Chen <jchen.tc@gmail.com>

* chore: rename dependency

* chore: update peerdas lib

- sets the blst version to 0.3 so that it matches whatever lighthouse is using. Although 0.3.12 is latest, lighthouse is pinned to 0.3.3

* chore: fix clippy lifetime

- Rust doesn't allow you to elide the lifetime on type aliases

* chore: cargo clippy fix

* chore: cargo fmt

* chore: update lib to add redundant checks (these will be removed in consensus-specs PR 3819)

* chore: update dependency to ignore proofs

* chore: update peerdas lib to latest

* update lib

* chore: remove empty proof parameter

---------

Co-authored-by: Jimmy Chen <jchen.tc@gmail.com>

* Update PeerDAS interop testnet config (#6069)

* Update interop testnet config.

* Fix typo and remove target peers

* Avoid retrying same sampling peer that previously failed. (#6084)

* Various fixes to custody range sync  (#6004)

* Only start requesting batches when there are good peers across all custody columns to avoid spaming block requests.

* Add custody peer check before mutating `BatchInfo` to avoid inconsistent state.

* Add check to cover a case where batch is not processed while waiting for custody peers to become available.

* Fix lint and logic error

* Fix `good_peers_on_subnet` always returning false for `DataColumnSubnet`.

* Add test for `get_custody_peers_for_column`

* Revert epoch parameter refactor.

* Fall back to default custody requiremnt if peer ENR is not present.

* Add metrics and update code comment.

* Add more debug logs.

* Use subscribed peers on subnet before MetaDataV3 is implemented. Remove peer_id matching when injecting error because multiple peers are used for range requests. Use randomized custodial peer to avoid repeatedly sending requests to failing peers. Batch by range request where possible.

* Remove unused code and update docs.

* Add comment

* chore: update peerdas-kzg library (#6118)

* chore: update peerDAS lib

* chore: update library

* chore: update library to version that include "init context" benchmarks and optional validation checks

* chore: (can remove) -- Add benchmarks for init context

* Prevent continuous searchers for low-peer networks (#6162)

* Merge branch 'unstable' into das

* Fix merge conflicts

* Add cli flag to enable sampling and disable by default. (#6209)

* chore: Use reference to an array representing a blob instead of an owned KzgBlob (#6179)

* add KzgBlobRef type

* modify code to use KzgBlobRef

* clippy

* Remove Deneb blob related changes to maintain compatibility with `c-kzg-4844`.

---------

Co-authored-by: Jimmy Chen <jchen.tc@gmail.com>

* Store computed custody subnets in PeerDB and fix custody lookup test (#6218)

* Fix failing custody lookup tests.

* Store custody subnets in PeerDB, fix custody lookup test and refactor some methods.

* Merge branch 'unstable' into das

# Conflicts:
#	beacon_node/beacon_chain/src/beacon_chain.rs
#	beacon_node/beacon_chain/src/block_verification_types.rs
#	beacon_node/beacon_chain/src/builder.rs
#	beacon_node/beacon_chain/src/data_availability_checker.rs
#	beacon_node/beacon_chain/src/data_availability_checker/overflow_lru_cache.rs
#	beacon_node/beacon_chain/src/data_column_verification.rs
#	beacon_node/beacon_chain/src/early_attester_cache.rs
#	beacon_node/beacon_chain/src/historical_blocks.rs
#	beacon_node/beacon_chain/tests/store_tests.rs
#	beacon_node/lighthouse_network/src/discovery/enr.rs
#	beacon_node/network/src/service.rs
#	beacon_node/src/cli.rs
#	beacon_node/store/src/hot_cold_store.rs
#	beacon_node/store/src/lib.rs
#	lcli/src/generate_bootnode_enr.rs

* Fix CI failures after merge.

* Batch sampling requests by peer (#6256)

* Batch sampling requests by peer

* Fix clippy errors

* Fix tests

* Add column_index to error message for ease of tracing

* Remove outdated comment

* Fix range sync never evaluating request as finished, causing it to get stuck. (#6276)

* Merge branch 'unstable' into das-0821-merge

# Conflicts:
#	Cargo.lock
#	Cargo.toml
#	beacon_node/beacon_chain/src/beacon_chain.rs
#	beacon_node/beacon_chain/src/data_availability_checker.rs
#	beacon_node/beacon_chain/src/data_availability_checker/overflow_lru_cache.rs
#	beacon_node/beacon_chain/src/data_column_verification.rs
#	beacon_node/beacon_chain/src/kzg_utils.rs
#	beacon_node/beacon_chain/src/metrics.rs
#	beacon_node/beacon_processor/src/lib.rs
#	beacon_node/lighthouse_network/src/rpc/codec/ssz_snappy.rs
#	beacon_node/lighthouse_network/src/rpc/config.rs
#	beacon_node/lighthouse_network/src/rpc/methods.rs
#	beacon_node/lighthouse_network/src/rpc/outbound.rs
#	beacon_node/lighthouse_network/src/rpc/rate_limiter.rs
#	beacon_node/lighthouse_network/src/service/api_types.rs
#	beacon_node/lighthouse_network/src/types/globals.rs
#	beacon_node/network/src/network_beacon_processor/mod.rs
#	beacon_node/network/src/network_beacon_processor/rpc_methods.rs
#	beacon_node/network/src/network_beacon_processor/sync_methods.rs
#	beacon_node/network/src/sync/block_lookups/common.rs
#	beacon_node/network/src/sync/block_lookups/mod.rs
#	beacon_node/network/src/sync/block_lookups/single_block_lookup.rs
#	beacon_node/network/src/sync/block_lookups/tests.rs
#	beacon_node/network/src/sync/manager.rs
#	beacon_node/network/src/sync/network_context.rs
#	consensus/types/src/data_column_sidecar.rs
#	crypto/kzg/Cargo.toml
#	crypto/kzg/benches/benchmark.rs
#	crypto/kzg/src/lib.rs

* Fix custody tests and load PeerDAS KZG instead.

* Fix ef tests and bench compilation.

* Fix failing sampling test.

* Merge pull request #6287 from jimmygchen/das-0821-merge

Merge `unstable` into `das` 20240821

* Remove get_block_import_status

* Merge branch 'unstable' into das

* Re-enable Windows release tests.

* Address some review comments.

* Address more review comments and cleanups.

* Comment out peer DAS KZG EF tests for now

* Address more review comments and fix build.

* Merge branch 'das' of github.com:sigp/lighthouse into das

* Unignore Electra tests

* Fix metric name

* Address some of Pawan's review comments

* Merge remote-tracking branch 'origin/unstable' into das

* Update PeerDAS network parameters for peerdas-devnet-2  (#6290)

* update subnet count & custody req

* das network params

* update ef tests

---------

Co-authored-by: Jimmy Chen <jchen.tc@gmail.com>
2024-08-27 04:10:22 +00:00
Lion - dapplion
b949db0a8b Remove timeout locks (#6048)
* Remove locks with timeouts

* Readd test

* Update docs

* Merge remote-tracking branch 'origin/unstable' into pk-cache-timeout
2024-07-26 02:01:40 +00:00
Michael Sproul
61962898e2 In-memory tree states (#5533)
* Consensus changes

* EF tests

* lcli

* common and watch

* account manager

* cargo

* fork choice

* promise cache

* beacon chain

* interop genesis

* http api

* lighthouse

* op pool

* beacon chain misc

* parallel state cache

* store

* fix issues in store

* IT COMPILES

* Remove some unnecessary module qualification

* Revert Arced pubkey optimization (#5536)

* Merge remote-tracking branch 'origin/unstable' into tree-states-memory

* Fix caching, rebasing and some tests

* Remove unused deps

* Merge remote-tracking branch 'origin/unstable' into tree-states-memory

* Small cleanups

* Revert shuffling cache/promise cache changes

* Fix state advance bugs

* Fix shuffling tests

* Remove some resolved FIXMEs

* Remove StateProcessingStrategy

* Optimise withdrawals calculation

* Don't reorg if state cache is missed

* Remove inconsistent state func

* Fix beta compiler

* Rebase early, rebase often

* Fix state caching behaviour

* Update to milhouse release

* Fix on-disk consensus context format

* Merge remote-tracking branch 'origin/unstable' into tree-states-memory

* Squashed commit of the following:

commit 3a16649023
Author: Michael Sproul <michael@sigmaprime.io>
Date:   Thu Apr 18 14:26:09 2024 +1000

    Fix on-disk consensus context format

* Keep indexed attestations, thanks Sean

* Merge branch 'on-disk-consensus-context' into tree-states-memory

* Merge branch 'unstable' into tree-states-memory

* Address half of Sean's review

* More simplifications from Sean's review

* Cache state after get_advanced_hot_state
2024-04-24 01:22:36 +00:00
Eitan Seri-Levi
ee69e14db9 Add is_parent_strong proposer re-org check (#5417)
* initial fork choice additions

* add helper fns

* add is_parent_strong

* Merge branch 'unstable' of https://github.com/sigp/lighthouse into add_is_parent_strong_check

* disabling proposer reorg should set parent_threshold to u64 max

* add new flag, is_parent_strong check in override fcu params

* cherry-pick changes

* Merge branch 'unstable' of https://github.com/sigp/lighthouse into add_is_parent_strong_check

* cleanup

* fmt

* Minor review tweaks
2024-04-04 19:38:06 +00:00
Michael Sproul
feb531f85b Single-pass epoch processing and optimised block processing (#5279)
* Single-pass epoch processing (#4483, #4573)

Co-authored-by: Michael Sproul <michael@sigmaprime.io>

* Delete unused epoch processing code (#5170)

* Delete unused epoch processing code

* Compare total deltas

* Remove unnecessary apply_pending

* cargo fmt

* Remove newline

* Use epoch cache in block packing (#5223)

* Remove progressive balances mode (#5224)

* inline inactivity_penalty_quotient_for_state

* drop previous_epoch_total_active_balance

* fc lint

* spec compliant process_sync_aggregate (#15)

* spec compliant process_sync_aggregate

* Update consensus/state_processing/src/per_block_processing/altair/sync_committee.rs

Co-authored-by: Michael Sproul <micsproul@gmail.com>

---------

Co-authored-by: Michael Sproul <micsproul@gmail.com>

* Delete the participation cache (#16)

* update help

* Fix op_pool tests

* Fix fork choice tests

* Merge remote-tracking branch 'sigp/unstable' into epoch-single-pass

* Simplify exit cache (#5280)

* Fix clippy on exit cache

* Clean up single-pass a bit (#5282)

* Address Mark's review of single-pass (#5386)

* Merge remote-tracking branch 'origin/unstable' into epoch-single-pass

* Address Sean's review comments (#5414)

* Address most of Sean's review comments

* Simplify total balance cache building

* Clean up unused junk

* Merge remote-tracking branch 'origin/unstable' into epoch-single-pass

* More self-review

* Merge remote-tracking branch 'origin/unstable' into epoch-single-pass

* Merge branch 'unstable' into epoch-single-pass

* Fix imports for beta compiler

* Fix tests, probably
2024-04-04 13:14:36 +00:00
Pawan Dhananjay
3ab9d3a84e Add a cli option for the snapshot cache size (#5270)
* Add a cli option for snapshot cache size

* Remove junk

* Make snapshot_cache module public

* lint

* Update docs
2024-02-26 05:19:39 +00:00
Lion - dapplion
b035638f9b Compute recent lightclient updates (#4969)
* Compute recent lightclient updates

* Review PR

* Merge remote-tracking branch 'upstream/unstable' into lc-prod-recent-updates

* Review PR

* consistent naming

* add metrics

* revert dropping reprocessing queue

* Update light client optimistic update re-processing logic. (#7)

* Add light client server simulator tests. Co-authored by @dapplion.

* Merge branch 'unstable' into fork/dapplion/lc-prod-recent-updates

* Fix lint

* Enable light client server in simulator test.

* Fix test for light client optimistic updates and finality updates.
2024-01-31 05:25:51 +00:00
Michael Sproul
c8b2324880 Enable progressive balances fast mode by default (#4971)
* Enable progressive balances fast mode by default

* Fix default in chain_config
2023-12-04 14:42:49 +11:00
João Oliveira
dcd69dfc62 Move dependencies to workspace (#4650)
## Issue Addressed

Synchronize dependencies and edition on the workspace `Cargo.toml`

## Proposed Changes

with https://github.com/rust-lang/cargo/issues/8415 merged it's now possible to synchronize details on the workspace `Cargo.toml` like the metadata and dependencies.
By only having dependencies that are shared between multiple crates aligned on the workspace `Cargo.toml` it's easier to not miss duplicate versions of the same dependency and therefore ease on the compile times.

## Additional Info
this PR also removes the no longer required direct dependency of the `serde_derive` crate.

should be reviewed after https://github.com/sigp/lighthouse/pull/4639 get's merged.
closes https://github.com/sigp/lighthouse/issues/4651


Co-authored-by: Michael Sproul <michael@sigmaprime.io>
Co-authored-by: Michael Sproul <micsproul@gmail.com>
2023-09-22 04:30:56 +00:00
Paul Hauner
b60304b19f Use BeaconProcessor for API requests (#4462)
## Issue Addressed

NA

## Proposed Changes

Rather than spawning new tasks on the tokio executor to process each HTTP API request, send the tasks to the `BeaconProcessor`. This achieves:

1. Places a bound on how many concurrent requests are being served (i.e., how many we are actually trying to compute at one time).
1. Places a bound on how many requests can be awaiting a response at one time (i.e., starts dropping requests when we have too many queued).
1. Allows the BN prioritise HTTP requests with respect to messages coming from the P2P network (i.e., proiritise importing gossip blocks rather than serving API requests).

Presently there are two levels of priorities:

- `Priority::P0`
    - The beacon processor will prioritise these above everything other than importing new blocks.
    - Roughly all validator-sensitive endpoints.
- `Priority::P1`
    - The beacon processor will prioritise practically all other P2P messages over these, except for historical backfill things.
    - Everything that's not `Priority::P0`
    
The `--http-enable-beacon-processor false` flag can be supplied to revert back to the old behaviour of spawning new `tokio` tasks for each request:

```
        --http-enable-beacon-processor <BOOLEAN>
            The beacon processor is a scheduler which provides quality-of-service and DoS protection. When set to
            "true", HTTP API requests will queued and scheduled alongside other tasks. When set to "false", HTTP API
            responses will be executed immediately. [default: true]
```
    
## New CLI Flags

I added some other new CLI flags:

```
        --beacon-processor-aggregate-batch-size <INTEGER>
            Specifies the number of gossip aggregate attestations in a signature verification batch. Higher values may
            reduce CPU usage in a healthy network while lower values may increase CPU usage in an unhealthy or hostile
            network. [default: 64]
        --beacon-processor-attestation-batch-size <INTEGER>
            Specifies the number of gossip attestations in a signature verification batch. Higher values may reduce CPU
            usage in a healthy network whilst lower values may increase CPU usage in an unhealthy or hostile network.
            [default: 64]
        --beacon-processor-max-workers <INTEGER>
            Specifies the maximum concurrent tasks for the task scheduler. Increasing this value may increase resource
            consumption. Reducing the value may result in decreased resource usage and diminished performance. The
            default value is the number of logical CPU cores on the host.
        --beacon-processor-reprocess-queue-len <INTEGER>
            Specifies the length of the queue for messages requiring delayed processing. Higher values may prevent
            messages from being dropped while lower values may help protect the node from becoming overwhelmed.
            [default: 12288]
```


I needed to add the max-workers flag since the "simulator" flavor tests started failing with HTTP timeouts on the test assertions. I believe they were failing because the Github runners only have 2 cores and there just weren't enough workers available to process our requests in time. I added the other flags since they seem fun to fiddle with.

## Additional Info

I bumped the timeouts on the "simulator" flavor test from 4s to 8s. The prioritisation of consensus messages seems to be causing slower responses, I guess this is what we signed up for 🤷 

The `validator/register` validator has some special handling because the relays have a bad habit of timing out on these calls. It seems like a waste of a `BeaconProcessor` worker to just wait for the builder API HTTP response, so we spawn a new `tokio` task to wait for a builder response.

I've added an optimisation for the `GET beacon/states/{state_id}/validators/{validator_id}` endpoint in [efbabe3](efbabe3252). That's the endpoint the VC uses to resolve pubkeys to validator indices, and it's the endpoint that was causing us grief. Perhaps I should move that into a new PR, not sure.
2023-08-08 23:30:15 +00:00
Michael Sproul
6c375205fb Fix HTTP state API bug and add --epochs-per-migration (#4236)
## Issue Addressed

Fix an issue observed by `@zlan` on Discord where Lighthouse would sometimes return this error when looking up states via the API:

> {"code":500,"message":"UNHANDLED_ERROR: ForkChoiceError(MissingProtoArrayBlock(0xc9cf1495421b6ef3215d82253b388d77321176a1dcef0db0e71a0cd0ffc8cdb7))","stacktraces":[]}

## Proposed Changes

The error stems from a faulty assumption in the HTTP API logic: that any state in the hot database must have its block in fork choice. This isn't true because the state's hot database may update much less frequently than the fork choice store, e.g. if reconstructing states (where freezer migration pauses), or if the freezer migration runs slowly. There could also be a race between loading the hot state and checking fork choice, e.g. even if the finalization migration of DB+fork choice were atomic, the update could happen between the 1st and 2nd calls.

To address this I've changed the HTTP API logic to use the finalized block's execution status as a fallback where it is safe to do so. In the case where a block is non-canonical and prior to finalization (permanently orphaned) we default `execution_optimistic` to `true`.

## Additional Info

I've also added a new CLI flag to reduce the frequency of the finalization migration as this is useful for several purposes:

- Spacing out database writes (less frequent, larger batches)
- Keeping a limited chain history with high availability, e.g. the last month in the hot database.

This new flag made it _substantially_ easier to test this change. It was extracted from `tree-states` (where it's called `--db-migration-period`), which is why this PR also carries the `tree-states` label.
2023-07-17 00:14:12 +00:00
Jimmy Chen
46be05f728 Cache target attester balances for unrealized FFG progression calculation (#4362)
## Issue Addressed

#4118 

## Proposed Changes

This PR introduces a "progressive balances" cache on the `BeaconState`, which keeps track of the accumulated target attestation balance for the current & previous epochs. The cached values are utilised by fork choice to calculate unrealized justification and finalization (instead of converting epoch participation arrays to balances for each block we receive).

This optimization will be rolled out gradually to allow for more testing. A new `--progressive-balances disabled|checked|strict|fast` flag is introduced to support this:
- `checked`: enabled with checks against participation cache, and falls back to the existing epoch processing calculation if there is a total target attester balance mismatch. There is no performance gain from this as the participation cache still needs to be computed. **This is the default mode for now.**
- `strict`: enabled with checks against participation cache, returns error if there is a mismatch. **Used for testing only**.
- `fast`: enabled with no comparative checks and without computing the participation cache. This mode gives us the performance gains from the optimization. This is still experimental and not currently recommended for production usage, but will become the default mode in a future release.
- `disabled`: disable the usage of progressive cache, and use the existing method for FFG progression calculation. This mode may be useful if we find a bug and want to stop the frequent error logs.

### Tasks

- [x] Initial cache implementation in `BeaconState`
- [x] Perform checks in fork choice to compare the progressive balances cache against results from `ParticipationCache`
- [x] Add CLI flag, and disable the optimization by default
- [x] Testing on Goerli & Benchmarking
- [x]  Move caching logic from state processing to the `ProgressiveBalancesCache` (see [this comment](https://github.com/sigp/lighthouse/pull/4362#discussion_r1230877001))
- [x] Add attesting balance metrics



Co-authored-by: Jimmy Chen <jimmy@sigmaprime.io>
2023-06-30 01:13:06 +00:00
Mac L
c76afc6630 Remove legacy max-skip-slots checks (#4403)
## Proposed Changes

Remove `max-skip-slots` checks when processing blocks.
This was legacy code which was previously used in the Medalla testnet to sync to the correct fork.
With the addition of checkpoint sync which allows us to sync to any arbitrary fork, this is no longer a necessary feature, so it has been removed for simplicity.

## Additional Notes
The CLI flag and checks for attestation processing have been retained as it still may have uses in DoS protection.
2023-06-20 05:20:36 +00:00
Age Manning
35ca086269 Backfill blocks only to the WSP by default (#4082)
## Limit Backfill Sync

This PR transitions Lighthouse from syncing all the way back to genesis to only syncing back to the weak subjectivity point (~ 5 months) when syncing via a checkpoint sync.

There are a number of important points to note with this PR:

- Firstly and most importantly, this PR fundamentally shifts the default security guarantees of checkpoint syncing in Lighthouse. Prior to this PR, Lighthouse could verify the checkpoint of any given chain by ensuring the chain eventually terminates at the corresponding genesis. This guarantee can still be employed via the new CLI flag --genesis-backfill which will prompt lighthouse to the old behaviour of downloading all blocks back to genesis. The new behaviour only checks the proposer signatures for the last 5 months of blocks but cannot guarantee the chain matches the genesis chain.
- I have not modified any of the peer scoring or RPC responses. Clients syncing from gensis, will downscore new Lighthouse peers that do not possess blocks prior to the WSP. This is by design, as Lighthouse nodes of this form, need a mechanism to sort through peers in order to find useful peers in order to complete their genesis sync. We therefore do not discriminate between empty/error responses for blocks prior or post the local WSP. If we request a block that a peer does not posses, then fundamentally that peer is less useful to us than other peers.
- This will make a radical shift in that the majority of nodes will no longer store the full history of the chain. In the future we could add a pruning mechanism to remove old blocks from the db also.


Co-authored-by: Paul Hauner <paul@paulhauner.com>
2023-05-05 03:49:23 +00:00
Michael Sproul
b90c0c3fb1 Make re-org strat more cautious and add more config (#4151)
## Proposed Changes

This change attempts to prevent failed re-orgs by:

1. Lowering the re-org cutoff from 2s to 1s. This is informed by a failed re-org attempted by @yorickdowne's node. The failed block was requested in the 1.5-2s window due to a Vouch failure, and failed to propagate to the majority of the network before the attestation deadline at 4s.
2. Allow users to adjust their re-org cutoff depending on observed network conditions and their risk profile. The static 2 second cutoff was too rigid.
3. Add a `--proposer-reorg-disallowed-offsets` flag which can be used to prohibit reorgs at certain slots. This is intended to help workaround an issue whereby reorging blocks at slot 1 are currently taking ~1.6s to propagate on gossip rather than ~500ms. This is suspected to be due to a cache miss in current versions of Prysm, which should be fixed in their next release.

## Additional Info

I'm of two minds about removing the `shuffling_stable` check which checks for blocks at slot 0 in the epoch. If we removed it users would be able to configure Lighthouse to try reorging at slot 0, which likely wouldn't work very well due to interactions with the proposer index cache. I think we could leave it for now and revisit it later.
2023-04-13 07:05:01 +00:00
Jimmy Chen
2de3451011 Rate limiting backfill sync (#3936)
## Issue Addressed

#3212 

## Proposed Changes

- Introduce a new `rate_limiting_backfill_queue` - any new inbound backfill work events gets immediately sent to this FIFO queue **without any processing**
- Spawn a `backfill_scheduler` routine that pops a backfill event from the FIFO queue at specified intervals (currently halfway through a slot, or at 6s after slot start for 12s slots) and sends the event to `BeaconProcessor` via a `scheduled_backfill_work_tx` channel
- This channel gets polled last in the `InboundEvents`, and work event received is  wrapped in a `InboundEvent::ScheduledBackfillWork` enum variant, which gets processed immediately or queued by the `BeaconProcessor` (existing logic applies from here)

Diagram comparing backfill processing with / without rate-limiting: 
https://github.com/sigp/lighthouse/issues/3212#issuecomment-1386249922

See this comment for @paulhauner's  explanation and solution: https://github.com/sigp/lighthouse/issues/3212#issuecomment-1384674956

## Additional Info

I've compared this branch (with backfill processing rate limited to to 1 and 3 batches per slot) against the latest stable version. The CPU usage during backfill sync is reduced by ~5% - 20%, more details on this page:

https://hackmd.io/@jimmygchen/SJuVpJL3j

The above testing is done on Goerli (as I don't currently have hardware for Mainnet), I'm guessing the differences are likely to be bigger on mainnet due to block size.

### TODOs

- [x] Experiment with processing multiple batches per slot. (need to think about how to do this for different slot durations)
- [x] Add option to disable rate-limiting, enabed by default.
- [x] (No longer required now we're reusing the reprocessing queue) Complete the `backfill_scheduler` task when backfill sync is completed or not required
2023-04-03 03:02:55 +00:00
Paul Hauner
1f8c17b530 Fork choice modifications and cleanup (#3962)
## Issue Addressed

NA

## Proposed Changes

- Implements https://github.com/ethereum/consensus-specs/pull/3290/
- Bumps `ef-tests` to [v1.3.0-rc.4](https://github.com/ethereum/consensus-spec-tests/releases/tag/v1.3.0-rc.4).

The `CountRealizedFull` concept has been removed and the `--count-unrealized-full` and `--count-unrealized` BN flags now do nothing but log a `WARN` when used.

## Database Migration Debt

This PR removes the `best_justified_checkpoint` from fork choice. This field is persisted on-disk and the correct way to go about this would be to make a DB migration to remove the field. However, in this PR I've simply stubbed out the value with a junk value. I've taken this approach because if we're going to do a DB migration I'd love to remove the `Option`s around the justified and finalized checkpoints on `ProtoNode` whilst we're at it. Those options were added in #2822 which was included in Lighthouse v2.1.0. The options were only put there to handle the migration and they've been set to `Some` ever since v2.1.0. There's no reason to keep them as options anymore.

I started adding the DB migration to this branch but I started to feel like I was bloating this rather critical PR with nice-to-haves. I've kept the partially-complete migration [over in my repo](https://github.com/paulhauner/lighthouse/tree/fc-pr-18-migration) so we can pick it up after this PR is merged.
2023-03-21 07:34:41 +00:00
Age Manning
785a9171e6 Customisable shuffling cache size (#4081)
This PR enables the user to adjust the shuffling cache size.

This is useful for some HTTP API requests which require re-computing old shufflings. This PR currently optimizes the
beacon/states/{state_id}/committees HTTP API by first checking the cache before re-building shuffling.

If the shuffling is set to a non-default value, then the HTTP API request will also fill the cache when as it constructs new shufflings.

If the CLI flag is not present or the value is set to the default of 16 the default behaviour is observed.


Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2023-03-21 05:14:59 +00:00
Michael Sproul
01556f6f01 Optimise payload attributes calculation and add SSE (#4027)
## Issue Addressed

Closes #3896
Closes #3998
Closes #3700

## Proposed Changes

- Optimise the calculation of withdrawals for payload attributes by avoiding state clones, avoiding unnecessary state advances and reading from the snapshot cache if possible.
- Use the execution layer's payload attributes cache to avoid re-calculating payload attributes. I actually implemented a new LRU cache just for withdrawals but it had the exact same key and most of the same data as the existing payload attributes cache, so I deleted it.
- Add a new SSE event that fires when payloadAttributes are calculated. This is useful for block builders, a la https://github.com/ethereum/beacon-APIs/issues/244.
- Add a new CLI flag `--always-prepare-payload` which forces payload attributes to be sent with every fcU regardless of connected proposers. This is intended for use by builders/relays.

For maximum effect, the flags I've been using to run Lighthouse in "payload builder mode" are:

```
--always-prepare-payload \
--prepare-payload-lookahead 12000 \
--suggested-fee-recipient 0x0000000000000000000000000000000000000000
```

The fee recipient is required so Lighthouse has something to pack in the payload attributes (it can be ignored by the builder). The lookahead causes fcU to be sent at the start of every slot rather than at 8s. As usual, fcU will also be sent after each change of head block. I think this combination is sufficient for builders to build on all viable heads. Often there will be two fcU (and two payload attributes) sent for the same slot: one sent at the start of the slot with the head from `n - 1` as the parent, and one sent after the block arrives with `n` as the parent.

Example usage of the new event stream:

```bash
curl -N "http://localhost:5052/eth/v1/events?topics=payload_attributes"
```

## Additional Info

- [x] Tests added by updating the proposer re-org tests. This has the benefit of testing the proposer re-org code paths with withdrawals too, confirming that the new changes don't interact poorly.
- [ ] Benchmarking with `blockdreamer` on devnet-7 showed promising results but I'm yet to do a comparison to `unstable`.


Co-authored-by: Michael Sproul <micsproul@gmail.com>
2023-03-05 23:43:30 +00:00
Michael Sproul
8e2931d73b Verify blockHash with withdrawals 2023-01-13 12:46:54 +11:00
Michael Sproul
2af8110529 Merge remote-tracking branch 'origin/unstable' into capella
Fixing the conflicts involved patching up some of the `block_hash` verification,
the rest will be done as part of https://github.com/sigp/lighthouse/issues/3870
2023-01-12 16:22:00 +11:00
Michael Sproul
4bd2b777ec Verify execution block hashes during finalized sync (#3794)
## Issue Addressed

Recent discussions with other client devs about optimistic sync have revealed a conceptual issue with the optimisation implemented in #3738. In designing that feature I failed to consider that the execution node checks the `blockHash` of the execution payload before responding with `SYNCING`, and that omitting this check entirely results in a degradation of the full node's validation. A node omitting the `blockHash` checks could be tricked by a supermajority of validators into following an invalid chain, something which is ordinarily impossible.

## Proposed Changes

I've added verification of the `payload.block_hash` in Lighthouse. In case of failure we log a warning and fall back to verifying the payload with the execution client.

I've used our existing dependency on `ethers_core` for RLP support, and a new dependency on Parity's `triehash` crate for the Merkle patricia trie. Although the `triehash` crate is currently unmaintained it seems like our best option at the moment (it is also used by Reth, and requires vastly less boilerplate than Parity's generic `trie-root` library).

Block hash verification is pretty quick, about 500us per block on my machine (mainnet).

The optimistic finalized sync feature can be disabled using `--disable-optimistic-finalized-sync` which forces full verification with the EL.

## Additional Info

This PR also introduces a new dependency on our [`metastruct`](https://github.com/sigp/metastruct) library, which was perfectly suited to the RLP serialization method. There will likely be changes as `metastruct` grows, but I think this is a good way to start dogfooding it.

I took inspiration from some Parity and Reth code while writing this, and have preserved the relevant license headers on the files containing code that was copied and modified.
2023-01-09 03:11:59 +00:00
Michael Sproul
775d222299 Enable proposer boost re-orging (#2860)
## Proposed Changes

With proposer boosting implemented (#2822) we have an opportunity to re-org out late blocks.

This PR adds three flags to the BN to control this behaviour:

* `--disable-proposer-reorgs`: turn aggressive re-orging off (it's on by default).
* `--proposer-reorg-threshold N`: attempt to orphan blocks with less than N% of the committee vote. If this parameter isn't set then N defaults to 20% when the feature is enabled.
* `--proposer-reorg-epochs-since-finalization N`: only attempt to re-org late blocks when the number of epochs since finalization is less than or equal to N. The default is 2 epochs, meaning re-orgs will only be attempted when the chain is finalizing optimally.

For safety Lighthouse will only attempt a re-org under very specific conditions:

1. The block being proposed is 1 slot after the canonical head, and the canonical head is 1 slot after its parent. i.e. at slot `n + 1` rather than building on the block from slot `n` we build on the block from slot `n - 1`.
2. The current canonical head received less than N% of the committee vote. N should be set depending on the proposer boost fraction itself, the fraction of the network that is believed to be applying it, and the size of the largest entity that could be hoarding votes.
3. The current canonical head arrived after the attestation deadline from our perspective. This condition was only added to support suppression of forkchoiceUpdated messages, but makes intuitive sense.
4. The block is being proposed in the first 2 seconds of the slot. This gives it time to propagate and receive the proposer boost.


## Additional Info

For the initial idea and background, see: https://github.com/ethereum/consensus-specs/pull/2353#issuecomment-950238004

There is also a specification for this feature here: https://github.com/ethereum/consensus-specs/pull/3034

Co-authored-by: Michael Sproul <micsproul@gmail.com>
Co-authored-by: pawan <pawandhananjay@gmail.com>
2022-12-13 09:57:26 +00:00
Giulio rebuffo
d5a2de759b Added LightClientBootstrap V1 (#3711)
## Issue Addressed

Partially addresses #3651

## Proposed Changes

Adds server-side support for light_client_bootstrap_v1 topic

## Additional Info

This PR, creates each time a bootstrap without using cache, I do not know how necessary a cache is in this case as this topic is not supposed to be called frequently and IMHO we can just prevent abuse by using the limiter, but let me know what you think or if there is any caveat to this, or if it is necessary only for the sake of good practice.


Co-authored-by: Pawan Dhananjay <pawandhananjay@gmail.com>
2022-11-25 05:19:00 +00:00
Michael Sproul
3be41006a6 Add --light-client-server flag and state cache utils (#3714)
## Issue Addressed

Part of https://github.com/sigp/lighthouse/issues/3651.

## Proposed Changes

Add a flag for enabling the light client server, which should be checked before gossip/RPC traffic is processed (e.g. https://github.com/sigp/lighthouse/pull/3693, https://github.com/sigp/lighthouse/pull/3711). The flag is available at runtime from `beacon_chain.config.enable_light_client_server`.

Additionally, a new method `BeaconChain::with_mutable_state_for_block` is added which I envisage being used for computing light client updates. Unfortunately its performance will be quite poor on average because it will only run quickly with access to the tree hash cache. Each slot the tree hash cache is only available for a brief window of time between the head block being processed and the state advance at 9s in the slot. When the state advance happens the cache is moved and mutated to get ready for the next slot, which makes it no longer useful for merkle proofs related to the head block. Rather than spend more time trying to optimise this I think we should continue prototyping with this code, and I'll make sure `tree-states` is ready to ship before we enable the light client server in prod (cf. https://github.com/sigp/lighthouse/pull/3206).

## Additional Info

I also fixed a bug in the implementation of `BeaconState::compute_merkle_proof` whereby the tree hash cache was moved with `.take()` but never put back with `.restore()`.
2022-11-11 11:03:18 +00:00
GeemoCandama
c591fcd201 add checkpoint-sync-url-timeout flag (#3710)
## Issue Addressed
#3702 
Which issue # does this PR address?
#3702
## Proposed Changes
Added checkpoint-sync-url-timeout flag to cli. Added timeout field to ClientGenesis::CheckpointSyncUrl to utilize timeout set

## Additional Info

Please provide any additional information. For example, future considerations
or information useful for reviewers.


Co-authored-by: GeemoCandama <104614073+GeemoCandama@users.noreply.github.com>
Co-authored-by: Michael Sproul <micsproul@gmail.com>
2022-11-11 00:38:28 +00:00
realbigsean
cae40731a2 Strict count unrealized (#3522)
## Issue Addressed

Add a flag that can increase count unrealized strictness, defaults to false

## Proposed Changes

Please list or describe the changes introduced by this PR.

## Additional Info

Please provide any additional information. For example, future considerations
or information useful for reviewers.


Co-authored-by: realbigsean <seananderson33@gmail.com>
Co-authored-by: sean <seananderson33@gmail.com>
2022-09-05 04:50:47 +00:00
Paul Hauner
8609cced0e Reset payload statuses when resuming fork choice (#3498)
## Issue Addressed

NA

## Proposed Changes

This PR is motivated by a recent consensus failure in Geth where it returned `INVALID` for an `VALID` block. Without this PR, the only way to recover is by re-syncing Lighthouse. Whilst ELs "shouldn't have consensus failures", in reality it's something that we can expect from time to time due to the complex nature of Ethereum. Being able to recover easily will help the network recover and EL devs to troubleshoot.

The risk introduced with this PR is that genuinely INVALID payloads get a "second chance" at being imported. I believe the DoS risk here is negligible since LH needs to be restarted in order to re-process the payload. Furthermore, there's no reason to think that a well-performing EL will accept a truly invalid payload the second-time-around.

## Additional Info

This implementation has the following intricacies:

1. Instead of just resetting *invalid* payloads to optimistic, we'll also reset *valid* payloads. This is an artifact of our existing implementation.
1. We will only reset payload statuses when we detect an invalid payload present in `proto_array`
    - This helps save us from forgetting that all our blocks are valid in the "best case scenario" where there are no invalid blocks.
1. If we fail to revert the payload statuses we'll log a `CRIT` and just continue with a `proto_array` that *does not* have reverted payload statuses.
    - The code to revert statuses needs to deal with balances and proposer-boost, so it's a failure point. This is a defensive measure to avoid introducing new show-stopping bugs to LH.
2022-08-29 14:34:41 +00:00
Michael Sproul
66eca1a882 Refactor op pool for speed and correctness (#3312)
## Proposed Changes

This PR has two aims: to speed up attestation packing in the op pool, and to fix bugs in the verification of attester slashings, proposer slashings and voluntary exits. The changes are bundled into a single database schema upgrade (v12).

Attestation packing is sped up by removing several inefficiencies: 

- No more recalculation of `attesting_indices` during packing.
- No (unnecessary) examination of the `ParticipationFlags`: a bitfield suffices. See `RewardCache`.
- No re-checking of attestation validity during packing: the `AttestationMap` provides attestations which are "correct by construction" (I have checked this using Hydra).
- No SSZ re-serialization for the clunky `AttestationId` type (it can be removed in a future release).

So far the speed-up seems to be roughly 2-10x, from 500ms down to 50-100ms.

Verification of attester slashings, proposer slashings and voluntary exits is fixed by:

- Tracking the `ForkVersion`s that were used to verify each message inside the `SigVerifiedOp`. This allows us to quickly re-verify that they match the head state's opinion of what the `ForkVersion` should be at the epoch(s) relevant to the message.
- Storing the `SigVerifiedOp` on disk rather than the raw operation. This allows us to continue track the fork versions after a reboot.

This is mostly contained in this commit 52bb1840ae.

## Additional Info

The schema upgrade uses the justified state to re-verify attestations and compute `attesting_indices` for them. It will drop any attestations that fail to verify, by the logic that attestations are most valuable in the few slots after they're observed, and are probably stale and useless by the time a node restarts. Exits and proposer slashings and similarly re-verified to obtain `SigVerifiedOp`s.

This PR contains a runtime killswitch `--paranoid-block-proposal` which opts out of all the optimisations in favour of closely verifying every included message. Although I'm quite sure that the optimisations are correct this flag could be useful in the event of an unforeseen emergency.

Finally, you might notice that the `RewardCache` appears quite useless in its current form because it is only updated on the hot-path immediately before proposal. My hope is that in future we can shift calls to `RewardCache::update` into the background, e.g. while performing the state advance. It is also forward-looking to `tree-states` compatibility, where iterating and indexing `state.{previous,current}_epoch_participation` is expensive and needs to be minimised.
2022-08-29 09:10:26 +00:00
Michael Sproul
6bc4a2cc91 Update invalid head tests (#3400)
## Proposed Changes

Update the invalid head tests so that they work with the current default fork choice configuration.

Thanks @realbigsean for fixing the persistence test and the EF tests.

Co-authored-by: realbigsean <sean@sigmaprime.io>
2022-08-05 23:41:09 +00:00
realbigsean
6c2d8b2262 Builder Specs v0.2.0 (#3134)
## Issue Addressed

https://github.com/sigp/lighthouse/issues/3091

Extends https://github.com/sigp/lighthouse/pull/3062, adding pre-bellatrix block support on blinded endpoints and allowing the normal proposal flow (local payload construction) on blinded endpoints. This resulted in better fallback logic because the VC will not have to switch endpoints on failure in the BN <> Builder API, the BN can just fallback immediately and without repeating block processing that it shouldn't need to. We can also keep VC fallback from the VC<>BN API's blinded endpoint to full endpoint.

## Proposed Changes

- Pre-bellatrix blocks on blinded endpoints
- Add a new `PayloadCache` to the execution layer
- Better fallback-from-builder logic

## Todos

- [x] Remove VC transition logic
- [x] Add logic to only enable builder flow after Merge transition finalization
- [x] Tests
- [x] Fix metrics
- [x] Rustdocs


Co-authored-by: Mac L <mjladson@pm.me>
Co-authored-by: realbigsean <sean@sigmaprime.io>
2022-07-30 00:22:37 +00:00
realbigsean
20ebf1f3c1 Realized unrealized experimentation (#3322)
## Issue Addressed

Add a flag that optionally enables unrealized vote tracking.  Would like to test out on testnets and benchmark differences in methods of vote tracking. This PR includes a DB schema upgrade to enable to new vote tracking style.


Co-authored-by: realbigsean <sean@sigmaprime.io>
Co-authored-by: Paul Hauner <paul@paulhauner.com>
Co-authored-by: sean <seananderson33@gmail.com>
Co-authored-by: Mac L <mjladson@pm.me>
2022-07-25 23:53:26 +00:00
Michael Sproul
8fa032c8ae Run fork choice before block proposal (#3168)
## Issue Addressed

Upcoming spec change https://github.com/ethereum/consensus-specs/pull/2878

## Proposed Changes

1. Run fork choice at the start of every slot, and wait for this run to complete before proposing a block.
2. As an optimisation, also run fork choice 3/4 of the way through the slot (at 9s), _dequeueing attestations for the next slot_.
3. Remove the fork choice run from the state advance timer that occurred before advancing the state.

## Additional Info

### Block Proposal Accuracy

This change makes us more likely to propose on top of the correct head in the presence of re-orgs with proposer boost in play. The main scenario that this change is designed to address is described in the linked spec issue.

### Attestation Accuracy

This change _also_ makes us more likely to attest to the correct head. Currently in the case of a skipped slot at `slot` we only run fork choice 9s into `slot - 1`. This means the attestations from `slot - 1` aren't taken into consideration, and any boost applied to the block from `slot - 1` is not removed (it should be). In the language of the linked spec issue, this means we are liable to attest to C, even when the majority voting weight has already caused a re-org to B.

### Why remove the call before the state advance?

If we've run fork choice at the start of the slot then it has already dequeued all the attestations from the previous slot, which are the only ones eligible to influence the head in the current slot. Running fork choice again is unnecessary (unless we run it for the next slot and try to pre-empt a re-org, but I don't currently think this is a great idea).

### Performance

Based on Prater testing this adds about 5-25ms of runtime to block proposal times, which are 500-1000ms on average (and spike to 5s+ sometimes due to state handling issues 😢 ). I believe this is a small enough penalty to enable it by default, with the option to disable it via the new flag `--fork-choice-before-proposal-timeout 0`. Upcoming work on block packing and state representation will also reduce block production times in general, while removing the spikes.

### Implementation

Fork choice gets invoked at the start of the slot via the `per_slot_task` function called from the slot timer. It then uses a condition variable to signal to block production that fork choice has been updated. This is a bit funky, but it seems to work. One downside of the timer-based approach is that it doesn't happen automatically in most of the tests. The test added by this PR has to trigger the run manually.
2022-05-20 05:02:11 +00:00
pawan
44a7b37ce3 Increase network limits (#2796)
Fix max packet sizes

Fix max_payload_size function

Add merge block test

Fix max size calculation; fix up test

Clear comments

Add a payload_size_function

Use safe arith for payload calculation

Return an error if block too big in block production

Separate test to check if block is over limit
2021-12-02 14:29:20 +11:00
Michael Sproul
d2e3d4c6f1 Add flag to disable lock timeouts (#2714)
## Issue Addressed

Mitigates #1096

## Proposed Changes

Add a flag to the beacon node called `--disable-lock-timeouts` which allows opting out of lock timeouts.

The lock timeouts serve a dual purpose:

1. They prevent any single operation from hogging the lock for too long. When a timeout occurs it logs a nasty error which indicates that there's suboptimal lock use occurring, which we can then act on.
2. They allow deadlock detection. We're fairly sure there are no deadlocks left in Lighthouse anymore but the timeout locks offer a safeguard against that.

However, timeouts on locks are not without downsides:

They allow for the possibility of livelock, particularly on slower hardware. If lock timeouts keep failing spuriously the node can be prevented from making any progress, even if it would be able to make progress slowly without the timeout. One particularly concerning scenario which could occur would be if a DoS attack succeeded in slowing block signature verification times across the network, and all Lighthouse nodes got livelocked because they timed out repeatedly. This could also occur on just a subset of nodes (e.g. dual core VPSs or Raspberri Pis).

By making the behaviour runtime configurable this PR allows us to choose the behaviour we want depending on circumstance. I suspect that long term we could make the timeout-free approach the default (#2381 moves in this direction) and just enable the timeouts on our testnet nodes for debugging purposes. This PR conservatively leaves the default as-is so we can gain some more experience before switching the default.
2021-10-19 00:30:40 +00:00
Michael Sproul
9667dc2f03 Implement checkpoint sync (#2244)
## Issue Addressed

Closes #1891
Closes #1784

## Proposed Changes

Implement checkpoint sync for Lighthouse, enabling it to start from a weak subjectivity checkpoint.

## Additional Info

- [x] Return unavailable status for out-of-range blocks requested by peers (#2561)
- [x] Implement sync daemon for fetching historical blocks (#2561)
- [x] Verify chain hashes (either in `historical_blocks.rs` or the calling module)
- [x] Consistency check for initial block + state
- [x] Fetch the initial state and block from a beacon node HTTP endpoint
- [x] Don't crash fetching beacon states by slot from the API
- [x] Background service for state reconstruction, triggered by CLI flag or API call.

Considered out of scope for this PR:

- Drop the requirement to provide the `--checkpoint-block` (this would require some pretty heavy refactoring of block verification)


Co-authored-by: Diva M <divma@protonmail.com>
2021-09-22 00:37:28 +00:00
Michael Sproul
acd49d988d Implement database temp states to reduce memory usage (#1798)
## Issue Addressed

Closes #800
Closes #1713

## Proposed Changes

Implement the temporary state storage algorithm described in #800. Specifically:

* Add `DBColumn::BeaconStateTemporary`, for storing 0-length temporary marker values.
* Store intermediate states immediately as they are created, marked temporary. Delete the temporary flag if the block is processed successfully.
* Add a garbage collection process to delete leftover temporary states on start-up.
* Bump the database schema version to 2 so that a DB with temporary states can't accidentally be used with older versions of the software. The auto-migration is a no-op, but puts in place some infra that we can use for future migrations (e.g. #1784)

## Additional Info

There are two known race conditions, one potentially causing permanent faults (hopefully rare), and the other insignificant.

### Race 1: Permanent state marked temporary

EDIT: this has been fixed by the addition of a lock around the relevant critical section

There are 2 threads that are trying to store 2 different blocks that share some intermediate states (e.g. they both skip some slots from the current head). Consider this sequence of events:

1. Thread 1 checks if state `s` already exists, and seeing that it doesn't, prepares an atomic commit of `(s, s_temporary_flag)`.
2. Thread 2 does the same, but also gets as far as committing the state txn, finishing the processing of its block, and _deleting_ the temporary flag.
3. Thread 1 is (finally) scheduled again, and marks `s` as temporary with its transaction.
4.
    a) The process is killed, or thread 1's block fails verification and the temp flag is not deleted. This is a permanent failure! Any attempt to load state `s` will fail... hope it isn't on the main chain! Alternatively (4b) happens...
    b) Thread 1 finishes, and re-deletes the temporary flag. In this case the failure is transient, state `s` will disappear temporarily, but will come back once thread 1 finishes running.

I _hope_ that steps 1-3 only happen very rarely, and 4a even more rarely. It's hard to know

This once again begs the question of why we're using LevelDB (#483), when it clearly doesn't care about atomicity! A ham-fisted fix would be to wrap the hot and cold DBs in locks, which would bring us closer to how other DBs handle read-write transactions. E.g. [LMDB only allows one R/W transaction at a time](https://docs.rs/lmdb/0.8.0/lmdb/struct.Environment.html#method.begin_rw_txn).

### Race 2: Temporary state returned from `get_state`

I don't think this race really matters, but in `load_hot_state`, if another thread stores a state between when we call `load_state_temporary_flag` and when we call `load_hot_state_summary`, then we could end up returning that state even though it's only a temporary state. I can't think of any case where this would be relevant, and I suspect if it did come up, it would be safe/recoverable (having data is safer than _not_ having data).

This could be fixed by using a LevelDB read snapshot, but that would require substantial changes to how we read all our values, so I don't think it's worth it right now.
2020-10-23 01:27:51 +00:00
realbigsean
9d2d6239cd Weak subjectivity start from genesis (#1675)
## Issue Addressed
Solution 2 proposed here: https://github.com/sigp/lighthouse/issues/1435#issuecomment-692317639

## Proposed Changes
- Adds an optional `--wss-checkpoint` flag that takes a string `root:epoch`
- Verify that the given checkpoint exists in the chain, or that the the chain syncs through this checkpoint. If not, shutdown and prompt the user to purge state before restarting.

## Additional Info


Co-authored-by: Paul Hauner <paul@paulhauner.com>
2020-10-01 01:41:58 +00:00
Paul Hauner
93b7c3b7ff Set default max skips to 700 (#1542)
## Issue Addressed

NA

## Proposed Changes

Sets the default max skips to 700 so that it can cover the 693 slot skip from `80894 - 80201`.

## Additional Info

NA
2020-08-18 09:27:04 +00:00
Michael Sproul
719a69aee0 Ignore blocks that skip a large distance from their parent (#1530)
## Proposed Changes

To mitigate the impact of minority forks on RAM and disk usage, this change rejects blocks whose parent lies more than 320 slots (10 epochs, ~1 hour) in the past. The behaviour is configurable via `lighthouse bn --max-skip-slots N`, and can be turned off entirely using `--max-skip-slots none`.

Co-authored-by: Paul Hauner <paul@paulhauner.com>
2020-08-17 10:54:58 +00:00