lighthouse

mirror of https://github.com/sigp/lighthouse.git synced 2026-03-17 03:42:46 +00:00

Author	SHA1	Message	Date
chonghe	522bd9e9c6	Update Rust Edition to 2024 (#7766 ) * #7749 Thanks @dknopik and @michaelsproul for your help!	2025-08-13 03:04:31 +00:00
Jimmy Chen	40c2fd5ff4	Instrument tracing spans for block processing and import (#7816 ) #7815 - removes all existing spans, so some span fields that appear in logs like `service_name` may be lost. - instruments a few key code paths in the beacon node, starting from root spans named below: * Gossip block and blobs * `process_gossip_data_column_sidecar` * `process_gossip_blob` * `process_gossip_block` * Rpc block and blobs * `process_rpc_block` * `process_rpc_blobs` * `process_rpc_custody_columns` * Rpc blocks (range and backfill) * `process_chain_segment` * `PendingComponents` lifecycle * `pending_components` To test locally: * Run Grafana and Tempo with https://github.com/sigp/lighthouse-metrics/pull/57 * Run Lighthouse BN with `--telemetry-collector-url http://localhost:4317` Some captured traces can be found here: https://hackmd.io/@jimmygchen/r1sLOxPPeg Removing the old spans seem to have reduced the memory usage quite a lot - i think we were using them on long running tasks and too excessively: <img width="910" height="495" alt="image" src="https://github.com/user-attachments/assets/5208bbe4-53b2-4ead-bc71-0b782c788669" />	2025-08-08 05:32:22 +00:00
Michael Sproul	0dcce40ccb	Fix Clippy for Rust 1.90 beta (#7826 ) Fix Clippy for recently released Rust 1.90 beta. There may be more changes required when Rust 1.89 stable is released in a few days, but possibly not 🤞	2025-08-05 13:52:26 +00:00
Lion - dapplion	dd98534158	Hierarchical state diffs in hot DB (#6750 ) This PR implements https://github.com/sigp/lighthouse/pull/5978 (tree-states) but on the hot DB. It allows Lighthouse to massively reduce its disk footprint during non-finality and overall I/O in all cases. Closes https://github.com/sigp/lighthouse/issues/6580 Conga into https://github.com/sigp/lighthouse/pull/6744 ### TODOs - [x] Fix OOM in CI https://github.com/sigp/lighthouse/pull/7176 - [x] optimise store_hot_state to avoid storing a duplicate state if the summary already exists (should be safe from races now that pruning is cleaner) - [x] mispelled: get_ancenstor_state_root - [x] get_ancestor_state_root should use state summaries - [x] Prevent split from changing during ancestor calc - [x] Use same hierarchy for hot and cold ### TODO Good optimization for future PRs - [ ] On the migration, if the latest hot snapshot is aligned with the cold snapshot migrate the diffs instead of the full states. ``` align slot time 10485760 Nov-26-2024 12582912 Sep-14-2025 14680064 Jul-02-2026 ``` ### TODO Maybe things good to have - [ ] Rename anchor_slot https://github.com/sigp/lighthouse/compare/tree-states-hot-rebase-oom...dapplion:lighthouse:tree-states-hot-anchor-slot-rename?expand=1 - [ ] Make anchor fields not public such that they must be mutated through a method. To prevent un-wanted changes of the anchor_slot ### NOTTODO - [ ] Use fork-choice and a new method [`descendants_of_checkpoint`](`ca2388e196 (diff-046fbdb517ca16b80e4464c2c824cf001a74a0a94ac0065e635768ac391062a8)`) to filter only the state summaries that descend of finalized checkpoint]	2025-06-19 02:43:25 +00:00
Jimmy Chen	759b0612b3	Offloading KZG Proof Computation from the beacon node (#7117 ) Addresses #7108 - Add EL integration for `getPayloadV5` and `getBlobsV2` - Offload proof computation and use proofs from EL RPC APIs	2025-04-08 07:37:16 +00:00
Lion - dapplion	6f31d44343	Remove CGC from data_availability checker (#7033 ) - Part of https://github.com/sigp/lighthouse/issues/6767 Validator custody makes the CGC and set of sampling columns dynamic. Right now this information is stored twice: - in the data availability checker - in the network globals If that state becomes dynamic we must make sure it is in sync updating it twice, or guarding it behind a mutex. However, I noted that we don't really have to keep the CGC inside the data availability checker. All consumers can actually read it from the network globals, and we can update `make_available` to read the expected count of data columns from the block.	2025-03-26 05:19:51 +00:00
ThreeHrSleep	d60c24ef1c	Integrate tracing (#6339 ) Tracing Integration - [reference](`5bbf1859e9/projects/project-ideas.md (L297)`) - [x] replace slog & log with tracing throughout the codebase - [x] implement custom crit log - [x] make relevant changes in the formatter - [x] replace sloggers - [x] re-write SSE logging components cc: @macladson @eserilev	2025-03-12 22:31:05 +00:00
Lion - dapplion	3fab6a2c0b	Block availability data enum (#6866 ) PeerDAS has undergone multiple refactors + the blending with the get_blobs optimization has generated technical debt. A function signature like this `f008b84079/beacon_node/beacon_chain/src/beacon_chain.rs (L7171-L7178)` Allows at least the following combination of states: - blobs: Some / None - data_columns: Some / None - data_column_recv: Some / None - Block has data? Yes / No - Block post-PeerDAS? Yes / No In reality, we don't have that many possible states, only: - `NoData`: pre-deneb, pre-PeerDAS with 0 blobs or post-PeerDAS with 0 blobs - `Blobs(BlobSidecarList<E>)`: post-Deneb pre-PeerDAS with > 0 blobs - `DataColumns(DataColumnSidecarList<E>)`: post-PeerDAS with > 0 blobs - `DataColumnsRecv(oneshot::Receiver<DataColumnSidecarList<E>>)`: post-PeerDAS with > 0 blobs, but we obtained the columns via reconstruction ^ this are the variants of the new `AvailableBlockData` enum So we go from 2^5 states to 4 well-defined. Downstream code benefits nicely from this clarity and I think it makes the whole feature much more maintainable. Currently `is_available` returns a bool, and then we construct the available block in `make_available`. In a way the availability condition is duplicated in both functions. Instead, this PR constructs `AvailableBlockData` in `is_available` so the availability conditions are written once ```rust if let Some(block_data) = is_available(..) { let available_block = make_available(block_data); } ```	2025-02-24 04:47:09 +00:00
chonghe	d6596dbe21	Keep execution payload during historical backfill when prune-payloads set to false (#6766 ) - #6510 - Keep execution payload during historical backfill when `--prune-payloads false` is set - Add a field in the historical backfill debug log to indicate if execution payload is kept - Add a test to check historical blocks has execution payload when `--prune-payloads false is set - Very minor typo correction that I notice when working on this	2025-02-07 09:19:29 +00:00
Eitan Seri-Levi	a1b7d616b4	Modularize beacon node backend (#4718 ) #4669 Modularize the beacon node backend to make it easier to add new database implementations	2025-01-23 02:12:16 +00:00
Michael Sproul	9fdd53df56	Hierarchical state diffs (#5978 ) * Start extracting freezer changes for tree-states * Remove unused config args * Add comments * Remove unwraps * Subjective more clear implementation * Clean up hdiff * Update xdelta3 * Tree states archive metrics (#6040) * Add store cache size metrics * Add compress timer metrics * Add diff apply compute timer metrics * Add diff buffer cache hit metrics * Add hdiff buffer load times * Add blocks replayed metric * Move metrics to store * Future proof some metrics --------- Co-authored-by: Michael Sproul <michael@sigmaprime.io> * Port and clean up forwards iterator changes * Add and polish hierarchy-config flag * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * Cleaner errors * Fix beacon_chain test compilation * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * Patch a few more freezer block roots * Fix genesis block root bug * Fix test failing due to pending updates * Beacon chain tests passing * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * Fix doc lint * Implement DB schema upgrade for hierarchical state diffs (#6193) * DB upgrade * Add flag * Delete RestorePointHash * Update docs * Update docs * Implement hierarchical state diffs config migration (#6245) * Implement hierarchical state diffs config migration * Review PR * Remove TODO * Set CURRENT_SCHEMA_VERSION correctly * Fix genesis state loading * Re-delete some PartialBeaconState stuff --------- Co-authored-by: Michael Sproul <michael@sigmaprime.io> * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * Fix test compilation * Update schema downgrade test * Fix tests * Fix null anchor migration * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * Fix tree states upgrade migration (#6328) * Towards crash safety * Fix compilation * Move cold summaries and state roots to new columns * Rename StateRoots chunked field * Update prune states * Clean hdiff CLI flag and metrics * Fix "staged reconstruction" * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * Fix alloy issues * Fix staged reconstruction logic * Prevent weird slot drift * Remove "allow" flag * Update CLI help * Remove FIXME about downgrade * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * Remove some unnecessary error variants * Fix new test * Tree states archive - review comments and metrics (#6386) * Review PR comments and metrics * Comments * Add anchor metrics * drop prev comment * Update metadata.rs * Apply suggestions from code review --------- Co-authored-by: Michael Sproul <micsproul@gmail.com> * Update beacon_node/store/src/hot_cold_store.rs Co-authored-by: Lion - dapplion <35266934+dapplion@users.noreply.github.com> * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * Clarify comment and remove anchor_slot garbage * Simplify database anchor (#6397) * Simplify database anchor * Update beacon_node/store/src/reconstruct.rs * Add migration for anchor * Fix and simplify light_client store tests * Fix incompatible config test * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * More metrics * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * New historic state cache (#6475) * New historic state cache * Add more metrics * State cache hit rate metrics * Fix store metrics * More logs and metrics * Fix logger * Ensure cached states have built caches :O * Replay blocks in preference to diffing * Two separate caches * Distribute cache build time to next slot * Re-plumb historic-state-cache flag * Clean up metrics * Update book * Update beacon_node/store/src/hdiff.rs Co-authored-by: Lion - dapplion <35266934+dapplion@users.noreply.github.com> * Update beacon_node/store/src/historic_state_cache.rs Co-authored-by: Lion - dapplion <35266934+dapplion@users.noreply.github.com> --------- Co-authored-by: Lion - dapplion <35266934+dapplion@users.noreply.github.com> * Update database docs * Update diagram * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * Update lockbud to work with bindgen/etc * Correct pkg name for Debian * Remove vestigial epochs_per_state_diff * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * Markdown lint * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * Address Jimmy's review comments * Simplify ReplayFrom case * Fix and document genesis_state_root * Typo Co-authored-by: Jimmy Chen <jchen.tc@gmail.com> * Merge branch 'unstable' into tree-states-archive * Compute diff of validators list manually (#6556) * Split hdiff computation * Dedicated logic for historical roots and summaries * Benchmark against real states * Mutated source? * Version the hdiff * Add lighthouse DB config for hierarchy exponents * Tidy up hierarchy exponents flag * Apply suggestions from code review Co-authored-by: Michael Sproul <micsproul@gmail.com> * Address PR review * Remove hardcoded paths in benchmarks * Delete unused function in benches * lint --------- Co-authored-by: Michael Sproul <michael@sigmaprime.io> * Test hdiff binary format stability (#6585) * Merge remote-tracking branch 'origin/unstable' into tree-states-archive * Add deprecation warning for SPRP * Update xdelta to get rid of duplicate deps * Document test	2024-11-18 01:51:44 +00:00
Lion - dapplion	38388979db	Strict match of errors in backfill sync (#6520 ) * Strict match of errors in backfill sync * Fix tests	2024-11-05 01:00:10 +00:00
Jimmy Chen	f3a5e256da	Implement Subnet Sampling for PeerDAS (#6410 ) * Add `SAMPLES_PER_SLOT` config. * Rename `sampling` module to `peer_sampling` * Implement subnet sampling. * Update lookup test. * Merge branch 'unstable' into subnet-sampling * Merge branch 'unstable' into subnet-sampling # Conflicts: # beacon_node/beacon_chain/src/data_availability_checker.rs # beacon_node/http_api/src/publish_blocks.rs # beacon_node/lighthouse_network/src/types/globals.rs # beacon_node/network/src/sync/manager.rs * Merge branch 'unstable' into subnet-sampling	2024-10-04 00:27:30 +00:00
Eitan Seri-Levi	99e53b88c3	Migrate from `ethereum-types` to `alloy-primitives` (#6078 ) * Remove use of ethers_core::RlpStream * Merge branch 'unstable' of https://github.com/sigp/lighthouse into remove_use_of_ethers_core * Remove old code * Simplify keccak call * Remove unused package * Merge branch 'unstable' of https://github.com/ethDreamer/lighthouse into remove_use_of_ethers_core * Merge branch 'unstable' into remove_use_of_ethers_core * Run clippy * Merge branch 'remove_use_of_ethers_core' of https://github.com/dospore/lighthouse into remove_use_of_ethers_core * Check all cargo fmt * migrate to alloy primitives init * fix deps * integrate alloy-primitives * resolve dep issues * more changes based on dep changes * add TODOs * Merge branch 'unstable' of https://github.com/sigp/lighthouse into remove_use_of_ethers_core * Revert lock * Add BeaconBlocksByRange v3 * continue migration * Revert "Add BeaconBlocksByRange v3" This reverts commit `e3ce7fc5ea`. * impl hash256 extended trait * revert some uneeded diffs * merge conflict resolved * fix subnet id rshift calc * rename to FixedBytesExtended * debugging * Merge branch 'unstable' of https://github.com/sigp/lighthouse into migrate-to-alloy-primitives * fix failed test * fixing more tests * Merge branch 'unstable' of https://github.com/sigp/lighthouse into remove_use_of_ethers_core * introduce a shim to convert between the two u256 types * move alloy to wrokspace * align alloy versions * update * update web3signer test certs * refactor * resolve failing tests * linting * fix graffiti string test * fmt * fix ef test * resolve merge conflicts * remove udep and revert cert * cargo patch * cyclic dep * fix build error * Merge branch 'unstable' of https://github.com/sigp/lighthouse into migrate-to-alloy-primitives * resolve conflicts, update deps * merge unstable * fmt * fix deps * Merge branch 'unstable' of https://github.com/sigp/lighthouse into migrate-to-alloy-primitives * resolve merge conflicts * resolve conflicts, make necessary changes * Remove patch * fmt * remove file * merge conflicts * sneaking in a smol change * bump versions * Merge remote-tracking branch 'origin/unstable' into migrate-to-alloy-primitives * Updates for peerDAS * Update ethereum_hashing to prevent dupe * updated alloy-consensus, removed TODOs * cargo update * endianess fix * Merge branch 'unstable' of https://github.com/sigp/lighthouse into migrate-to-alloy-primitives * fmt * fix merge * fix test * fixed_bytes crate * minor fixes * convert u256 to i64 * panic free mixin to_low_u64_le * from_str_radix * computbe_subnet api and ensuring we use big-endian * Merge branch 'unstable' of https://github.com/sigp/lighthouse into migrate-to-alloy-primitives * fix test * Simplify subnet_id test * Simplify some more tests * Add tests to fixed_bytes crate * Merge branch 'unstable' into migrate-to-alloy-primitives	2024-09-02 08:03:24 +00:00
Jimmy Chen	18df7010c3	Persist data columns to store (#6255 ) * Persist data columns (from das PR #5196)	2024-08-14 04:36:24 +00:00
Jimmy Chen	acd3151184	Import gossip data column into data availability checker (#6197 ) * Import gossip data column into data availability checker	2024-08-02 12:42:11 +00:00
Michael Sproul	6f3af67362	Fix off-by-one in backfill sig verification (#5120 ) * Fix off-by-one in backfill sig verification * Add self-referential PR link	2024-01-30 00:33:01 +00:00
realbigsean	1cebf41452	Backfill blob storage fix (#5119 ) * store blobs in the correct db in backfill * add database migration * add migration file * remove log info suggesting deneb isn't schedule * add batching in blob migration	2024-01-24 09:35:02 +11:00
Michael Sproul	c574f8136e	Fix block backfill with genesis skip slots (#4820 ) ## Issue Addressed Closes #4817. ## Proposed Changes - Fill in the linear block roots array between 0 and the slot of the first block (e.g. slots 0 and 1 on Holesky). - Backport the `--freezer`, `--skip` and `--limit` options for `lighthouse db inspect` from tree-states. This allows us to easily view the database corruption of 4817 using `lighthouse db inspect --network holesky --freezer --column bbr --output values --limit 2`. - Backport the `iter_column_from` change and `MemoryStore` overhaul from tree-states. These are required to enable `lighthouse db inspect`. - Rework `freezer_upper_limit` to allow state lookups for slots below the `state_lower_limit`. Currently state lookups will fail until state reconstruction completes entirely. There is a new regression test for the main bug, but no test for the `freezer_upper_limit` fix because we don't currently support running state reconstruction partially (see #3026). This will be fixed once we merge `tree-states`! In lieu of an automated test, I've tested manually on a Holesky node while it was reconstructing. ## Additional Info Users who backfilled Holesky to slot 0 (e.g. using `--reconstruct-historic-states`) need to either: - Re-sync from genesis. - Re-sync using checkpoint sync and the changes from this PR. Due to the recency of the Holesky genesis, writing a custom pass to fix up broken databases (which would require its own thorough testing) was deemed unnecessary. This is the primary reason for this PR being marked `backwards-incompat`. This will create few conflicts with Deneb, which I've already resolved on `tree-states-deneb` and will be happy to backport to Deneb once this PR is merged to unstable.	2023-10-27 05:08:49 +00:00
Jimmy Chen	4ad7e15732	Address Clippy 1.73 lints on Deneb branch (#4810 ) * Address Clippy 1.73 lints (#4809) ## Proposed Changes Fix Clippy lints enabled by default in Rust 1.73.0, released today. * Address Clippy 1.73 lints. --------- Co-authored-by: Michael Sproul <michael@sigmaprime.io>	2023-10-06 12:23:57 +05:30
realbigsean	c7ddf1f0b1	add processing and processed caching to the DA checker (#4732 ) * add processing and processed caching to the DA checker * move processing cache out of critical cache * get it compiling * fix lints * add docs to `AvailabilityView` * some self review * fix lints * fix beacon chain tests * cargo fmt * make availability view easier to implement, start on testing * move child component cache and finish test * cargo fix * cargo fix * cargo fix * fmt and lint * make blob commitments not optional, rename some caches, add missing blobs struct * Update beacon_node/beacon_chain/src/data_availability_checker/processing_cache.rs Co-authored-by: ethDreamer <37123614+ethDreamer@users.noreply.github.com> * marks review feedback and other general cleanup * cargo fix * improve availability view docs * some renames * some renames and docs * fix should delay lookup logic * get rid of some wrapper methods * fix up single lookup changes * add a couple docs * add single blob merge method and improve process_... docs * update some names * lints * fix merge * remove blob indices from lookup creation log * remove blob indices from lookup creation log * delayed lookup logging improvement * check fork choice before doing any blob processing * remove unused dep * Update beacon_node/beacon_chain/src/data_availability_checker/availability_view.rs Co-authored-by: Michael Sproul <micsproul@gmail.com> * Update beacon_node/beacon_chain/src/data_availability_checker/availability_view.rs Co-authored-by: Michael Sproul <micsproul@gmail.com> * Update beacon_node/beacon_chain/src/data_availability_checker/availability_view.rs Co-authored-by: Michael Sproul <micsproul@gmail.com> * Update beacon_node/beacon_chain/src/data_availability_checker/availability_view.rs Co-authored-by: Michael Sproul <micsproul@gmail.com> * Update beacon_node/network/src/sync/block_lookups/delayed_lookup.rs Co-authored-by: Michael Sproul <micsproul@gmail.com> * remove duplicate deps * use gen range in random blobs geneartor * rename processing cache fields * require block root in rpc block construction and check block root consistency * send peers as vec in single message * spawn delayed lookup service from network beacon processor * fix tests --------- Co-authored-by: ethDreamer <37123614+ethDreamer@users.noreply.github.com> Co-authored-by: Michael Sproul <micsproul@gmail.com>	2023-10-03 09:59:33 -04:00
Michael Sproul	9244f7f7bc	Improvements to Deneb `store` upon review (#4693 ) * Start testing blob pruning * Get rid of unnecessary orphaned blob column * Make random blob tests deterministic * Test for pruning being blocked by finality * Fix bugs and test fork boundary * A few more tweaks to pruning conditions * Tweak oldest_blob_slot semantics * Test margin pruning * Clean up some terminology and lints * Schema migrations for v18 * Remove FIXME * Prune blobs on finalization not every slot * Fix more bugs + tests * Address review comments	2023-09-25 14:21:54 -04:00
realbigsean	7d468cb487	More deneb cleanup (#4640 ) * remove protoc and token from network tests github action * delete unused beacon chain methods * downgrade writing blobs to store log * reduce diff in block import logic * remove some todo's and deneb built in network * remove unnecessary error, actually use some added metrics * remove some metrics, fix missing components on publish funcitonality * fix status tests * rename sidecar by root to blobs by root * clean up some metrics * remove unnecessary feature gate from attestation subnet tests, clean up blobs by range response code * pawan's suggestion in `protocol_info`, peer score in matching up batch sync block and blobs * fix range tests for deneb * pub block and blob db cache behind the same mutex * remove unused errs and an empty file * move sidecar trait to new file * move types from payload to eth2 crate * update comment and add flag value name * make function private again, remove allow unused * use reth rlp for tx decoding * fix compile after merge * rename kzg commitments * cargo fmt * remove unused dep * Update beacon_node/execution_layer/src/lib.rs Co-authored-by: Pawan Dhananjay <pawandhananjay@gmail.com> * Update beacon_node/beacon_processor/src/lib.rs Co-authored-by: Pawan Dhananjay <pawandhananjay@gmail.com> * pawan's suggestiong for vec capacity * cargo fmt * Revert "use reth rlp for tx decoding" This reverts commit `5181837d81`. * remove reth rlp --------- Co-authored-by: Pawan Dhananjay <pawandhananjay@gmail.com>	2023-08-20 21:17:17 -04:00
ethDreamer	ceaa740841	Validate & Store Blobs During Backfill (#4307 ) * Verify and Store Blobs During Backfill * Improve logs * Eliminated Clone * Fix Inital Vector Capacity * Addressed Sean's Comments	2023-06-05 09:09:42 -04:00
Jimmy Chen	70c4ae35ab	Merge branch 'unstable' into deneb-free-blobs # Conflicts: # .github/workflows/docker.yml # .github/workflows/local-testnet.yml # .github/workflows/test-suite.yml # Cargo.lock # Cargo.toml # beacon_node/beacon_chain/src/beacon_chain.rs # beacon_node/beacon_chain/src/builder.rs # beacon_node/beacon_chain/src/test_utils.rs # beacon_node/execution_layer/src/engine_api/json_structures.rs # beacon_node/network/src/beacon_processor/mod.rs # beacon_node/network/src/beacon_processor/worker/gossip_methods.rs # beacon_node/network/src/sync/backfill_sync/mod.rs # beacon_node/store/src/config.rs # beacon_node/store/src/hot_cold_store.rs # common/eth2_network_config/Cargo.toml # consensus/ssz/src/decode/impls.rs # consensus/ssz_derive/src/lib.rs # consensus/ssz_derive/tests/tests.rs # consensus/ssz_types/src/serde_utils/mod.rs # consensus/tree_hash/src/impls.rs # consensus/tree_hash/src/lib.rs # consensus/types/Cargo.toml # consensus/types/src/beacon_state.rs # consensus/types/src/chain_spec.rs # consensus/types/src/eth_spec.rs # consensus/types/src/fork_name.rs # lcli/Cargo.toml # lcli/src/main.rs # lcli/src/new_testnet.rs # scripts/local_testnet/el_bootnode.sh # scripts/local_testnet/genesis.json # scripts/local_testnet/geth.sh # scripts/local_testnet/setup.sh # scripts/local_testnet/start_local_testnet.sh # scripts/local_testnet/vars.env # scripts/tests/doppelganger_protection.sh # scripts/tests/genesis.json # scripts/tests/vars.env # testing/ef_tests/Cargo.toml # validator_client/src/block_service.rs	2023-05-30 22:44:05 +10:00
Age Manning	35ca086269	Backfill blocks only to the WSP by default (#4082 ) ## Limit Backfill Sync This PR transitions Lighthouse from syncing all the way back to genesis to only syncing back to the weak subjectivity point (~ 5 months) when syncing via a checkpoint sync. There are a number of important points to note with this PR: - Firstly and most importantly, this PR fundamentally shifts the default security guarantees of checkpoint syncing in Lighthouse. Prior to this PR, Lighthouse could verify the checkpoint of any given chain by ensuring the chain eventually terminates at the corresponding genesis. This guarantee can still be employed via the new CLI flag --genesis-backfill which will prompt lighthouse to the old behaviour of downloading all blocks back to genesis. The new behaviour only checks the proposer signatures for the last 5 months of blocks but cannot guarantee the chain matches the genesis chain. - I have not modified any of the peer scoring or RPC responses. Clients syncing from gensis, will downscore new Lighthouse peers that do not possess blocks prior to the WSP. This is by design, as Lighthouse nodes of this form, need a mechanism to sort through peers in order to find useful peers in order to complete their genesis sync. We therefore do not discriminate between empty/error responses for blocks prior or post the local WSP. If we request a block that a peer does not posses, then fundamentally that peer is less useful to us than other peers. - This will make a radical shift in that the majority of nodes will no longer store the full history of the chain. In the future we could add a pruning mechanism to remove old blocks from the db also. Co-authored-by: Paul Hauner <paul@paulhauner.com>	2023-05-05 03:49:23 +00:00
realbigsean	e02fcb30ab	dont verify the signature of the genesis block in backfill sync (#3846 )	2023-01-03 17:23:01 -05:00
Paul Hauner	be4e261e74	Use async code when interacting with EL (#3244 ) ## Overview This rather extensive PR achieves two primary goals: 1. Uses the finalized/justified checkpoints of fork choice (FC), rather than that of the head state. 2. Refactors fork choice, block production and block processing to `async` functions. Additionally, it achieves: - Concurrent forkchoice updates to the EL and cache pruning after a new head is selected. - Concurrent "block packing" (attestations, etc) and execution payload retrieval during block production. - Concurrent per-block-processing and execution payload verification during block processing. - The `Arc`-ification of `SignedBeaconBlock` during block processing (it's never mutated, so why not?): - I had to do this to deal with sending blocks into spawned tasks. - Previously we were cloning the beacon block at least 2 times during each block processing, these clones are either removed or turned into cheaper `Arc` clones. - We were also `Box`-ing and un-`Box`-ing beacon blocks as they moved throughout the networking crate. This is not a big deal, but it's nice to avoid shifting things between the stack and heap. - Avoids cloning all the blocks in every chain segment during sync. - It also has the potential to clean up our code where we need to pass an owned block around so we can send it back in the case of an error (I didn't do much of this, my PR is already big enough 😅) - The `BeaconChain::HeadSafetyStatus` struct was removed. It was an old relic from prior merge specs. For motivation for this change, see https://github.com/sigp/lighthouse/pull/3244#issuecomment-1160963273 ## Changes to `canonical_head` and `fork_choice` Previously, the `BeaconChain` had two separate fields: ``` canonical_head: RwLock<Snapshot>, fork_choice: RwLock<BeaconForkChoice> ``` Now, we have grouped these values under a single struct: ``` canonical_head: CanonicalHead { cached_head: RwLock<Arc<Snapshot>>, fork_choice: RwLock<BeaconForkChoice> } ``` Apart from ergonomics, the only actual change here is wrapping the canonical head snapshot in an `Arc`. This means that we no longer need to hold the `cached_head` (`canonical_head`, in old terms) lock when we want to pull some values from it. This was done to avoid deadlock risks by preventing functions from acquiring (and holding) the `cached_head` and `fork_choice` locks simultaneously. ## Breaking Changes ### The `state` (root) field in the `finalized_checkpoint` SSE event Consider the scenario where epoch `n` is just finalized, but `start_slot(n)` is skipped. There are two state roots we might in the `finalized_checkpoint` SSE event: 1. The state root of the finalized block, which is `get_block(finalized_checkpoint.root).state_root`. 4. The state root at slot of `start_slot(n)`, which would be the state from (1), but "skipped forward" through any skip slots. Previously, Lighthouse would choose (2). However, we can see that when [Teku generates that event](`de2b2801c8/data/beaconrestapi/src/main/java/tech/pegasys/teku/beaconrestapi/handlers/v1/events/EventSubscriptionManager.java (L171-L182)`) it uses [`getStateRootFromBlockRoot`](`de2b2801c8/data/provider/src/main/java/tech/pegasys/teku/api/ChainDataProvider.java (L336-L341)`) which uses (1). I have switched Lighthouse from (2) to (1). I think it's a somewhat arbitrary choice between the two, where (1) is easier to compute and is consistent with Teku. ## Notes for Reviewers I've renamed `BeaconChain::fork_choice` to `BeaconChain::recompute_head`. Doing this helped ensure I broke all previous uses of fork choice and I also find it more descriptive. It describes an action and can't be confused with trying to get a reference to the `ForkChoice` struct. I've changed the ordering of SSE events when a block is received. It used to be `[block, finalized, head]` and now it's `[block, head, finalized]`. It was easier this way and I don't think we were making any promises about SSE event ordering so it's not "breaking". I've made it so fork choice will run when it's first constructed. I did this because I wanted to have a cached version of the last call to `get_head`. Ensuring `get_head` has been run at least once means that the cached values doesn't need to wrapped in an `Option`. This was fairly simple, it just involved passing a `slot` to the constructor so it knows when it's being run. When loading a fork choice from the store and a slot clock isn't handy I've just used the `slot` that was saved in the `fork_choice_store`. That seems like it would be a faithful representation of the slot when we saved it. I added the `genesis_time: u64` to the `BeaconChain`. It's small, constant and nice to have around. Since we're using FC for the fin/just checkpoints, we no longer get the `0x00..00` roots at genesis. You can see I had to remove a work-around in `ef-tests` here: b56be3bc2. I can't find any reason why this would be an issue, if anything I think it'll be better since the genesis-alias has caught us out a few times (0x00..00 isn't actually a real root). Edit: I did find a case where the `network` expected the 0x00..00 alias and patched it here: 3f26ac3e2. You'll notice a lot of changes in tests. Generally, tests should be functionally equivalent. Here are the things creating the most diff-noise in tests: - Changing tests to be `tokio::async` tests. - Adding `.await` to fork choice, block processing and block production functions. - Refactor of the `canonical_head` "API" provided by the `BeaconChain`. E.g., `chain.canonical_head.cached_head()` instead of `chain.canonical_head.read()`. - Wrapping `SignedBeaconBlock` in an `Arc`. - In the `beacon_chain/tests/block_verification`, we can't use the `lazy_static` `CHAIN_SEGMENT` variable anymore since it's generated with an async function. We just generate it in each test, not so efficient but hopefully insignificant. I had to disable `rayon` concurrent tests in the `fork_choice` tests. This is because the use of `rayon` and `block_on` was causing a panic. Co-authored-by: Mac L <mjladson@pm.me>	2022-07-03 05:36:50 +00:00
Michael Sproul	bcdd960ab1	Separate execution payloads in the DB (#3157 ) ## Proposed Changes Reduce post-merge disk usage by not storing finalized execution payloads in Lighthouse's database. ⚠️ This is achieved in a backwards-incompatible way for networks that have already merged ⚠️. Kiln users and shadow fork enjoyers will be unable to downgrade after running the code from this PR. The upgrade migration may take several minutes to run, and can't be aborted after it begins. The main changes are: - New column in the database called `ExecPayload`, keyed by beacon block root. - The `BeaconBlock` column now stores blinded blocks only. - Lots of places that previously used full blocks now use blinded blocks, e.g. analytics APIs, block replay in the DB, etc. - On finalization: - `prune_abanonded_forks` deletes non-canonical payloads whilst deleting non-canonical blocks. - `migrate_db` deletes finalized canonical payloads whilst deleting finalized states. - Conversions between blinded and full blocks are implemented in a compositional way, duplicating some work from Sean's PR #3134. - The execution layer has a new `get_payload_by_block_hash` method that reconstructs a payload using the EE's `eth_getBlockByHash` call. - I've tested manually that it works on Kiln, using Geth and Nethermind. - This isn't necessarily the most efficient method, and new engine APIs are being discussed to improve this: https://github.com/ethereum/execution-apis/pull/146. - We're depending on the `ethers` master branch, due to lots of recent changes. We're also using a workaround for https://github.com/gakonst/ethers-rs/issues/1134. - Payload reconstruction is used in the HTTP API via `BeaconChain::get_block`, which is now `async`. Due to the `async` fn, the `blocking_json` wrapper has been removed. - Payload reconstruction is used in network RPC to serve blocks-by-{root,range} responses. Here the `async` adjustment is messier, although I think I've managed to come up with a reasonable compromise: the handlers take the `SendOnDrop` by value so that they can drop it on _task completion_ (after the `fn` returns). Still, this is introducing disk reads onto core executor threads, which may have a negative performance impact (thoughts appreciated). ## Additional Info - [x] For performance it would be great to remove the cloning of full blocks when converting them to blinded blocks to write to disk. I'm going to experiment with a `put_block` API that takes the block by value, breaks it into a blinded block and a payload, stores the blinded block, and then re-assembles the full block for the caller. - [x] We should measure the latency of blocks-by-root and blocks-by-range responses. - [x] We should add integration tests that stress the payload reconstruction (basic tests done, issue for more extensive tests: https://github.com/sigp/lighthouse/issues/3159) - [x] We should (manually) test the schema v9 migration from several prior versions, particularly as blocks have changed on disk and some migrations rely on being able to load blocks. Co-authored-by: Paul Hauner <paul@paulhauner.com>	2022-05-12 00:42:17 +00:00
Michael Sproul	5cde3fc4da	Reduce lock contention in backfill sync (#2716 ) ## Proposed Changes Clone the proposer pubkeys during backfill signature verification to reduce the time that the pubkey cache lock is held for. Cloning such a small number of pubkeys has negligible impact on the total running time, but greatly reduces lock contention. On a Ryzen 5950X, the setup step seems to take around 180us regardless of whether the key is cloned or not, while the verification takes 7ms. When Lighthouse is limited to 10% of one core using `sudo cpulimit --pid <pid> --limit 10` the total time jumps up to 800ms, but the setup step remains only 250us. This means that under heavy load this PR could cut the time the lock is held for from 800ms to 250us, which is a huge saving of 99.97%!	2021-10-15 03:28:03 +00:00
Michael Sproul	ed1fc7cca6	Fix I/O atomicity issues with checkpoint sync (#2671 ) ## Issue Addressed This PR addresses an issue found by @YorickDowne during testing of v2.0.0-rc.0. Due to a lack of atomic database writes on checkpoint sync start-up, it was possible for the database to get into an inconsistent state from which it couldn't recover without `--purge-db`. The core of the issue was that the store's anchor info was being stored _before_ the `PersistedBeaconChain`. If a crash occured so that anchor info was stored but _not_ the `PersistedBeaconChain`, then on restart Lighthouse would think the database was unitialized and attempt to compare-and-swap a `None` value, but would actually find the stale info from the previous run. ## Proposed Changes The issue is fixed by writing the anchor info, the split point, and the `PersistedBeaconChain` atomically on start-up. Some type-hinting ugliness was required, which could possibly be cleaned up in future refactors.	2021-10-05 03:53:17 +00:00
Michael Sproul	9667dc2f03	Implement checkpoint sync (#2244 ) ## Issue Addressed Closes #1891 Closes #1784 ## Proposed Changes Implement checkpoint sync for Lighthouse, enabling it to start from a weak subjectivity checkpoint. ## Additional Info - [x] Return unavailable status for out-of-range blocks requested by peers (#2561) - [x] Implement sync daemon for fetching historical blocks (#2561) - [x] Verify chain hashes (either in `historical_blocks.rs` or the calling module) - [x] Consistency check for initial block + state - [x] Fetch the initial state and block from a beacon node HTTP endpoint - [x] Don't crash fetching beacon states by slot from the API - [x] Background service for state reconstruction, triggered by CLI flag or API call. Considered out of scope for this PR: - Drop the requirement to provide the `--checkpoint-block` (this would require some pretty heavy refactoring of block verification) Co-authored-by: Diva M <divma@protonmail.com>	2021-09-22 00:37:28 +00:00

32 Commits