Commit Graph

95 Commits

Author SHA1 Message Date
Jimmy Chen
6f754bfd8d Merge branch 'peerdas-devnet-7' into peerdas-rangesync 2025-06-05 23:39:03 +10:00
Lion - dapplion
d457ceeaaf Don't create child lookup if parent is faulty (#7118)
Issue discovered on PeerDAS devnet (node `lighthouse-geth-2.peerdas-devnet-5.ethpandaops.io`). Summary:

- A lookup is created for block root `0x28299de15843970c8ea4f95f11f07f75e76a690f9a8af31d354c38505eebbe12`
- That block or a parent is faulty and `0x28299de15843970c8ea4f95f11f07f75e76a690f9a8af31d354c38505eebbe12` is added to the failed chains cache
- We later receive a block that is a child of a child of `0x28299de15843970c8ea4f95f11f07f75e76a690f9a8af31d354c38505eebbe12`
- We create a lookup, which attempts to process the child of `0x28299de15843970c8ea4f95f11f07f75e76a690f9a8af31d354c38505eebbe12` and hit a processor error `UnknownParent`, hitting this line

bf955c7543/beacon_node/network/src/sync/block_lookups/mod.rs (L686-L688)

`search_parent_of_child` does not create a parent lookup because the parent root is in the failed chain cache. However, we have **already** marked the child as awaiting the parent. This results in an inconsistent state of lookup sync, as there's a lookup awaiting a parent that doesn't exist.

Now we have a lookup (the child of `0x28299de15843970c8ea4f95f11f07f75e76a690f9a8af31d354c38505eebbe12`) that is awaiting a parent lookup that doesn't exist: hence stuck.

### Impact

This bug can affect Mainnet as well as PeerDAS devnets.

This bug may stall lookup sync for a few minutes (up to `LOOKUP_MAX_DURATION_STUCK_SECS = 15 min`) until the stuck prune routine deletes it. By that time the root will be cleared from the failed chain cache and sync should succeed. During that time the user will see a lot of `WARN` logs when attempting to add each peer to the inconsistent lookup. We may also sync the block through range sync if we fall behind by more than 2 epochs. We may also create the parent lookup successfully after the failed cache clears and complete the child lookup.

This bug is triggered if:
- We have a lookup that fails and its root is added to the failed chain cache (much more likely to happen in PeerDAS networks)
- We receive a block that builds on a child of the block added to the failed chain cache


  Ensure that we never create (or leave existing) a lookup that references a non-existing parent.

I added `must_use` lints to the functions that create lookups. To fix the specific bug we must recursively drop the child lookup if the parent is not created. So if `search_parent_of_child` returns `false` now return `LookupRequestError::Failed` instead of `LookupResult::Pending`.

As a bonus I have a added more logging and reason strings to the errors
2025-06-05 08:53:43 +00:00
dapplion
52722b7b2e Resolve TODO(das) 2025-05-27 14:28:52 -05:00
dapplion
4fb2ae658a Implement reliable range sync for PeerDAS 2025-05-22 00:03:25 -05:00
ThreeHrSleep
d60c24ef1c Integrate tracing (#6339)
Tracing Integration
- [reference](5bbf1859e9/projects/project-ideas.md (L297))


  - [x] replace slog & log with tracing throughout the codebase
- [x] implement custom crit log
- [x] make relevant changes in the formatter
- [x] replace sloggers
- [x] re-write SSE logging components

cc: @macladson @eserilev
2025-03-12 22:31:05 +00:00
Lion - dapplion
3992d6ba74 Fix misc PeerDAS todos (#6862)
Address misc PeerDAS TODOs that are not too big for a dedicated PR


  I'll justify each TODO on an inlined comment
2025-02-11 06:07:13 +00:00
Lion - dapplion
95cec45c38 Use data column batch verification consistently (#6851)
Resolve a `TODO(das)` to use KZG batch verification in `put_rpc_custody_columns`


  Uses `verify_kzg_for_data_column_list_with_scoring` in all paths that send more than one column. To use batch verification and have attributability of which peer is sending a bad column.

Needs to move `verify_kzg_for_data_column_list_with_scoring` into the type's module to convert to the KZG verified type.
2025-02-03 06:07:45 +00:00
Lion - dapplion
7a0388ef2a Fix custodial peer assumption on lookup custody requests (#6815)
* Fix custodial peer assumption on lookup custody requests

* lint
2025-01-20 12:31:18 +00:00
Lion - dapplion
8188e036a0 Generalize sync block lookup tests (#6498)
* Generalize sync block lookup tests
2024-10-25 05:50:31 +00:00
Lion - dapplion
a0a62ea3e1 Prevent sync lookups from reverting to awaiting block (#6443)
* Prevent sync lookups from reverting to awaiting block

* Remove stale comment
2024-10-10 23:44:18 +00:00
Lion - dapplion
71c5388461 Transition block lookup sync to range sync (#6122)
* Transition block lookup sync to range sync

* Log unexpected state

* Merge remote-tracking branch 'sigp/unstable' into lookup-to-range

* Add docs

* Merge remote-tracking branch 'sigp/unstable' into lookup-to-range
2024-10-08 21:18:41 +00:00
Michael Sproul
2792705331 Lenient duplicate checks on HTTP API for block publication (#5574)
* start splitting gossip verification

* WIP

* Gossip verify separate (#7)

* save

* save

* make ProvenancedBlock concrete

* delete into gossip verified block contents

* get rid of IntoBlobSidecar trait

* remove IntoGossipVerified trait

* get tests compiling

* don't check sidecar slashability in publish

* remove second publish closure

* drop blob bool. also prefer using message index over index of position in list

* Merge remote-tracking branch 'origin/unstable' into gossip-verify-separate

* Fix low-hanging tests

* Fix tests and clean up

* Clean up imports

* more cleanup

* Merge remote-tracking branch 'origin/unstable' into gossip-verify-separate

* Further refine behaviour and add tests

* Merge remote-tracking branch 'origin/unstable' into gossip-verify-separate

* Merge remote-tracking branch 'origin/unstable' into gossip-verify-separate

* Remove empty line

* Fix test (block is not fully imported just gossip verified)

* Merge remote-tracking branch 'origin/unstable' into gossip-verify-separate

* Update for unstable & use empty blob list

* Update comment

* Add test for duplicate block case

* Merge remote-tracking branch 'origin/unstable' into gossip-verify-separate

* Clarify unreachable case

* Fix another publish_block case

* Remove unreachable case in filter chain segment

* Revert unrelated blob optimisation

* Merge remote-tracking branch 'origin/unstable' into gossip-verify-separate

* Merge remote-tracking branch 'origin/unstable' into gossip-verify-separate

* Fix merge conflicts

* Merge remote-tracking branch 'origin/unstable' into gossip-verify-separate

* Fix some compilation issues. Impl is fucked though

* Support peerDAS

* Fix tests

* Merge remote-tracking branch 'origin/unstable' into gossip-verify-separate

* Fix conflict

* Merge remote-tracking branch 'origin/unstable' into gossip-verify-separate

* Address review comments

* Merge remote-tracking branch 'origin/unstable' into gossip-verify-separate
2024-09-24 04:52:44 +00:00
Lion - dapplion
d84df5799c Attribute invalid column proof error to correct peer (#6377)
* Attribute invalid column proof error to correct peer

* Update beacon_node/beacon_chain/src/data_availability_checker.rs

Co-authored-by: Pawan Dhananjay <pawandhananjay@gmail.com>

* fix conflicts
2024-09-23 18:49:26 +00:00
Lion - dapplion
ed7cd3bf47 Drop block data from BlockError and BlobError (#5735)
* Drop block data from BlockError and BlobError

* Debug release tests

* Fix release tests

* Revert unnecessary changes to lighthouse_metrics
2024-09-03 01:07:07 +00:00
Lion - dapplion
f75a2cf65b PeerDAS implementation (#5683)
* 1D PeerDAS prototype: Data format and Distribution (#5050)

* Build and publish column sidecars. Add stubs for gossip.

* Add blob column subnets

* Add `BlobColumnSubnetId` and initial compute subnet logic.

* Subscribe to blob column subnets.

* Introduce `BLOB_COLUMN_SUBNET_COUNT` based on DAS configuration parameter changes.

* Fix column sidecar type to use `VariableList` for data.

* Fix lint errors.

* Update types and naming to latest consensus-spec #3574.

* Fix test and some cleanups.

* Merge branch 'unstable' into das

* Merge branch 'unstable' into das

* Merge branch 'unstable' into das

# Conflicts:
#	consensus/types/src/chain_spec.rs

* Add `DataColumnSidecarsByRoot ` req/resp protocol (#5196)

* Add stub for `DataColumnsByRoot`

* Add basic implementation of serving RPC data column from DA checker.

* Store data columns in early attester cache and blobs db.

* Apply suggestions from code review

Co-authored-by: Eitan Seri-Levi <eserilev@gmail.com>
Co-authored-by: Jacob Kaufmann <jacobkaufmann18@gmail.com>

* Fix build.

* Store `DataColumnInfo` in database and various cleanups.

* Update `DataColumnSidecar` ssz max size and remove panic code.

---------

Co-authored-by: Eitan Seri-Levi <eserilev@gmail.com>
Co-authored-by: Jacob Kaufmann <jacobkaufmann18@gmail.com>

* feat: add DAS KZG in data col construction (#5210)

* feat: add DAS KZG in data col construction

* refactor data col sidecar construction

* refactor: add data cols to GossipVerifiedBlockContents

* Disable windows tests for `das` branch. (c-kzg doesn't build on windows)

* Formatting and lint changes only.

* refactor: remove iters in construction of data cols

* Update vec capacity and error handling.

* Add `data_column_sidecar_computation_seconds` metric.

---------

Co-authored-by: Jimmy Chen <jchen.tc@gmail.com>

* Merge branch 'unstable' into das

# Conflicts:
#	.github/workflows/test-suite.yml
#	beacon_node/lighthouse_network/src/types/topics.rs

* fix: update data col subnet count from 64 to 32 (#5413)

* feat: add peerdas custody field to ENR (#5409)

* feat: add peerdas custody field to ENR

* add hash prefix step in subnet computation

* refactor test and fix possible u64 overflow

* default to min custody value if not present in ENR

* Merge branch 'unstable' into das

* Merge branch 'unstable' into das-unstable-merge-0415

# Conflicts:
#	Cargo.lock
#	beacon_node/beacon_chain/src/data_availability_checker.rs
#	beacon_node/beacon_chain/src/data_availability_checker/availability_view.rs
#	beacon_node/beacon_chain/src/data_availability_checker/overflow_lru_cache.rs
#	beacon_node/beacon_chain/src/data_availability_checker/processing_cache.rs
#	beacon_node/lighthouse_network/src/rpc/methods.rs
#	beacon_node/network/src/network_beacon_processor/mod.rs
#	beacon_node/network/src/sync/block_lookups/tests.rs
#	crypto/kzg/Cargo.toml

* Merge remote-tracking branch 'sigp/unstable' into das

* Merge remote-tracking branch 'sigp/unstable' into das

* Fix merge conflicts.

* Send custody data column to `DataAvailabilityChecker` for determining block importability (#5570)

* Only import custody data columns after publishing a block.

* Add `subscribe-all-data-column-subnets` and pass custody column count to `availability_cache`.

* Add custody requirement checks to `availability_cache`.

* Fix config not being passed to DAChecker and add more logging.

* Introduce `peer_das_epoch` and make blobs and columns mutually exclusive.

* Add DA filter for PeerDAS.

* Fix data availability check and use test_logger in tests.

* Fix subscribe to all data column subnets not working correctly.

* Fix tests.

* Only publish column sidecars if PeerDAS is activated. Add `PEER_DAS_EPOCH` chain spec serialization.

* Remove unused data column index in `OverflowKey`.

* Fix column sidecars incorrectly produced when there are no blobs.

* Re-instate index to `OverflowKey::DataColumn` and downgrade noisy debug log to `trace`.

* DAS sampling on sync (#5616)

* Data availability sampling on sync

* Address @jimmygchen review

* Trigger sampling

* Address some review comments and only send `SamplingBlock` sync message after PEER_DAS_EPOCH.

---------

Co-authored-by: Jimmy Chen <jchen.tc@gmail.com>

* Merge branch 'unstable' into das

# Conflicts:
#	Cargo.lock
#	Cargo.toml
#	beacon_node/beacon_chain/src/block_verification.rs
#	beacon_node/http_api/src/publish_blocks.rs
#	beacon_node/lighthouse_network/src/rpc/codec/ssz_snappy.rs
#	beacon_node/lighthouse_network/src/rpc/protocol.rs
#	beacon_node/lighthouse_network/src/types/pubsub.rs
#	beacon_node/network/src/sync/block_lookups/single_block_lookup.rs
#	beacon_node/store/src/hot_cold_store.rs
#	consensus/types/src/beacon_state.rs
#	consensus/types/src/chain_spec.rs
#	consensus/types/src/eth_spec.rs

* Merge branch 'unstable' into das

* Re-process early sampling requests (#5569)

* Re-process early sampling requests

# Conflicts:
#	beacon_node/beacon_processor/src/work_reprocessing_queue.rs
#	beacon_node/lighthouse_network/src/rpc/methods.rs
#	beacon_node/network/src/network_beacon_processor/rpc_methods.rs

* Update beacon_node/beacon_processor/src/work_reprocessing_queue.rs

Co-authored-by: Jimmy Chen <jchen.tc@gmail.com>

* Add missing var

* Beta compiler fixes and small typo fixes.

* Remove duplicate method.

---------

Co-authored-by: Jimmy Chen <jchen.tc@gmail.com>

* Merge remote-tracking branch 'sigp/unstable' into das

* Fix merge conflict.

* Add data columns by root to currently supported protocol list (#5678)

* Add data columns by root to currently supported protocol list.

* Add missing data column by roots handling.

* Merge branch 'unstable' into das

# Conflicts:
#	Cargo.lock
#	Cargo.toml
#	beacon_node/network/src/sync/block_lookups/tests.rs
#	beacon_node/network/src/sync/manager.rs

* Fix simulator tests on `das` branch (#5731)

* Bump genesis delay in sim tests as KZG setup takes longer for DAS.

* Fix incorrect YAML spacing.

* DataColumnByRange boilerplate (#5353)

* add boilerplate

* fmt

* PeerDAS custody lookup sync (#5684)

* Implement custody sync

* Lint

* Fix tests

* Fix rebase issue

* Add data column kzg verification and update `c-kzg`. (#5701)

* Add data column kzg verification and update `c-kzg`.

* Fix incorrect `Cell` size.

* Add kzg verification on rpc blocks.

* Add kzg verification on rpc data columns.

* Rename `PEER_DAS_EPOCH` to `EIP7594_FORK_EPOCH` for client interop. (#5750)

* Fetch custody columns in range sync (#5747)

* Fetch custody columns in range sync

* Clean up todos

* Remove `BlobSidecar` construction and publish after PeerDAS activated (#5759)

* Avoid building and publishing blob sidecars after PeerDAS.

* Ignore gossip blobs with a slot greater than peer das activation epoch.

* Only attempt to verify blob count and import blobs before PeerDAS.

* #5684 review comments (#5748)

* #5684 review comments.

* Doc and message update only.

* Fix incorrect condition when constructing `RpcBlock` with `DataColumn`s

* Make sampling tests deterministic (#5775)

* PeerDAS spec tests (#5772)

* Add get_custody_columns spec tests.

* Add kzg merkle proof spec tests.

* Add SSZ spec tests.

* Add remaining KZG tests

* Load KZG only once per process, exclude electra tests and add missing SSZ tests.

* Fix lint and missing changes.

* Ignore macOS generated file.

* Merge remote branch 'sigp/unstable' into das

* Merge remote tracking branch 'origin/unstable' into das

* Implement unconditional reconstruction for supernodes (#5781)

* Implement unconditional reconstruction for supernodes

* Move code into KzgVerifiedCustodyDataColumn

* Remove expect

* Add test

* Thanks justin

* Add withhold attack mode for interop (#5788)

* Add withhold attack mode

* Update readme

* Drop added readmes

* Undo styling changes

* Add column gossip verification and handle unknown parent block (#5783)

* Add column gossip verification and handle missing parent for columns.

* Review PR

* Fix rebase issue

* more lint issues :)

---------

Co-authored-by: dapplion <35266934+dapplion@users.noreply.github.com>

* Trigger sampling on sync events (#5776)

* Trigger sampling on sync events

* Update beacon_chain.rs

* Fix tests

* Fix tests

* PeerDAS parameter changes for devnet-0 (#5779)

* Update PeerDAS parameters to latest values.

* Lint fix

* Fix lint.

* Update hardcoded subnet count to 64 (#5791)

* Fix incorrect columns per subnet and config cleanup (#5792)

* Tidy up PeerDAS preset and config values.

* Fix broken config

* Fix DAS branch CI (#5793)

* Fix invalid syntax.

* Update cli doc. Ignore get_custody_columns test temporarily.

* Fix failing test and add verify inclusion test.

* Undo accidentally removed code.

* Only attempt reconstruct columns once. (#5794)

* Re-enable precompute table for peerdas kzg (#5795)

* Merge branch 'unstable' into das

* Update subscription filter. (#5797)

* Remove penalty for duplicate columns (expected due to reconstruction) (#5798)

* Revert DAS config for interop testing. Optimise get_custody_columns function. (#5799)

* Don't perform reconstruction for proposer node as it already has all the columns. (#5806)

* Multithread compute_cells_and_proofs (#5805)

* Multi-thread reconstruct data columns

* Multi-thread path for block production

* Merge branch 'unstable' into das

# Conflicts:
#	.github/workflows/test-suite.yml
#	beacon_node/network/src/sync/block_lookups/mod.rs
#	beacon_node/network/src/sync/block_lookups/single_block_lookup.rs
#	beacon_node/network/src/sync/network_context.rs

* Fix CI errors.

* Move PeerDAS type-level config to configurable `ChainSpec` (#5828)

* Move PeerDAS type level config to `ChainSpec`.

* Fix tests

* Misc custody lookup improvements (#5821)

* Improve custody requests

* Type DataColumnsByRootRequestId

* Prioritize peers and load balance

* Update tests

* Address PR review

* Merge branch 'unstable' into das

* Rename deploy_block in network config (`das` branch) (#5852)

* Rename deploy_block.txt to deposit_contract_block.txt

* fmt

---------

Co-authored-by: Pawan Dhananjay <pawandhananjay@gmail.com>

* Merge branch 'unstable' into das

* Fix CI and merge issues.

* Merge branch 'unstable' into das

# Conflicts:
#	beacon_node/beacon_chain/src/data_availability_checker/overflow_lru_cache.rs
#	lcli/src/main.rs

* Store data columns individually in store and caches (#5890)

* Store data columns individually in store and caches

* Implement data column pruning

* Merge branch 'unstable' into das

# Conflicts:
#	Cargo.lock

* Update reconstruction benches to newer criterion version. (#5949)

* Merge branch 'unstable' into das

# Conflicts:
#	.github/workflows/test-suite.yml

* chore: add `recover_cells_and_compute_proofs` method (#5938)

* chore: add recover_cells_and_compute_proofs method

* Introduce type alias `CellsAndKzgProofs` to address type complexity.

---------

Co-authored-by: Jimmy Chen <jchen.tc@gmail.com>

* Update `csc` format in ENR and spec tests for devnet-1 (#5966)

* Update `csc` format in ENR.

* Add spec tests for `recover_cells_and_kzg_proofs`.

* Add tests for ENR.

* Fix failing tests.

* Add protection against invalid csc value in ENR.

* Fix lint

* Fix csc encoding and decoding (#5997)

* Fix data column rpc request not being sent due to incorrect limits set. (#6000)

* Fix incorrect inbound request count causing rate limiting. (#6025)

* Merge branch 'stable' into das

# Conflicts:
#	beacon_node/network/src/sync/block_lookups/tests.rs
#	beacon_node/network/src/sync/block_sidecar_coupling.rs
#	beacon_node/network/src/sync/manager.rs
#	beacon_node/network/src/sync/network_context.rs
#	beacon_node/network/src/sync/network_context/requests.rs

* Merge remote-tracking branch 'unstable' into das

* Add kurtosis config for DAS testing (#5968)

* Add kurtosis config for DAS testing.

* Fix invalid yaml file

* Update network parameter files.

* chore: add rust PeerdasKZG crypto library for peerdas functionality and rollback c-kzg dependency to 4844 version (#5941)

* chore: add recover_cells_and_compute_proofs method

* chore: add rust peerdas crypto library

* chore: integrate peerdaskzg rust library into kzg crate

* chore(multi):

- update `ssz_cell_to_crypto_cell`
- update conversion from the crypto cell type to a Vec<u8>. Since the Rust library defines them as references to an array, the conversion is simply `to_vec`

* chore(multi):

- update rest of code to handle the new crypto `Cell` type
- update test case code to no longer use the Box type

* chore: cleanup of superfluous conversions

* chore: revert c-kzg dependency back to v1

* chore: move dependency into correct order

* chore: update rust dependency

- This version includes a new method `PeerDasContext::with_num_threads`

* chore: remove Default initialization of PeerDasContext and explicitly set the parameters in `new_from_trusted_setup`

* chore: cleanup exports

* chore: commit updated cargo.lock

* Update Cargo.toml

Co-authored-by: Jimmy Chen <jchen.tc@gmail.com>

* chore: rename dependency

* chore: update peerdas lib

- sets the blst version to 0.3 so that it matches whatever lighthouse is using. Although 0.3.12 is latest, lighthouse is pinned to 0.3.3

* chore: fix clippy lifetime

- Rust doesn't allow you to elide the lifetime on type aliases

* chore: cargo clippy fix

* chore: cargo fmt

* chore: update lib to add redundant checks (these will be removed in consensus-specs PR 3819)

* chore: update dependency to ignore proofs

* chore: update peerdas lib to latest

* update lib

* chore: remove empty proof parameter

---------

Co-authored-by: Jimmy Chen <jchen.tc@gmail.com>

* Update PeerDAS interop testnet config (#6069)

* Update interop testnet config.

* Fix typo and remove target peers

* Avoid retrying same sampling peer that previously failed. (#6084)

* Various fixes to custody range sync  (#6004)

* Only start requesting batches when there are good peers across all custody columns to avoid spaming block requests.

* Add custody peer check before mutating `BatchInfo` to avoid inconsistent state.

* Add check to cover a case where batch is not processed while waiting for custody peers to become available.

* Fix lint and logic error

* Fix `good_peers_on_subnet` always returning false for `DataColumnSubnet`.

* Add test for `get_custody_peers_for_column`

* Revert epoch parameter refactor.

* Fall back to default custody requiremnt if peer ENR is not present.

* Add metrics and update code comment.

* Add more debug logs.

* Use subscribed peers on subnet before MetaDataV3 is implemented. Remove peer_id matching when injecting error because multiple peers are used for range requests. Use randomized custodial peer to avoid repeatedly sending requests to failing peers. Batch by range request where possible.

* Remove unused code and update docs.

* Add comment

* chore: update peerdas-kzg library (#6118)

* chore: update peerDAS lib

* chore: update library

* chore: update library to version that include "init context" benchmarks and optional validation checks

* chore: (can remove) -- Add benchmarks for init context

* Prevent continuous searchers for low-peer networks (#6162)

* Merge branch 'unstable' into das

* Fix merge conflicts

* Add cli flag to enable sampling and disable by default. (#6209)

* chore: Use reference to an array representing a blob instead of an owned KzgBlob (#6179)

* add KzgBlobRef type

* modify code to use KzgBlobRef

* clippy

* Remove Deneb blob related changes to maintain compatibility with `c-kzg-4844`.

---------

Co-authored-by: Jimmy Chen <jchen.tc@gmail.com>

* Store computed custody subnets in PeerDB and fix custody lookup test (#6218)

* Fix failing custody lookup tests.

* Store custody subnets in PeerDB, fix custody lookup test and refactor some methods.

* Merge branch 'unstable' into das

# Conflicts:
#	beacon_node/beacon_chain/src/beacon_chain.rs
#	beacon_node/beacon_chain/src/block_verification_types.rs
#	beacon_node/beacon_chain/src/builder.rs
#	beacon_node/beacon_chain/src/data_availability_checker.rs
#	beacon_node/beacon_chain/src/data_availability_checker/overflow_lru_cache.rs
#	beacon_node/beacon_chain/src/data_column_verification.rs
#	beacon_node/beacon_chain/src/early_attester_cache.rs
#	beacon_node/beacon_chain/src/historical_blocks.rs
#	beacon_node/beacon_chain/tests/store_tests.rs
#	beacon_node/lighthouse_network/src/discovery/enr.rs
#	beacon_node/network/src/service.rs
#	beacon_node/src/cli.rs
#	beacon_node/store/src/hot_cold_store.rs
#	beacon_node/store/src/lib.rs
#	lcli/src/generate_bootnode_enr.rs

* Fix CI failures after merge.

* Batch sampling requests by peer (#6256)

* Batch sampling requests by peer

* Fix clippy errors

* Fix tests

* Add column_index to error message for ease of tracing

* Remove outdated comment

* Fix range sync never evaluating request as finished, causing it to get stuck. (#6276)

* Merge branch 'unstable' into das-0821-merge

# Conflicts:
#	Cargo.lock
#	Cargo.toml
#	beacon_node/beacon_chain/src/beacon_chain.rs
#	beacon_node/beacon_chain/src/data_availability_checker.rs
#	beacon_node/beacon_chain/src/data_availability_checker/overflow_lru_cache.rs
#	beacon_node/beacon_chain/src/data_column_verification.rs
#	beacon_node/beacon_chain/src/kzg_utils.rs
#	beacon_node/beacon_chain/src/metrics.rs
#	beacon_node/beacon_processor/src/lib.rs
#	beacon_node/lighthouse_network/src/rpc/codec/ssz_snappy.rs
#	beacon_node/lighthouse_network/src/rpc/config.rs
#	beacon_node/lighthouse_network/src/rpc/methods.rs
#	beacon_node/lighthouse_network/src/rpc/outbound.rs
#	beacon_node/lighthouse_network/src/rpc/rate_limiter.rs
#	beacon_node/lighthouse_network/src/service/api_types.rs
#	beacon_node/lighthouse_network/src/types/globals.rs
#	beacon_node/network/src/network_beacon_processor/mod.rs
#	beacon_node/network/src/network_beacon_processor/rpc_methods.rs
#	beacon_node/network/src/network_beacon_processor/sync_methods.rs
#	beacon_node/network/src/sync/block_lookups/common.rs
#	beacon_node/network/src/sync/block_lookups/mod.rs
#	beacon_node/network/src/sync/block_lookups/single_block_lookup.rs
#	beacon_node/network/src/sync/block_lookups/tests.rs
#	beacon_node/network/src/sync/manager.rs
#	beacon_node/network/src/sync/network_context.rs
#	consensus/types/src/data_column_sidecar.rs
#	crypto/kzg/Cargo.toml
#	crypto/kzg/benches/benchmark.rs
#	crypto/kzg/src/lib.rs

* Fix custody tests and load PeerDAS KZG instead.

* Fix ef tests and bench compilation.

* Fix failing sampling test.

* Merge pull request #6287 from jimmygchen/das-0821-merge

Merge `unstable` into `das` 20240821

* Remove get_block_import_status

* Merge branch 'unstable' into das

* Re-enable Windows release tests.

* Address some review comments.

* Address more review comments and cleanups.

* Comment out peer DAS KZG EF tests for now

* Address more review comments and fix build.

* Merge branch 'das' of github.com:sigp/lighthouse into das

* Unignore Electra tests

* Fix metric name

* Address some of Pawan's review comments

* Merge remote-tracking branch 'origin/unstable' into das

* Update PeerDAS network parameters for peerdas-devnet-2  (#6290)

* update subnet count & custody req

* das network params

* update ef tests

---------

Co-authored-by: Jimmy Chen <jchen.tc@gmail.com>
2024-08-27 04:10:22 +00:00
Lion - dapplion
9fc0a662c3 Add sync lookup custody request state (#6257)
* Add sync lookup custody request state

* Review PR

* clippy

* Merge branch 'unstable' of https://github.com/sigp/lighthouse into peerdas-network-lookup
2024-08-15 16:26:39 +00:00
Mac L
cc55e610b9 Rust 1.80.0 lints (#6183)
* Fix lints
2024-07-25 15:56:22 +00:00
Lion - dapplion
0c0b56d9e8 Bound max count of lookups (#6015)
* Bound max count of lookups

* Move up
2024-07-12 11:55:24 +00:00
João Oliveira
a59a61fef9 Remove generic Id param from RequestId (#6032)
* rename RequestId's for better context,

and move them to lighthouse_network crate.

* remove unrequired generic AppReqId from RequestID
2024-07-08 23:56:14 +00:00
Michael Sproul
a64cee3af1 Merge remote-tracking branch 'origin/stable' into unstable 2024-06-28 14:07:00 +10:00
Lion - dapplion
f14f21f37b Bound lookup parent chain length with tip extension (#5705)
* Bound lookup parent chain length with tip extension

* Add test
2024-06-27 18:21:37 +00:00
Lion - dapplion
b38019cb10 Attempt to continue lookups after adding peers (#5993)
* Attempt to continue lookups after adding peers
2024-06-26 23:53:55 +00:00
Pawan Dhananjay
bf4cbd3b0a Remove all batches related to a peer on disconnect (#5969)
* Remove all batches related to a peer on disconnect

* Cleanup map entries after disconnect

* Allow lookups to continue in case of disconnections

* Pretty response types

* fmt

* Fix lints

* Remove lookup if it cannot progress

* Fix tests

* Remove poll_close on rpc behaviour

* Remove redundant test

* Fix issue raised by lion

* Revert pretty response types

* Cleanup

* Fix test

* Merge remote-tracking branch 'origin/release-v5.2.1' into rpc-error-on-disconnect-revert

* Apply suggestions from joao

Co-authored-by: João Oliveira <hello@jxs.pt>

* Fix log

* update request status on no peers found

* Do not remove lookup after peer disconnection

* Add comments about expected event api

* Update single_block_lookup.rs

* Update mod.rs

* Merge branch 'rpc-error-on-disconnect-revert' into 5969-review

* Merge pull request #10 from dapplion/5969-review

Add comments about expected event api
2024-06-26 23:53:53 +00:00
Lion - dapplion
17dc978760 Add peers to parent lookups (#5858)
* Add peers to parent lookups

* Add test

* lint

* Do not attempt to continue

* add_peers_to_lookup_and_ancestors can't drop
2024-05-30 17:07:32 +00:00
Lion - dapplion
f187ad8bb4 Do not drop lookups without peers while awaiting events (#5839)
* Do not drop lookups without peers while awaiting events
2024-05-24 21:28:37 +00:00
Lion - dapplion
f1f1250784 Do not double log uknown lookup (#5837)
* Do not double log uknown lookup

* Fix log text, not necessarily a child lookup
2024-05-24 19:45:01 +00:00
Lion - dapplion
17d9086df3 Drop stuck lookups (#5824)
* Drop stuck lookups
2024-05-23 12:46:05 +00:00
Lion - dapplion
2a87016d94 Fix lookup disconnect peer (#5815)
* Test lookup peer disconnect modes

* Fix lookup peer disconnected return early
2024-05-20 18:27:57 +00:00
Lion - dapplion
8006418d80 Type sync network context send errors (#5808)
* Type sync network context send errors

* Consisntent naming
2024-05-17 11:34:21 +00:00
Lion - dapplion
319b4a2467 Skip creating child lookup if parent is never created (#5803)
* Skip creating child lookup if parent is never created
2024-05-17 10:58:27 +00:00
Lion - dapplion
6f45ad4534 Log stuck lookups (#5778)
* Log stuck lookups every interval

* Implement debug manually

* Add comment

* Do not print peers twice

* Add SYNC_LOOKUPS_STUCK metric

* Skip logging request root

* use derivative

* Merge branch 'unstable' of https://github.com/sigp/lighthouse into log-stuck-lookups

* add req id to debug

* Merge remote-tracking branch 'sigp/unstable' into log-stuck-lookups

* Fix conflict with unstable
2024-05-14 17:34:26 +00:00
Lion - dapplion
683d9df63b Don't request block components until having block (#5774)
* Don't request block components until having block

* Update tests

* Resolve todo comment

* Merge branch 'unstable' into request-blocks-first
2024-05-14 14:50:38 +00:00
Lion - dapplion
ce66ab374e Enforce sync lookup receives a single result (#5777)
* Enforce sync lookup receives a single result
2024-05-14 10:12:48 +00:00
Lion - dapplion
f37ffe4b8d Do not request current child lookup peers (#5724)
* Do not request current child lookup peers

* Update tests
2024-05-13 15:13:32 +00:00
Lion - dapplion
93e0649abc Notify lookup sync of gossip processing results (#5722)
* Notify lookup sync of gossip processing results

* Add tests

* Add GossipBlockProcessResult event

* Re-add dropped comments

* Update beacon_node/network/src/network_beacon_processor/sync_methods.rs

* update test_lookup_disconnection_peer_left
2024-05-13 11:41:29 +00:00
Lion - dapplion
b87c36ac0e Report RPC Errors to the application on peer disconnections (#5680)
* Report RPC Errors to the application on peer disconnections

Co-authored-by: Age Manning <Age@AgeManning.com>

* Expect RPCError::Disconnect to fail ongoing requests

* Drop lookups after peer disconnect and not awaiting events

* Allow RPCError disconnect through network service

* Update beacon_node/lighthouse_network/src/service/mod.rs

Co-authored-by: Age Manning <Age@AgeManning.com>

* Merge branch 'unstable' into rpc-error-on-disconnect
2024-05-06 17:18:47 +00:00
Lion - dapplion
436d54e4bf Consistent logging of full root in lookup debug logs (#5700)
* Consistent logging of full root in lookup debug logs

* Tag sync log with service

* More logs

* Log when new peers are added

* Don't shortcircuit add_peer
2024-05-06 15:41:38 +00:00
Lion - dapplion
01e4e3527e Check da_checker before doing a block lookup request (#5681)
* Check da_checker before doing a block lookup request

* Ensure consistent handling of lookup result

* use req resp pre import cache rather than da checker
2024-05-01 14:45:36 +00:00
Lion - dapplion
ce66582c16 Merge parent and current sync lookups (#5655)
* Drop lookup type trait for a simple arg

* Drop reconstructed for processing

* Send parent blocks one by one

* Merge current and parent lookups

* Merge current and parent lookups clean up todos

* Merge current and parent lookups tests

* Merge remote-tracking branch 'origin/unstable' into sync-merged-lookup

* Merge branch 'unstable' of https://github.com/sigp/lighthouse into sync-merged-lookup

* fix compile after merge

* #5655 pr review (#26)

* fix compile after merge

* remove todos, fix typos etc

* fix compile

* stable rng

* delete TODO and unfilled out test

* make download result a struct

* enums instead of bools as params

* fix comment

* Various fixes

* Track ignored child components

* Track dropped lookup reason as metric

* fix test

* add comment describing behavior of avail check error

*  update ordering
2024-04-30 20:12:15 +00:00
Lion - dapplion
c38b05d640 Drop lookup type trait for a simple arg (#5620)
* Drop lookup type trait for a simple arg
2024-04-24 06:02:45 +00:00
Lion - dapplion
f7aca97a55 Handle sync lookup request streams in network context (#5583)
* by-root-stream-terminator

* Fix tests

* Resolve merge conflicts

* Log report reason

* Some lints and bugfixes (#23)

* fix lints

* bug fixes

* Fix tests

* Merge branch 'unstable' of https://github.com/sigp/lighthouse into handle-sync-lookup-requests

* Pr 5583 review (#24)

* add bad state warn log

* add rust docs to new fields in `SyncNetworkContext`

* remove timestamp todo

* add back lookup verify error

* remove TODOs
2024-04-22 16:06:39 +00:00
Lion - dapplion
e5b8d1237d Use Action in single_block_component_processed (#5564)
* Use Action in single_block_component_processed

* Merge branch 'unstable' of https://github.com/sigp/lighthouse into single_block_component_processed-action

* fix compile after merge

* add continue action for when we are awaiting other parts of missing components
2024-04-16 12:59:53 +00:00
Lion - dapplion
fee2ee9c08 Make SingleLookupRequestState fields private (#5563)
* Make SingleLookupRequestState fields private

* Fix tests
2024-04-15 13:47:08 +00:00
Lion - dapplion
34dbb32610 Nest lookup type into request id SingleBlock and SingleBlob (#5562)
* Nest lookup type into block lookup RequestId
2024-04-11 16:32:06 +00:00
Lion - dapplion
b1f9751a69 Event-based block lookup tests (#5534)
* WIP

* Initial working version of new sync tests.

* Remove sync traits and fix lints.

* Reduce internal method visibility and make test method instead. Remove extra beacon chain harness instance created in tests.

* Improve `SyncTester` api.

* Fix lint.

* Test example

* Lookup tests using rig

* Tests should interface with events only

* lint

* Skip deneb test pre-deneb

* Add more assertions

* Remove logging changes

* Address @jimmygchen comments

* Merge branch 'unstable' of https://github.com/sigp/lighthouse into bn-p2p-tests

* remove unused assertions

* fix lint
2024-04-10 12:05:18 +00:00
Lion - dapplion
b74da14261 Improve block lookup logging (#5535)
* Improve block lookup logging
2024-04-09 14:25:18 +00:00
Lion - dapplion
053525e281 Remove DataAvailabilityView trait from ChildComponents (#5421)
* Remove DataAvailabilityView trait from ChildComponents

* PR reviews

* Update beacon_node/network/src/sync/block_lookups/common.rs

Co-authored-by: realbigsean <seananderson33@GMAIL.com>

* Merge branch 'unstable' of https://github.com/sigp/lighthouse into child_components_independent
2024-04-04 18:49:09 +00:00
Mac L
969d12dc6f Use E for EthSpec globally (#5264)
* Use `E` for `EthSpec` globally

* Fix tests

* Merge branch 'unstable' into e-ethspec

* Merge branch 'unstable' into e-ethspec

# Conflicts:
#	beacon_node/execution_layer/src/engine_api.rs
#	beacon_node/execution_layer/src/engine_api/http.rs
#	beacon_node/execution_layer/src/engine_api/json_structures.rs
#	beacon_node/execution_layer/src/test_utils/handle_rpc.rs
#	beacon_node/store/src/partial_beacon_state.rs
#	consensus/types/src/beacon_block.rs
#	consensus/types/src/beacon_block_body.rs
#	consensus/types/src/beacon_state.rs
#	consensus/types/src/config_and_preset.rs
#	consensus/types/src/execution_payload.rs
#	consensus/types/src/execution_payload_header.rs
#	consensus/types/src/light_client_optimistic_update.rs
#	consensus/types/src/payload.rs
#	lcli/src/parse_ssz.rs
2024-04-02 15:12:25 +00:00
realbigsean
9d24844f50 Lookup log improvements (#5491)
* log improvements
2024-03-27 13:24:04 +00:00
realbigsean
334aa2eabd Single lookup improvements (#5488)
* Fix unexpected `UnrequestedBlobId` and `ExtraBlocksReturned` errors due to race conditions.

* Continue chain segment processing and skip any blocks that are already known, rather than returning an error.

* more de-dup checking

* ensure we don't reset `requested_ids` during rpc download

* better fix

* Merge branch 'unstable' of https://github.com/sigp/lighthouse into more-dup-lookup-fixes

* remove chain hash check

* Merge branch 'fix-block-lookup-race' of https://github.com/jimmygchen/lighthouse into sean-test-lookups

* remove block check

* add back tests

* Log and CI fixes

* undue extra check

* Merge branch 'sean-test-lookups' of https://github.com/realbigsean/lighthouse into sean-test-lookups

* log improvements

* Improve logging
2024-03-27 10:01:27 +00:00