I feel it's preferable to do this explicitly by updating the revision on `Cargo.toml` rather than implicitly by letting `Cargo.lock` control the revision of the branch.
We forked `gossipsub` into the lighthouse repo sometime ago so that we could iterate quicker on implementing back pressure and IDONTWANT.
Meanwhile we have pushed all our changes upstream and we are now the main maintainers of `rust-libp2p` this allows us to use upstream `gossipsub` again.
Nonetheless we still use our forked repo to give us freedom to experiment with features before submitting them upstream
Currently we have very poor coverage of range sync with unit tests. With the event driven test framework we could cover much more ground and be confident when modifying the code.
Add two basic cases:
- Happy path, complete a finalized sync for 2 epochs
- Post-PeerDAS case where we start without enough custody peers and later we find enough
⚠️ If you have ideas for more test cases, please let me know! I'll write them
* Add test for ActiveSamplingRequest
* Fix the column_indexes field from the requested ones to the responded ones
* Fix clippy errors
* Move tests to tests.rs
* Fix unused import
* Fix clippy error
* Merge branch 'unstable' into fork/add-test-for-active-sampling-request
# Conflicts:
# beacon_node/network/Cargo.toml
# beacon_node/network/src/sync/sampling.rs
* Merge branch 'unstable' into fork/add-test-for-active-sampling-request
* WIP
* Initial working version of new sync tests.
* Remove sync traits and fix lints.
* Reduce internal method visibility and make test method instead. Remove extra beacon chain harness instance created in tests.
* Improve `SyncTester` api.
* Fix lint.
* Test example
* Lookup tests using rig
* Tests should interface with events only
* lint
* Skip deneb test pre-deneb
* Add more assertions
* Remove logging changes
* Address @jimmygchen comments
* Merge branch 'unstable' of https://github.com/sigp/lighthouse into bn-p2p-tests
* remove unused assertions
* fix lint
* move gossipsub into a separate crate
* Merge branch 'unstable' of github.com:sigp/lighthouse into separate-gossipsub
* address review 2
* clippy beta
* update logging to log gossipsub logs
* remove exit-future usage,
as it is non maintained, and replace with async-channel which is already in the repo.
* Merge branch 'unstable' of https://github.com/sigp/lighthouse into remove-exit-future
* Merge branch 'unstable' of https://github.com/sigp/lighthouse into remove-exit-future
## Issue Addressed
Addresses the recent CI failures caused by caching `blst` for the wrong CPU type.
## Proposed Changes
- Use `FEATURES: jemalloc,portable` when building Lighthouse & `lcli` in tests
- Add a new `TEST_FEATURES` and set to `portable` for all CI test jobs.
- Updated Makefiles to read the `TEST_FEATURES` environment variable, and default to none.
* add processing and processed caching to the DA checker
* move processing cache out of critical cache
* get it compiling
* fix lints
* add docs to `AvailabilityView`
* some self review
* fix lints
* fix beacon chain tests
* cargo fmt
* make availability view easier to implement, start on testing
* move child component cache and finish test
* cargo fix
* cargo fix
* cargo fix
* fmt and lint
* make blob commitments not optional, rename some caches, add missing blobs struct
* Update beacon_node/beacon_chain/src/data_availability_checker/processing_cache.rs
Co-authored-by: ethDreamer <37123614+ethDreamer@users.noreply.github.com>
* marks review feedback and other general cleanup
* cargo fix
* improve availability view docs
* some renames
* some renames and docs
* fix should delay lookup logic
* get rid of some wrapper methods
* fix up single lookup changes
* add a couple docs
* add single blob merge method and improve process_... docs
* update some names
* lints
* fix merge
* remove blob indices from lookup creation log
* remove blob indices from lookup creation log
* delayed lookup logging improvement
* check fork choice before doing any blob processing
* remove unused dep
* Update beacon_node/beacon_chain/src/data_availability_checker/availability_view.rs
Co-authored-by: Michael Sproul <micsproul@gmail.com>
* Update beacon_node/beacon_chain/src/data_availability_checker/availability_view.rs
Co-authored-by: Michael Sproul <micsproul@gmail.com>
* Update beacon_node/beacon_chain/src/data_availability_checker/availability_view.rs
Co-authored-by: Michael Sproul <micsproul@gmail.com>
* Update beacon_node/beacon_chain/src/data_availability_checker/availability_view.rs
Co-authored-by: Michael Sproul <micsproul@gmail.com>
* Update beacon_node/network/src/sync/block_lookups/delayed_lookup.rs
Co-authored-by: Michael Sproul <micsproul@gmail.com>
* remove duplicate deps
* use gen range in random blobs geneartor
* rename processing cache fields
* require block root in rpc block construction and check block root consistency
* send peers as vec in single message
* spawn delayed lookup service from network beacon processor
* fix tests
---------
Co-authored-by: ethDreamer <37123614+ethDreamer@users.noreply.github.com>
Co-authored-by: Michael Sproul <micsproul@gmail.com>
## Issue Addressed
Synchronize dependencies and edition on the workspace `Cargo.toml`
## Proposed Changes
with https://github.com/rust-lang/cargo/issues/8415 merged it's now possible to synchronize details on the workspace `Cargo.toml` like the metadata and dependencies.
By only having dependencies that are shared between multiple crates aligned on the workspace `Cargo.toml` it's easier to not miss duplicate versions of the same dependency and therefore ease on the compile times.
## Additional Info
this PR also removes the no longer required direct dependency of the `serde_derive` crate.
should be reviewed after https://github.com/sigp/lighthouse/pull/4639 get's merged.
closes https://github.com/sigp/lighthouse/issues/4651
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
Co-authored-by: Michael Sproul <micsproul@gmail.com>
## Issue Addressed
Fixes a bug in the handling of `--beacon-process-max-workers` which caused it to have no effect.
## Proposed Changes
For this PR I channeled @ethDreamer and saw deep into the faulty CLI config -- this bug is almost identical to the one Mark found and fixed in #4622.
* remove protoc and token from network tests github action
* delete unused beacon chain methods
* downgrade writing blobs to store log
* reduce diff in block import logic
* remove some todo's and deneb built in network
* remove unnecessary error, actually use some added metrics
* remove some metrics, fix missing components on publish funcitonality
* fix status tests
* rename sidecar by root to blobs by root
* clean up some metrics
* remove unnecessary feature gate from attestation subnet tests, clean up blobs by range response code
* pawan's suggestion in `protocol_info`, peer score in matching up batch sync block and blobs
* fix range tests for deneb
* pub block and blob db cache behind the same mutex
* remove unused errs and an empty file
* move sidecar trait to new file
* move types from payload to eth2 crate
* update comment and add flag value name
* make function private again, remove allow unused
* use reth rlp for tx decoding
* fix compile after merge
* rename kzg commitments
* cargo fmt
* remove unused dep
* Update beacon_node/execution_layer/src/lib.rs
Co-authored-by: Pawan Dhananjay <pawandhananjay@gmail.com>
* Update beacon_node/beacon_processor/src/lib.rs
Co-authored-by: Pawan Dhananjay <pawandhananjay@gmail.com>
* pawan's suggestiong for vec capacity
* cargo fmt
* Revert "use reth rlp for tx decoding"
This reverts commit 5181837d81.
* remove reth rlp
---------
Co-authored-by: Pawan Dhananjay <pawandhananjay@gmail.com>
Often when testing I have to create a hack which is annoying to maintain.
I think it might be handy to add a custom compile-time flag that developers can use if they want to test things locally without having to backfill a bunch of blocks.
There is probably an argument to have a feature called "backfill" which is enabled by default and can be disabled. I didn't go this route because I think it's counter-intuitive to have a feature that enables a core and necessary behaviour.
*Replaces #4434. It is identical, but this PR has a smaller diff due to a curated commit history.*
## Issue Addressed
NA
## Proposed Changes
This PR moves the scheduling logic for the `BeaconProcessor` into a new crate in `beacon_node/beacon_processor`. Previously it existed in the `beacon_node/network` crate.
This addresses a circular-dependency problem where it's not possible to use the `BeaconProcessor` from the `beacon_chain` crate. The `network` crate depends on the `beacon_chain` crate (`network -> beacon_chain`), but importing the `BeaconProcessor` into the `beacon_chain` crate would create a circular dependancy of `beacon_chain -> network`.
The `BeaconProcessor` was designed to provide queuing and prioritized scheduling for messages from the network. It has proven to be quite valuable and I believe we'd make Lighthouse more stable and effective by using it elsewhere. In particular, I think we should use the `BeaconProcessor` for:
1. HTTP API requests.
1. Scheduled tasks in the `BeaconChain` (e.g., state advance).
Using the `BeaconProcessor` for these tasks would help prevent the BN from becoming overwhelmed and would also help it to prioritize operations (e.g., choosing to process blocks from gossip before responding to low-priority HTTP API requests).
## Additional Info
This PR is intended to have zero impact on runtime behaviour. It aims to simply separate the *scheduling* code (i.e., the `BeaconProcessor`) from the *business logic* in the `network` crate (i.e., the `Worker` impls). Future PRs (see #4462) can build upon these works to actually use the `BeaconProcessor` for more operations.
I've gone to some effort to use `git mv` to make the diff look more like "file was moved and modified" rather than "file was deleted and a new one added". This should reduce review burden and help maintain commit attribution.
## Issue Addressed
Resolves#3238
## Proposed Changes
Please list or describe the changes introduced by this PR.
## Additional Info
Please provide any additional information. For example, future considerations
or information useful for reviewers.
* some blob reprocessing work
* remove ForceBlockLookup
* reorder enum match arms in sync manager
* a lot more reprocessing work
* impl logic for triggerng blob lookups along with block lookups
* deal with rpc blobs in groups per block in the da checker. don't cache missing blob ids in the da checker.
* make single block lookup generic
* more work
* add delayed processing logic and combine some requests
* start fixing some compile errors
* fix compilation in main block lookup mod
* much work
* get things compiling
* parent blob lookups
* fix compile
* revert red/stevie changes
* fix up sync manager delay message logic
* add peer usefulness enum
* should remove lookup refactor
* consolidate retry error handling
* improve peer scoring during certain failures in parent lookups
* improve retry code
* drop parent lookup if either req has a peer disconnect during download
* refactor single block processed method
* processing peer refactor
* smol bugfix
* fix some todos
* fix lints
* fix lints
* fix compile in lookup tests
* fix lints
* fix lints
* fix existing block lookup tests
* renamings
* fix after merge
* cargo fmt
* compilation fix in beacon chain tests
* fix
* refactor lookup tests to work with multiple forks and response types
* make tests into macros
* wrap availability check error
* fix compile after merge
* add random blobs
* start fixing up lookup verify error handling
* some bug fixes and the start of deneb only tests
* make tests work for all forks
* track information about peer source
* error refactoring
* improve peer scoring
* fix test compilation
* make sure blobs are sent for processing after stream termination, delete copied tests
* add some tests and fix a bug
* smol bugfixes and moar tests
* add tests and fix some things
* compile after merge
* lots of refactoring
* retry on invalid block/blob
* merge unknown parent messages before current slot lookup
* get tests compiling
* penalize blob peer on invalid blobs
* Check disk on in-memory cache miss
* Update beacon_node/beacon_chain/src/data_availability_checker/overflow_lru_cache.rs
* Update beacon_node/network/src/sync/network_context.rs
Co-authored-by: Divma <26765164+divagant-martian@users.noreply.github.com>
* fix bug in matching blocks and blobs in range sync
* pr feedback
* fix conflicts
* upgrade logs from warn to crit when we receive incorrect response in range
* synced_and_connected_within_tolerance -> should_search_for_block
* remove todo
* Fix Broken Overflow Tests
* fix merge conflicts
* checkpoint sync without alignment
* add import
* query for checkpoint state by slot rather than state root (teku doesn't serve by state root)
* get state first and query by most recent block root
* simplify delay logic
* rename unknown parent sync message variants
* rename parameter, block_slot -> slot
* add some docs to the lookup module
* use interval instead of sleep
* drop request if blocks and blobs requests both return `None` for `Id`
* clean up `find_single_lookup` logic
* add lookup source enum
* clean up `find_single_lookup` logic
* add docs to find_single_lookup_request
* move LookupSource our of param where unnecessary
* remove unnecessary todo
* query for block by `state.latest_block_header.slot`
* fix lint
* fix test
* fix test
* fix observed blob sidecars test
* PR updates
* use optional params instead of a closure
* create lookup and trigger request in separate method calls
* remove `LookupSource`
* make sure duplicate lookups are not dropped
---------
Co-authored-by: Pawan Dhananjay <pawandhananjay@gmail.com>
Co-authored-by: Mark Mackey <mark@sigmaprime.io>
Co-authored-by: Divma <26765164+divagant-martian@users.noreply.github.com>
This PR address the following spec change: https://github.com/ethereum/consensus-specs/pull/3312
Instead of subscribing to a long-lived subnet for every attached validator to a beacon node, all beacon nodes will subscribe to `SUBNETS_PER_NODE` long-lived subnets. This is currently set to 2 for mainnet.
This PR does not include any scoring or advanced discovery mechanisms. A future PR will improve discovery and we can implement scoring after the next hard fork when we expect all client teams and all implementations to respect this spec change.
This will be a significant change in the subnet network structure for consensus clients and we will likely have to monitor and tweak our peer management logic.
There is a race condition which occurs when multiple discovery queries return at almost the exact same time and they independently contain a useful peer we would like to connect to.
The condition can occur that we can add the same peer to the dial queue, before we get a chance to process the queue.
This ends up displaying an error to the user:
```
ERRO Dialing an already dialing peer
```
Although this error is harmless it's not ideal.
There are two solutions to resolving this:
1. As we decide to dial the peer, we change the state in the peer-db to dialing (before we add it to the queue) which would prevent other requests from adding to the queue.
2. We prevent duplicates in the dial queue
This PR has opted for 2. because 1. will complicate the code in that we are changing states in non-intuitive places. Although this technically adds a very slight performance cost, its probably a cleaner solution as we can keep the state-changing logic in one place.
## Issue Addressed
NA
## Proposed Changes
Our `ERRO` stream has been rather noisy since the merge due to some unexpected behaviours of builders and EEs. Now that we've been running post-merge for a while, I think we can drop some of these `ERRO` to `WARN` so we're not "crying wolf".
The modified logs are:
#### `ERRO Execution engine call failed`
I'm seeing this quite frequently on Geth nodes. They seem to timeout when they're busy and it rarely indicates a serious issue. We also have logging across block import, fork choice updating and payload production that raise `ERRO` or `CRIT` when the EE times out, so I think we're not at risk of silencing actual issues.
#### `ERRO "Builder failed to reveal payload"`
In #3775 we reduced this log from `CRIT` to `ERRO` since it's common for builders to fail to reveal the block to the producer directly whilst still broadcasting it to the networ. I think it's worth dropping this to `WARN` since it's rarely interesting.
I elected to stay with `WARN` since I really do wish builders would fulfill their API promises by returning the block to us. Perhaps I'm just being pedantic here, I could be convinced otherwise.
#### `ERRO "Relay error when registering validator(s)"`
It seems like builders and/or mev-boost struggle to handle heavy loads of validator registrations. I haven't observed issues with validators not actually being registered, but I see timeouts on these endpoints many times a day. It doesn't seem like this `ERRO` is worth it.
#### `ERRO Error fetching block for peer ExecutionLayerErrorPayloadReconstruction`
This means we failed to respond to a peer on the P2P network with a block they requested because of an error in the `execution_layer`. It's very common to see timeouts or incomplete responses on this endpoint whilst the EE is busy and I don't think it's important enough for an `ERRO`. As long as the peer count stays high, I don't think the user needs to be actively concerned about how we're responding to peers.
## Additional Info
NA
* Add first efforts at broadcast
* Tidy
* Move broadcast code to client
* Progress with broadcast impl
* Rename to address change
* Fix compile errors
* Use `while` loop
* Tidy
* Flip broadcast condition
* Switch to forgetting individual indices
* Always broadcast when the node starts
* Refactor into two functions
* Add testing
* Add another test
* Tidy, add more testing
* Tidy
* Add test, rename enum
* Rename enum again
* Tidy
* Break loop early
* Add V15 schema migration
* Bump schema version
* Progress with migration
* Update beacon_node/client/src/address_change_broadcast.rs
Co-authored-by: Michael Sproul <micsproul@gmail.com>
* Fix typo in function name
---------
Co-authored-by: Michael Sproul <micsproul@gmail.com>
## Issue Addressed
Recent discussions with other client devs about optimistic sync have revealed a conceptual issue with the optimisation implemented in #3738. In designing that feature I failed to consider that the execution node checks the `blockHash` of the execution payload before responding with `SYNCING`, and that omitting this check entirely results in a degradation of the full node's validation. A node omitting the `blockHash` checks could be tricked by a supermajority of validators into following an invalid chain, something which is ordinarily impossible.
## Proposed Changes
I've added verification of the `payload.block_hash` in Lighthouse. In case of failure we log a warning and fall back to verifying the payload with the execution client.
I've used our existing dependency on `ethers_core` for RLP support, and a new dependency on Parity's `triehash` crate for the Merkle patricia trie. Although the `triehash` crate is currently unmaintained it seems like our best option at the moment (it is also used by Reth, and requires vastly less boilerplate than Parity's generic `trie-root` library).
Block hash verification is pretty quick, about 500us per block on my machine (mainnet).
The optimistic finalized sync feature can be disabled using `--disable-optimistic-finalized-sync` which forces full verification with the EL.
## Additional Info
This PR also introduces a new dependency on our [`metastruct`](https://github.com/sigp/metastruct) library, which was perfectly suited to the RLP serialization method. There will likely be changes as `metastruct` grows, but I think this is a good way to start dogfooding it.
I took inspiration from some Parity and Reth code while writing this, and have preserved the relevant license headers on the files containing code that was copied and modified.