Which issue # does this PR address?
None
All of these are performing a check, and adding a batch, or creating a new lookup, or a new query, etc..
Hence all of these limits would be off by one.
Example:
```rust
// BACKFILL_BATCH_BUFFER_SIZE = 5
if self.batches.iter().filter(...).count() >= BACKFILL_BATCH_BUFFER_SIZE {
return None; // ← REJECT
}
// ... later adds batch via Entry::Vacant(entry).insert(...)
```
Without the `>` being changed to a `>=` , we would allow 6. The same idea applies to all changes proposed.
Co-Authored-By: Antoine James <antoine@ethereum.org>
Co-Authored-By: Jimmy Chen <jimmy@sigmaprime.io>
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
Since https://github.com/sigp/lighthouse/pull/6847, invalid `BlocksByRange`/`BlobsByRange` requests, which do not comply with the spec, are [handled in the Handler](3d16d1080f/beacon_node/lighthouse_network/src/rpc/handler.rs (L880-L911)). Any peer that sends an invalid request is penalized and disconnected.
However, other kinds of invalid rpc request, which result in decoding errors, are just dropped. No penalty is applied and the connection with the peer remains.
I have added handling for the `ListenUpgradeError` event to notify the application of an `RPCError:InvalidData` error and disconnect to the peer that sent the invalid rpc request.
I also added tests for handling invalid rpc requests.
Co-Authored-By: ackintosh <sora.akatsuki@gmail.com>
- Step 0 of the tree-sync roadmap https://github.com/sigp/lighthouse/issues/7678
Current lookup sync tests are written in an explicit way that assume how the internals of lookup sync work. For example the test would do:
- Emit unknown block parent message
- Expect block request for X
- Respond with successful block request
- Expect block processing request for X
- Response with successful processing request
- etc..
This is unnecessarily verbose. And it will requires a complete re-write when something changes in the internals of lookup sync (has happened a few times, mostly for deneb and fulu).
What we really want to assert is:
- WHEN: we receive an unknown block parent message
- THEN: Lookup sync can sync that block
- ASSERT: Without penalizing peers, without unnecessary retries
Keep all existing tests and add new cases but written in the new style described above. The logic to serve and respond to request is in this function `fn simulate` 2288a3aeb1/beacon_node/network/src/sync/tests/lookups.rs (L301)
- It controls peer behavior based on a `CompleteStrategy` where you can set for example "respond to BlocksByRoot requests with empty"
- It actually runs beacon processor messages running their clousures. Now sync tests actually import blocks, increasing the test coverage to the interaction of sync and the da_checker.
- To achieve the above the tests create real blocks with the test harness. To make the tests as fast as before, I disabled crypto with `TestConfig`
Along the way I found a couple bugs, which I documented on the diff.
Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>
Since https://github.com/sigp/lighthouse/pull/6847, invalid `BlocksByRange`/`BlobsByRange` requests, which do not comply with the spec, are [handled in the Handler](3d16d1080f/beacon_node/lighthouse_network/src/rpc/handler.rs (L880-L911)). Any peer that sends an invalid request is penalized and disconnected.
However, other kinds of invalid rpc request, which result in decoding errors, are just dropped. No penalty is applied and the connection with the peer remains.
I have added handling for the `ListenUpgradeError` event to notify the application of an `RPCError:InvalidData` error and disconnect to the peer that sent the invalid rpc request.
I also added tests for handling invalid rpc requests.
Co-Authored-By: ackintosh <sora.akatsuki@gmail.com>
Swaps out the `local_ip_address` dependency for `if-addrs`. The reason for this is that is that `local_ip_address` is a relatively heavy dependency (depends on `neli`) compared to `if-addrs` and we only use it to check the presence of an IPv6 interface. This is an experiment to see if we can use the more lightweight `if-addrs` instead.
Co-Authored-By: Mac L <mjladson@pm.me>
All the required boilerplate for gloas gossip. We'll include the gossip message processing logic in a separate PR
Co-Authored-By: Eitan Seri- Levi <eserilev@gmail.com>
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
#8652
Implements "simplified" versions of `max_blobs_by_root_request` and `max_data_columns_by_root_request` which do not depend on type information from the `data` module. I've also added tests which test the original implementation against the simplified one to ensure they don't deviate.
Also moves `all_data_column_sidecar_subnets` from a method on `ChainSpec` to a function which just takes `ChainSpec` as an argument.
Co-Authored-By: Mac L <mjladson@pm.me>
Removes the remaining facade re-exports from `consensus/types`.
I have left `graffiti` as I think it has some utility so am leaning towards keeping it in the final API design.
Co-Authored-By: Mac L <mjladson@pm.me>
Just visual clean-up, making logging statements look uniform. There's no reason to use `tracing::debug` instead of `debug`. If we ever need to migrate our logging lib in the future it would make things easier too.
Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
Co-Authored-By: Michael Sproul <michaelsproul@users.noreply.github.com>
Which issue # does this PR address?
None
Discussed in private with @jimmygchen, Lighthouse's `earliest_available_slot` is guaranteed to always align with epoch boundaries, but as a safety implementation, we should use `start_slot` just in case other clients differ in their implementations.
At least we agreed it would be safer for `synced_peers_for_epoch`, I also made the change in `has_good_custody_range_sync_peer`, but this is to be reviewed please.
Co-Authored-By: Antoine James <antoine@ethereum.org>
Co-Authored-By: Jimmy Chen <jimmy@sigmaprime.io>
Which issue # does this PR address?
#8586
Please list or describe the changes introduced by this PR.
Remove `service_name` from `TaskExecutor`
Co-Authored-By: Abhivansh <31abhivanshj@gmail.com>
There are certain crates which we re-export within `types` which creates a fragmented DevEx, where there are various ways to import the same crates.
```rust
// consensus/types/src/lib.rs
pub use bls::{
AggregatePublicKey, AggregateSignature, Error as BlsError, Keypair, PUBLIC_KEY_BYTES_LEN,
PublicKey, PublicKeyBytes, SIGNATURE_BYTES_LEN, SecretKey, Signature, SignatureBytes,
get_withdrawal_credentials,
};
pub use context_deserialize::{ContextDeserialize, context_deserialize};
pub use fixed_bytes::FixedBytesExtended;
pub use milhouse::{self, List, Vector};
pub use ssz_types::{BitList, BitVector, FixedVector, VariableList, typenum, typenum::Unsigned};
pub use superstruct::superstruct;
```
This PR removes these re-exports and makes it explicit that these types are imported from a non-`consensus/types` crate.
Co-Authored-By: Mac L <mjladson@pm.me>
Remove certain dependencies from `eth2`, and feature-gate others which are only used by certain endpoints.
| Removed | Optional | Dev only |
| -------- | -------- | -------- |
| `either` `enr` `libp2p-identity` `multiaddr` | `protoarray` `eth2_keystore` `eip_3076` `zeroize` `reqwest-eventsource` `futures` `futures-util` | `rand` `test_random_derive` |
This is done by adding an `events` feature which enables the events endpoint and its associated dependencies.
The `lighthouse` feature also enables its associated dependencies making them optional.
The networking-adjacent dependencies were removed by just having certain fields use a `String` instead of an explicit network type. This means the user should handle conversion at the call site instead. This is a bit spicy, but I believe `PeerId`, `Enr` and `Multiaddr` are easily converted to and from `String`s so I think it's fine and reduces our dependency space by a lot. The alternative is to feature gate these types behind a `network` feature instead.
Co-Authored-By: Mac L <mjladson@pm.me>
Organize and categorize `consensus/types` into modules based on their relation to key consensus structures/concepts.
This is a precursor to a sensible public interface.
While this refactor is very opinionated, I am open to suggestions on module names, or type groupings if my current ones are inappropriate.
Co-Authored-By: Mac L <mjladson@pm.me>
Fixes#7785
- [x] Update all integration tests with >1 files to follow the `main` pattern.
- [x] `crypto/eth2_key_derivation/tests`
- [x] `crypto/eth2_keystore/tests`
- [x] `crypto/eth2_wallet/tests`
- [x] `slasher/tests`
- [x] `common/eth2_interop_keypairs/tests`
- [x] `beacon_node/lighthouse_network/tests`
- [x] Set `debug_assertions` to false on `.vscode/settings.json`.
- [x] Document how to make rust analyzer work on integration tests files. In `book/src/contributing_setup.md`
---
Tracking a `rust-analyzer.toml` with settings like the one provided in `.vscode/settings.json` would be nicer. But this is not possible yet. For now, that config should be a good enough indicator for devs using editors different to VSCode.
Co-Authored-By: Daniel Ramirez-Chiquillo <hi@danielrachi.com>
Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
Consolidate our property-testing around `proptest`. This PR was written with Copilot and manually tweaked.
Co-Authored-By: Michael Sproul <michael@sproul.xyz>
Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
Take 2 of #8390.
Fixes the race condition properly instead of propagating the error. I think this is a better alternative, and doesn't seem to look that bad.
* Lift node id loading or generation from `NetworkService ` startup to the `ClientBuilder`, so that it can be used to compute custody columns for the beacon chain without waiting for Network bootstrap.
I've considered and implemented a few alternatives:
1. passing `node_id` to beacon chain builder and compute columns when creating `CustodyContext`. This approach isn't good for separation of concerns and isn't great for testability
2. passing `ordered_custody_groups` to beacon chain. `CustodyContext` only uses this to compute ordered custody columns, so we might as well lift this logic out, so we don't have to do error handling in `CustodyContext` construction. Less tests to update;.
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
https://github.com/sigp/lighthouse/issues/8012
Replace all instances of `VariableList::from` and `FixedVector::from` to their `try_from` variants.
While I tried to use proper error handling in most cases, there were certain situations where adding an `expect` for situations where `try_from` can trivially never fail avoided adding a lot of extra complexity.
Co-Authored-By: Mac L <mjladson@pm.me>
Co-Authored-By: Michael Sproul <michaelsproul@users.noreply.github.com>
Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
#7603
#### Custody backfill sync service
Similar in many ways to the current backfill service. There may be ways to unify the two services. The difficulty there is that the current backfill service tightly couples blocks and their associated blobs/data columns. Any attempts to unify the two services should be left to a separate PR in my opinion.
#### `SyncNeworkContext`
`SyncNetworkContext` manages custody sync data columns by range requests separetly from other sync RPC requests. I think this is a nice separation considering that custody backfill is its own service.
#### Data column import logic
The import logic verifies KZG committments and that the data columns block root matches the block root in the nodes store before importing columns
#### New channel to send messages to `SyncManager`
Now external services can communicate with the `SyncManager`. In this PR this channel is used to trigger a custody sync. Alternatively we may be able to use the existing `mpsc` channel that the `SyncNetworkContext` uses to communicate with the `SyncManager`. I will spend some time reviewing this.
Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>
Co-Authored-By: Eitan Seri- Levi <eserilev@gmail.com>
Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>
N/A
1. In the batch retry logic, we were failing to set the batch state to `AwaitingDownload` before attempting a retry. This PR sets it to `AwaitingDownload` before the retry and sets it back to `Downloading` if the retry suceeded in sending out a request
2. Remove all peer scoring logic from retrying and rely on just de priorotizing the failed peer. I finally concede the point to @dapplion 😄
3. Changes `block_components_by_range_request` to accept `block_peers` and `column_peers`. This is to ensure that we use the full synced peerset for requesting columns in order to avoid splitting the column peers among multiple head chains. During forward sync, we want the block peers to be the peers from the syncing chain and column peers to be all synced peers from the peerdb.
Also, fixes a typo and calls `attempt_send_awaiting_download_batches` from more places
Co-Authored-By: Pawan Dhananjay <pawandhananjay@gmail.com>
#8105 (to be confirmed)
I noticed a large number of failed discovery requests after deploying latest `unstable` to some of our testnet and mainnet nodes. This is because of a recent PeerDAS change to attempt to maintain sufficient peers across data column subnets - this shouldn't be enabled on network without peerdas scheduled, otherwise it will keep retrying discovery on these subnets and never succeed.
Also removed some unused files.
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
Co-Authored-By: Jimmy Chen <jimmy@sigmaprime.io>
Anchor currently depends on `lighthouse_network` for a few types and utilities that live within. As we use our own libp2p behaviours, we actually do not use the core logic in that crate. This makes us transitively depend on a bunch of unneeded crates (even a whole separate libp2p if the versions mismatch!)
Move things we require into it's own lightweight crate.
Co-Authored-By: Daniel Knopik <daniel@dknopik.de>
I just noticed that one of the tests i added in #7915 is incorrect, after it was running flaky for a bit.
This PR fixes the scenario and ensure the outcome will always be the same.
Closes:
- #7865
- #7855
Changes extracted from earlier PR #7876
This PR fixes two main things with a few other improvements mentioned below:
- Prevent Lighthouse from repeatedly sending `DataColumnByRoot` requests to an unsynced peer, causing lookup sync to get stuck
- Allows Lighthouse to send discovery requests if there isn't enough **synced** peers in the required sampling subnets - this fixes the stuck sync scenario where there isn't enough usable peers in sampling subnet but no discovery is attempted.
- Make peer discovery queries if custody subnet peer count drops below the minimum threshold
- Update peer pruning logic to prioritise uniform distribution across all data column subnets and avoid pruning sampling peers if the count is below the target threshold (2)
- Check sync status when making discovery requests, to make sure we don't ignore requests if there isn't enough synced peers in the required sampling subnets
- Optimise some of the `PeerDB` functions checking custody peers
- Only send lookup requests to peers that are synced or advanced
This method is a footgun because it truncates the list. It is the source of a recent bug:
- https://github.com/sigp/lighthouse/pull/7927
- Delete uses of `RuntimeVariableList::from_vec` and replace them with `::new` which does validation and can fail.
- Propagate errors where possible, unwrap in tests and use `expect` for obviously-safe uses (in `chain_spec.rs`).
Adds the required boilerplate code for the Gloas (Glamsterdam) hard fork. This allows PRs testing Gloas-candidate features to test fork transition.
This also includes de-duplication of post-Bellatrix readiness notifiers from #6797 (credit to @dapplion)
Fixes#7926
This was a bug I introduced in #7890 and @pawanjay176 noticed it on some running nodes, and added a rpc test to confirm it.
The culprit is this line, where I failed to fill the vec to it's max size, so it doesn't calculate the max size properly, resulting in all `DataColumnByRoot` requests exceeding the max size during validation:
d24a6d2a45/consensus/types/src/chain_spec.rs (L1984)
The PR fixes this and includes new regression tests for this fix.
Was going to leave this as a comment on #7877 but when noticed it had already been merged.
we have `DEFAULT_TARGET_PEERS` which was set to 50 and only used on the `Default` impl for `peer_manager`'s `Config`, which then get's overridden by this `lighthouse_network::Config`s default
This PR unifies everything on `DEFAULT_TARGET_PEERS`