* WIP
* Initial working version of new sync tests.
* Remove sync traits and fix lints.
* Reduce internal method visibility and make test method instead. Remove extra beacon chain harness instance created in tests.
* Improve `SyncTester` api.
* Fix lint.
* Test example
* Lookup tests using rig
* Tests should interface with events only
* lint
* Skip deneb test pre-deneb
* Add more assertions
* Remove logging changes
* Address @jimmygchen comments
* Merge branch 'unstable' of https://github.com/sigp/lighthouse into bn-p2p-tests
* remove unused assertions
* fix lint
## Issue Addressed
https://github.com/sigp/lighthouse/issues/4543
## Proposed Changes
- Removes `NotBanned` from `BanResult`, implements `Display` and `std::error::Error` for `BanResult` and changes `ban_result` return type to `Option<BanResult>` which helps returning `BanResult` on `handle_established_inbound_connection`
- moves the check from for banned peers from `on_connection_established` to `handle_established_inbound_connection` to start addressing #4543.
- Removes `allow_block_list` as it's now redundant? Not sure about this one but if `PeerManager` keeps track of the banned peers, no need to send a `Swarm` event for `alow_block_list` to also keep that list right?
## Questions
- #4543 refers:
> More specifically, implement the connection limit behaviour inside the peer manager.
@AgeManning do you mean copying `libp2p::connection_limits::Behaviour`'s code into `PeerManager`/ having it as an inner `NetworkBehaviour` of `PeerManager`/other? If it's the first two, I think it probably makes more sense to have it as it is as it's less code to maintain.
> Also implement the banning of peers inside the behaviour, rather than passing messages back up to the swarm.
I tried to achieve this, but we still need to pass the `PeerManagerEvent::Banned` swarm event as `DiscV5` handles it's node and ip management internally and I did not find a method to query if a peer is banned. Is there anything else we can do from here?
3397612160/beacon_node/lighthouse_network/src/discovery/mod.rs (L931-L940)
Same as the question above, I did not find a way to check if `DiscV5` has the peer banned, so that we could check here and avoid sending `Swarm` events
3397612160/beacon_node/lighthouse_network/src/peer_manager/network_behaviour.rs (L168-L178)
Is there a chance we try to dial a peer that has been banned previously?
Thanks!
## Issue Addressed
#4402
## Proposed Changes
This PR adds QUIC support to Lighthouse. As this is not officially spec'd this will only work between lighthouse <-> lighthouse connections. We attempt a QUIC connection (if the node advertises it) and if it fails we fallback to TCP.
This should be a backwards compatible modification. We want to test this functionality on live networks to observe any improvements in bandwidth/latency.
NOTE: This also removes the websockets transport as I believe no one is really using it. It should be mentioned in our release however.
Co-authored-by: João Oliveira <hello@jxs.pt>
## Issue Addressed
#4150
## Proposed Changes
Maintain trusted peers in the pruning logic. ~~In principle the changes here are not necessary as a trusted peer has a max score (100) and all other peers can have at most 0 (because we don't implement positive scores). This means that we should never prune trusted peers unless we have more trusted peers than the target peer count.~~
This change shifts this logic to explicitly never prune trusted peers which I expect is the intuitive behaviour.
~~I suspect the issue in #4150 arises when a trusted peer disconnects from us for one reason or another and then we remove that peer from our peerdb as it becomes stale. When it re-connects at some large time later, it is no longer a trusted peer.~~
Currently we do disconnect trusted peers, and this PR corrects this to maintain trusted peers in the pruning logic.
As suggested in #4150 we maintain trusted peers in the db and thus we remember them even if they disconnect from us.
## Issue Addressed
N/A
## Proposed Changes
Adds a flag for disabling peer scoring. This is useful for local testing and testing small networks for new features.
This is a correction to #3757.
The correction registers a peer that is being disconnected in the local peer manager db to ensure we are tracking the correct state.
On heavily crowded networks, we are seeing many attempted connections to our node every second.
Often these connections come from peers that have just been disconnected. This can be for a number of reasons including:
- We have deemed them to be not as useful as other peers
- They have performed poorly
- They have dropped the connection with us
- The connection was spontaneously lost
- They were randomly removed because we have too many peers
In all of these cases, if we have reached or exceeded our target peer limit, there is no desire to accept new connections immediately after the disconnect from these peers. In fact, it often costs us resources to handle the established connections and defeats some of the logic of dropping them in the first place.
This PR adds a timeout, that prevents recently disconnected peers from reconnecting to us.
Technically we implement a ban at the swarm layer to prevent immediate re connections for at least 10 minutes. I decided to keep this light, and use a time-based LRUCache which only gets updated during the peer manager heartbeat to prevent added stress of polling a delay map for what could be a large number of peers.
This cache is bounded in time. An extra space bound could be added should people consider this a risk.
Co-authored-by: Diva M <divma@protonmail.com>
## Issue Addressed
I noticed in some logs some excess and unecessary discovery queries. What was happening was we were pruning our peers down to our outbound target and having some disconnect. When we are below this threshold we try to find more peers (even if we are at our peer limit). The request becomes futile because we have no more peer slots.
This PR corrects this issue and advances the pruning mechanism to favour subnet peers.
An overview the new logic added is:
- We prune peers down to a target outbound peer count which is higher than the minimum outbound peer count.
- We only search for more peers if there is room to do so, and we are below the minimum outbound peer count not the target. So this gives us some buffer for peers to disconnect. The buffer is currently 10%
The modified pruning logic is documented in the code but for reference it should do the following:
- Prune peers with bad scores first
- If we need to prune more peers, then prune peers that are subscribed to a long-lived subnet
- If we still need to prune peers, the prune peers that we have a higher density of on any given subnet which should drive for uniform peers across all subnets.
This will need a bit of testing as it modifies some significant peer management behaviours in lighthouse.
## Issue Addressed
We emit a warning to verify that all peer connection state information is consistent. A warning is given under one edge case;
We try to dial a peer with peer-id X and multiaddr Y. The peer responds to multiaddr Y with a different peer-id, Z. The dialing to the peer fails, but libp2p injects the failed attempt as peer-id Z.
In this instance, our PeerDB tries to add a new peer in the disconnected state under a previously unknown peer-id. This is harmless and so this PR permits this behaviour without logging a warning.
## Issue Addressed
The PeerDB was getting out of sync with the number of disconnected peers compared to the actual count. As this value determines how many we store in our cache, over time the cache was depleting and we were removing peers immediately resulting in errors that manifest as unknown peers for some operations.
The error occurs when dialing a peer fails, we were not correctly updating the peerdb counter because the increment to the counter was placed in the wrong order and was therefore not incrementing the count.
This PR corrects this.
## Issue Addressed
Users are experiencing `Status'd peer not found` errors
## Proposed Changes
Although I cannot reproduce this error, this is only one connection state change that is not addressed in the peer manager (that I could see). The error occurs because the number of disconnected peers in the peerdb becomes out of sync with the actual number of disconnected peers. From what I can tell almost all possible connection state changes are handled, except for the case when a disconnected peer changes to be disconnecting. This can potentially happen at the peer connection limit, where a previously connected peer switches to disconnecting.
This PR decrements the disconnected counter when this event occurs and from what I can tell, covers all possible disconnection state changes in the peer manager.
## Issue Addressed
Part of a bigger effort to make the network globals read only. This moves all writes to the `PeerDB` to the `eth2_libp2p` crate. Limiting writes to the peer manager is a slightly more complicated issue for a next PR, to keep things reviewable.
## Proposed Changes
- Make the peers field in the globals a private field.
- Allow mutable access to the peers field to `eth2_libp2p` for now.
- Add a new network message to update the sync state.
Co-authored-by: Age Manning <Age@AgeManning.com>
## Issue Addressed
I've done this change in a couple of WIPs already so I might as well submit it on its own. This changes no functionality but reduces coupling in a 0.0001%. It also helps new people who need to work in the peer manager to better understand what it actually needs from the outside
## Proposed Changes
Add a config to the peer manager
## Description
The `eth2_libp2p` crate was originally named and designed to incorporate a simple libp2p integration into lighthouse. Since its origins the crates purpose has expanded dramatically. It now houses a lot more sophistication that is specific to lighthouse and no longer just a libp2p integration.
As of this writing it currently houses the following high-level lighthouse-specific logic:
- Lighthouse's implementation of the eth2 RPC protocol and specific encodings/decodings
- Integration and handling of ENRs with respect to libp2p and eth2
- Lighthouse's discovery logic, its integration with discv5 and logic about searching and handling peers.
- Lighthouse's peer manager - This is a large module handling various aspects of Lighthouse's network, such as peer scoring, handling pings and metadata, connection maintenance and recording, etc.
- Lighthouse's peer database - This is a collection of information stored for each individual peer which is specific to lighthouse. We store connection state, sync state, last seen ips and scores etc. The data stored for each peer is designed for various elements of the lighthouse code base such as syncing and the http api.
- Gossipsub scoring - This stores a collection of gossipsub 1.1 scoring mechanisms that are continuously analyssed and updated based on the ethereum 2 networks and how Lighthouse performs on these networks.
- Lighthouse specific types for managing gossipsub topics, sync status and ENR fields
- Lighthouse's network HTTP API metrics - A collection of metrics for lighthouse network monitoring
- Lighthouse's custom configuration of all networking protocols, RPC, gossipsub, discovery, identify and libp2p.
Therefore it makes sense to rename the crate to be more akin to its current purposes, simply that it manages the majority of Lighthouse's network stack. This PR renames this crate to `lighthouse_network`
Co-authored-by: Paul Hauner <paul@paulhauner.com>