Commit Graph

26 Commits

Author SHA1 Message Date
Paul Hauner
4af6fcfafd Bump libp2p to address inconsistency in mesh peer tracking (#2493)
## Issue Addressed

- Resolves #2457
- Resolves #2443

## Proposed Changes

Target the (presently unreleased) head of `libp2p/rust-libp2p:master` in order to obtain the fix from https://github.com/libp2p/rust-libp2p/pull/2175.

Additionally:

- `libsecp256k1` needed to be upgraded to satisfy the new version of `libp2p`.
- There were also a handful of minor changes to `eth2_libp2p` to suit some interface changes.
- Two `cargo audit --ignore` flags were remove due to libp2p upgrades.

## Additional Info
 
 NA
2021-08-12 01:59:20 +00:00
Pawan Dhananjay
e8c0d1f19b Altair networking (#2300)
## Issue Addressed

Resolves #2278 

## Proposed Changes

Implements the networking components for the Altair hard fork https://github.com/ethereum/eth2.0-specs/blob/dev/specs/altair/p2p-interface.md

## Additional Info

This PR acts as the base branch for networking changes and tracks https://github.com/sigp/lighthouse/pull/2279 . Changes to gossip, rpc and discovery can be separate PRs to be merged here for ease of review.

Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-08-04 01:44:57 +00:00
Age Manning
3c0d3227ab Global Network Behaviour Refactor (#2442)
* Network upgrades (#2345)

* Discovery patch (#2382)

* Upgrade libp2p and unstable gossip

* Network protocol upgrades

* Correct dependencies, reduce incoming bucket limit

* Clean up dirty DHT entries before repopulating

* Update cargo lock

* Update lockfile

* Update ENR dep

* Update deps to specific versions

* Update test dependencies

* Update docker rust, and remote signer tests

* More remote signer test fixes

* Temp commit

* Update discovery

* Remove cached enrs after dialing

* Increase the session capacity, for improved efficiency

* Bleeding edge discovery (#2435)

* Update discovery banning logic and tokio

* Update to latest discovery

* Shift to latest discovery

* Fmt

* Initial re-factor of the behaviour

* More progress

* Missed changes

* First draft

* Discovery as a behaviour

* Adding back event waker (not convinced its neccessary, but have made this many changes already)

* Corrections

* Speed up discovery

* Remove double log

* Fmt

* After disconnect inform swarm about ban

* More fmt

* Appease clippy

* Improve ban handling

* Update tests

* Update cargo.lock

* Correct tests

* Downgrade log
2021-07-15 16:43:17 +10:00
realbigsean
b84ff9f793 rust 1.53.0 updates (#2411)
## Issue Addressed

`make lint` failing on rust 1.53.0.

## Proposed Changes

1.53.0 updates

## Additional Info

I haven't figure out why yet, we were now hitting the recursion limit in a few crates. So I had to add `#![recursion_limit = "256"]` in a few places


Co-authored-by: realbigsean <seananderson33@gmail.com>
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2021-06-18 05:58:01 +00:00
divma
3261eff0bf split outbound and inbound codecs encoded types (#2410)
Splits the inbound and outbound requests, for maintainability.
2021-06-17 00:40:16 +00:00
realbigsean
e20f64b21a Update to tokio 1.1 (#2172)
## Issue Addressed

resolves #2129
resolves #2099 
addresses some of #1712
unblocks #2076
unblocks #2153 

## Proposed Changes

- Updates all the dependencies mentioned in #2129, except for web3. They haven't merged their tokio 1.0 update because they are waiting on some dependencies of their own. Since we only use web3 in tests, I think updating it in a separate issue is fine. If they are able to merge soon though, I can update in this PR. 

- Updates `tokio_util` to 0.6.2 and `bytes` to 1.0.1.

- We haven't made a discv5 release since merging tokio 1.0 updates so I'm using a commit rather than release atm. **Edit:** I think we should merge an update of `tokio_util` to 0.6.2 into discv5 before this release because it has panic fixes in `DelayQueue`  --> PR in discv5:  https://github.com/sigp/discv5/pull/58

## Additional Info

tokio 1.0 changes that required some changes in lighthouse:

- `interval.next().await.is_some()` -> `interval.tick().await`
- `sleep` future is now `!Unpin` -> https://github.com/tokio-rs/tokio/issues/3028
- `try_recv` has been temporarily removed from `mpsc` -> https://github.com/tokio-rs/tokio/issues/3350
- stream features have moved to `tokio-stream` and `broadcast::Receiver::into_stream()` has been temporarily removed -> `https://github.com/tokio-rs/tokio/issues/2870
- I've copied over the `BroadcastStream` wrapper from this PR, but can update to use `tokio-stream` once it's merged https://github.com/tokio-rs/tokio/pull/3384

Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-02-10 23:29:49 +00:00
divma
f3200784b4 More metrics + RPC tweaks (#2041)
## Issue Addressed

NA

## Proposed Changes
This was mostly done to find the reason why LH was dropping peers from Nimbus. It proved to be useful so I think it's worth it. But there is also some functional stuff here
- Add metrics for rpc errors per client, error type and direction
- Add metrics for downscoring events per source type, client and penalty type
- Add metrics for gossip validation results per client for non-accepted messages
- Make the RPC handler return errors and requests/responses in the order we see them
- Allow a small burst for the Ping rate limit, from 1 every 5 seconds to 2 every 10 seconds
- Send rate limiting errors with a particular code and use that same code to identify them. I picked something different to 128 since that is most likely what other clients are using for their own errors
- Remove some unused code in the `PeerAction` and the rpc handler
- Remove the unused variant `RateLimited`. tTis was never produced directly, since the only way to get the request's protocol is via de handler. The handler upon receiving from LH a response with an error (rate limited in this case) emits this event with the missing info (It was always like this, just pointing out that we do downscore rate limiting errors regardless of the change)

Metrics for Nimbus looked like this:
Downscoring events: `increase(libp2p_peer_actions_per_client{client="Nimbus"}[5m])`
![image](https://user-images.githubusercontent.com/26765164/101210880-862bf280-3676-11eb-94c0-399f0bf5aa2e.png)

RPC Errors: `increase(libp2p_rpc_errors_per_client{client="Nimbus"}[5m])`
![image](https://user-images.githubusercontent.com/26765164/101210997-ba071800-3676-11eb-847a-f32405ede002.png)

Unaccepted gossip message: `increase(gossipsub_unaccepted_messages_per_client{client="Nimbus"}[5m])`
![image](https://user-images.githubusercontent.com/26765164/101211124-f470b500-3676-11eb-9459-132ecff058ec.png)
2020-12-08 03:55:50 +00:00
divma
8fcd22992c No string in slog (#2017)
## Issue Addressed

Following slog's documentation, this should help a bit with string allocations. I left it run for two days and mem usage is lower. This is of course anecdotal, but shouldn't harm anyway 

## Proposed Changes

remove `String` creation in logs when possible
2020-11-30 10:33:00 +00:00
Age Manning
a567f788bd Upgrade to tokio 0.3 (#1839)
## Description

This PR updates Lighthouse to tokio 0.3. It includes a number of dependency updates and some structural changes as to how we create and spawn tasks.

This also brings with it a number of various improvements:

- Discv5 update
- Libp2p update
- Fix for recompilation issues
- Improved UPnP port mapping handling
- Futures dependency update
- Log downgrade to traces for rejecting peers when we've reached our max



Co-authored-by: blacktemplar <blacktemplar@a1.net>
2020-11-28 05:30:57 +00:00
Age Manning
49c4630045 Performance improvement for db reads (#1909)
This PR adds a number of improvements:
- Downgrade a warning log when we ignore blocks for gossipsub processing
- Revert a a correction to improve logging of peer score changes
- Shift syncing DB reads off the core-executor allowing parallel processing of large sync messages
- Correct the timeout logic of RPC chunk sends, giving more time before timing out RPC outbound messages.
2020-11-16 07:28:30 +00:00
divma
eb56140582 Update logs + do not downscore peers if WE time out (#1901)
## Issue Addressed

- RPC Errors were being logged twice: first in the peer manager and then again in the router, so leave just the peer manager's one 
- The "reduce peer count" warn message gets thrown to the user for every missed chunk, so instead print it when the request times out and also do not include there info that is not relevant to the user
- The processor didn't have the service tag so add it
- Impl `KV` for status message
- Do not downscore peers if we are the ones that timed out

Other small improvements
2020-11-16 04:06:14 +00:00
Age Manning
c00e6c2c6f Small network adjustments (#1884)
## Issue Addressed

- Asymmetric pings - Currently with symmetric ping intervals, lighthouse nodes race each other to ping often ending in simultaneous ping connections. This shifts the ping interval to be asymmetric based on inbound/outbound connections
- Correct inbound/outbound peer-db registering - It appears we were accounting inbound as outbound and vice versa in the peerdb, this has been corrected
- Improved logging

There is likely more to come - I'll leave this open as we investigate further testnets
2020-11-13 06:06:33 +00:00
Age Manning
e2ae5010a6 Update libp2p (#1865)
Updates libp2p to the latest version. 

This adds tokio 0.3 support and brings back yamux support. 

This also updates some discv5 configuration parameters for leaner discovery queries
2020-11-06 04:14:14 +00:00
Age Manning
0a0f4daf9d Prevent errors for stream termination race (#1853)
Prevents an error being propagated on a race condition for RPC stream termination
2020-11-03 10:37:00 +00:00
Paul Hauner
da44821e39 Clean up obsolete TODOs (#1734)
Squashed commit of the following:

commit f99373cbae
Author: Age Manning <Age@AgeManning.com>
Date:   Mon Oct 5 18:44:09 2020 +1100

    Clean up obsolute TODOs
2020-10-05 21:08:14 +11:00
blacktemplar
c18d37c202 Use Gossipsub 1.1 (#1516)
## Issue Addressed

#1172

## Proposed Changes

* updates the libp2p dependency
* small adaptions based on changes in libp2p
* report not just valid messages but also invalid and distinguish between `IGNORE`d messages and `REJECT`ed messages


Co-authored-by: Age Manning <Age@AgeManning.com>
2020-08-30 13:06:50 +00:00
divma
1a67d15701 Mitigate too many outgoing connections (#1469)
limit simultaneous outgoing connections attempts to a reasonable top as an extra layer of protection
also shift the keep alive logic of the rpc handler to avoid needing to update it by hand. I think In rare cases this could make shutting down a connection a bit faster.
2020-08-11 02:16:31 +00:00
Age Manning
08a31c5a1a Disconnect peers (#1484)
## Issue Addressed

Peers that connected after the peer limit may remain connected in some circumstances. 

This ensures peers not in the peer manager's list get disconnected. Further logging is also added to track this behaviour.
2020-08-08 06:08:44 +00:00
blacktemplar
23a8f31f83 Fix clippy warnings (#1385)
## Issue Addressed

NA

## Proposed Changes

Fixes most clippy warnings and ignores the rest of them, see issue #1388.
2020-07-23 14:18:00 +00:00
divma
ba10c80633 Refactor inbound substream logic with async (#1325)
## Issue Addressed
#1112 

The logic is slightly different but still valid wrt to error handling.
- Inbound state is either Busy with a future that return the subtream (and info about the processing)
- The state machine works as follows:
  - `Idle` with pending responses => `Busy`
  - `Busy` => finished ? if so and there are new pending responses then `Busy`, if not then `Idle`
               => not finished remains `Busy`
- Add an `InboundInfo` for readability
- Other stuff:
  - Close inbound substreams when all expected responses are sent
  - Remove the error variants from `RPCCodedResponse` and use the codes instead
  - Fix various spelling mistakes because I got sloppy last time

Sorry for the delay

Co-authored-by: Age Manning <Age@AgeManning.com>
2020-07-23 12:30:43 +00:00
Age Manning
5bc8fea2e0 Activate peer scoring (#1284)
* Initial score structure

* Peer manager update

* Updates to dialing

* Correct tests

* Correct typos and remove unused function

* Integrate scoring into the network crate

* Clean warnings

* Formatting

* Shift core functionality into the behaviour

* Temp commit

* Shift disconnections into the behaviour

* Temp commit

* Update libp2p and gossipsub

* Remove gossipsub lru cache

* Correct merge conflicts

* Modify handler and correct tests

* Update enr network globals on socket update

* Apply clippy lints

* Add new prysm fingerprint

* More clippy fixes
2020-07-07 10:13:16 +10:00
Michael Sproul
7688b5f1dd Merge remote-tracking branch 'origin/master' into spec-v0.12 2020-06-26 12:57:56 +10:00
pscott
02174e21d8 Fix clippy's performance lints (#1286)
* Fix clippy perf lints

* Cargo fmt

* Add  and  to lint rule in Makefile

* Fix some leftover clippy lints
2020-06-26 00:04:08 +10:00
divma
259502829e fix wrong draining of queued requests on handler shutdown (#1288) 2020-06-24 17:44:28 +10:00
Paul Hauner
decea48c78 Merge branch 'master' into spec-v0.12 2020-06-21 10:33:02 +10:00
Age Manning
e379ad0f4e Silky smooth discovery (#1274)
* Initial structural re-write

* Improving discovery update and correcting attestation service logic

* Rework discovery.mod

* Handling lifetimes of query futures

* Discovery update first draft

* format fixes

* Stabalise discv5 update

* Formatting corrections

* Limit FindPeers queries and bug correction

* Update to stable release discv5

* Remove unnecessary pin

* formatting
2020-06-19 14:13:23 +10:00