lighthouse

mirror of https://github.com/sigp/lighthouse.git synced 2026-05-31 05:07:12 +00:00

Author	SHA1	Message	Date
Pawan Dhananjay	e8c0d1f19b	Altair networking (#2300 ) ## Issue Addressed Resolves #2278 ## Proposed Changes Implements the networking components for the Altair hard fork https://github.com/ethereum/eth2.0-specs/blob/dev/specs/altair/p2p-interface.md ## Additional Info This PR acts as the base branch for networking changes and tracks https://github.com/sigp/lighthouse/pull/2279 . Changes to gossip, rpc and discovery can be separate PRs to be merged here for ease of review. Co-authored-by: realbigsean <seananderson33@gmail.com>	2021-08-04 01:44:57 +00:00
realbigsean	303deb9969	Rust 1.54.0 lints (#2483 ) ## Issue Addressed N/A ## Proposed Changes - Removing a bunch of unnecessary references - Updated `Error::VariantError` to `Error::Variant` - There were additional enum variant lints that I ignored, because I thought our variant names were fine - removed `MonitoredValidator`'s `pubkey` field, because I couldn't find it used anywhere. It looks like we just use the string version of the pubkey (the `id` field) if there is no index ## Additional Info Co-authored-by: realbigsean <seananderson33@gmail.com>	2021-07-30 01:11:47 +00:00
Age Manning	08fedbfcba	Libp2p Connection Limit (#2455 ) * Get libp2p to handle connection limits * fmt	2021-07-15 16:43:18 +10:00
Age Manning	381befbf82	Ensure disconnecting peers are added to the peerdb (#2451 )	2021-07-15 16:43:18 +10:00
Age Manning	059d9ec1b1	Gossipsub scoring improvements (#2391 ) * Tweak gossipsub parameters for improved scoring * Modify gossip history * Update settings * Make mesh window constant * Decrease the mesh message deliveries weight * Fmt	2021-07-15 16:43:18 +10:00
Age Manning	c62810b408	Update to Libp2p to 39.1 (#2448 ) * Adjust beacon node timeouts for validator client HTTP requests (#2352) Resolves #2313 Provide `BeaconNodeHttpClient` with a dedicated `Timeouts` struct. This will allow granular adjustment of the timeout duration for different calls made from the VC to the BN. These can either be a constant value, or as a ratio of the slot duration. Improve timeout performance by using these adjusted timeout duration's only whenever a fallback endpoint is available. Add a CLI flag called `use-long-timeouts` to revert to the old behavior. Additionally set the default `BeaconNodeHttpClient` timeouts to the be the slot duration of the network, rather than a constant 12 seconds. This will allow it to adjust to different network specifications. Co-authored-by: Paul Hauner <paul@paulhauner.com> * Use read_recursive locks in database (#2417) Closes #2245 Replace all calls to `RwLock::read` in the `store` crate with `RwLock::read_recursive`. * Unfortunately we can't run the deadlock detector on CI because it's pinned to an old Rust 1.51.0 nightly which cannot compile Lighthouse (one of our deps uses `ptr::addr_of!` which is too new). A fun side-project at some point might be to update the deadlock detector. * The reason I think we haven't seen this deadlock (at all?) in practice is that _writes_ to the database's split point are quite infrequent, and a concurrent write is required to trigger the deadlock. The split point is only written when finalization advances, which is once per epoch (every ~6 minutes), and state reads are also quite sporadic. Perhaps we've just been incredibly lucky, or there's something about the timing of state reads vs database migration that protects us. * I wrote a few small programs to demo the deadlock, and the effectiveness of the `read_recursive` fix: https://github.com/michaelsproul/relock_deadlock_mvp * [The docs for `read_recursive`](https://docs.rs/lock_api/0.4.2/lock_api/struct.RwLock.html#method.read_recursive) warn of starvation for writers. I think in order for starvation to occur the database would have to be spammed with so many state reads that it's unable to ever clear them all and find time for a write, in which case migration of states to the freezer would cease. If an attack could be performed to trigger this starvation then it would likely trigger a deadlock in the current code, and I think ceasing migration is preferable to deadlocking in this extreme situation. In practice neither should occur due to protection from spammy peers at the network layer. Nevertheless, it would be prudent to run this change on the testnet nodes to check that it doesn't cause accidental starvation. * Return more detail when invalid data is found in the DB during startup (#2445) - Resolves #2444 Adds some more detail to the error message returned when the `BeaconChainBuilder` is unable to access or decode block/state objects during startup. NA * Use hardware acceleration for SHA256 (#2426) Modify the SHA256 implementation in `eth2_hashing` so that it switches between `ring` and `sha2` to take advantage of [x86_64 SHA extensions](https://en.wikipedia.org/wiki/Intel_SHA_extensions). The extensions are available on modern Intel and AMD CPUs, and seem to provide a considerable speed-up: on my Ryzen 5950X it dropped state tree hashing times by about 30% from 35ms to 25ms (on Prater). The extensions became available in the `sha2` crate [last year](https://www.reddit.com/r/rust/comments/hf2vcx/ann_rustcryptos_sha1_and_sha2_now_support/), and are not available in Ring, which uses a [pure Rust implementation of sha2](https://github.com/briansmith/ring/blob/main/src/digest/sha2.rs). Ring is faster on CPUs that lack the extensions so I've implemented a runtime switch to use `sha2` only when the extensions are available. The runtime switching seems to impose a miniscule penalty (see the benchmarks linked below). * Start a release checklist (#2270) NA Add a checklist to the release draft created by CI. I know @michaelsproul was also working on this and I suspect @realbigsean also might have useful input. NA * Serious banning * fmt Co-authored-by: Mac L <mjladson@pm.me> Co-authored-by: Paul Hauner <paul@paulhauner.com> Co-authored-by: Michael Sproul <michael@sigmaprime.io>	2021-07-15 16:43:18 +10:00
Age Manning	3c0d3227ab	Global Network Behaviour Refactor (#2442 ) * Network upgrades (#2345) * Discovery patch (#2382) * Upgrade libp2p and unstable gossip * Network protocol upgrades * Correct dependencies, reduce incoming bucket limit * Clean up dirty DHT entries before repopulating * Update cargo lock * Update lockfile * Update ENR dep * Update deps to specific versions * Update test dependencies * Update docker rust, and remote signer tests * More remote signer test fixes * Temp commit * Update discovery * Remove cached enrs after dialing * Increase the session capacity, for improved efficiency * Bleeding edge discovery (#2435) * Update discovery banning logic and tokio * Update to latest discovery * Shift to latest discovery * Fmt * Initial re-factor of the behaviour * More progress * Missed changes * First draft * Discovery as a behaviour * Adding back event waker (not convinced its neccessary, but have made this many changes already) * Corrections * Speed up discovery * Remove double log * Fmt * After disconnect inform swarm about ban * More fmt * Appease clippy * Improve ban handling * Update tests * Update cargo.lock * Correct tests * Downgrade log	2021-07-15 16:43:17 +10:00
Age Manning	6fb48b45fa	Discovery patch (#2382 ) * Upgrade libp2p and unstable gossip * Network protocol upgrades * Correct dependencies, reduce incoming bucket limit * Clean up dirty DHT entries before repopulating * Update cargo lock * Update lockfile * Update ENR dep * Update deps to specific versions * Update test dependencies * Update docker rust, and remote signer tests * More remote signer test fixes * Temp commit * Update discovery * Remove cached enrs after dialing * Increase the session capacity, for improved efficiency	2021-07-15 16:43:17 +10:00
realbigsean	b84ff9f793	rust 1.53.0 updates (#2411 ) ## Issue Addressed `make lint` failing on rust 1.53.0. ## Proposed Changes 1.53.0 updates ## Additional Info I haven't figure out why yet, we were now hitting the recursion limit in a few crates. So I had to add `#![recursion_limit = "256"]` in a few places Co-authored-by: realbigsean <seananderson33@gmail.com> Co-authored-by: Michael Sproul <michael@sigmaprime.io>	2021-06-18 05:58:01 +00:00
Kevin Lu	320a683e72	Minimum Outbound-Only Peers Requirement (#2356 ) ## Issue Addressed #2325 ## Proposed Changes This pull request changes the behavior of the Peer Manager by including a minimum outbound-only peers requirement. The peer manager will continue querying for peers if this outbound-only target number hasn't been met. Additionally, when peers are being removed, an outbound-only peer will not be disconnected if doing so brings us below the minimum. ## Additional Info Unit test for heartbeat function tests that disconnection behavior is correct. Continual querying for peers if outbound-only hasn't been met is not directly tested, but indirectly through unit testing of the helper function that counts the number of outbound-only peers. EDIT: Am concerned about the behavior of ```update_peer_scores```. If we have connected to a peer with a score below the disconnection threshold (-20), then its connection status will remain connected, while its score state will change to disconnected. ```rust let previous_state = info.score_state(); // Update scores info.score_update(); Self::handle_score_transitions( previous_state, peer_id, info, &mut to_ban_peers, &mut to_unban_peers, &mut self.events, &self.log, ); ``` ```previous_state``` will be set to Disconnected, and then because ```handle_score_transitions``` only changes connection status for a peer if the state changed, the peer remains connected. Then in the heartbeat code, because we only disconnect healthy peers if we have too many peers, these peers don't get disconnected. I'm not sure realistically how often this scenario would occur, but it might be better to adjust the logic to account for scenarios where the score state implies a connection status different from the current connection status. Co-authored-by: Kevin Lu <kevlu93@gmail.com>	2021-05-31 04:18:19 +00:00
Age Manning	ec5cceba50	Correct issue with dialing peers (#2375 ) The ordering of adding new peers to the peerdb and deciding when to dial them was not considered in a previous update. This adds the condition that if a peer is not in the peer-db then it is an acceptable peer to dial. This makes #2374 obsolete.	2021-05-29 07:25:06 +00:00
Age Manning	55aada006f	More stringent dialing (#2363 ) * More stringent dialing * Cover cached enr dialing	2021-05-26 14:21:44 +10:00
ethDreamer	0aa8509525	Filter Disconnected Peers from Discv5 DHT (#2219 ) ## Issue Addressed #2107 ## Proposed Change The peer manager will mark peers as disconnected in the discv5 DHT when they disconnect or dial fails ## Additional Info Rationale for this particular change is explained in my comment on #2107	2021-04-28 04:07:37 +00:00
Paul Hauner	8e5c20b6d1	Update for clippy 1.50 (#2193 ) ## Issue Addressed NA ## Proposed Changes Rust 1.50 has landed 🎉 The shiny new `clippy` peers down upon us mere mortals with disgust. Brutish peasants wrapping our `usize`s in superfluous `Option`s... tsk tsk. I've performed the goat sacrifice and corrected our evil ways in this PR. Tonight we shall pray that Github Actions bestows the almighty green tick upon us. ## Additional Info NA Co-authored-by: realbigsean <seananderson33@gmail.com> Co-authored-by: Michael Sproul <michael@sigmaprime.io>	2021-02-15 00:09:12 +00:00
realbigsean	e20f64b21a	Update to tokio 1.1 (#2172 ) ## Issue Addressed resolves #2129 resolves #2099 addresses some of #1712 unblocks #2076 unblocks #2153 ## Proposed Changes - Updates all the dependencies mentioned in #2129, except for web3. They haven't merged their tokio 1.0 update because they are waiting on some dependencies of their own. Since we only use web3 in tests, I think updating it in a separate issue is fine. If they are able to merge soon though, I can update in this PR. - Updates `tokio_util` to 0.6.2 and `bytes` to 1.0.1. - We haven't made a discv5 release since merging tokio 1.0 updates so I'm using a commit rather than release atm. Edit: I think we should merge an update of `tokio_util` to 0.6.2 into discv5 before this release because it has panic fixes in `DelayQueue` --> PR in discv5: https://github.com/sigp/discv5/pull/58 ## Additional Info tokio 1.0 changes that required some changes in lighthouse: - `interval.next().await.is_some()` -> `interval.tick().await` - `sleep` future is now `!Unpin` -> https://github.com/tokio-rs/tokio/issues/3028 - `try_recv` has been temporarily removed from `mpsc` -> https://github.com/tokio-rs/tokio/issues/3350 - stream features have moved to `tokio-stream` and `broadcast::Receiver::into_stream()` has been temporarily removed -> `https://github.com/tokio-rs/tokio/issues/2870 - I've copied over the `BroadcastStream` wrapper from this PR, but can update to use `tokio-stream` once it's merged https://github.com/tokio-rs/tokio/pull/3384 Co-authored-by: realbigsean <seananderson33@gmail.com>	2021-02-10 23:29:49 +00:00
Akihito Nakano	1a22a096c6	Fix clippy errors on tests (#2160 ) ## Issue Addressed There are some clippy error on tests. ## Proposed Changes Enable clippy check on tests and fix the errors. 💪	2021-01-28 23:31:06 +00:00
Paul Hauner	805e152f66	Simplify enum -> str with strum (#2164 ) ## Issue Addressed NA ## Proposed Changes As per #2100, uses derives from the sturm library to implement AsRef<str> and AsStaticRef to easily get str values from enums without creating new Strings. Furthermore unifies all attestation error counter into one IntCounterVec vector. These works are originally by @blacktemplar, I've just created this PR so I can resolve some merge conflicts. ## Additional Info NA Co-authored-by: blacktemplar <blacktemplar@a1.net>	2021-01-19 06:33:58 +00:00
realbigsean	7a71977987	Clippy 1.49.0 updates and dht persistence test fix (#2156 ) ## Issue Addressed `test_dht_persistence` failing ## Proposed Changes Bind `NetworkService::start` to an underscore prefixed variable rather than `_`. `_` was causing it to be dropped immediately This was failing 5/100 times before this update, but I haven't been able to get it to fail after updating it Co-authored-by: realbigsean <seananderson33@gmail.com>	2021-01-19 00:34:28 +00:00
Pawan Dhananjay	28238d97b1	Disconnect from peers quicker on internet issues (#2147 ) ## Issue Addressed Fixes #2146 ## Proposed Changes Change ping timeout errors to return `LowToleranceErrors` so that we disconnect faster on internet failures/changes.	2021-01-13 08:09:10 +00:00
Age Manning	7e4b190df0	Reduce ping interval (#2132 ) ## Issue Addressed #2123 ## Description Reduces the TCP ping interval to increase our responsiveness to peer liveness changes.	2021-01-06 04:35:52 +00:00
Age Manning	2931b05582	Update libp2p (#2101 ) This is a little bit of a tip-of-the-iceberg PR. It houses a lot of code changes in the libp2p dependency. This needs a bit of thorough testing before merging. The primary code changes are: - General libp2p dependency update - Gossipsub refactor to shift compression into gossipsub providing performance improvements and improved API for handling compression Co-authored-by: Paul Hauner <paul@paulhauner.com>	2020-12-23 07:53:36 +00:00
Pawan Dhananjay	f998eff7ce	Subnet discovery fixes (#2095 ) ## Issue Addressed N/A ## Proposed Changes Fixes multiple issues related to discovering of subnet peers. 1. Subnet discovery retries after yielding no results 2. Metadata updates if peer send older metadata 3. peerdb stores the peer subscriptions from gossipsub	2020-12-17 00:39:15 +00:00
divma	11c299cbf6	impl Resource Unavailable RPC error (#2072 ) ## Issue Addressed Related to #1891, The error is not in the spec yet (see ethereum/eth2.0-specs#2131) ## Proposed Changes Implement the proposed error, banning peers that send it ## Additional Info NA	2020-12-15 00:17:32 +00:00
Age Manning	4f85371ce8	Downgrades a valid log (#2057 ) ## Issue Addressed #2046 ## Proposed Changes The log was originally intended to verify the correct logic and ordering of events when scoring peers. The queued tasks can be structured in such a way that peers can be banned after they are disconnected. Therefore the error log is now downgraded to debug log.	2020-12-08 10:48:45 +00:00
divma	f3200784b4	More metrics + RPC tweaks (#2041 ) ## Issue Addressed NA ## Proposed Changes This was mostly done to find the reason why LH was dropping peers from Nimbus. It proved to be useful so I think it's worth it. But there is also some functional stuff here - Add metrics for rpc errors per client, error type and direction - Add metrics for downscoring events per source type, client and penalty type - Add metrics for gossip validation results per client for non-accepted messages - Make the RPC handler return errors and requests/responses in the order we see them - Allow a small burst for the Ping rate limit, from 1 every 5 seconds to 2 every 10 seconds - Send rate limiting errors with a particular code and use that same code to identify them. I picked something different to 128 since that is most likely what other clients are using for their own errors - Remove some unused code in the `PeerAction` and the rpc handler - Remove the unused variant `RateLimited`. tTis was never produced directly, since the only way to get the request's protocol is via de handler. The handler upon receiving from LH a response with an error (rate limited in this case) emits this event with the missing info (It was always like this, just pointing out that we do downscore rate limiting errors regardless of the change) Metrics for Nimbus looked like this: Downscoring events: `increase(libp2p_peer_actions_per_client{client="Nimbus"}[5m])` ![image](https://user-images.githubusercontent.com/26765164/101210880-862bf280-3676-11eb-94c0-399f0bf5aa2e.png) RPC Errors: `increase(libp2p_rpc_errors_per_client{client="Nimbus"}[5m])` ![image](https://user-images.githubusercontent.com/26765164/101210997-ba071800-3676-11eb-847a-f32405ede002.png) Unaccepted gossip message: `increase(gossipsub_unaccepted_messages_per_client{client="Nimbus"}[5m])` ![image](https://user-images.githubusercontent.com/26765164/101211124-f470b500-3676-11eb-9459-132ecff058ec.png)	2020-12-08 03:55:50 +00:00
Age Manning	2682f46025	Fingerprint new client identify agent string (#2027 ) Nimbus have modified their identify agent string. This PR adds their new agent string to identify new nimbus peers.	2020-12-03 22:07:14 +00:00
blacktemplar	d8cda2d86e	Fix new clippy lints (#2036 ) ## Issue Addressed NA ## Proposed Changes Fixes new clippy lints in the whole project (mainly [manual_strip](https://rust-lang.github.io/rust-clippy/master/index.html#manual_strip) and [unnecessary_lazy_evaluations](https://rust-lang.github.io/rust-clippy/master/index.html#unnecessary_lazy_evaluations)). Furthermore, removes `to_string()` calls on literals when used with the `?`-operator.	2020-12-03 01:10:26 +00:00
divma	8fcd22992c	No string in slog (#2017 ) ## Issue Addressed Following slog's documentation, this should help a bit with string allocations. I left it run for two days and mem usage is lower. This is of course anecdotal, but shouldn't harm anyway ## Proposed Changes remove `String` creation in logs when possible	2020-11-30 10:33:00 +00:00
Paul Hauner	85e69249e6	Drop discovery log to trace (#2007 ) ## Issue Addressed NA ## Proposed Changes This was causing: ``` Nov 28 21:56:08.154 ERRO slog-async: logger dropped messages due to channel overflow, count: 44, service: libp2p ``` ## Additional Info NA	2020-11-29 03:02:23 +00:00
Age Manning	a567f788bd	Upgrade to tokio 0.3 (#1839 ) ## Description This PR updates Lighthouse to tokio 0.3. It includes a number of dependency updates and some structural changes as to how we create and spawn tasks. This also brings with it a number of various improvements: - Discv5 update - Libp2p update - Fix for recompilation issues - Improved UPnP port mapping handling - Futures dependency update - Log downgrade to traces for rejecting peers when we've reached our max Co-authored-by: blacktemplar <blacktemplar@a1.net>	2020-11-28 05:30:57 +00:00
divma	fc07cc3fdf	Sync metrics (#1975 ) ## Issue Addressed - Add metrics to keep track of peer counts by sync type - Add metric to keep track of the number of syncing chains in range ## Proposed Changes Plugin to the network metrics update interval and update too the counts for peers wrt to their sync status with us ## Additional Info For the peer counts - By the way it is implemented the numbers won't always match to the total peer count in the `libp2p` metric. - Updating the gauge with every change is messy because it requires to be updated on connection (in the `eth2_libp2p` crate, while metrics are defined in the `network` crate) on Goodbye sent (for an `IrrelevantPeer`) either in the `beacon_processor` or the `peer_manager`, and on disconnection. Since this is not a critical metric I think counting once every second is enough. If you think more accuracy is needed we can do it too, but it would be harder to maintain) ATM those look like this ![image](https://user-images.githubusercontent.com/26765164/100275387-22137b00-2f60-11eb-93b9-94b0f265240c.png)	2020-11-26 05:23:17 +00:00
divma	3b4afc27bf	Status race condition (#1967 ) ## Issue Addressed Sync stalls due to race conditions between dc notifications and status processing	2020-11-25 02:15:38 +00:00
blacktemplar	3408de8151	Avoid string initialization in network metrics and replace by &str where possible (#1898 ) ## Issue Addressed NA ## Proposed Changes Removes most of the temporary string initializations in network metrics and replaces them by directly using `&str`. This further improves on PR https://github.com/sigp/lighthouse/pull/1895. For the subnet id handling the current approach uses a build script to create a static map. This has the disadvantage that the build script hardcodes the number of subnets. If we want to use more than 64 subnets we need to adjust this in the build script. ## Additional Info We still have some string initializations for the enum `PeerKind`. To also replace that by `&str` I created a PR in the libp2p dependency: https://github.com/sigp/rust-libp2p/pull/91. Either we wait with merging until this dependency PR is merged (and all conflicts with the newest libp2p version are resolved) or we just merge as is and I will create another PR when the dependency is ready.	2020-11-18 23:31:37 +00:00
Age Manning	49c4630045	Performance improvement for db reads (#1909 ) This PR adds a number of improvements: - Downgrade a warning log when we ignore blocks for gossipsub processing - Revert a a correction to improve logging of peer score changes - Shift syncing DB reads off the core-executor allowing parallel processing of large sync messages - Correct the timeout logic of RPC chunk sends, giving more time before timing out RPC outbound messages.	2020-11-16 07:28:30 +00:00
divma	eb56140582	Update logs + do not downscore peers if WE time out (#1901 ) ## Issue Addressed - RPC Errors were being logged twice: first in the peer manager and then again in the router, so leave just the peer manager's one - The "reduce peer count" warn message gets thrown to the user for every missed chunk, so instead print it when the request times out and also do not include there info that is not relevant to the user - The processor didn't have the service tag so add it - Impl `KV` for status message - Do not downscore peers if we are the ones that timed out Other small improvements	2020-11-16 04:06:14 +00:00
divma	8a16548715	Misc Peer sync info adjustments (#1896 ) ## Issue Addressed #1856 ## Proposed Changes - For clarity, the router's processor now only decides if a peer is compatible and it disconnects it or sends it to sync accordingly. No logic here regarding how useful is the peer. - Update peer_sync_info's rules - Add an `IrrelevantPeer` sync status to account for incompatible peers (maybe this should be "IncompatiblePeer" now that I think about it?) this state is update upon receiving an internal goodbye in the peer manager - Misc code cleanups - Reduce the need to create `StatusMessage`s (and thus, `Arc` accesses ) - Add missing calls to update the global sync state The overall effect should be: - More peers recognized as Behind, and less as Unknown - Peers identified as incompatible	2020-11-13 09:00:10 +00:00
Age Manning	c00e6c2c6f	Small network adjustments (#1884 ) ## Issue Addressed - Asymmetric pings - Currently with symmetric ping intervals, lighthouse nodes race each other to ping often ending in simultaneous ping connections. This shifts the ping interval to be asymmetric based on inbound/outbound connections - Correct inbound/outbound peer-db registering - It appears we were accounting inbound as outbound and vice versa in the peerdb, this has been corrected - Improved logging There is likely more to come - I'll leave this open as we investigate further testnets	2020-11-13 06:06:33 +00:00
blacktemplar	c7ac967d5a	handle peer state transitions on gossipsub score changes + refactoring (#1892 ) ## Issue Addressed NA ## Proposed Changes Correctly handles peer state transitions on gossipsub changes + refactors handling of peer state transitions into one function used for lighthouse score changes and gossipsub score changes. Co-authored-by: Age Manning <Age@AgeManning.com>	2020-11-13 03:15:03 +00:00
blacktemplar	7404f1ce54	Gossipsub scoring (#1668 ) ## Issue Addressed #1606 ## Proposed Changes Uses dynamic gossipsub scoring parameters depending on the number of active validators as specified in https://gist.github.com/blacktemplar/5c1862cb3f0e32a1a7fb0b25e79e6e2c. ## Additional Info Although the parameters got tested on Medalla, extensive testing using simulations on larger networks is still to be done and we expect that we need to change the parameters, although this might only affect constants within the dynamic parameter framework.	2020-11-12 01:48:28 +00:00
divma	b0e9e3dcef	Seen addresses store port (#1841 ) ## Issue Addressed #1764	2020-11-09 04:01:03 +00:00
blacktemplar	7e7fad5734	Ignore RPC messages of disconnected peers and remove old peers based on disconnection time (#1854 ) ## Issue Addressed NA ## Proposed Changes Lets the networking behavior ignore messages of peers that are not connected. Furthermore, old peers are not removed from the peerdb based on score anymore but based on the disconnection time.	2020-11-03 23:43:10 +00:00
divma	6c0c050fbb	Tweak head syncing (#1845 ) ## Issue Addressed Fixes head syncing ## Proposed Changes - Get back to statusing peers after removing chain segments and making the peer manager deal with status according to the Sync status, preventing an old known deadlock - Also a bug where a chain would get removed if the optimistic batch succeeds being empty ## Additional Info Tested on Medalla and looking good	2020-11-01 23:37:39 +00:00
blacktemplar	2bd5b9182f	fix unbanning of peers (#1838 ) ## Issue Addressed NA ## Proposed Changes Currently a banned peer will remain banned indefinitely as long as update is called on the score struct regularly. This fixes this bug and the score decay starts after `BANNED_BEFORE_DECAY` seconds after banning.	2020-10-29 01:25:02 +00:00
Age Manning	7453f39d68	Prevent unbanning of disconnected peers (#1822 ) ## Issue Addressed Further testing revealed another edge case where we attempt to unban a peer that can be in a disconnected start. Although this causes no real issue, it does log an error to the user. This PR adds a check to prevent this edge case and prevents the error being logged to the user.	2020-10-24 05:24:20 +00:00
Age Manning	a3cc1a1e0f	Call unban only when necessary (#1821 ) This PR prevents a user-facing error. It prevents optimistically unbanning a peer and instead checks the state of the peer before requesting the peers state to be unbanned.	2020-10-24 03:24:19 +00:00
Age Manning	66f0cf4430	Improve peer handling (#1796 ) ## Issue Addressed Potentially resolves #1647 and sync stalls. ## Proposed Changes The handling of the state of banned peers was inadequate for the complex peerdb data structure. We store a limited number of disconnected and banned peers in the db. We were not tracking intermediate "disconnecting" states and the in some circumstances we were updating the peer state without informing the peerdb. This lead to a number of inconsistencies in the peer state. Further, the peer manager could ban a peer changing a peer's state from being connected to banned. In this circumstance, if the peer then disconnected, we didn't inform the application layer, which lead to applications like sync not being informed of a peers disconnection. This could lead to sync stalling and having to require a lighthouse restart. Improved handling for peer states and interactions with the peerdb is made in this PR.	2020-10-23 01:27:48 +00:00
realbigsean	a3552a4b70	Node endpoints (#1778 ) ## Issue Addressed `node` endpoints in #1434 ## Proposed Changes Implement these: ``` /eth/v1/node/health /eth/v1/node/peers/{peer_id} /eth/v1/node/peers ``` - Add an `Option<Enr>` to `PeerInfo` - Finish implementation of `/eth/v1/node/identity` ## Additional Info - should update the `peers` endpoints when #1764 is resolved Co-authored-by: realbigsean <seananderson33@gmail.com>	2020-10-22 02:59:42 +00:00
divma	668513b67e	Sync state adjustments (#1804 ) check for advanced peers and the state of the chain wrt the clock slot to decide if a chain is or not synced /transitioning to a head sync. Also a fix that prevented getting the right state while syncing heads	2020-10-22 00:26:06 +00:00
divma	2acf75785c	More sync updates (#1791 ) ## Issue Addressed #1614 and a couple of sync-stalling problems, the most important is a cyclic dependency between the sync manager and the peer manager	2020-10-20 22:34:18 +00:00
blacktemplar	6ba997b88e	add direction information to PeerInfo (#1768 ) ## Issue Addressed NA ## Proposed Changes Adds a direction field to `PeerConnectionStatus` that can be accessed by calling `is_outgoing` which will return `true` iff the peer is connected and the first connection was an outgoing one.	2020-10-16 05:24:21 +00:00

1 2

79 Commits