Pause sync when EE is offline (#3428)

## Issue Addressed #3032 ## Proposed Changes Pause sync when ee is offline. Changes include three main parts: - Online/offline notification system - Pause sync - Resume sync #### Online/offline notification system - The engine state is now guarded behind a new struct `State` that ensures every change is correctly notified. Notifications are only sent if the state changes. The new `State` is behind a `RwLock` (as before) as the synchronization mechanism. - The actual notification channel is a [tokio::sync::watch](https://docs.rs/tokio/latest/tokio/sync/watch/index.html) which ensures only the last value is in the receiver channel. This way we don't need to worry about message order etc. - Sync waits for state changes concurrently with normal messages. #### Pause Sync Sync has four components, pausing is done differently in each: - **Block lookups**: Disabled while in this state. We drop current requests and don't search for new blocks. Block lookups are infrequent and I don't think it's worth the extra logic of keeping these and delaying processing. If we later see that this is required, we can add it. - **Parent lookups**: Disabled while in this state. We drop current requests and don't search for new parents. Parent lookups are even less frequent and I don't think it's worth the extra logic of keeping these and delaying processing. If we later see that this is required, we can add it. - **Range**: Chains don't send batches for processing to the beacon processor. This is easily done by guarding the channel to the beacon processor and giving it access only if the ee is responsive. I find this the simplest and most powerful approach since we don't need to deal with new sync states and chain segments that are added while the ee is offline will follow the same logic without needing to synchronize a shared state among those. Another advantage of passive pause vs active pause is that we can still keep track of active advertised chain segments so that on resume we don't need to re-evaluate all our peers. - **Backfill**: Not affected by ee states, we don't pause. #### Resume Sync - **Block lookups**: Enabled again. - **Parent lookups**: Enabled again. - **Range**: Active resume. Since the only real pause range does is not sending batches for processing, resume makes all chains that are holding read-for-processing batches send them. - **Backfill**: Not affected by ee states, no need to resume. ## Additional Info **QUESTION**: Originally I made this to notify and change on synced state, but @pawanjay176 on talks with @paulhauner concluded we only need to check online/offline states. The upcheck function mentions extra checks to have a very up to date sync status to aid the networking stack. However, the only need the networking stack would have is this one. I added a TODO to review if the extra check can be removed Next gen of #3094 Will work best with #3439 Co-authored-by: Pawan Dhananjay <pawandhananjay@gmail.com>
2026-06-16 10:18:15 +00:00 · 2022-08-24 23:34:56 +00:00
parent aab4a8d2f2
commit 8c69d57c2c
14 changed files with 574 additions and 328 deletions
--- a/beacon_node/network/src/sync/backfill_sync/mod.rs
+++ b/beacon_node/network/src/sync/backfill_sync/mod.rs
@@ -24,7 +24,6 @@ use std::collections::{
    HashMap, HashSet,
 };
 use std::sync::Arc;
-use tokio::sync::mpsc;
 use types::{Epoch, EthSpec, SignedBeaconBlock};

 /// Blocks are downloaded in batches from peers. This constant specifies how many epochs worth of
@@ -144,9 +143,6 @@ pub struct BackFillSync<T: BeaconChainTypes> {
    /// (i.e synced peers).
    network_globals: Arc<NetworkGlobals<T::EthSpec>>,

-    /// A multi-threaded, non-blocking processor for processing batches in the beacon chain.
-    beacon_processor_send: mpsc::Sender<BeaconWorkEvent<T>>,
-
    /// A logger for backfill sync.
    log: slog::Logger,
 }
@@ -155,7 +151,6 @@ impl<T: BeaconChainTypes> BackFillSync<T> {
    pub fn new(
        beacon_chain: Arc<BeaconChain<T>>,
        network_globals: Arc<NetworkGlobals<T::EthSpec>>,
-        beacon_processor_send: mpsc::Sender<BeaconWorkEvent<T>>,
        log: slog::Logger,
    ) -> Self {
        // Determine if backfill is enabled or not.
@@ -193,7 +188,6 @@ impl<T: BeaconChainTypes> BackFillSync<T> {
            participating_peers: HashSet::new(),
            restart_failed_sync: false,
            beacon_chain,
-            beacon_processor_send,
            log,
        };

@@ -216,7 +210,7 @@ impl<T: BeaconChainTypes> BackFillSync<T> {
    #[must_use = "A failure here indicates the backfill sync has failed and the global sync state should be updated"]
    pub fn start(
        &mut self,
-        network: &mut SyncNetworkContext<T::EthSpec>,
+        network: &mut SyncNetworkContext<T>,
    ) -> Result<SyncStart, BackFillError> {
        match self.state() {
            BackFillState::Syncing => {} // already syncing ignore.
@@ -312,7 +306,7 @@ impl<T: BeaconChainTypes> BackFillSync<T> {
    pub fn peer_disconnected(
        &mut self,
        peer_id: &PeerId,
-        network: &mut SyncNetworkContext<T::EthSpec>,
+        network: &mut SyncNetworkContext<T>,
    ) -> Result<(), BackFillError> {
        if matches!(
            self.state(),
@@ -355,7 +349,7 @@ impl<T: BeaconChainTypes> BackFillSync<T> {
    #[must_use = "A failure here indicates the backfill sync has failed and the global sync state should be updated"]
    pub fn inject_error(
        &mut self,
-        network: &mut SyncNetworkContext<T::EthSpec>,
+        network: &mut SyncNetworkContext<T>,
        batch_id: BatchId,
        peer_id: &PeerId,
        request_id: Id,
@@ -392,7 +386,7 @@ impl<T: BeaconChainTypes> BackFillSync<T> {
    #[must_use = "A failure here indicates the backfill sync has failed and the global sync state should be updated"]
    pub fn on_block_response(
        &mut self,
-        network: &mut SyncNetworkContext<T::EthSpec>,
+        network: &mut SyncNetworkContext<T>,
        batch_id: BatchId,
        peer_id: &PeerId,
        request_id: Id,
@@ -505,7 +499,7 @@ impl<T: BeaconChainTypes> BackFillSync<T> {
    /// The batch must exist and be ready for processing
    fn process_batch(
        &mut self,
-        network: &mut SyncNetworkContext<T::EthSpec>,
+        network: &mut SyncNetworkContext<T>,
        batch_id: BatchId,
    ) -> Result<ProcessResult, BackFillError> {
        // Only process batches if this chain is Syncing, and only one at a time
@@ -541,8 +535,8 @@ impl<T: BeaconChainTypes> BackFillSync<T> {
        let process_id = ChainSegmentProcessId::BackSyncBatchId(batch_id);
        self.current_processing_batch = Some(batch_id);

-        if let Err(e) = self
-            .beacon_processor_send
+        if let Err(e) = network
+            .processor_channel()
            .try_send(BeaconWorkEvent::chain_segment(process_id, blocks))
        {
            crit!(self.log, "Failed to send backfill segment to processor."; "msg" => "process_batch",
@@ -563,7 +557,7 @@ impl<T: BeaconChainTypes> BackFillSync<T> {
    #[must_use = "A failure here indicates the backfill sync has failed and the global sync state should be updated"]
    pub fn on_batch_process_result(
        &mut self,
-        network: &mut SyncNetworkContext<T::EthSpec>,
+        network: &mut SyncNetworkContext<T>,
        batch_id: BatchId,
        result: &BatchProcessResult,
    ) -> Result<ProcessResult, BackFillError> {
@@ -704,7 +698,7 @@ impl<T: BeaconChainTypes> BackFillSync<T> {
    /// Processes the next ready batch.
    fn process_completed_batches(
        &mut self,
-        network: &mut SyncNetworkContext<T::EthSpec>,
+        network: &mut SyncNetworkContext<T>,
    ) -> Result<ProcessResult, BackFillError> {
        // Only process batches if backfill is syncing and only process one batch at a time
        if self.state() != BackFillState::Syncing || self.current_processing_batch.is_some() {
@@ -764,11 +758,7 @@ impl<T: BeaconChainTypes> BackFillSync<T> {
    ///
    /// If a previous batch has been validated and it had been re-processed, penalize the original
    /// peer.
-    fn advance_chain(
-        &mut self,
-        network: &mut SyncNetworkContext<T::EthSpec>,
-        validating_epoch: Epoch,
-    ) {
+    fn advance_chain(&mut self, network: &mut SyncNetworkContext<T>, validating_epoch: Epoch) {
        // make sure this epoch produces an advancement
        if validating_epoch >= self.current_start {
            return;
@@ -863,7 +853,7 @@ impl<T: BeaconChainTypes> BackFillSync<T> {
    /// intended and can result in downvoting a peer.
    fn handle_invalid_batch(
        &mut self,
-        network: &mut SyncNetworkContext<T::EthSpec>,
+        network: &mut SyncNetworkContext<T>,
        batch_id: BatchId,
    ) -> Result<(), BackFillError> {
        // The current batch could not be processed, indicating either the current or previous
@@ -914,7 +904,7 @@ impl<T: BeaconChainTypes> BackFillSync<T> {
    /// Sends and registers the request of a batch awaiting download.
    fn retry_batch_download(
        &mut self,
-        network: &mut SyncNetworkContext<T::EthSpec>,
+        network: &mut SyncNetworkContext<T>,
        batch_id: BatchId,
    ) -> Result<(), BackFillError> {
        let batch = match self.batches.get_mut(&batch_id) {
@@ -958,7 +948,7 @@ impl<T: BeaconChainTypes> BackFillSync<T> {
    /// Requests the batch assigned to the given id from a given peer.
    fn send_batch(
        &mut self,
-        network: &mut SyncNetworkContext<T::EthSpec>,
+        network: &mut SyncNetworkContext<T>,
        batch_id: BatchId,
        peer: PeerId,
    ) -> Result<(), BackFillError> {
@@ -1011,10 +1001,7 @@ impl<T: BeaconChainTypes> BackFillSync<T> {

    /// When resuming a chain, this function searches for batches that need to be re-downloaded and
    /// transitions their state to redownload the batch.
-    fn resume_batches(
-        &mut self,
-        network: &mut SyncNetworkContext<T::EthSpec>,
-    ) -> Result<(), BackFillError> {
+    fn resume_batches(&mut self, network: &mut SyncNetworkContext<T>) -> Result<(), BackFillError> {
        let batch_ids_to_retry = self
            .batches
            .iter()
@@ -1040,7 +1027,7 @@ impl<T: BeaconChainTypes> BackFillSync<T> {
    /// pool and left over batches until the batch buffer is reached or all peers are exhausted.
    fn request_batches(
        &mut self,
-        network: &mut SyncNetworkContext<T::EthSpec>,
+        network: &mut SyncNetworkContext<T>,
    ) -> Result<(), BackFillError> {
        if !matches!(self.state(), BackFillState::Syncing) {
            return Ok(());