Allow AwaitingDownload to be a valid in-between state (#7984)

N/A


  Extracts (3) from https://github.com/sigp/lighthouse/pull/7946.

Prior to peerdas, a batch should never have been in `AwaitingDownload` state because we immediataly try to move from `AwaitingDownload` to `Downloading` state by sending batches. This was always possible as long as we had peers in the `SyncingChain` in the pre-peerdas world.

However, this is no longer the case as a batch can be stuck waiting in `AwaitingDownload` state if we have no peers to request the columns from. This PR makes `AwaitingDownload` to be an allowable in between state. If a batch is found to be in this state, then we attempt to send the batch instead of erroring like before.
Note to reviewer: We need to make sure that this doesn't lead to a bunch of batches stuck in `AwaitingDownload` state if the chain can be progressed.

Backfill already retries all batches in AwaitingDownload state so we just need to make `AwaitingDownload` a valid state during processing and validation.

This PR explicitly adds the same logic for forward sync to download batches stuck in `AwaitingDownload`.
Apart from that, we also force download of the `processing_target` when sync stops progressing. This is required in cases where `self.batches` has > `BATCH_BUFFER_SIZE` batches that are waiting to get processed but the `processing_batch` has repeatedly failed at download/processing stage. This leads to sync getting stuck and never recovering.
This commit is contained in:
Pawan Dhananjay
2025-09-04 00:39:16 -07:00
committed by GitHub
parent c2a92f1a8c
commit 84ec209eba
2 changed files with 67 additions and 18 deletions

View File

@@ -687,11 +687,12 @@ impl<T: BeaconChainTypes> BackFillSync<T> {
// Batch is not ready, nothing to process
}
BatchState::Poisoned => unreachable!("Poisoned batch"),
BatchState::Failed | BatchState::AwaitingDownload | BatchState::Processing(_) => {
// Batches can be in `AwaitingDownload` state if there weren't good data column subnet
// peers to send the request to.
BatchState::AwaitingDownload => return Ok(ProcessResult::Successful),
BatchState::Failed | BatchState::Processing(_) => {
// these are all inconsistent states:
// - Failed -> non recoverable batch. Chain should have been removed
// - AwaitingDownload -> A recoverable failed batch should have been
// re-requested.
// - Processing -> `self.current_processing_batch` is None
self.fail_sync(BackFillError::InvalidSyncState(String::from(
"Invalid expected batch state",
@@ -790,7 +791,8 @@ impl<T: BeaconChainTypes> BackFillSync<T> {
}
}
BatchState::Downloading(..) => {}
BatchState::Failed | BatchState::Poisoned | BatchState::AwaitingDownload => {
BatchState::AwaitingDownload => return,
BatchState::Failed | BatchState::Poisoned => {
crit!("batch indicates inconsistent chain state while advancing chain")
}
BatchState::AwaitingProcessing(..) => {}