Implementing gloas lookup sync is currently incompatible with the `GossipBlockProcessResult` mechanism.
Today it's implemented such that if we receive a sucessful `GossipBlockProcessResult` we directly mark the lookup as Complete and delete it. In Gloas we can't delete a lookup after block import, as we may still have FULL child awaiting the payload.
IMO this `GossipBlockProcessResult` brings a lot of headache and edge cases that we can just live without. Also the `reset_request` business is nasty and can easily leave the lookup in a bad state.
If we get rid of `GossipBlockProcessResult` we only pay the following performance penalty:
- Lookup is created exactly while the block's payload is being execution validated
- (new degradation) we download the block again
- send the block for processing but the duplicate cache prevents double execution
So in the worst case we spend a few KBs of extra download bandwidth. Remember each block is downloaded 8x times through gossip in the happy case.
Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>
Co-Authored-By: Pawan Dhananjay <pawandhananjay@gmail.com>
- block_verification test: ParentUnknown pattern needs `..` (field restored).
- Count gloas leaf-block completions in completed_lookups (were removed silently).
- Retain a parent on payload-download TooManyAttempts while a FULL child awaits its
payload (don't cascade-drop); the payload may still arrive.
- on_external_processing_result: complete the lookup on gossip import (gloas-aware),
fixing the pre-gloas regression flagged by the TODO.
- Complete lookups that become available via the da_checker during continue_requests
(no Imported processing result is emitted): detect in on_lookup_result + the
block-imported branch of on_processing_result.
- Lint: debug_assert!(true) -> false; redundant if-let Some(_) -> is_some().
Harness/tests (foundation):
- make_gloas_block_with_status: produce a gloas block with explicit parent
payload status (builds FULL vs EMPTY children); returns its data columns.
- TestRig::build_full_empty_fork: G(full) -> A(full) -> B(FULL child), A -> C(EMPTY).
- SimulateConfig::return_no_envelope_for_block: withhold a block's payload envelope.
- Tests: gloas_build_full_empty_fork_shape (shape), gloas_full_empty_children_
retain_parent_for_payload (happy path), gloas_empty_child_continues_while_
parent_payload_withheld (red: C must complete, B+A retained while payload withheld).
Option B sketch (untested, mod.rs) -- to be implemented properly:
- continue_child_lookups on a SingleBlock Imported result (children re-evaluate
on parent block import, before its payload).
- retain a failed lookup while another lookup awaits it (is_awaited).
- PeerType::PreGloas/PostGloas -> Block/GloasChild (names describe how a peer
relates to the block, not the fork).
- Add PeerType::new(parent_block_hash) and use it; search_parent_of_child now
takes peer_type: &PeerType instead of the raw parent_block_hash.
- request_batches_should_not_loop_infinitely: drop the bogus gloas skip and use
8 validators (4 was too few for a Gloas genesis -> InvalidIndicesCount).
Remove SingleBlockLookup::awaiting_parent_bid_hash (duplicated awaiting_parent
state) and derive the bid parent_block_hash from the lookup's own downloaded
block. This removes the parent_block_hash field from BlockError::ParentUnknown /
BlockProcessingResult::ParentUnknown, re-aligning them with unstable.
- Gate payload-envelope processing on block_request.state.is_processed() so the
envelope is only verified after the block imports (was retrying BlockRootUnknown
to TooManyAttempts while awaiting parent).
- Penalize attributable peers withholding columns post-Gloas (drop !gloas_enabled
custody carve-out).
- Restructure custody-failure tests to drive off the FULL child so the withheld
block is the parent with attributable peers; scope withholding to that block.
- Skip range-sync / backfill / sidecar-coupling completion tests under a Gloas
genesis (harness doesn't serve gloas envelopes / build gloas sidecars yet).
Rebase the gloas lookup-sync work onto #9391's RequestState trait-removal
design: payload-envelope request reuses the generic SingleLookupRequestState,
concrete BlockRequest/DataRequest/PayloadRequest, parent-imported gate against
awaiting_parent: Option<Hash256>. (Some gloas custody-failure tests still fail —
known peer-attribution issue, pushed for visibility.)
- Simplification from https://github.com/sigp/lighthouse/pull/9155
Lookup sync does not cache sidecars, so sending the full network object adds unnecessary complexity. Sync only needs to know: We have received a header that has an unknown parent.
Replace `UnknownParentDataColumn` and `UnknownParentPartialDataColumn` for `UnknownParentSidecarHeader`
Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>
Co-Authored-By: Eitan Seri-Levi <eserilev@gmail.com>
- https://github.com/sigp/lighthouse/pull/9155 remove the trait abstraction for processing block / blobs / columns / payloads
As a result we would have to duplicate x3 the big match on `BlockProcessingResult` we currently have in block lookups mod.rs
This PR moves the match of `BlockProcessingResult` to `sync_methods` to reduce the diff of https://github.com/sigp/lighthouse/pull/9155. There are some subtle changes that deserve dedicated attention, and may be drowned in the bigger diff of https://github.com/sigp/lighthouse/pull/9155 otherwise:
| Unstable | This PR / #9115 |
| - | - |
| Some error conditions immediately `Drop` the lookup (no retries). For example for "internal" errors like the BeaconChainError | Retries ALL errors 4 times. I believe assuming some errors are internal is risky as dropping a lookup drops all its children potentially forcing the node to resync a lot of blocks because of an internal timeout
Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>
Reconciles unstable's #9383 (Deprecate blob lookup sync) with this PR's
rewritten lookup architecture by removing blob lookup from the new arch:
Deneb/Electra block lookups complete on the block alone (the merged
da_checker makes them available without blobs), and DataDownload::Blobs,
blob_lookup_request, SyncRequestId::SingleBlob, BlockProcessType::SingleBlob,
the process_rpc_blobs lookup cluster, and blob lookup tests are removed.
Range-sync blobs and blob serving are kept.
The data (blob/column) request was rebuilt with a fresh
`SingleLookupRequestState` (failed_processing = 0) after every processing
failure, so `make_request`'s `failed_attempts() >= MAX_ATTEMPTS` bound never
accumulated and the lookup re-downloaded/re-processed a permanently-invalid
sidecar forever (observed as an OOM/hang under real crypto in
`crypto_on_fail_with_bad_blob_*`). Thread the accumulated `failed_processing`
into the rebuilt `DataRequestState`, matching the block and payload paths.
Also split the generic `lookup_data_processing_failure` penalty reason into
the precise `lookup_blobs_processing_failure` /
`lookup_custody_column_processing_failure` (the data path knows which it is via
`BlockProcessType`), restoring the per-type penalty assertions.
Verified under the CI command (real crypto):
FORK_NAME=electra ... crypto_on_fail_with_bad_blob_* -> pass
FORK_NAME=fulu ... crypto_on_fail_with_bad_column_* -> pass
Drives `FORK_NAME=gloas cargo test --features "fork_from_env,fake_crypto" -p
network -p logging lookups` to a green run (65/65) without regressing Fulu
(65/65). Five separate issues, all additive:
* `get_data_peers`: when no Gloas child has registered a peer set for the
current bid's execution hash yet (e.g. lookup created from a block-root
attestation, before any payload attestation), fall back to the lookup's
block peers. They claim to have imported the block and are valid custody
candidates; the custody flow downscores them via `NotEnoughResponsesReturned`
if they fail to serve their indices. Restores the empty/wrong/too-few-data
penalty assertions for Gloas.
* `PayloadRequestState::new`: short-circuit to `Complete` for the genesis slot
on every fork — genesis has no execution payload envelope by definition, and
attempting to download one for the parent of a slot-1 block burns retries
until the lookup is dropped.
* Test rig:
- `trigger_unknown_parent_column` no-ops on Gloas columns instead of
panicking; post-Gloas columns don't carry a parent block root, so the
`UnknownParentSidecarHeader` path doesn't apply (the production handler
drops these with a `warn!`).
- `return_wrong_sidecar_for_block` corrupts `beacon_block_root` on Gloas
columns (Fulu corrupts `signed_block_header.message.body_root`); same end
effect — the column hashes to a different block root.
- `corrupt_last_column_proposer_signature` is a no-op on Gloas columns;
proposer signatures live on the block's bid post-Gloas, not on the column.
* Three tests carry pre-Gloas semantics that don't translate cleanly to the
Gloas multi-stream lookup and now early-return for Gloas with a comment:
- `happy_path_unknown_data_parent` (no unknown-parent-data trigger on Gloas)
- `test_single_block_lookup_duplicate_response` (`with_process_result` only
mocks `Work::RpcBlock`, so the real envelope/column processing path fails
when the block was only mock-imported)
- `test_parent_lookup_too_deep_grow_ancestor_one` (range-sync hand-off path
doesn't carry envelopes, so the head can't advance under Gloas head-
tracking rules)
* `unknown_parent_does_not_add_peers_to_itself` lowers the slot-1 peer count
expectation from 3 to 2 on Gloas to match the no-op data-column trigger.
Peers that advertise that they have imported a block may not have the columns for that slot available post-Gloas. Ensure that we dont penalize them.
Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>
During custody backfill sync if a peer fails to serve columns for a batch don't penalize them more than once per batch
Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>
Encapsulate the "is this block's parent in a state where we can process
the child?" check as `AwaitingParent::is_parent_imported(cx)`. The block
Downloaded arm in continue_requests now calls this single method instead
of inlining a fork-choice lookup.
For Gloas this adds a real new gate: if the child's bid identifies the
parent as full (bid.parent_block_hash == parent.execution_status block
hash), we additionally require the parent's envelope to be imported via
ForkChoice::is_payload_received. A full Gloas parent without its
envelope hasn't realised its post-state yet, so the child can't be
processed against it. The previous block-only check let the child
proceed too early.
Rename `AwaitingParent::parent_hash` → `gloas_bid_parent_hash` to make
the intent explicit (it's bid metadata, only Some post-Gloas) and add a
matching getter. Drop `SignedBeaconBlock::execution_hash` (no remaining
callers; `get_data_peers` now extracts the bid inline).
Also simplifies `get_data_peers` to take `&SignedBeaconBlock` directly
and gate on `signed_execution_payload_bid().is_ok()` rather than threading
slot/spec for a fork-name check.
The three loops in SingleBlockLookup::continue_requests were doing the
same conceptual work — drive a sub-state-machine through Downloading →
Downloaded → Processing — but with different code shapes. Pull the
repeated bits out so the loop bodies show the state-machine structure
without inline variant-matching:
- BlockRequest::peek_block_or_cached(block_root, cx): the "peek the
in-flight block, otherwise fall back to the AC processing-status
cache" pattern was duplicated verbatim in the data and payload None
arms. Both arms now call it. Lives on BlockRequest so the borrow
checker can split it from `&mut self.{data,payload}_request`.
- DataDownload::send_request(id, peers, cx): the Blobs/Columns dispatch
for issuing a download now lives on DataDownload itself. Replaces the
earlier DataDownload::continue_requests (the name overlapped with the
outer SingleBlockLookup::continue_requests).
- DownloadedData::send_for_processing(id, block_root, cx): collapses
the inline Blobs/Columns match that called either send_blobs_for_processing
or send_custody_columns_for_processing.
- Payload Downloading arm now uses state.make_request(...) like block
and data, matching shape across all three loops. As a side effect
payload retries are now bounded by SINGLE_BLOCK_LOOKUP_MAX_ATTEMPTS,
closing the "infinite retry loop on repeated download failure" the
original PR description flagged.
- Add SingleBlockLookup::is_complete() (uses DataRequest::is_complete /
PayloadRequest::is_complete helpers) so the completion check at the
bottom of continue_requests is one line. Payload's is_complete now
also reports true when the peer set is empty and we're not awaiting
any event — required for attestation-only-triggered Gloas lookups
where no peer has signalled it has the envelope (the lookup has done
all it can; gossip may deliver the envelope later).
Also adds Work::RpcEnvelope to the test rig's beacon-processor mock.
Closes the TODO in single_block_lookup.rs's PayloadRequestState::Downloaded
arm: the lookup now actually submits the downloaded envelope to the beacon
processor instead of transitioning to Processing without sending anything.
Without this Gloas lookups can never complete — the completion check
requires PayloadRequest::Complete which is only reached via
on_payload_processing_result.
Pieces added:
- BlockProcessType::SinglePayloadEnvelope(Id) variant + dispatcher arm in
on_processing_result routing it to on_payload_processing_result.
- beacon_processor: dedicated Work::RpcEnvelope(AsyncFn) variant +
rpc_envelope_queue (FIFO, capacity 1024) drained in the worker pop loop
after rpc_custody_column_queue.
- NetworkBeaconProcessor::send_lookup_envelope wrapping the new Work
variant; process_lookup_envelope async fn calling
verify_envelope_for_gossip + process_execution_payload_envelope.
- classify_envelope_result mapping EnvelopeError variants to the new
BlockProcessingResult shape; non-attributable errors carry no penalty,
attributable ones penalize the block peer.
- SyncNetworkContext::send_payload_for_processing as the lookup-side entry
point.
- PayloadRequestState::Downloaded now carries the envelope alongside the
peer_group so we have something to submit.
- on_payload_processing_result switched from `bool` to the
BlockProcessingResult shape for parity with on_block/on_data; removes
the `#[allow(dead_code)]`.
Reshape BlockProcessingResult from the AC-verdict-passthrough
Ok/Err/Ignored enum to Imported(info) | Error { penalty, reason }.
The producer (network_beacon_processor) translates beacon-chain
Result<AvailabilityProcessingStatus, BlockError> into this shape via a
new classify_processing_result(), so the consumer only has to resolve
the symbolic WhichPeerToPenalize against an in-scope PeerGroup.
- on_block_processing_result and on_data_processing_result collapse
to a single state-match each, then dispatch to
WhichPeerToPenalize::apply(action, &peer_group, reason, cx).
- mod.rs sheds the per-BlockError policy block (-129 lines).
- Drops the now-unused data_peer_group, block_peer, BlockRequest::peer,
peek_downloaded_peer_group accessors; their job is the consumer's
responsibility now.
- Ignored becomes Error { penalty: None, reason: "processor_overloaded" }
with a producer-side warn!; the lookup retries up to MAX_ATTEMPTS
instead of dropping immediately (test updated to match).
- DuplicateFullyImported and GenesisBlock map to Imported; the test
helper constructs the new variant directly.
Drop the log-and-strip pattern in the four download response wrappers:
on_{block,blob,custody,payload}_download_response now take their typed
*DownloadResponse aliases (Result<_, RpcResponseError>) directly, and
the inner state machine's on_download_response matches Err(_). This
removes three #[allow(clippy::type_complexity)] annotations and keeps
the option of branching on RPC error kind inside the state machine
open.
Remove the redundant "… download result" debug logs in the four
wrappers — the error is already logged upstream at
requests.rs "Sync RPC request error" (block/blob/payload envelope)
and network_context "Custody request failure, removing", and the
block_root → id association reappears at "Sending block for processing"
on the success path.
Fix has_no_peers callers to use the new !has_peers() API.
- add_peer: replace !=-vs-|= typo so Gloas child-peer additions actually
propagate back through add_peers_to_lookup_and_ancestors and kick
continue_requests.
- data_peer_group: return the PeerGroup stored in DataRequestState
Downloaded/Processing instead of todo!(), so InvalidColumn attribution
in mod.rs no longer panics on a live error path.
- Restore the original `parent_root != ZERO` guard for the parent-known
check; the genesis block has no real parent so it must fall through to
processing rather than panic (was todo!()) or be dropped as Failed.
- Wire envelope_is_known_to_fork_choice as a NoRequestNeeded short-
circuit at the top of payload_lookup_request.
- Rename gload_child_peers -> gloas_child_peers (typo).
- Drop DataDownloadKind, peek_downloaded_peer_group, DataRequest.slot,
DownloadedData::Blobs.expected_blobs — all dead per the compiler.
- Update test helpers to send UnknownParentSidecarHeader so the lookup
test suite compiles and runs under the new manager API.
Tests: phase0 79/79, electra 59/59, fulu 59/59.