Don't create child lookup if parent is faulty (#7118)

Issue discovered on PeerDAS devnet (node `lighthouse-geth-2.peerdas-devnet-5.ethpandaops.io`). Summary:

- A lookup is created for block root `0x28299de15843970c8ea4f95f11f07f75e76a690f9a8af31d354c38505eebbe12`
- That block or a parent is faulty and `0x28299de15843970c8ea4f95f11f07f75e76a690f9a8af31d354c38505eebbe12` is added to the failed chains cache
- We later receive a block that is a child of a child of `0x28299de15843970c8ea4f95f11f07f75e76a690f9a8af31d354c38505eebbe12`
- We create a lookup, which attempts to process the child of `0x28299de15843970c8ea4f95f11f07f75e76a690f9a8af31d354c38505eebbe12` and hit a processor error `UnknownParent`, hitting this line

bf955c7543/beacon_node/network/src/sync/block_lookups/mod.rs (L686-L688)

`search_parent_of_child` does not create a parent lookup because the parent root is in the failed chain cache. However, we have **already** marked the child as awaiting the parent. This results in an inconsistent state of lookup sync, as there's a lookup awaiting a parent that doesn't exist.

Now we have a lookup (the child of `0x28299de15843970c8ea4f95f11f07f75e76a690f9a8af31d354c38505eebbe12`) that is awaiting a parent lookup that doesn't exist: hence stuck.

### Impact

This bug can affect Mainnet as well as PeerDAS devnets.

This bug may stall lookup sync for a few minutes (up to `LOOKUP_MAX_DURATION_STUCK_SECS = 15 min`) until the stuck prune routine deletes it. By that time the root will be cleared from the failed chain cache and sync should succeed. During that time the user will see a lot of `WARN` logs when attempting to add each peer to the inconsistent lookup. We may also sync the block through range sync if we fall behind by more than 2 epochs. We may also create the parent lookup successfully after the failed cache clears and complete the child lookup.

This bug is triggered if:
- We have a lookup that fails and its root is added to the failed chain cache (much more likely to happen in PeerDAS networks)
- We receive a block that builds on a child of the block added to the failed chain cache


  Ensure that we never create (or leave existing) a lookup that references a non-existing parent.

I added `must_use` lints to the functions that create lookups. To fix the specific bug we must recursively drop the child lookup if the parent is not created. So if `search_parent_of_child` returns `false` now return `LookupRequestError::Failed` instead of `LookupResult::Pending`.

As a bonus I have a added more logging and reason strings to the errors
This commit is contained in:
Lion - dapplion
2025-06-05 02:53:43 -06:00
committed by GitHub
parent 9a4972053e
commit d457ceeaaf
4 changed files with 119 additions and 25 deletions

View File

@@ -1725,6 +1725,63 @@ fn test_parent_lookup_too_deep_grow_ancestor() {
rig.assert_failed_chain(chain_hash);
}
// Regression test for https://github.com/sigp/lighthouse/pull/7118
#[test]
fn test_child_lookup_not_created_for_failed_chain_parent_after_processing() {
// GIVEN: A parent chain longer than PARENT_DEPTH_TOLERANCE.
let mut rig = TestRig::test_setup();
let mut blocks = rig.rand_blockchain(PARENT_DEPTH_TOLERANCE + 1);
let peer_id = rig.new_connected_peer();
// The child of the trigger block to be used to extend the chain.
let trigger_block_child = blocks.pop().unwrap();
// The trigger block that starts the lookup.
let trigger_block = blocks.pop().unwrap();
let tip_root = trigger_block.canonical_root();
// Trigger the initial unknown parent block for the tip.
rig.trigger_unknown_parent_block(peer_id, trigger_block.clone());
// Simulate the lookup chain building up via `ParentUnknown` errors.
for block in blocks.into_iter().rev() {
let id = rig.expect_block_parent_request(block.canonical_root());
rig.parent_lookup_block_response(id, peer_id, Some(block.clone()));
rig.parent_lookup_block_response(id, peer_id, None);
rig.expect_block_process(ResponseType::Block);
rig.parent_block_processed(
tip_root,
BlockProcessingResult::Err(BlockError::ParentUnknown {
parent_root: block.parent_root(),
}),
);
}
// At this point, the chain should have been deemed too deep and pruned.
// The tip root should have been inserted into failed chains.
rig.assert_failed_chain(tip_root);
rig.expect_no_penalty_for(peer_id);
// WHEN: Trigger the extending block that points to the tip.
let trigger_block_child_root = trigger_block_child.canonical_root();
rig.trigger_unknown_block_from_attestation(trigger_block_child_root, peer_id);
let id = rig.expect_block_lookup_request(trigger_block_child_root);
rig.single_lookup_block_response(id, peer_id, Some(trigger_block_child.clone()));
rig.single_lookup_block_response(id, peer_id, None);
rig.expect_block_process(ResponseType::Block);
rig.single_block_component_processed(
id.lookup_id,
BlockProcessingResult::Err(BlockError::ParentUnknown {
parent_root: tip_root,
}),
);
// THEN: The extending block should not create a lookup because the tip was inserted into failed chains.
rig.expect_no_active_lookups();
// AND: The peer should be penalized for extending a failed chain.
rig.expect_single_penalty(peer_id, "failed_chain");
rig.expect_empty_network();
}
#[test]
fn test_parent_lookup_too_deep_grow_tip() {
let mut rig = TestRig::test_setup();