Make re-org strat more cautious and add more config (#4151)

## Proposed Changes

This change attempts to prevent failed re-orgs by:

1. Lowering the re-org cutoff from 2s to 1s. This is informed by a failed re-org attempted by @yorickdowne's node. The failed block was requested in the 1.5-2s window due to a Vouch failure, and failed to propagate to the majority of the network before the attestation deadline at 4s.
2. Allow users to adjust their re-org cutoff depending on observed network conditions and their risk profile. The static 2 second cutoff was too rigid.
3. Add a `--proposer-reorg-disallowed-offsets` flag which can be used to prohibit reorgs at certain slots. This is intended to help workaround an issue whereby reorging blocks at slot 1 are currently taking ~1.6s to propagate on gossip rather than ~500ms. This is suspected to be due to a cache miss in current versions of Prysm, which should be fixed in their next release.

## Additional Info

I'm of two minds about removing the `shuffling_stable` check which checks for blocks at slot 0 in the epoch. If we removed it users would be able to configure Lighthouse to try reorging at slot 0, which likely wouldn't work very well due to interactions with the proposer index cache. I think we could leave it for now and revisit it later.
This commit is contained in:
Michael Sproul
2023-04-13 07:05:01 +00:00
parent 00cf5fc184
commit b90c0c3fb1
12 changed files with 218 additions and 18 deletions

View File

@@ -14,6 +14,15 @@ There are three flags which control the re-orging behaviour:
* `--proposer-reorg-threshold N`: attempt to orphan blocks with less than N% of the committee vote. If this parameter isn't set then N defaults to 20% when the feature is enabled.
* `--proposer-reorg-epochs-since-finalization N`: only attempt to re-org late blocks when the number of epochs since finalization is less than or equal to N. The default is 2 epochs,
meaning re-orgs will only be attempted when the chain is finalizing optimally.
* `--proposer-reorg-cutoff T`: only attempt to re-org late blocks when the proposal is being made
before T milliseconds into the slot. Delays between the validator client and the beacon node can
cause some blocks to be requested later than the start of the slot, which makes them more likely
to fail. The default cutoff is 1000ms on mainnet, which gives blocks 3000ms to be signed and
propagated before the attestation deadline at 4000ms.
* `--proposer-reorg-disallowed-offsets N1,N2,N3...`: Prohibit Lighthouse from attempting to reorg at
specific offsets in each epoch. A disallowed offset `N` prevents reorging blocks from being
proposed at any `slot` such that `slot % SLOTS_PER_EPOCH == N`. The value to this flag is a
comma-separated list of integer offsets.
All flags should be applied to `lighthouse bn`. The default configuration is recommended as it
balances the chance of the re-org succeeding against the chance of failure due to attestations