CI fix: add retries to eth1 sim tests (#4501)

## Issue Addressed

This PR attempts to workaround the recent frequent eth1 simulator failures caused by missing eth logs from Anvil. 

> FailedToInsertDeposit(NonConsecutive { log_index: 1, expected: 0 })

This usually occurs at the beginning of the tests, and it guarantees a timeout after a few hours if this log shows up, and this is currently causing our CIs to fail quite frequently. 

Example failure here: https://github.com/sigp/lighthouse/actions/runs/5525760195/jobs/10079736914

## Proposed Changes

The quick fix applied here adds a timeout to node startup and restarts the node again.

- Add a 60 seconds timeout to beacon node startup in eth1 simulator tests. It takes ~10 seconds on my machine, but could take longer on CI runners.
- Wrap the startup code in a retry function, that allows for 3 retries before returning an error.

## Additional Info

We should probably raise an issue under the Anvil GitHub repo there so this can be further investigated.
This commit is contained in:
Jimmy Chen
2023-07-17 00:14:18 +00:00
parent 68d5a6cf99
commit d4a61756ca
6 changed files with 183 additions and 70 deletions

View File

@@ -14,3 +14,4 @@ validator_client = { path = "../../validator_client" }
validator_dir = { path = "../../common/validator_dir", features = ["insecure_keys"] }
sensitive_url = { path = "../../common/sensitive_url" }
execution_layer = { path = "../../beacon_node/execution_layer" }
tokio = { version = "1.14.0", features = ["time"] }