Fix I/O atomicity issues with checkpoint sync (#2671)

## Issue Addressed

This PR addresses an issue found by @YorickDowne during testing of v2.0.0-rc.0.

Due to a lack of atomic database writes on checkpoint sync start-up, it was possible for the database to get into an inconsistent state from which it couldn't recover without `--purge-db`. The core of the issue was that the store's anchor info was being stored _before_ the `PersistedBeaconChain`. If a crash occured so that anchor info was stored but _not_ the `PersistedBeaconChain`, then on restart Lighthouse would think the database was unitialized and attempt to compare-and-swap a `None` value, but would actually find the stale info from the previous run.

## Proposed Changes

The issue is fixed by writing the anchor info, the split point, and the `PersistedBeaconChain` atomically on start-up. Some type-hinting ugliness was required, which could possibly be cleaned up in future refactors.
This commit is contained in:
Michael Sproul
2021-10-05 03:53:17 +00:00
parent 28b79084cd
commit ed1fc7cca6
7 changed files with 95 additions and 36 deletions

View File

@@ -189,7 +189,6 @@ impl<T: EthSpec> PartialBeaconState<T> {
}
/// Prepare the partial state for storage in the KV database.
#[must_use]
pub fn as_kv_store_op(&self, state_root: Hash256) -> KeyValueStoreOp {
let db_key = get_key_for_col(DBColumn::BeaconState.into(), state_root.as_bytes());
KeyValueStoreOp::PutKeyValue(db_key, self.as_ssz_bytes())