mirror of
https://github.com/sigp/lighthouse.git
synced 2026-03-10 12:11:59 +00:00
Fix regression in DB write atomicity (#3931)
## Issue Addressed Fix a bug introduced by #3696. The bug is not expected to occur frequently, so releasing this PR is non-urgent. ## Proposed Changes * Add a variant to `StoreOp` that allows a raw KV operation to be passed around. * Return to using `self.store.do_atomically` rather than `self.store.hot_db.do_atomically`. This streamlines the write back into a single call and makes our auto-revert work again. * Prevent `import_block_update_shuffling_cache` from failing block import. This is an outstanding bug from before v3.4.0 which may have contributed to some random unexplained database corruption. ## Additional Info In #3696 I split the database write into two calls, one to convert the `StoreOp`s to `KeyValueStoreOp`s and one to write them. This had the unfortunate side-effect of damaging our atomicity guarantees in case of a write error. If the first call failed, we would be left with the block in fork choice but not on-disk (or the snapshot cache), which would prevent us from processing any descendant blocks. On `unstable` the first call is very unlikely to fail unless the disk is full, but on `tree-states` the conversion is more involved and a user reported database corruption after it failed in a way that should have been recoverable. Additionally, as @emhane observed, #3696 also inadvertently removed the import of the new block into the block cache. Although this seems like it could have negatively impacted performance, there are several mitigating factors: - For regular block processing we should almost always load the parent block (and state) from the snapshot cache. - We often load blinded blocks, which bypass the block cache anyway. - Metrics show no noticeable increase in the block cache miss rate with v3.4.0. However, I expect the block cache _will_ be useful again in `tree-states`, so it is restored to use by this PR.
This commit is contained in:
@@ -4,7 +4,7 @@ use ssz::{Decode, Encode};
|
||||
use std::collections::HashMap;
|
||||
use std::convert::TryInto;
|
||||
use std::marker::PhantomData;
|
||||
use store::{DBColumn, Error as StoreError, KeyValueStore, KeyValueStoreOp, StoreItem};
|
||||
use store::{DBColumn, Error as StoreError, StoreItem, StoreOp};
|
||||
use types::{BeaconState, Hash256, PublicKey, PublicKeyBytes};
|
||||
|
||||
/// Provides a mapping of `validator_index -> validator_publickey`.
|
||||
@@ -38,7 +38,7 @@ impl<T: BeaconChainTypes> ValidatorPubkeyCache<T> {
|
||||
};
|
||||
|
||||
let store_ops = cache.import_new_pubkeys(state)?;
|
||||
store.hot_db.do_atomically(store_ops)?;
|
||||
store.do_atomically(store_ops)?;
|
||||
|
||||
Ok(cache)
|
||||
}
|
||||
@@ -79,7 +79,7 @@ impl<T: BeaconChainTypes> ValidatorPubkeyCache<T> {
|
||||
pub fn import_new_pubkeys(
|
||||
&mut self,
|
||||
state: &BeaconState<T::EthSpec>,
|
||||
) -> Result<Vec<KeyValueStoreOp>, BeaconChainError> {
|
||||
) -> Result<Vec<StoreOp<'static, T::EthSpec>>, BeaconChainError> {
|
||||
if state.validators().len() > self.pubkeys.len() {
|
||||
self.import(
|
||||
state.validators()[self.pubkeys.len()..]
|
||||
@@ -92,7 +92,10 @@ impl<T: BeaconChainTypes> ValidatorPubkeyCache<T> {
|
||||
}
|
||||
|
||||
/// Adds zero or more validators to `self`.
|
||||
fn import<I>(&mut self, validator_keys: I) -> Result<Vec<KeyValueStoreOp>, BeaconChainError>
|
||||
fn import<I>(
|
||||
&mut self,
|
||||
validator_keys: I,
|
||||
) -> Result<Vec<StoreOp<'static, T::EthSpec>>, BeaconChainError>
|
||||
where
|
||||
I: Iterator<Item = PublicKeyBytes> + ExactSizeIterator,
|
||||
{
|
||||
@@ -112,7 +115,9 @@ impl<T: BeaconChainTypes> ValidatorPubkeyCache<T> {
|
||||
// It will be committed atomically when the block that introduced it is written to disk.
|
||||
// Notably it is NOT written while the write lock on the cache is held.
|
||||
// See: https://github.com/sigp/lighthouse/issues/2327
|
||||
store_ops.push(DatabasePubkey(pubkey).as_kv_store_op(DatabasePubkey::key_for_index(i)));
|
||||
store_ops.push(StoreOp::KeyValueOp(
|
||||
DatabasePubkey(pubkey).as_kv_store_op(DatabasePubkey::key_for_index(i)),
|
||||
));
|
||||
|
||||
self.pubkeys.push(
|
||||
(&pubkey)
|
||||
@@ -294,7 +299,7 @@ mod test {
|
||||
let ops = cache
|
||||
.import_new_pubkeys(&state)
|
||||
.expect("should import pubkeys");
|
||||
store.hot_db.do_atomically(ops).unwrap();
|
||||
store.do_atomically(ops).unwrap();
|
||||
check_cache_get(&cache, &keypairs[..]);
|
||||
drop(cache);
|
||||
|
||||
|
||||
Reference in New Issue
Block a user