Fix head tracker concurrency bugs (#1771)

## Issue Addressed Closes #1557 ## Proposed Changes Modify the pruning algorithm so that it mutates the head-tracker _before_ committing the database transaction to disk, and _only if_ all the heads to be removed are still present in the head-tracker (i.e. no concurrent mutations). In the process of writing and testing this I also had to make a few other changes: * Use internal mutability for all `BeaconChainHarness` functions (namely the RNG and the graffiti), in order to enable parallel calls (see testing section below). * Disable logging in harness tests unless the `test_logger` feature is turned on And chose to make some clean-ups: * Delete the `NullMigrator` * Remove type-based configuration for the migrator in favour of runtime config (simpler, less duplicated code) * Use the non-blocking migrator unless the blocking migrator is required. In the store tests we need the blocking migrator because some tests make asserts about the state of the DB after the migration has run. * Rename `validators_keypairs` -> `validator_keypairs` in the `BeaconChainHarness` ## Testing To confirm that the fix worked, I wrote a test using [Hiatus](https://crates.io/crates/hiatus), which can be found here: https://github.com/michaelsproul/lighthouse/tree/hiatus-issue-1557 That test can't be merged because it inserts random breakpoints everywhere, but if you check out that branch you can run the test with: ``` $ cd beacon_node/beacon_chain $ cargo test --release --test parallel_tests --features test_logger ``` It should pass, and the log output should show: ``` WARN Pruning deferred because of a concurrent mutation, message: this is expected only very rarely! ``` ## Additional Info This is a backwards-compatible change with no impact on consensus.
2026-04-19 13:58:28 +00:00 · 2020-10-19 05:58:39 +00:00
parent 6ba997b88e
commit 703c33bdc7
25 changed files with 538 additions and 729 deletions
--- a/beacon_node/store/src/hot_cold_store.rs
+++ b/beacon_node/store/src/hot_cold_store.rs
@@ -55,11 +55,11 @@ pub struct HotColdDB<E: EthSpec, Hot: ItemStore<E>, Cold: ItemStore<E>> {
    split: RwLock<Split>,
    config: StoreConfig,
    /// Cold database containing compact historical data.
-    pub(crate) cold_db: Cold,
+    pub cold_db: Cold,
    /// Hot database containing duplicated but quick-to-access recent data.
    ///
    /// The hot database also contains all blocks.
-    pub(crate) hot_db: Hot,
+    pub hot_db: Hot,
    /// LRU cache of deserialized blocks. Updated whenever a block is loaded.
    block_cache: Mutex<LruCache<Hash256, SignedBeaconBlock<E>>>,
    /// Chain spec.
@@ -386,11 +386,10 @@ impl<E: EthSpec, Hot: ItemStore<E>, Cold: ItemStore<E>> HotColdDB<E, Hot, Cold>
        self.hot_db.exists::<I>(key)
    }

-    pub fn do_atomically(&self, batch: Vec<StoreOp<E>>) -> Result<(), Error> {
-        let mut guard = self.block_cache.lock();
-
-        let mut key_value_batch: Vec<KeyValueStoreOp> = Vec::with_capacity(batch.len());
-        for op in &batch {
+    /// Convert a batch of `StoreOp` to a batch of `KeyValueStoreOp`.
+    pub fn convert_to_kv_batch(&self, batch: &[StoreOp<E>]) -> Result<Vec<KeyValueStoreOp>, Error> {
+        let mut key_value_batch = Vec::with_capacity(batch.len());
+        for op in batch {
            match op {
                StoreOp::PutBlock(block_hash, block) => {
                    let untyped_hash: Hash256 = (*block_hash).into();
@@ -430,7 +429,14 @@ impl<E: EthSpec, Hot: ItemStore<E>, Cold: ItemStore<E>> HotColdDB<E, Hot, Cold>
                }
            }
        }
-        self.hot_db.do_atomically(key_value_batch)?;
+        Ok(key_value_batch)
+    }
+
+    pub fn do_atomically(&self, batch: Vec<StoreOp<E>>) -> Result<(), Error> {
+        let mut guard = self.block_cache.lock();
+
+        self.hot_db
+            .do_atomically(self.convert_to_kv_batch(&batch)?)?;

        for op in &batch {
            match op {