mirror of
https://github.com/sigp/lighthouse.git
synced 2026-03-21 22:04:44 +00:00
Implement database temp states to reduce memory usage (#1798)
## Issue Addressed Closes #800 Closes #1713 ## Proposed Changes Implement the temporary state storage algorithm described in #800. Specifically: * Add `DBColumn::BeaconStateTemporary`, for storing 0-length temporary marker values. * Store intermediate states immediately as they are created, marked temporary. Delete the temporary flag if the block is processed successfully. * Add a garbage collection process to delete leftover temporary states on start-up. * Bump the database schema version to 2 so that a DB with temporary states can't accidentally be used with older versions of the software. The auto-migration is a no-op, but puts in place some infra that we can use for future migrations (e.g. #1784) ## Additional Info There are two known race conditions, one potentially causing permanent faults (hopefully rare), and the other insignificant. ### Race 1: Permanent state marked temporary EDIT: this has been fixed by the addition of a lock around the relevant critical section There are 2 threads that are trying to store 2 different blocks that share some intermediate states (e.g. they both skip some slots from the current head). Consider this sequence of events: 1. Thread 1 checks if state `s` already exists, and seeing that it doesn't, prepares an atomic commit of `(s, s_temporary_flag)`. 2. Thread 2 does the same, but also gets as far as committing the state txn, finishing the processing of its block, and _deleting_ the temporary flag. 3. Thread 1 is (finally) scheduled again, and marks `s` as temporary with its transaction. 4. a) The process is killed, or thread 1's block fails verification and the temp flag is not deleted. This is a permanent failure! Any attempt to load state `s` will fail... hope it isn't on the main chain! Alternatively (4b) happens... b) Thread 1 finishes, and re-deletes the temporary flag. In this case the failure is transient, state `s` will disappear temporarily, but will come back once thread 1 finishes running. I _hope_ that steps 1-3 only happen very rarely, and 4a even more rarely. It's hard to know This once again begs the question of why we're using LevelDB (#483), when it clearly doesn't care about atomicity! A ham-fisted fix would be to wrap the hot and cold DBs in locks, which would bring us closer to how other DBs handle read-write transactions. E.g. [LMDB only allows one R/W transaction at a time](https://docs.rs/lmdb/0.8.0/lmdb/struct.Environment.html#method.begin_rw_txn). ### Race 2: Temporary state returned from `get_state` I don't think this race really matters, but in `load_hot_state`, if another thread stores a state between when we call `load_state_temporary_flag` and when we call `load_hot_state_summary`, then we could end up returning that state even though it's only a temporary state. I can't think of any case where this would be relevant, and I suspect if it did come up, it would be safe/recoverable (having data is safer than _not_ having data). This could be fixed by using a LevelDB read snapshot, but that would require substantial changes to how we read all our values, so I don't think it's worth it right now.
This commit is contained in:
@@ -15,6 +15,7 @@ pub mod chunked_vector;
|
||||
pub mod config;
|
||||
pub mod errors;
|
||||
mod forwards_iter;
|
||||
mod garbage_collection;
|
||||
pub mod hot_cold_store;
|
||||
mod impls;
|
||||
mod leveldb_store;
|
||||
@@ -22,11 +23,10 @@ mod memory_store;
|
||||
mod metadata;
|
||||
mod metrics;
|
||||
mod partial_beacon_state;
|
||||
mod schema_change;
|
||||
|
||||
pub mod iter;
|
||||
|
||||
use std::borrow::Cow;
|
||||
|
||||
pub use self::config::StoreConfig;
|
||||
pub use self::hot_cold_store::{BlockReplay, HotColdDB, HotStateSummary, Split};
|
||||
pub use self::leveldb_store::LevelDB;
|
||||
@@ -35,6 +35,7 @@ pub use self::partial_beacon_state::PartialBeaconState;
|
||||
pub use errors::Error;
|
||||
pub use impls::beacon_state::StorageContainer as BeaconStateStorageContainer;
|
||||
pub use metrics::scrape_for_metrics;
|
||||
use parking_lot::MutexGuard;
|
||||
pub use types::*;
|
||||
|
||||
pub trait KeyValueStore<E: EthSpec>: Sync + Send + Sized + 'static {
|
||||
@@ -60,6 +61,12 @@ pub trait KeyValueStore<E: EthSpec>: Sync + Send + Sized + 'static {
|
||||
|
||||
/// Execute either all of the operations in `batch` or none at all, returning an error.
|
||||
fn do_atomically(&self, batch: Vec<KeyValueStoreOp>) -> Result<(), Error>;
|
||||
|
||||
/// Return a mutex guard that can be used to synchronize sensitive transactions.
|
||||
///
|
||||
/// This doesn't prevent other threads writing to the DB unless they also use
|
||||
/// this method. In future we may implement a safer mandatory locking scheme.
|
||||
fn begin_rw_transaction(&self) -> MutexGuard<()>;
|
||||
}
|
||||
|
||||
pub fn get_key_for_col(column: &str, key: &[u8]) -> Vec<u8> {
|
||||
@@ -121,13 +128,14 @@ pub trait ItemStore<E: EthSpec>: KeyValueStore<E> + Sync + Send + Sized + 'stati
|
||||
|
||||
/// Reified key-value storage operation. Helps in modifying the storage atomically.
|
||||
/// See also https://github.com/sigp/lighthouse/issues/692
|
||||
#[allow(clippy::large_enum_variant)]
|
||||
pub enum StoreOp<'a, E: EthSpec> {
|
||||
PutBlock(SignedBeaconBlockHash, SignedBeaconBlock<E>),
|
||||
PutState(BeaconStateHash, Cow<'a, BeaconState<E>>),
|
||||
PutStateSummary(BeaconStateHash, HotStateSummary),
|
||||
DeleteBlock(SignedBeaconBlockHash),
|
||||
DeleteState(BeaconStateHash, Slot),
|
||||
PutBlock(Hash256, Box<SignedBeaconBlock<E>>),
|
||||
PutState(Hash256, &'a BeaconState<E>),
|
||||
PutStateSummary(Hash256, HotStateSummary),
|
||||
PutStateTemporaryFlag(Hash256),
|
||||
DeleteStateTemporaryFlag(Hash256),
|
||||
DeleteBlock(Hash256),
|
||||
DeleteState(Hash256, Option<Slot>),
|
||||
}
|
||||
|
||||
/// A unique column identifier.
|
||||
@@ -146,6 +154,9 @@ pub enum DBColumn {
|
||||
BeaconRestorePoint,
|
||||
/// For the mapping from state roots to their slots or summaries.
|
||||
BeaconStateSummary,
|
||||
/// For the list of temporary states stored during block import,
|
||||
/// and then made non-temporary by the deletion of their state root from this column.
|
||||
BeaconStateTemporary,
|
||||
BeaconBlockRoots,
|
||||
BeaconStateRoots,
|
||||
BeaconHistoricalRoots,
|
||||
@@ -166,6 +177,7 @@ impl Into<&'static str> for DBColumn {
|
||||
DBColumn::ForkChoice => "frk",
|
||||
DBColumn::BeaconRestorePoint => "brp",
|
||||
DBColumn::BeaconStateSummary => "bss",
|
||||
DBColumn::BeaconStateTemporary => "bst",
|
||||
DBColumn::BeaconBlockRoots => "bbr",
|
||||
DBColumn::BeaconStateRoots => "bsr",
|
||||
DBColumn::BeaconHistoricalRoots => "bhr",
|
||||
@@ -175,6 +187,16 @@ impl Into<&'static str> for DBColumn {
|
||||
}
|
||||
}
|
||||
|
||||
impl DBColumn {
|
||||
pub fn as_str(self) -> &'static str {
|
||||
self.into()
|
||||
}
|
||||
|
||||
pub fn as_bytes(self) -> &'static [u8] {
|
||||
self.as_str().as_bytes()
|
||||
}
|
||||
}
|
||||
|
||||
/// An item that may stored in a `Store` by serializing and deserializing from bytes.
|
||||
pub trait StoreItem: Sized {
|
||||
/// Identifies which column this item should be placed in.
|
||||
|
||||
Reference in New Issue
Block a user