## Proposed Changes Reduce post-merge disk usage by not storing finalized execution payloads in Lighthouse's database. ⚠️ **This is achieved in a backwards-incompatible way for networks that have already merged** ⚠️. Kiln users and shadow fork enjoyers will be unable to downgrade after running the code from this PR. The upgrade migration may take several minutes to run, and can't be aborted after it begins. The main changes are: - New column in the database called `ExecPayload`, keyed by beacon block root. - The `BeaconBlock` column now stores blinded blocks only. - Lots of places that previously used full blocks now use blinded blocks, e.g. analytics APIs, block replay in the DB, etc. - On finalization: - `prune_abanonded_forks` deletes non-canonical payloads whilst deleting non-canonical blocks. - `migrate_db` deletes finalized canonical payloads whilst deleting finalized states. - Conversions between blinded and full blocks are implemented in a compositional way, duplicating some work from Sean's PR #3134. - The execution layer has a new `get_payload_by_block_hash` method that reconstructs a payload using the EE's `eth_getBlockByHash` call. - I've tested manually that it works on Kiln, using Geth and Nethermind. - This isn't necessarily the most efficient method, and new engine APIs are being discussed to improve this: https://github.com/ethereum/execution-apis/pull/146. - We're depending on the `ethers` master branch, due to lots of recent changes. We're also using a workaround for https://github.com/gakonst/ethers-rs/issues/1134. - Payload reconstruction is used in the HTTP API via `BeaconChain::get_block`, which is now `async`. Due to the `async` fn, the `blocking_json` wrapper has been removed. - Payload reconstruction is used in network RPC to serve blocks-by-{root,range} responses. Here the `async` adjustment is messier, although I think I've managed to come up with a reasonable compromise: the handlers take the `SendOnDrop` by value so that they can drop it on _task completion_ (after the `fn` returns). Still, this is introducing disk reads onto core executor threads, which may have a negative performance impact (thoughts appreciated). ## Additional Info - [x] For performance it would be great to remove the cloning of full blocks when converting them to blinded blocks to write to disk. I'm going to experiment with a `put_block` API that takes the block by value, breaks it into a blinded block and a payload, stores the blinded block, and then re-assembles the full block for the caller. - [x] We should measure the latency of blocks-by-root and blocks-by-range responses. - [x] We should add integration tests that stress the payload reconstruction (basic tests done, issue for more extensive tests: https://github.com/sigp/lighthouse/issues/3159) - [x] We should (manually) test the schema v9 migration from several prior versions, particularly as blocks have changed on disk and some migrations rely on being able to load blocks. Co-authored-by: Paul Hauner <paul@paulhauner.com>
Database Schema Migrations
This document is an attempt to record some best practices and design conventions for applying database schema migrations within Lighthouse.
General Structure
If you make a breaking change to an on-disk data structure you need to increment the
SCHEMA_VERSION in beacon_node/store/src/metadata.rs and add a migration from the previous
version to the new version.
The entry-point for database migrations is in schema_change.rs, not migrate.rs (which deals
with finalization). Supporting code for a specific migration may be added in
schema_change/migration_schema_vX.rs, where X is the version being migrated to.
Combining Schema Changes
Schema changes may be combined if they are part of the same pull request to
unstable. Once a schema version is defined in unstable we should not apply changes to it
without incrementing the version. This prevents conflicts between versions that appear to be the
same. This allows us to deploy unstable to nodes without having to worry about needing to resync
because of a sneaky schema change.
Changing the on-disk structure for a version before it is merged to unstable is OK. You will
just have to handle manually resyncing any test nodes (use checkpoint sync).
Naming Conventions
Prefer to name versions of structs by the version at which the change was introduced. For example
if you add a field to Foo in v9, call the previous version FooV1 (assuming this is Foo's first
migration) and write a schema change that migrates from FooV1 to FooV9.
Prefer to use explicit version names in schema_change.rs and the schema_change module. To
interface with the outside either:
- Define a type alias to the latest version, e.g.
pub type Foo = FooV9, or - Define a mapping from the latest version to the version used elsewhere, e.g.
impl From<FooV9> for Foo {}
Avoid names like:
LegacyFooOldFooFooWithoutX
First-version vs Last-version
Previously the schema migration code would name types by the last version at which they were
valid. For example if Foo changed in V9 then we would name the two variants FooV8 and FooV9.
The problem with this scheme is that if Foo changes again in the future at say v12 then FooV9 would
need to be renamed to FooV11, which is annoying. Using the first valid version as described
above does not have this issue.
Using SuperStruct
If possible, consider using superstruct to handle data
structure changes between versions.
- Use
superstruct(no_enum)to avoid generating an unnecessary top-level enum.
Example
A field is added to Foo in v9, and there are two variants: FooV1 and FooV9. There is a
migration from FooV1 to FooV9. Foo is aliased to FooV9.
Some time later another field is added to Foo in v12. A new FooV12 is created, along with a
migration from FooV9 to FooV12. The primary Foo type gets re-aliased to FooV12. The previous
migration from V1 to V9 shouldn't break because the schema migration refers to FooV9 explicitly
rather than Foo. Due to the re-aliasing (or re-mapping) the compiler will check every usage
of Foo to make sure that it still makes sense with FooV12.