Commit Graph

494 Commits

Author SHA1 Message Date
Michael Sproul
43843ca802 Release v3.4.0-tree.2 2023-02-07 09:23:30 +11:00
Tim Gretler
481e792898 Gradual state reconstruction
Co-authored-by: Michael Sproul <michael@sigmaprime.io>
2023-02-02 15:03:55 +11:00
Michael Sproul
90797a1b04 Fix serde quoting of u64 lists & vectors 2023-01-23 17:24:22 +11:00
Michael Sproul
1fd944a09b Tree states v3.4.0 alpha.1 2023-01-18 12:37:06 +11:00
Michael Sproul
44a106a8af Switch allocator to jemalloc (#3697)
Squashed commit of the following:

commit 974b3359f8
Merge: ac205b7ba 480309fb9
Author: Michael Sproul <michael@sigmaprime.io>
Date:   Wed Jan 18 10:01:26 2023 +1100

    Merge remote-tracking branch 'origin/unstable' into jemalloc

commit 480309fb96
Author: aliask <aliask@gmail.com>
Date:   Tue Jan 17 05:13:49 2023 +0000

    Fix some dead links in markdown files (#3885)

    ## Issue Addressed

    No issue has been raised for these broken links.

    ## Proposed Changes

    Update links with the new URLs for the same document.

    ## Additional Info

    ~The link for the [Lighthouse Development Updates](https://eepurl.com/dh9Lvb/) mailing list is also broken, but I can't find the correct link.~

    Co-authored-by: Paul Hauner <paul@paulhauner.com>

commit b4d9fc03ee
Author: GeemoCandama <geemo@tutanota.com>
Date:   Tue Jan 17 05:13:48 2023 +0000

    add logging for starting request  and receiving block (#3858)

    ## Issue Addressed

    #3853

    ## Proposed Changes

    Added `INFO` level logs for requesting and receiving the unsigned block.

    ## Additional Info

    Logging for successfully publishing the signed block is already there. And seemingly there is a log for when "We realize we are going to produce a block" in the `start_update_service`: `info!(log, "Block production service started");
    `.  Is there anywhere else you'd like to see logging around this event?

    Co-authored-by: GeemoCandama <104614073+GeemoCandama@users.noreply.github.com>

commit 9a970ce3a2
Author: David Theodore <prodigalsonsolutions@gmail.com>
Date:   Tue Jan 17 05:13:47 2023 +0000

    add better err reporting UnableToOpenVotingKeystore (#3781)

    ## Issue Addressed

    #3780

    ## Proposed Changes

    Add error reporting that notifies the node operator that the `voting_keystore_path` in their `validator_definitions.yml` file may be incorrect.

    ## Additional Info

    There is more info in issue #3780

    Co-authored-by: Paul Hauner <paul@paulhauner.com>

commit ac205b7bab
Merge: 93457d85b bf533c8e4
Author: Michael Sproul <michael@sigmaprime.io>
Date:   Fri Nov 25 16:32:33 2022 +1100

    Merge remote-tracking branch 'origin/unstable' into jemalloc

commit 93457d85b7
Author: Michael Sproul <michael@sigmaprime.io>
Date:   Wed Nov 9 11:53:59 2022 +1100

    Fix cargo-udeps

commit 6c42aef1b5
Author: Michael Sproul <micsproul@gmail.com>
Date:   Tue Nov 8 19:12:19 2022 +1100

    Fixups

commit f14b87bb88
Author: Michael Sproul <michael@sigmaprime.io>
Date:   Tue Nov 8 16:28:16 2022 +1100

    Update docs

commit 5005dc3b65
Author: Michael Sproul <michael@sigmaprime.io>
Date:   Tue Nov 8 16:22:42 2022 +1100

    Fix lcli

commit a082ba5904
Author: Michael Sproul <michael@sigmaprime.io>
Date:   Tue Nov 8 16:17:10 2022 +1100

    Remove check-consensus

commit 81441e9cea
Author: Michael Sproul <micsproul@gmail.com>
Date:   Tue Nov 8 15:28:11 2022 +1100

    Disable jemalloc on Windows

commit 41eac5d0c1
Author: Michael Sproul <micsproul@gmail.com>
Date:   Tue Nov 8 13:46:17 2022 +1100

    Compatibility with macOS

commit 69ecba7876
Author: Michael Sproul <michael@sigmaprime.io>
Date:   Mon Nov 7 18:48:31 2022 +1100

    Add jemalloc support
2023-01-18 10:07:21 +11:00
Michael Sproul
a70ee29c08 Tree states v3.4.0 alpha.0 2023-01-17 16:53:56 +11:00
Michael Sproul
5ce14c8dce Fix ups and Clippy 2023-01-17 15:57:34 +11:00
Michael Sproul
2b84597525 Split common crates out into their own repos (#3890)
Squashed commit of the following:

commit 1ba4f80cc0
Author: Michael Sproul <michael@sigmaprime.io>
Date:   Tue Jan 17 11:43:18 2023 +1100

    Bye 1.0.0 beta, hello 0.5.x

commit a862b234b2
Author: Michael Sproul <michael@sigmaprime.io>
Date:   Tue Jan 17 10:54:46 2023 +1100

    Cargo fmt

commit e29f358a9e
Author: Michael Sproul <michael@sigmaprime.io>
Date:   Mon Jan 16 18:21:42 2023 +1100

    It compiles :O

commit 1ee4514b7d
Author: Michael Sproul <michael@sigmaprime.io>
Date:   Mon Jan 16 17:27:10 2023 +1100

    Ethereum hashing

commit 69bdd1d61f
Author: Michael Sproul <michael@sigmaprime.io>
Date:   Mon Jan 16 17:24:58 2023 +1100

    Tree hash et al

commit 7cae5d99d7
Author: Michael Sproul <michael@sigmaprime.io>
Date:   Mon Jan 16 17:21:03 2023 +1100

    Delete crates!

commit dd9ee38084
Author: Michael Sproul <michael@sigmaprime.io>
Date:   Mon Jan 16 17:19:19 2023 +1100

    Delete overrides

commit 0d54534eb4
Author: Michael Sproul <michael@sigmaprime.io>
Date:   Mon Jan 16 17:19:04 2023 +1100

    Crate renames
2023-01-17 13:41:34 +11:00
Michael Sproul
8d8df17551 Merge unstable (needs a few more fixes) 2023-01-17 13:15:41 +11:00
Paul Hauner
38514c07f2 Release v3.4.0 (#3862)
## Issue Addressed

NA

## Proposed Changes

Bump versions

## Additional Info

- [x] ~~Blocked on #3728, #3801~~
- [x] ~~Blocked on #3866~~
- [x] Requires additional testing
2023-01-11 03:27:08 +00:00
Michael Sproul
0c74cd4696 Update dependencies incl Tokio (#3866)
## Proposed Changes

Update all dependencies to new semver-compatible releases with `cargo update`. Importantly this patches a Tokio vuln: https://rustsec.org/advisories/RUSTSEC-2023-0001. I don't think we were affected by the vuln because it only applies to named pipes on Windows, but it's still good hygiene to patch.
2023-01-09 23:29:23 +00:00
Michael Sproul
4bd2b777ec Verify execution block hashes during finalized sync (#3794)
## Issue Addressed

Recent discussions with other client devs about optimistic sync have revealed a conceptual issue with the optimisation implemented in #3738. In designing that feature I failed to consider that the execution node checks the `blockHash` of the execution payload before responding with `SYNCING`, and that omitting this check entirely results in a degradation of the full node's validation. A node omitting the `blockHash` checks could be tricked by a supermajority of validators into following an invalid chain, something which is ordinarily impossible.

## Proposed Changes

I've added verification of the `payload.block_hash` in Lighthouse. In case of failure we log a warning and fall back to verifying the payload with the execution client.

I've used our existing dependency on `ethers_core` for RLP support, and a new dependency on Parity's `triehash` crate for the Merkle patricia trie. Although the `triehash` crate is currently unmaintained it seems like our best option at the moment (it is also used by Reth, and requires vastly less boilerplate than Parity's generic `trie-root` library).

Block hash verification is pretty quick, about 500us per block on my machine (mainnet).

The optimistic finalized sync feature can be disabled using `--disable-optimistic-finalized-sync` which forces full verification with the EL.

## Additional Info

This PR also introduces a new dependency on our [`metastruct`](https://github.com/sigp/metastruct) library, which was perfectly suited to the RLP serialization method. There will likely be changes as `metastruct` grows, but I think this is a good way to start dogfooding it.

I took inspiration from some Parity and Reth code while writing this, and have preserved the relevant license headers on the files containing code that was copied and modified.
2023-01-09 03:11:59 +00:00
Age Manning
1d9a2022b4 Upgrade to libp2p v0.50.0 (#3764)
I've needed to do this work in order to do some episub testing. 

This version of libp2p has not yet been released, so this is left as a draft for when we wish to update.

Co-authored-by: Diva M <divma@protonmail.com>
2023-01-06 15:59:33 +00:00
Michael Sproul
775d222299 Enable proposer boost re-orging (#2860)
## Proposed Changes

With proposer boosting implemented (#2822) we have an opportunity to re-org out late blocks.

This PR adds three flags to the BN to control this behaviour:

* `--disable-proposer-reorgs`: turn aggressive re-orging off (it's on by default).
* `--proposer-reorg-threshold N`: attempt to orphan blocks with less than N% of the committee vote. If this parameter isn't set then N defaults to 20% when the feature is enabled.
* `--proposer-reorg-epochs-since-finalization N`: only attempt to re-org late blocks when the number of epochs since finalization is less than or equal to N. The default is 2 epochs, meaning re-orgs will only be attempted when the chain is finalizing optimally.

For safety Lighthouse will only attempt a re-org under very specific conditions:

1. The block being proposed is 1 slot after the canonical head, and the canonical head is 1 slot after its parent. i.e. at slot `n + 1` rather than building on the block from slot `n` we build on the block from slot `n - 1`.
2. The current canonical head received less than N% of the committee vote. N should be set depending on the proposer boost fraction itself, the fraction of the network that is believed to be applying it, and the size of the largest entity that could be hoarding votes.
3. The current canonical head arrived after the attestation deadline from our perspective. This condition was only added to support suppression of forkchoiceUpdated messages, but makes intuitive sense.
4. The block is being proposed in the first 2 seconds of the slot. This gives it time to propagate and receive the proposer boost.


## Additional Info

For the initial idea and background, see: https://github.com/ethereum/consensus-specs/pull/2353#issuecomment-950238004

There is also a specification for this feature here: https://github.com/ethereum/consensus-specs/pull/3034

Co-authored-by: Michael Sproul <micsproul@gmail.com>
Co-authored-by: pawan <pawandhananjay@gmail.com>
2022-12-13 09:57:26 +00:00
Michael Sproul
3b657a3b0b Automate merkle proofs with metastruct 2022-12-08 15:50:27 +11:00
Michael Sproul
5d628d7857 Merge remote-tracking branch 'origin/unstable' into tree-states 2022-11-30 14:14:17 +11:00
Paul Hauner
bf533c8e42 v3.3.0 (#3741)
## Issue Addressed

NA

## Proposed Changes

- Bump versions
- Pin the `nethermind` version since our method of getting the latest tags on `master` is giving us an old version (`1.14.1`).
- Increase timeout for execution engine startup.

## Additional Info

- [x] ~Awaiting further testing~
2022-11-23 23:38:32 +00:00
Age Manning
230168deff Health Endpoints for UI (#3668)
This PR adds some health endpoints for the beacon node and the validator client.

Specifically it adds the endpoint:
`/lighthouse/ui/health`

These are not entirely stable yet. But provide a base for modification for our UI. 

These also may have issues with various platforms and may need modification.
2022-11-15 05:21:26 +00:00
Michael Sproul
bfabaa10e0 Merge and test fixups 2022-11-10 16:53:40 +11:00
Giulio rebuffo
9d6209725f Added Merkle Proof Generation for Beacon State (#3674)
## Issue Addressed

This PR addresses partially #3651

## Proposed Changes

This PR adds the following methods:

* a new method to trait `TreeHash`, `hash_tree_leaves` which returns all the Merkle leaves of the ssz object.
* a new method to `BeaconState`: `compute_merkle_proof` which generates a specific merkle proof for given depth and index by using the `hash_tree_leaves` as leaves function.

## Additional Info

Now here is some rationale on why I decided to go down this route: adding a new function to commonly used trait is a pain but was necessary to make sure we have all merkle leaves for every object, that is why I just added  `hash_tree_leaves`  in the trait and not  `compute_merkle_proof` as well. although it would make sense it gives us code duplication/harder review time and we just need it from one specific object in one specific usecase so not worth the effort YET. In my humble opinion.

Co-authored-by: Michael Sproul <micsproul@gmail.com>
2022-11-08 01:58:18 +00:00
ethDreamer
e8604757a2 Deposit Cache Finalization & Fast WS Sync (#2915)
## Summary

The deposit cache now has the ability to finalize deposits. This will cause it to drop unneeded deposit logs and hashes in the deposit Merkle tree that are no longer required to construct deposit proofs. The cache is finalized whenever the latest finalized checkpoint has a new `Eth1Data` with all deposits imported.

This has three benefits:

1. Improves the speed of constructing Merkle proofs for deposits as we can just replay deposits since the last finalized checkpoint instead of all historical deposits when re-constructing the Merkle tree.
2. Significantly faster weak subjectivity sync as the deposit cache can be transferred to the newly syncing node in compressed form. The Merkle tree that stores `N` finalized deposits requires a maximum of `log2(N)` hashes. The newly syncing node then only needs to download deposits since the last finalized checkpoint to have a full tree.
3. Future proofing in preparation for [EIP-4444](https://eips.ethereum.org/EIPS/eip-4444) as execution nodes will no longer be required to store logs permanently so we won't always have all historical logs available to us.

## More Details

Image to illustrate how the deposit contract merkle tree evolves and finalizes along with the resulting `DepositTreeSnapshot`
![image](https://user-images.githubusercontent.com/37123614/151465302-5fc56284-8a69-4998-b20e-45db3934ac70.png)

## Other Considerations

I've changed the structure of the `SszDepositCache` so once you load & save your database from this version of lighthouse, you will no longer be able to load it from older versions.

Co-authored-by: ethDreamer <37123614+ethDreamer@users.noreply.github.com>
2022-10-30 04:04:24 +00:00
Divma
46fbf5b98b Update discv5 (#3171)
## Issue Addressed

Updates discv5

Pending on
- [x] #3547 
- [x] Alex upgrades his deps

## Proposed Changes

updates discv5 and the enr crate. The only relevant change would be some clear indications of ipv4 usage in lighthouse

## Additional Info

Functionally, this should be equivalent to the prev version.
As draft pending a discv5 release
2022-10-28 05:40:06 +00:00
Michael Sproul
6d5a2b509f Release v3.2.1 (#3660)
## Proposed Changes

Patch release to include the performance regression fix https://github.com/sigp/lighthouse/pull/3658.

## Additional Info

~~Blocked on the merge of https://github.com/sigp/lighthouse/pull/3658.~~
2022-10-26 09:38:25 +00:00
Paul Hauner
fcfd02aeec Release v3.2.0 (#3647)
## Issue Addressed

NA

## Proposed Changes

Bump version to `v3.2.0`

## Additional Info

- ~~Blocked on #3597~~
- ~~Blocked on #3645~~
- ~~Blocked on #3653~~
- ~~Requires additional testing~~
2022-10-25 06:36:51 +00:00
Michael Sproul
77b28177a4 Update Cargo lock 2022-10-21 10:31:34 +11:00
Michael Sproul
03fde98737 bls: uncompressed serialization 2022-10-20 23:05:01 +11:00
Michael Sproul
fd800ce755 Merge remote-tracking branch 'origin/freezer-tools' into tree-states 2022-10-19 15:07:27 +11:00
Michael Sproul
3841aa3580 Merge remote-tracking branch 'michael/separate-blocks' into tree-states 2022-10-19 14:37:30 +11:00
Michael Sproul
ff26c80068 Merge remote-tracking branch 'origin/unstable' into tree-states 2022-10-19 13:21:47 +11:00
Michael Sproul
e4cbdc1c77 Optimistic sync spec tests (v1.2.0) (#3564)
## Issue Addressed

Implements new optimistic sync test format from https://github.com/ethereum/consensus-specs/pull/2982.

## Proposed Changes

- Add parsing and runner support for the new test format.
- Extend the mock EL with a set of canned responses keyed by block hash. Although this doubles up on some of the existing functionality I think it's really nice to use compared to the `preloaded_responses` or static responses. I think we could write novel new opt sync tests using these primtives much more easily than the previous ones. Forks are natively supported, and different responses to `forkchoiceUpdated` and `newPayload` are also straight-forward.

## Additional Info

Blocked on merge of the spec PR and release of new test vectors.
2022-10-15 22:25:52 +00:00
Divma
4926e3967f [DEV FEATURE] Deterministic long lived subnets (#3453)
## Issue Addressed

#2847 

## Proposed Changes
Add under a feature flag the required changes to subscribe to long lived subnets in a deterministic way

## Additional Info

There is an additional required change that is actually searching for peers using the prefix, but I find that it's best to make this change in the future
2022-10-04 10:37:48 +00:00
GeemoCandama
6a92bf70e4 CLI tests for logging flags (#3609)
## Issue Addressed
Adding CLI tests for logging flags: log-color and disable-log-timestamp
Which issue # does this PR address?
#3588 
## Proposed Changes
Add CLI tests for logging flags as described in #3588 
Please list or describe the changes introduced by this PR.
Added logger_config to client::Config as suggested. Implemented Default for LoggerConfig based on what was being done elsewhere in the repo. Created 2 tests for each flag addressed.
## Additional Info

Please provide any additional information. For example, future considerations
or information useful for reviewers.
2022-10-04 08:33:40 +00:00
Pawan Dhananjay
8728c40102 Remove fallback support from eth1 service (#3594)
## Issue Addressed

N/A

## Proposed Changes

With https://github.com/sigp/lighthouse/pull/3214 we made it such that you can either have 1 auth endpoint or multiple non auth endpoints. Now that we are post merge on all networks (testnets and mainnet), we cannot progress a chain without a dedicated auth execution layer connection so there is no point in having a non-auth eth1-endpoint for syncing deposit cache. 

This code removes all fallback related code in the eth1 service. We still keep the single non-auth endpoint since it's useful for testing.

## Additional Info

This removes all eth1 fallback related metrics that were relevant for the monitoring service, so we might need to change the api upstream.
2022-10-04 08:33:39 +00:00
Michael Sproul
14135cf9be Cargo lock update 2022-09-29 17:02:23 +10:00
Divma
b1d2510d1b Libp2p v0.48.0 upgrade (#3547)
## Issue Addressed

Upgrades libp2p to v.0.47.0. This is the compilation of
- [x] #3495 
- [x] #3497 
- [x] #3491 
- [x] #3546 
- [x] #3553 

Co-authored-by: Age Manning <Age@AgeManning.com>
2022-09-29 01:50:11 +00:00
Paul Hauner
01e84b71f5 v3.1.2 (#3603)
## Issue Addressed

NA

## Proposed Changes

Bump versions to v3.1.2

## Additional Info

- ~~Blocked on several PRs.~~
- ~~Requires further testing.~~
2022-09-26 01:17:36 +00:00
Paul Hauner
3128b5b430 v3.1.1 (#3585)
## Issue Addressed

NA

## Proposed Changes

Bump versions

## Additional Info

- ~~Requires additional testing~~
- ~~Blocked on:~~
    - ~~#3589~~
    - ~~#3540~~
    - ~~#3587~~
2022-09-22 06:08:52 +00:00
Michael Sproul
dce526391b Merge remote-tracking branch 'origin/unstable' into tree-states 2022-09-22 10:13:02 +10:00
Paul Hauner
96692b8e43 Impl oneshot_broadcast for committee promises (#3595)
## Issue Addressed

NA

## Proposed Changes

Fixes an issue introduced in #3574 where I erroneously assumed that a `crossbeam_channel` multiple receiver queue was a *broadcast* queue. This is incorrect, each message will be received by *only one* receiver. The effect of this mistake is these logs:

```
Sep 20 06:56:17.001 INFO Synced                                  slot: 4736079, block: 0xaa8a…180d, epoch: 148002, finalized_epoch: 148000, finalized_root: 0x2775…47f2, exec_hash: 0x2ca5…ffde (verified), peers: 6, service: slot_notifier
Sep 20 06:56:23.237 ERRO Unable to validate attestation          error: CommitteeCacheWait(RecvError), peer_id: 16Uiu2HAm2Jnnj8868tb7hCta1rmkXUf5YjqUH1YPj35DCwNyeEzs, type: "aggregated", slot: Slot(4736047), beacon_block_root: 0x88d318534b1010e0ebd79aed60b6b6da1d70357d72b271c01adf55c2b46206c1
```

## Additional Info

NA
2022-09-21 01:01:50 +00:00
Paul Hauner
2cd3e3a768 Avoid duplicate committee cache loads (#3574)
## Issue Addressed

NA

## Proposed Changes

I have observed scenarios on Goerli where Lighthouse was receiving attestations which reference the same, un-cached shuffling on multiple threads at the same time. Lighthouse was then loading the same state from database and determining the shuffling on multiple threads at the same time. This is unnecessary load on the disk and RAM.

This PR modifies the shuffling cache so that each entry can be either:

- A committee
- A promise for a committee (i.e., a `crossbeam_channel::Receiver`)

Now, in the scenario where we have thread A and thread B simultaneously requesting the same un-cached shuffling, we will have the following:

1. Thread A will take the write-lock on the shuffling cache, find that there's no cached committee and then create a "promise" (a `crossbeam_channel::Sender`) for a committee before dropping the write-lock.
1. Thread B will then be allowed to take the write-lock for the shuffling cache and find the promise created by thread A. It will block the current thread waiting for thread A to fulfill that promise.
1. Thread A will load the state from disk, obtain the shuffling, send it down the channel, insert the entry into the cache and then continue to verify the attestation.
1. Thread B will then receive the shuffling from the receiver, be un-blocked and then continue to verify the attestation.

In the case where thread A fails to generate the shuffling and drops the sender, the next time that specific shuffling is requested we will detect that the channel is disconnected and return a `None` entry for that shuffling. This will cause the shuffling to be re-calculated.

## Additional Info

NA
2022-09-16 08:54:03 +00:00
Michael Sproul
2bd784ef68 Work in progress block separation 2022-09-16 17:32:22 +10:00
Michael Sproul
69584aa348 Merge remote-tracking branch 'origin/unstable' into tree-states 2022-09-14 13:51:23 +10:00
Michael Sproul
c4744849ea Cargo.lock fixes and EF test fixes 2022-09-14 11:38:46 +10:00
Michael Sproul
cd31e54b99 Bump axum deps (#3570)
## Issue Addressed

Fix a `cargo-audit` failure. We don't use `axum` for anything besides tests, but `cargo-audit` is failing due to this vulnerability in `axum-core`: https://rustsec.org/advisories/RUSTSEC-2022-0055
2022-09-13 01:57:47 +00:00
realbigsean
d1a8d6cf91 Pin mev rs deps (#3557)
## Issue Addressed

We were unable to update lighthouse by running `cargo update` because some of the `mev-build-rs` deps weren't pinned. But `mev-build-rs` is now pinned here and includes it's own pinned commits for `ssz-rs` and `etheruem-consensus`



Co-authored-by: realbigsean <sean@sigmaprime.io>
2022-09-08 23:46:03 +00:00
Divma
473abc14ca Subscribe to subnets only when needed (#3419)
## Issue Addressed

We currently subscribe to attestation subnets as soon as the subscription arrives (one epoch in advance), this makes it so that subscriptions for future slots are scheduled instead of done immediately. 

## Proposed Changes

- Schedule subscriptions to subnets for future slots.
- Finish removing hashmap_delay, in favor of [delay_map](https://github.com/AgeManning/delay_map). This was the only remaining service to do this.
- Subscriptions for past slots are rejected, before we would subscribe for one slot.
- Add a new test for subscriptions that are not consecutive.

## Additional Info

This is also an effort in making the code easier to understand
2022-09-05 00:22:48 +00:00
Paul Hauner
aa022f4685 v3.1.0 (#3525)
## Issue Addressed

NA

## Proposed Changes

- Bump versions

## Additional Info

- ~~Blocked on #3508~~
- ~~Blocked on #3526~~
- ~~Requires additional testing.~~
- Expected release date is 2022-09-01
2022-08-31 22:21:55 +00:00
Paul Hauner
8609cced0e Reset payload statuses when resuming fork choice (#3498)
## Issue Addressed

NA

## Proposed Changes

This PR is motivated by a recent consensus failure in Geth where it returned `INVALID` for an `VALID` block. Without this PR, the only way to recover is by re-syncing Lighthouse. Whilst ELs "shouldn't have consensus failures", in reality it's something that we can expect from time to time due to the complex nature of Ethereum. Being able to recover easily will help the network recover and EL devs to troubleshoot.

The risk introduced with this PR is that genuinely INVALID payloads get a "second chance" at being imported. I believe the DoS risk here is negligible since LH needs to be restarted in order to re-process the payload. Furthermore, there's no reason to think that a well-performing EL will accept a truly invalid payload the second-time-around.

## Additional Info

This implementation has the following intricacies:

1. Instead of just resetting *invalid* payloads to optimistic, we'll also reset *valid* payloads. This is an artifact of our existing implementation.
1. We will only reset payload statuses when we detect an invalid payload present in `proto_array`
    - This helps save us from forgetting that all our blocks are valid in the "best case scenario" where there are no invalid blocks.
1. If we fail to revert the payload statuses we'll log a `CRIT` and just continue with a `proto_array` that *does not* have reverted payload statuses.
    - The code to revert statuses needs to deal with balances and proposer-boost, so it's a failure point. This is a defensive measure to avoid introducing new show-stopping bugs to LH.
2022-08-29 14:34:41 +00:00
Michael Sproul
66eca1a882 Refactor op pool for speed and correctness (#3312)
## Proposed Changes

This PR has two aims: to speed up attestation packing in the op pool, and to fix bugs in the verification of attester slashings, proposer slashings and voluntary exits. The changes are bundled into a single database schema upgrade (v12).

Attestation packing is sped up by removing several inefficiencies: 

- No more recalculation of `attesting_indices` during packing.
- No (unnecessary) examination of the `ParticipationFlags`: a bitfield suffices. See `RewardCache`.
- No re-checking of attestation validity during packing: the `AttestationMap` provides attestations which are "correct by construction" (I have checked this using Hydra).
- No SSZ re-serialization for the clunky `AttestationId` type (it can be removed in a future release).

So far the speed-up seems to be roughly 2-10x, from 500ms down to 50-100ms.

Verification of attester slashings, proposer slashings and voluntary exits is fixed by:

- Tracking the `ForkVersion`s that were used to verify each message inside the `SigVerifiedOp`. This allows us to quickly re-verify that they match the head state's opinion of what the `ForkVersion` should be at the epoch(s) relevant to the message.
- Storing the `SigVerifiedOp` on disk rather than the raw operation. This allows us to continue track the fork versions after a reboot.

This is mostly contained in this commit 52bb1840ae.

## Additional Info

The schema upgrade uses the justified state to re-verify attestations and compute `attesting_indices` for them. It will drop any attestations that fail to verify, by the logic that attestations are most valuable in the few slots after they're observed, and are probably stale and useless by the time a node restarts. Exits and proposer slashings and similarly re-verified to obtain `SigVerifiedOp`s.

This PR contains a runtime killswitch `--paranoid-block-proposal` which opts out of all the optimisations in favour of closely verifying every included message. Although I'm quite sure that the optimisations are correct this flag could be useful in the event of an unforeseen emergency.

Finally, you might notice that the `RewardCache` appears quite useless in its current form because it is only updated on the hot-path immediately before proposal. My hope is that in future we can shift calls to `RewardCache::update` into the background, e.g. while performing the state advance. It is also forward-looking to `tree-states` compatibility, where iterating and indexing `state.{previous,current}_epoch_participation` is expensive and needs to be minimised.
2022-08-29 09:10:26 +00:00
Michael Sproul
209a109877 Add freezer DB debugging tools 2022-08-26 16:50:43 +10:00