N/A
The difference is computed by taking the difference of expected with received. We were doing the inverse.
Thanks to Yassine for finding the issue.
Co-Authored-By: Pawan Dhananjay <pawandhananjay@gmail.com>
Take 2 of #8390.
Fixes the race condition properly instead of propagating the error. I think this is a better alternative, and doesn't seem to look that bad.
* Lift node id loading or generation from `NetworkService ` startup to the `ClientBuilder`, so that it can be used to compute custody columns for the beacon chain without waiting for Network bootstrap.
I've considered and implemented a few alternatives:
1. passing `node_id` to beacon chain builder and compute columns when creating `CustodyContext`. This approach isn't good for separation of concerns and isn't great for testability
2. passing `ordered_custody_groups` to beacon chain. `CustodyContext` only uses this to compute ordered custody columns, so we might as well lift this logic out, so we don't have to do error handling in `CustodyContext` construction. Less tests to update;.
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
Addressed this comment here: https://github.com/sigp/lighthouse/issues/6837#issuecomment-3509209465
Lighthouse can only checkpoint sync from a server that can serve blob sidecars, which means they need to be at least custdoying 50% of columns (semi-supernodes)
This PR lifts this constraint, as blob sidecar endpoint is getting deprecated in Fulu, and we plan to fetch the checkpoint data columns from peers (#6837)
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
Fix the span on execution payload verification (newPayload), by creating a new span rather than using the parent span. Using the parent span was incorrectly associating the time spent verifying the payload with `from_signature_verified_components`.
Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
Fix an issue detected by @jimmygchen that occurs when checkpoint sync is aborted midway and then later restarted.
The characteristic error is something like:
> Nov 13 00:51:35.832 ERROR Database write failed error: Hdiff(LessThanStart(Slot(1728288), Slot(1728320))), action: "reverting blob DB changes"
Nov 13 00:51:35.833 WARN Hot DB pruning failed error: DBError(HotColdDBError(Rollback))
This issue has existed since v7.1.0.
Delete snapshot/diff in the case where `hot_storage_strategy` fails.
Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
#6022
Migrate the `execution_engine_integration` tests to the `alloy` ecosystem. This removes the last remaining `ethers` dependencies
Co-Authored-By: Mac L <mjladson@pm.me>
Part of a fork-choice tech debt clean-up https://github.com/sigp/lighthouse/issues/8325https://github.com/sigp/lighthouse/issues/7089 (non-finalized checkpoint sync) changes the meaning of the checkpoints inside fork-choice. It turns out that we persist the justified and finalized checkpoints **twice** in fork-choice
1. Inside the fork-choice store
2. Inside the proto-array
There's no reason for 2. except for making the function signature of some methods smallers. It's not consistent with the rest of the crate, because in some functions we pass the external variable of time (current_slot) via args, but then read the finalized checkpoint from the internal state. Passing both variables as args makes fork-choice easier to reason about at the cost of a few extra lines.
Remove the unnecessary state (`justified_checkpoint`, `finalized_checkpoint`) inside `ProtoArray`, to make it easier to reason about.
Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>
Co-Authored-By: Michael Sproul <michaelsproul@users.noreply.github.com>
State advances were observed as especially slow on pre-Fulu networks (mainnet).
The reason being: we were doing an extra epoch of state advance because of code that should only have been running after Fulu, when proposer shufflings are determined with lookahead.
Only attempt to cache the _next epoch_ shuffling if the state's slot determines it (this will only be true post-Fulu). Reusing the logic for `proposer_shuffling_decision_slot` avoids having to repeat the fiddly logic about the Fulu fork epoch itself.
Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
since block and blob both start with `bl`, it was not clear how to differentiate between `blbroots_queue` and `bbroots_queue`
After renaming, there also seems to be a discrepancy
Co-Authored-By: Kevaundray Wedderburn <kevtheappdev@gmail.com>
Debugging https://github.com/sigp/lighthouse/issues/8104 it would have been helpful to quickly see in the logs that a specific block was submitted into the HTTP API.
Because we want to optimize the block root computation we don't include it in the logs, and just log the block slot. I believe we can take a minute performance hit to have the block root in all the logs during block publishing.
Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
#6022
Use `alloy_rpc_types::Transaction` to replace the `ethers_core::Transaction` inside the execution block generator.
Co-Authored-By: Mac L <mjladson@pm.me>
FIx flaky tests that depends on timing. Previously the test processes all 128 columns and expect reconstruction to happen after all columns are processed. There is a race here, and reconstruction could be triggered before all columns are processed.
I've updated the tests to process 64 columns, just enough for reconstruction and wait for 50ms for reconstruction to be triggered.
This PR requires the change made in https://github.com/sigp/lighthouse/pull/8194 for the test to pass consistently (blob count set to 1 for all blocks instead of random blob count between 0..max)
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
Co-Authored-By: Jimmy Chen <jimmy@sigmaprime.io>
Another good candidate for publishing separately from Lighthouse is `sensitive_url` as it's a general utility crate and not related to Ethereum. This PR prepares it to be spun out into its own crate.
I've made the `full` field on `SensitiveUrl` private and instead provided an explicit getter called `.expose_full()`. It's a bit ugly for the diff but I prefer the explicit nature of the getter.
I've also added some extra tests and doc strings along with feature gating `Serialize` and `Deserialize` implementations behind the `serde` feature.
Co-Authored-By: Mac L <mjladson@pm.me>
### Downgrade a non error to `Debug`
I noticed this error on one of our hoodi nodes:
```
Nov 04 05:13:38.892 ERROR Error during data column reconstruction block_root: 0x4271b9efae7deccec3989bd2418e998b83ce8144210c2b17200abb62b7951190, error: DuplicateFullyImported(0x4271b9efae7deccec3989bd2418e998b83ce8144210c2b17200abb62b7951190)
```
This shouldn't be logged as an error and it's due to a normal race condition, and it doesn't impact the node negatively.
### Remove spammy logs
This logs is filling up the log files quite quickly and it is also something we'd expect during normal operation - getting columns via EL before gossip. We haven't found this debug log to be useful, so I propose we remove it to avoid spamming debug logs.
```
Received already available column sidecar. Ignoring the column sidecar
```
In the process of removing this, I noticed we aren't propagating the validation result, which I think we should so I've added this. The impact should be quite minimal - the message will stay in the gossip memcache for a bit longer but should be evicted in the next heartbeat.
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
while working on this #7892 @michaelsproul pointed it might be a good metric to measure the delay from start of the slot instead of the current `slot_duration / 3`, since the attestations duties start before the `1/3rd` mark now with the change in the link PR.
Co-Authored-By: hopinheimer <knmanas6@gmail.com>
Co-Authored-By: hopinheimer <48147533+hopinheimer@users.noreply.github.com>
This is an optimisation targeted at Fulu networks in non-finality.
While debugging on Holesky, we found that `state_root_at_slot` was being called from `prepare_beacon_proposer` a lot, for the finalized state:
2c9b670f5d/beacon_node/http_api/src/lib.rs (L3860-L3861)
This was causing `prepare_beacon_proposer` calls to take upwards of 5 seconds, sometimes 10 seconds, because it would trigger _multiple_ beacon state loads in order to iterate back to the finalized slot. Ideally, loading the finalized state should be quick because we keep it cached in the state cache (technically we keep the split state, but they usually coincide). Instead we are computing the finalized state root separately (slow), and then loading the state from the cache (fast).
Although it would be possible to make the API faster by removing the `state_root_at_slot` call, I believe it's simpler to change `state_root_at_slot` itself and remove the footgun. Devs rightly expect operations involving the finalized state to be fast.
Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
`beacon-chain-tests` is now regularly taking 1h+ on CI since Fulu fork was added.
This PR attemtpts to reduce the test time by bringing down the number of blobs generated in tests - instead of generating 0..max_blobs, the generator now generates 0..1 blobs by default, and this can be modified by setting `harness.execution_block_generator.set_min_blob_count(n)`.
Note: The blobs are pre-generated and doesn't require too much CPU to generate however processing a larger number of them on the beacon chain does take a lot of time.
This PR also include a few other small improvements
- Our slowest test (`chain_segment_varying_chunk_size`) runs 3x faster in Fulu just by reusing chain segments
- Avoid re-running fork specific tests on all forks
- Fix a bunch of tests that depends on the harness's existing random blob generation, which is fragile
beacon chain test time on test machine is **~2x** faster:
### `unstable`
```
Summary [ 751.586s] 291 tests run: 291 passed (13 slow), 0 skipped
```
### this branch
```
Summary [ 373.792s] 291 tests run: 291 passed (2 slow), 0 skipped
```
The next set of tests to optimise is the ones that use [`get_chain_segment`](77a9af96de/beacon_node/beacon_chain/tests/block_verification.rs (L45)), as it by default build 320 blocks with supernode - an easy optimisation would be to build these blocks with cgc = 8 for tests that only require fullnodes.
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
Co-Authored-By: Jimmy Chen <jimmy@sigmaprime.io>
N/A
Includes the following unmerged PRs:
- #8344
- #8335
- #8339
This PR should be merged after all above PRs are merged.
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
Co-Authored-By: Jimmy Chen <jimmy@sigmaprime.io>
Custody backfill sync has a bug when we request columns from more than one peer per batch. The fix here ensures we wait for all requests to be completed before performing verification and importing the responses.
I've also added an endpoint `lighthouse/custody/backfill` that resets a nodes earliest available data column to the current epoch so that custody backfill can be triggered. This endpoint is needed to rescue any nodes that may have missing columns due to the custody backfill sync bug without requiring a full re-sync.
Co-Authored-By: Eitan Seri- Levi <eserilev@gmail.com>
Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
Co-Authored-By: Michael Sproul <michaelsproul@users.noreply.github.com>
The beacon API spec wasn't updated to use the Fulu definition of `dependent_root` for the proposer duties endpoint. No other client updated their logic, so to retain backwards compatibility the decision has been made to continue using the block root at the end of epoch `N - 1`, and introduce a new v2 endpoint down the track to use the correct dependent root.
Eth R&D discussion: https://discord.com/channels/595666850260713488/598292067260825641/1433036715848765562
Change the behaviour of the v1 endpoint back to using the last slot of `N - 1` rather than the last slot of `N - 2`. This introduces the possibility of dependent root false positives (the root can change without changing the shuffling), but causes the least compatibility issues with other clients.
Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
#8311
Removes the `git_version` crate from `lighthouse_version` and implements git `HEAD` tracking manually.
This removes the (mostly) broken dirty tracking but prevents spurious recompilation of the `lighthouse_version` crate.
This also reworks the way crate versions are handled by utilizing workspace version inheritance and Cargo environment variables.
This means the _only_ place where Lighthouse's version is defined is in the top level `Cargo.toml` for the workspace. All relevant binaries then inherit this version. This largely makes the `change_version.sh` script useless so I've removed it, although we could keep a version which just alters the workspace version (if we need to maintain compatibility with certain build/release tooling.
### When is a Rebuild Triggered?
1. When the build.rs file is changed.
2. When the HEAD commit changes (added, removed, rebased, etc)
3. When the branch changes (this includes changing to the current branch, and creating a detached HEAD)
Note that working/staged changes will not trigger a recompile of `lighthouse_version`.
Co-Authored-By: Mac L <mjladson@pm.me>
Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
During custody backfill sync there could be an edge case where we update CGC at the same time where we are importing a batch of columns which may cause us to incorrectly overwrite values when calling `backfill_validator_custody_requirements`. To prevent this race condition, the expected cgc is now passed into this function and is used to check if the expected cgc == the current validator cgc. If the values arent equal, this probably indicates that a very recent CGC occurred so we do not prune/update values in the `epoch_validator_custody_requirements` map.
Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>
Bump gas limit to 60M as part of Fusaka mainnet release.
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
Co-Authored-By: Jimmy Chen <jimmy@sigmaprime.io>
Fixes#8268
Switch `est_time` from time until DA boundary slot, to time to finish total custody work from the original earliest data-column slot down to the DA boundary
Co-Authored-By: PoulavBhowmick03 <bpoulav@gmail.com>
Update the EF spec tests to v1.6.0-beta.1
There are a few new light client tests (which we pass), and some for progressive containers, which we haven't implemented (we ignore them).
Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
Partially addresses #8248
Run the beacon chain, http and network tests only for recent forks instead of everything from phase 0.
Also added gloas also to the recent forks list. I thought that would be a good way to know if changes in the current fork affect future forks.
Not completely sure if we should run for future forks, but added it so that we can discuss here.
Co-Authored-By: Pawan Dhananjay <pawandhananjay@gmail.com>
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
Currently the `eth2` crate lib file is a large monolith of almost 3000 lines of code. As part of the bosun migration we are trying to increase code readability and modularity in the lighthouse crates initially, which then can be transferred to bosun
Co-Authored-By: hopinheimer <knmanas6@gmail.com>
Co-Authored-By: hopinheimer <48147533+hopinheimer@users.noreply.github.com>
https://github.com/sigp/lighthouse/issues/8012
Replace all instances of `VariableList::from` and `FixedVector::from` to their `try_from` variants.
While I tried to use proper error handling in most cases, there were certain situations where adding an `expect` for situations where `try_from` can trivially never fail avoided adding a lot of extra complexity.
Co-Authored-By: Mac L <mjladson@pm.me>
Co-Authored-By: Michael Sproul <michaelsproul@users.noreply.github.com>
Co-Authored-By: Michael Sproul <michael@sigmaprime.io>