Which issue # does this PR address?
#8586
Please list or describe the changes introduced by this PR.
Remove `service_name` from `TaskExecutor`
Co-Authored-By: Abhivansh <31abhivanshj@gmail.com>
Instrument beacon node startup and parallelise pubkey cache initialisation.
I instrumented beacon node startup and noticed that pubkey cache takes a long time to initialise, mostly due to decompressing all the validator pubkeys.
This PR uses rayon to parallelize the decompression on initial checkpoint sync. The pubkeys are stored uncompressed, so the decopression time is not a problem on subsequent restarts. On restarts, we still deserialize pubkeys, but the timing is quite minimal on Sepolia so I didn't investigate further.
`validator_pubkey_cache_new` timing on Sepolia:
* before: 109.64ms
* with parallelization: 21ms
on Hoodi:
* before: times out with Kurtosis after 120s
* with parallelization: 12.77s to import keys
**UPDATE**: downloading checkpoint state + genesis state takes about 2 minutes on my laptop, so it seems like the BN managed to start the http server just before timing out (after the optimisation).
<img width="1380" height="625" alt="image" src="https://github.com/user-attachments/assets/4c548c14-57dd-4b47-af9a-115b15791940" />
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
Take 2 of #8390.
Fixes the race condition properly instead of propagating the error. I think this is a better alternative, and doesn't seem to look that bad.
* Lift node id loading or generation from `NetworkService ` startup to the `ClientBuilder`, so that it can be used to compute custody columns for the beacon chain without waiting for Network bootstrap.
I've considered and implemented a few alternatives:
1. passing `node_id` to beacon chain builder and compute columns when creating `CustodyContext`. This approach isn't good for separation of concerns and isn't great for testability
2. passing `ordered_custody_groups` to beacon chain. `CustodyContext` only uses this to compute ordered custody columns, so we might as well lift this logic out, so we don't have to do error handling in `CustodyContext` construction. Less tests to update;.
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
Addressed this comment here: https://github.com/sigp/lighthouse/issues/6837#issuecomment-3509209465
Lighthouse can only checkpoint sync from a server that can serve blob sidecars, which means they need to be at least custdoying 50% of columns (semi-supernodes)
This PR lifts this constraint, as blob sidecar endpoint is getting deprecated in Fulu, and we plan to fetch the checkpoint data columns from peers (#6837)
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
Addresses #8218
A simplified version of #8241 for the initial release.
I've tried to minimise the logic change in this PR, although introducing the `NodeCustodyType` enum still result in quite a bit a of diff, but the actual logic change in `CustodyContext` is quite small.
The main changes are in the `CustdoyContext` struct
* ~~combining `validator_custody_count` and `current_is_supernode` fields into a single `custody_group_count_at_head` field. We persist the cgc of the initial cli values into the `custody_group_count_at_head` field and only allow for increase (same behaviour as before).~~
* I noticed the above approach caused a backward compatibility issue, I've [made a fix](15569bc085) and changed the approach slightly (which was actually what I had originally in mind):
* when initialising, only override the `validator_custody_count` value if either flag `--supernode` or `--semi-supernode` is used; otherwise leave it as the existing default `0`. Most other logic remains unchanged.
All existing validator custody unit tests are still all passing, and I've added additional tests to cover semi-supernode, and restoring `CustodyContext` from disk.
Note: I've added a `WARN` if the user attempts to switch to a `--semi-supernode` or `--supernode` - this currently has no effect, but once @eserilev column backfill is merged, we should be able to support this quite easily.
Things to test
- [x] cgc in metadata / enr
- [x] cgc in metrics
- [x] subscribed subnets
- [x] getBlobs endpoint
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
Part of #7866
- Continuation of #7921
In the above PR, we enabled rayon for batch KZG verification in chain segment processing. However, using the global rayon thread pool for backfill is likely to create resource contention with higher-priority beacon processor work.
This PR introduces a dedicated low-priority rayon thread pool `LOW_PRIORITY_RAYON_POOL` and uses it for processing backfill chain segments.
This prevents backfill KZG verification from using the global rayon thread pool and competing with high-priority beacon processor tasks for CPU resources.
However, this PR by itself doesn't prevent CPU oversubscription because other tasks could still fill up the global rayon thread pool, and having an extra thread pool could make things worse. To address this we need the beacon
processor to coordinate total CPU allocation across all tasks, which is covered in:
- #7789
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
Co-Authored-By: Eitan Seri- Levi <eserilev@gmail.com>
Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>
This PR fixes a bug where wrong columns could get processed immediately after a CGC increase.
Scenario:
- The node's CGC increased due to additional validators attached to it (lets say from 10 to 11)
- The new CGC is advertised and new subnets are subscribed immediately, however the change won't be effective in the data availability check until the next epoch (See [this](ab0e8870b4/beacon_node/beacon_chain/src/validator_custody.rs (L93-L99))). Data availability checker still only require 10 columns for the current epoch.
- During this time, data columns for the additional custody column (lets say column 11) may arrive via gossip as we're already subscribed to the topic, and it may be incorrectly used to satisfy the existing data availability requirement (10 columns), and result in this additional column (instead of a required one) getting persisted, resulting in database inconsistency.
N/A
After the electra fork which includes EIP 6110, the beacon node no longer needs the eth1 bridging mechanism to include new deposits as they are provided by the EL as a `deposit_request`. So after electra + a transition period where the finalized bridge deposits pre-fork are included through the old mechanism, we no longer need the elaborate machinery we had to get deposit contract data from the execution layer.
Since holesky has already forked to electra and completed the transition period, this PR basically checks to see if removing all the eth1 related logic leads to any surprises.
Partially https://github.com/sigp/lighthouse/issues/6291
This PR removes the reprocess event channel from being externally exposed. All work events are now sent through the single `BeaconProcessorSend` channel. I've introduced a new `Work::Reprocess` enum variant which we then use to schedule jobs for reprocess. I've also created a new scheduler module which will eventually house the different scheduler impls.
This is all needed as an initial step to generalize the beacon processor
A "full" implementation for the generalized beacon processor can be found here
https://github.com/sigp/lighthouse/pull/6448
I'm going to try to break up the full implementation into smaller PR's so it can actually be reviewed
Which issue # does this PR address?
Closes#7457
Added `E::slots_per_epoch()` and now it ensures conversion from epochs to slots while calculating deneb time
#6296: Deterministic RNG in peer DAS publish block tests
Made test functions to call publish-block APIs with true for the deterministic RNG boolean parameter while production code with false. This will deterministically shuffle columns for unit tests under broadcast_validation_tests.rs.
* First pass
* Add restrictions to RuntimeVariableList api
* Use empty_uninitialized and fix warnings
* Fix some todos
* Merge branch 'unstable' into max-blobs-preset
* Fix take impl on RuntimeFixedList
* cleanup
* Fix test compilations
* Fix some more tests
* Fix test from unstable
* Merge branch 'unstable' into max-blobs-preset
* Merge remote-tracking branch 'origin/unstable' into max-blobs-preset
* Remove footgun function
* Minor simplifications
* Move from preset to config
* Fix typo
* Revert "Remove footgun function"
This reverts commit de01f923c7.
* Try fixing tests
* Thread through ChainSpec
* Fix release tests
* Move RuntimeFixedVector into module and rename
* Add test
* Remove empty RuntimeVarList awefullness
* Fix tests
* Simplify BlobSidecarListFromRoot
* Merge remote-tracking branch 'origin/unstable' into max-blobs-preset
* Bump quota to account for new target (6)
* Remove clone
* Fix issue from review
* Try to remove ugliness
* Merge branch 'unstable' into max-blobs-preset
* Fix max value
* Fix doctest
* Fix formatting
* Fix max check
* Delete hardcoded max_blobs_per_block in RPC limits
* Merge remote-tracking branch 'origin/unstable' into max-blobs-preset
* Add peerdas KZG library and use it for data column construction and cell kzg verification (#5701, #5941, #6118, #6179)
Co-authored-by: kevaundray <kevtheappdev@gmail.com>
* Update `rust_eth_kzg` crate to published version.
* Update kzg metrics buckets.
* Merge branch 'unstable' into peerdas-kzg
* Update KZG version to fix windows mem allocation.
* Refactor common logic from build sidecar and reconstruction. Remove unnecessary `needless_lifetimes`.
Co-authored-by: realbigsean <sean@sigmaprime.io>
* Copy existing trusted setup into `PeerDASTrustedSetup` for consistency and maintain `--trusted-setup` functionality.
* Merge branch 'unstable' into peerdas-kzg
* Merge branch 'peerdas-kzg' of github.com:jimmygchen/lighthouse into peerdas-kzg
* Merge branch 'unstable' into peerdas-kzg
* Merge branch 'unstable' into peerdas-kzg
* Load PeerDAS KZG only if PeerDAS is enabled.
* Add plumbing for peerdas supernodes (#5050, #5409, #5570, #5966)
- add cli option `--subscribe-to-all-data-columns`
- add custody subnet count to ENR, only if PeerDAS is scheduled
- subscribe to data column topics, only if PeerDAS is scheduled
Co-authored-by: Jacob Kaufmann <jacobkaufmann18@gmail.com>
* Merge branch 'unstable' into das-supernode
* Update CLI docs.
* Merge branch 'unstable' into das-supernode
* Fix fork epoch comparison with `FAR_FUTURE_EPOCH`.
* Merge branch 'unstable' into das-supernode
* Hide `--subscribe-all-data-column-subnets` flag and update help.
* Fix docs only
* Merge branch 'unstable' into das-supernode
* MVP implementation (untested)
* update store checkpoint sync test
* update cli help
* Merge pull request #5253 from realbigsean/checkpoint-blobs-sean
Checkpoint blobs sean
* Warn only if blobs are missing from server
* Merge remote-tracking branch 'origin/unstable' into checkpoint-blobs
* Verify checkpoint blobs
* Move blob verification earlier
* Disallow Syncing From Genesis By Default
* Fix CLI Tests
* Perform checks in the `ClientBuilder`
* Tidy, fix tests
* Return an error based on the Deneb fork
* Fix typos
* Fix failing test
* Add missing CLI flag
* Fix CLI flags
* Add suggestion from Sean
* Fix conflict with blob sidecars epochs
---------
Co-authored-by: Mark Mackey <mark@sigmaprime.io>
* update libp2p and address compiler errors
* remove bandwidth logging from transport
* use libp2p registry
* make clippy happy
* use rust 1.73
* correct rpc keep alive
* remove comments and obsolte code
* remove libp2p prefix
* make clippy happy
* use quic under facade
* remove fast msg id
* bubble up close statements
* fix wrong comment
* use workspace deps in kzg crate
* delete unused blobs dp path field
* full match on fork name in engine api get payload v3
* only accept v3 payloads on get payload v3 endpoint in mock el
* remove FIXMEs related to merge transition tests
* move static tx to test utils
* default max_per_epoch_activation_churn_limit to mainnet value
* remove unnecessary async
* remove comment
* use task executor in `blob_sidecars` endpoint
## Issue Addressed
#4738
## Proposed Changes
See the above issue for details. Went with option #2 to use the async reqwest client in `Eth2NetworkConfig` and propagate the async-ness.
## Issue Addressed
NA
## Proposed Changes
Add the Holesky network config as per 36e4ff2d51/custom_config_data.
Since the genesis state is ~190MB, I've opted to *not* include it in the binary and instead download it at runtime (see #4564 for context). To download this file we have:
- A hard-coded URL for a SigP-hosted S3 bucket with the Holesky genesis state. Assuming this download works correctly, users will be none the wiser that the state wasn't included in the binary (apart from some additional logs)
- If the user provides a `--checkpoint-sync-url` flag, then LH will download the genesis state from that server rather than our S3 bucket.
- If the user provides a `--genesis-state-url` flag, then LH will download the genesis state from that server regardless of the S3 bucket or `--checkpoint-sync-url` flag.
- Whenever a genesis state is downloaded it is checked against a checksum baked into the binary.
- A genesis state will never be downloaded if it's already included in the binary.
- There is a `--genesis-state-url-timeout` flag to tweak the timeout for downloading the genesis state file.
## Log Output
Example of log output when a state is downloaded:
```bash
Aug 23 05:40:13.424 INFO Logging to file path: "/Users/paul/.lighthouse/holesky/beacon/logs/beacon.log"
Aug 23 05:40:13.425 INFO Lighthouse started version: Lighthouse/v4.3.0-bd9931f+
Aug 23 05:40:13.425 INFO Configured for network name: holesky
Aug 23 05:40:13.426 INFO Data directory initialised datadir: /Users/paul/.lighthouse/holesky
Aug 23 05:40:13.427 INFO Deposit contract address: 0x4242424242424242424242424242424242424242, deploy_block: 0
Aug 23 05:40:13.427 INFO Downloading genesis state info: this may take some time on testnets with large validator counts, timeout: 60s, server: https://sigp-public-genesis-states.s3.ap-southeast-2.amazonaws.com/
Aug 23 05:40:29.895 INFO Starting from known genesis state service: beacon
```
Example of log output when there are no URLs specified:
```
Aug 23 06:29:51.645 INFO Logging to file path: "/Users/paul/.lighthouse/goerli/beacon/logs/beacon.log"
Aug 23 06:29:51.646 INFO Lighthouse started version: Lighthouse/v4.3.0-666a39c+
Aug 23 06:29:51.646 INFO Configured for network name: goerli
Aug 23 06:29:51.647 INFO Data directory initialised datadir: /Users/paul/.lighthouse/goerli
Aug 23 06:29:51.647 INFO Deposit contract address: 0xff50ed3d0ec03ac01d4c79aad74928bff48a7b2b, deploy_block: 4367322
The genesis state is not present in the binary and there are no known download URLs. Please use --checkpoint-sync-url or --genesis-state-url.
```
## Additional Info
I tested the `--genesis-state-url` flag with all 9 Goerli checkpoint sync servers on https://eth-clients.github.io/checkpoint-sync-endpoints/ and they all worked 🎉
My IDE eagerly formatted some `Cargo.toml`. I've disabled it but I don't see the value in spending time reverting the changes that are already there.
I also added the `GenesisStateBytes` enum to avoid an unnecessary clone on the genesis state bytes baked into the binary. This is not a huge deal on Mainnet, but will become more relevant when testing with big genesis states.
When we do a fresh checkpoint sync we're downloading the genesis state to check the `genesis_validators_root` against the finalised state we receive. This is not *entirely* pointless, since we verify the checksum when we download the genesis state so we are actually guaranteeing that the finalised state is on the same network. There might be a smarter/less-download-y way to go about this, but I've run out of cycles to figure that out. Perhaps we can grab it in the next release?
## Issue Addressed
Fixes a bug in the handling of `--beacon-process-max-workers` which caused it to have no effect.
## Proposed Changes
For this PR I channeled @ethDreamer and saw deep into the faulty CLI config -- this bug is almost identical to the one Mark found and fixed in #4622.
## Issue Addressed
Closes#3210Closes#3211
## Proposed Changes
- Checkpoint sync from the latest finalized state regardless of its alignment.
- Add the `block_root` to the database's split point. This is _only_ added to the in-memory split in order to avoid a schema migration. See `load_split`.
- Add a new method to the DB called `get_advanced_state`, which looks up a state _by block root_, with a `state_root` as fallback. Using this method prevents accidental accesses of the split's unadvanced state, which does not exist in the hot DB and is not guaranteed to exist in the freezer DB at all. Previously Lighthouse would look up this state _from the freezer DB_, even if it was required for block/attestation processing, which was suboptimal.
- Replace several state look-ups in block and attestation processing with `get_advanced_state` so that they can't hit the split block's unadvanced state.
- Do not store any states in the freezer database by default. All states will be deleted upon being evicted from the hot database unless `--reconstruct-historic-states` is set. The anchor info which was previously used for checkpoint sync is used to implement this, including when syncing from genesis.
## Additional Info
Needs further testing. I want to stress-test the pruned database under Hydra.
The `get_advanced_state` method is intended to become more relevant over time: `tree-states` includes an identically named method that returns advanced states from its in-memory cache.
Co-authored-by: realbigsean <seananderson33@gmail.com>