Which issue # does this PR address?
#8586
Please list or describe the changes introduced by this PR.
Remove `service_name` from `TaskExecutor`
Co-Authored-By: Abhivansh <31abhivanshj@gmail.com>
Instrument beacon node startup and parallelise pubkey cache initialisation.
I instrumented beacon node startup and noticed that pubkey cache takes a long time to initialise, mostly due to decompressing all the validator pubkeys.
This PR uses rayon to parallelize the decompression on initial checkpoint sync. The pubkeys are stored uncompressed, so the decopression time is not a problem on subsequent restarts. On restarts, we still deserialize pubkeys, but the timing is quite minimal on Sepolia so I didn't investigate further.
`validator_pubkey_cache_new` timing on Sepolia:
* before: 109.64ms
* with parallelization: 21ms
on Hoodi:
* before: times out with Kurtosis after 120s
* with parallelization: 12.77s to import keys
**UPDATE**: downloading checkpoint state + genesis state takes about 2 minutes on my laptop, so it seems like the BN managed to start the http server just before timing out (after the optimisation).
<img width="1380" height="625" alt="image" src="https://github.com/user-attachments/assets/4c548c14-57dd-4b47-af9a-115b15791940" />
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
Take 2 of #8390.
Fixes the race condition properly instead of propagating the error. I think this is a better alternative, and doesn't seem to look that bad.
* Lift node id loading or generation from `NetworkService ` startup to the `ClientBuilder`, so that it can be used to compute custody columns for the beacon chain without waiting for Network bootstrap.
I've considered and implemented a few alternatives:
1. passing `node_id` to beacon chain builder and compute columns when creating `CustodyContext`. This approach isn't good for separation of concerns and isn't great for testability
2. passing `ordered_custody_groups` to beacon chain. `CustodyContext` only uses this to compute ordered custody columns, so we might as well lift this logic out, so we don't have to do error handling in `CustodyContext` construction. Less tests to update;.
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
Addressed this comment here: https://github.com/sigp/lighthouse/issues/6837#issuecomment-3509209465
Lighthouse can only checkpoint sync from a server that can serve blob sidecars, which means they need to be at least custdoying 50% of columns (semi-supernodes)
This PR lifts this constraint, as blob sidecar endpoint is getting deprecated in Fulu, and we plan to fetch the checkpoint data columns from peers (#6837)
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
Fixes#8268
Switch `est_time` from time until DA boundary slot, to time to finish total custody work from the original earliest data-column slot down to the DA boundary
Co-Authored-By: PoulavBhowmick03 <bpoulav@gmail.com>
Addresses #8218
A simplified version of #8241 for the initial release.
I've tried to minimise the logic change in this PR, although introducing the `NodeCustodyType` enum still result in quite a bit a of diff, but the actual logic change in `CustodyContext` is quite small.
The main changes are in the `CustdoyContext` struct
* ~~combining `validator_custody_count` and `current_is_supernode` fields into a single `custody_group_count_at_head` field. We persist the cgc of the initial cli values into the `custody_group_count_at_head` field and only allow for increase (same behaviour as before).~~
* I noticed the above approach caused a backward compatibility issue, I've [made a fix](15569bc085) and changed the approach slightly (which was actually what I had originally in mind):
* when initialising, only override the `validator_custody_count` value if either flag `--supernode` or `--semi-supernode` is used; otherwise leave it as the existing default `0`. Most other logic remains unchanged.
All existing validator custody unit tests are still all passing, and I've added additional tests to cover semi-supernode, and restoring `CustodyContext` from disk.
Note: I've added a `WARN` if the user attempts to switch to a `--semi-supernode` or `--supernode` - this currently has no effect, but once @eserilev column backfill is merged, we should be able to support this quite easily.
Things to test
- [x] cgc in metadata / enr
- [x] cgc in metrics
- [x] subscribed subnets
- [x] getBlobs endpoint
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
#7603
#### Custody backfill sync service
Similar in many ways to the current backfill service. There may be ways to unify the two services. The difficulty there is that the current backfill service tightly couples blocks and their associated blobs/data columns. Any attempts to unify the two services should be left to a separate PR in my opinion.
#### `SyncNeworkContext`
`SyncNetworkContext` manages custody sync data columns by range requests separetly from other sync RPC requests. I think this is a nice separation considering that custody backfill is its own service.
#### Data column import logic
The import logic verifies KZG committments and that the data columns block root matches the block root in the nodes store before importing columns
#### New channel to send messages to `SyncManager`
Now external services can communicate with the `SyncManager`. In this PR this channel is used to trigger a custody sync. Alternatively we may be able to use the existing `mpsc` channel that the `SyncNetworkContext` uses to communicate with the `SyncManager`. I will spend some time reviewing this.
Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>
Co-Authored-By: Eitan Seri- Levi <eserilev@gmail.com>
Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>
Temporary stop gap for #7002
Downgrade light client errors to debug
We eventually should fix our light client objects so they can consist of data across forks.
Co-Authored-By: Eitan Seri- Levi <eserilev@gmail.com>
Part of #7866
- Continuation of #7921
In the above PR, we enabled rayon for batch KZG verification in chain segment processing. However, using the global rayon thread pool for backfill is likely to create resource contention with higher-priority beacon processor work.
This PR introduces a dedicated low-priority rayon thread pool `LOW_PRIORITY_RAYON_POOL` and uses it for processing backfill chain segments.
This prevents backfill KZG verification from using the global rayon thread pool and competing with high-priority beacon processor tasks for CPU resources.
However, this PR by itself doesn't prevent CPU oversubscription because other tasks could still fill up the global rayon thread pool, and having an extra thread pool could make things worse. To address this we need the beacon
processor to coordinate total CPU allocation across all tasks, which is covered in:
- #7789
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
Co-Authored-By: Eitan Seri- Levi <eserilev@gmail.com>
Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>
Adds the required boilerplate code for the Gloas (Glamsterdam) hard fork. This allows PRs testing Gloas-candidate features to test fork transition.
This also includes de-duplication of post-Bellatrix readiness notifiers from #6797 (credit to @dapplion)
This PR fixes a bug where wrong columns could get processed immediately after a CGC increase.
Scenario:
- The node's CGC increased due to additional validators attached to it (lets say from 10 to 11)
- The new CGC is advertised and new subnets are subscribed immediately, however the change won't be effective in the data availability check until the next epoch (See [this](ab0e8870b4/beacon_node/beacon_chain/src/validator_custody.rs (L93-L99))). Data availability checker still only require 10 columns for the current epoch.
- During this time, data columns for the additional custody column (lets say column 11) may arrive via gossip as we're already subscribed to the topic, and it may be incorrectly used to satisfy the existing data availability requirement (10 columns), and result in this additional column (instead of a required one) getting persisted, resulting in database inconsistency.
N/A
After the electra fork which includes EIP 6110, the beacon node no longer needs the eth1 bridging mechanism to include new deposits as they are provided by the EL as a `deposit_request`. So after electra + a transition period where the finalized bridge deposits pre-fork are included through the old mechanism, we no longer need the elaborate machinery we had to get deposit contract data from the execution layer.
Since holesky has already forked to electra and completed the transition period, this PR basically checks to see if removing all the eth1 related logic leads to any surprises.
Partially https://github.com/sigp/lighthouse/issues/6291
This PR removes the reprocess event channel from being externally exposed. All work events are now sent through the single `BeaconProcessorSend` channel. I've introduced a new `Work::Reprocess` enum variant which we then use to schedule jobs for reprocess. I've also created a new scheduler module which will eventually house the different scheduler impls.
This is all needed as an initial step to generalize the beacon processor
A "full" implementation for the generalized beacon processor can be found here
https://github.com/sigp/lighthouse/pull/6448
I'm going to try to break up the full implementation into smaller PR's so it can actually be reviewed
Which issue # does this PR address?
Closes#7457
Added `E::slots_per_epoch()` and now it ensures conversion from epochs to slots while calculating deneb time
#6296: Deterministic RNG in peer DAS publish block tests
Made test functions to call publish-block APIs with true for the deterministic RNG boolean parameter while production code with false. This will deterministically shuffle columns for unit tests under broadcast_validation_tests.rs.
N/A
2 changes:
1. Replace Option::map_or(true, ...) with is_none_or(...)
2. Remove unnecessary `Into::into` blocks where the type conversion is apparent from the types
* First pass
* Add restrictions to RuntimeVariableList api
* Use empty_uninitialized and fix warnings
* Fix some todos
* Merge branch 'unstable' into max-blobs-preset
* Fix take impl on RuntimeFixedList
* cleanup
* Fix test compilations
* Fix some more tests
* Fix test from unstable
* Merge branch 'unstable' into max-blobs-preset
* Merge remote-tracking branch 'origin/unstable' into max-blobs-preset
* Remove footgun function
* Minor simplifications
* Move from preset to config
* Fix typo
* Revert "Remove footgun function"
This reverts commit de01f923c7.
* Try fixing tests
* Thread through ChainSpec
* Fix release tests
* Move RuntimeFixedVector into module and rename
* Add test
* Remove empty RuntimeVarList awefullness
* Fix tests
* Simplify BlobSidecarListFromRoot
* Merge remote-tracking branch 'origin/unstable' into max-blobs-preset
* Bump quota to account for new target (6)
* Remove clone
* Fix issue from review
* Try to remove ugliness
* Merge branch 'unstable' into max-blobs-preset
* Fix max value
* Fix doctest
* Fix formatting
* Fix max check
* Delete hardcoded max_blobs_per_block in RPC limits
* Merge remote-tracking branch 'origin/unstable' into max-blobs-preset
* add fork_name_enabled fn to Forkname impl
* refactor codebase to use new fork_enabled fn
* fmt
* Merge branch 'unstable' of https://github.com/sigp/lighthouse into fork-ord-impl
* small code cleanup
* resolve merge conflicts
* fix beacon chain test
* merge conflicts
* fix ef test issue
* resolve merge conflicts
* Add peerdas KZG library and use it for data column construction and cell kzg verification (#5701, #5941, #6118, #6179)
Co-authored-by: kevaundray <kevtheappdev@gmail.com>
* Update `rust_eth_kzg` crate to published version.
* Update kzg metrics buckets.
* Merge branch 'unstable' into peerdas-kzg
* Update KZG version to fix windows mem allocation.
* Refactor common logic from build sidecar and reconstruction. Remove unnecessary `needless_lifetimes`.
Co-authored-by: realbigsean <sean@sigmaprime.io>
* Copy existing trusted setup into `PeerDASTrustedSetup` for consistency and maintain `--trusted-setup` functionality.
* Merge branch 'unstable' into peerdas-kzg
* Merge branch 'peerdas-kzg' of github.com:jimmygchen/lighthouse into peerdas-kzg
* Merge branch 'unstable' into peerdas-kzg
* Merge branch 'unstable' into peerdas-kzg
* Load PeerDAS KZG only if PeerDAS is enabled.
* Add plumbing for peerdas supernodes (#5050, #5409, #5570, #5966)
- add cli option `--subscribe-to-all-data-columns`
- add custody subnet count to ENR, only if PeerDAS is scheduled
- subscribe to data column topics, only if PeerDAS is scheduled
Co-authored-by: Jacob Kaufmann <jacobkaufmann18@gmail.com>
* Merge branch 'unstable' into das-supernode
* Update CLI docs.
* Merge branch 'unstable' into das-supernode
* Fix fork epoch comparison with `FAR_FUTURE_EPOCH`.
* Merge branch 'unstable' into das-supernode
* Hide `--subscribe-all-data-column-subnets` flag and update help.
* Fix docs only
* Merge branch 'unstable' into das-supernode