lighthouse

mirror of https://github.com/sigp/lighthouse.git synced 2026-04-28 02:03:32 +00:00

Author	SHA1	Message	Date
Pawan Dhananjay	658163cfde	Testing	2026-01-19 15:21:04 -08:00
Mac L	3903e1c67f	More `consensus/types` re-export cleanup (#8665 ) Remove more of the temporary re-exports from `consensus/types` Co-Authored-By: Mac L <mjladson@pm.me>	2026-01-16 04:43:05 +00:00
Mac L	1abc41e337	Cleanup `consensus/types` re-exports (#8643 ) Removes some of the temporary re-exports in `consensus/types`. I am doing this in multiple parts to keep each diff small. Co-Authored-By: Mac L <mjladson@pm.me>	2026-01-15 02:23:55 +00:00
Mac L	605ef8e8e6	Remove `state` dependency from `core` module in `consensus/types` (#8653 ) #8652 - This removes instances of `BeaconStateError` from `eth_spec.rs`, and replaces them directly with `ArithError` which can be trivially converted back to `BeaconStateError` at the call site. - Also moves the state related methods on `ChainSpec` to be methods on `BeaconState` instead. I think this might be a more natural place for them to exist anyway. Co-Authored-By: Mac L <mjladson@pm.me>	2026-01-15 02:16:40 +00:00
Pawan Dhananjay	c91345782a	Get blobs v2 metrics (#8641 ) N/A Add standardized metrics for getBlobsV2 from https://github.com/ethereum/beacon-metrics/pull/14. Co-Authored-By: Pawan Dhananjay <pawandhananjay@gmail.com>	2026-01-13 07:50:40 +00:00
Pawan Dhananjay	57bbc93d75	Update buckets for metric (#8651 ) N/A The `beacon_data_column_sidecar_computation_seconds` used to record the full kzg proof generation times before we changed getBlobsV2 to just return the full proofs + cells. This metric should be taking way less time than 100ms which was the minimum bucket previously. Update the metric to use the default buckets for better granularity. Co-Authored-By: Pawan Dhananjay <pawandhananjay@gmail.com>	2026-01-13 05:58:34 +00:00
Jimmy Chen	dbe474e132	Delete attester cache (#8469 ) Fixes attester cache write lock contention. Alternative to #8463. Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>	2026-01-06 03:08:02 +00:00
ethDreamer	a39e991557	Gloas(EIP-7732): Containers / Constants (#7923 ) * #7850 This is the first round of the conga line! 🎉 Just spec constants and container changes so far. Co-Authored-By: shane-moore <skm1790@gmail.com> Co-Authored-By: Mark Mackey <mark@sigmaprime.io> Co-Authored-By: Shane K Moore <41407272+shane-moore@users.noreply.github.com> Co-Authored-By: Eitan Seri- Levi <eserilev@gmail.com> Co-Authored-By: ethDreamer <37123614+ethDreamer@users.noreply.github.com> Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com> Co-Authored-By: Jimmy Chen <jimmy@sigmaprime.io> Co-Authored-By: Michael Sproul <michael@sigmaprime.io>	2025-12-16 06:45:45 +00:00
chonghe	86c2b7cfbe	Append client version info to graffiti (#7558 ) * #7201 Co-Authored-By: Tan Chee Keong <tanck@sigmaprime.io> Co-Authored-By: chonghe <44791194+chong-he@users.noreply.github.com> Co-Authored-By: Jimmy Chen <jimmy@sigmaprime.io> Co-Authored-By: Tan Chee Keong <tanck2005@gmail.com>	2025-12-16 03:19:28 +00:00
Andrés David Ramírez Chiquillo	49e1112da2	Add regression test for unaligned checkpoint sync with payload pruning (#8458 ) Closes #8426 Added a new regression test: `reproduction_unaligned_checkpoint_sync_pruned_payload`. This test reproduces the bug where unaligned checkpoint syncs (skipped slots at epoch boundaries) fail to import the anchor block's execution payload when `prune_payloads` is enabled. The test simulates the failure mode by: - Skipping if execution payloads are not applicable. - Creating a harness with an unaligned checkpoint (gap of 3 slots). - Configuring the client with prune_payloads = true. It asserts that the Beacon Chain builds successfully (previously it panicked with `MissingFullBlockExecutionPayloadPruned`), confirming the fix logic in `try_get_full_block`. Co-Authored-By: Andrurachi <andruvrch@gmail.com> Co-Authored-By: Michael Sproul <michaelsproul@users.noreply.github.com>	2025-12-15 02:33:29 +00:00
Eitan Seri-Levi	556e917092	Rust 1.92 lints (#8567 ) Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>	2025-12-12 08:45:38 +00:00
Mac L	f3fd1f210b	Remove `consensus/types` re-exports (#8540 ) There are certain crates which we re-export within `types` which creates a fragmented DevEx, where there are various ways to import the same crates. ```rust // consensus/types/src/lib.rs pub use bls::{ AggregatePublicKey, AggregateSignature, Error as BlsError, Keypair, PUBLIC_KEY_BYTES_LEN, PublicKey, PublicKeyBytes, SIGNATURE_BYTES_LEN, SecretKey, Signature, SignatureBytes, get_withdrawal_credentials, }; pub use context_deserialize::{ContextDeserialize, context_deserialize}; pub use fixed_bytes::FixedBytesExtended; pub use milhouse::{self, List, Vector}; pub use ssz_types::{BitList, BitVector, FixedVector, VariableList, typenum, typenum::Unsigned}; pub use superstruct::superstruct; ``` This PR removes these re-exports and makes it explicit that these types are imported from a non-`consensus/types` crate. Co-Authored-By: Mac L <mjladson@pm.me>	2025-12-09 07:13:41 +00:00
Mac L	7bfcc03520	Reduce `eth2` dependency space (#8524 ) Remove certain dependencies from `eth2`, and feature-gate others which are only used by certain endpoints. \| Removed \| Optional \| Dev only \| \| -------- \| -------- \| -------- \| \| `either` `enr` `libp2p-identity` `multiaddr` \| `protoarray` `eth2_keystore` `eip_3076` `zeroize` `reqwest-eventsource` `futures` `futures-util` \| `rand` `test_random_derive` \| This is done by adding an `events` feature which enables the events endpoint and its associated dependencies. The `lighthouse` feature also enables its associated dependencies making them optional. The networking-adjacent dependencies were removed by just having certain fields use a `String` instead of an explicit network type. This means the user should handle conversion at the call site instead. This is a bit spicy, but I believe `PeerId`, `Enr` and `Multiaddr` are easily converted to and from `String`s so I think it's fine and reduces our dependency space by a lot. The alternative is to feature gate these types behind a `network` feature instead. Co-Authored-By: Mac L <mjladson@pm.me>	2025-12-08 05:37:23 +00:00
Mac L	4e958a92d3	Refactor `consensus/types` (#7827 ) Organize and categorize `consensus/types` into modules based on their relation to key consensus structures/concepts. This is a precursor to a sensible public interface. While this refactor is very opinionated, I am open to suggestions on module names, or type groupings if my current ones are inappropriate. Co-Authored-By: Mac L <mjladson@pm.me>	2025-12-04 09:28:52 +00:00
0xMushow	4fbe517491	Fix data columns sorting when reconstructing blobs (#8510 ) Closes https://github.com/sigp/lighthouse/issues/8509 Co-Authored-By: Antoine James <antoine@ethereum.org>	2025-12-02 03:06:29 +00:00
Jimmy Chen	7cee5d6090	Optimise pubkey cache initialisation during beacon node startup (#8451 ) Instrument beacon node startup and parallelise pubkey cache initialisation. I instrumented beacon node startup and noticed that pubkey cache takes a long time to initialise, mostly due to decompressing all the validator pubkeys. This PR uses rayon to parallelize the decompression on initial checkpoint sync. The pubkeys are stored uncompressed, so the decopression time is not a problem on subsequent restarts. On restarts, we still deserialize pubkeys, but the timing is quite minimal on Sepolia so I didn't investigate further. `validator_pubkey_cache_new` timing on Sepolia: * before: 109.64ms * with parallelization: 21ms on Hoodi: * before: times out with Kurtosis after 120s * with parallelization: 12.77s to import keys UPDATE: downloading checkpoint state + genesis state takes about 2 minutes on my laptop, so it seems like the BN managed to start the http server just before timing out (after the optimisation). <img width="1380" height="625" alt="image" src="https://github.com/user-attachments/assets/4c548c14-57dd-4b47-af9a-115b15791940" /> Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>	2025-11-28 04:30:49 +00:00
Michael Sproul	e21a433748	Allow manual checkpoint sync without blobs (#8470 ) Since merging this PR, we don't need `--checkpoint-blobs`, even prior to Fulu: - https://github.com/sigp/lighthouse/pull/8417 This PR removes the mandatory check for blobs prior to Fulu, enabling simpler manual checkpoint sync. Co-Authored-By: Michael Sproul <michael@sigmaprime.io> Co-Authored-By: Jimmy Chen <jimmy@sigmaprime.io>	2025-11-26 23:00:21 +00:00
Michael Sproul	0d0232e8fc	Optimise out block header calculation (#8446 ) This is a `tracing`-driven optimisation. While investigating why Lighthouse is slow to send `newPayload`, I found a suspicious 13ms of computation on the hot path in `gossip_block_into_execution_pending_block_slashable`: <img width="1998" height="1022" alt="headercalc" src="https://github.com/user-attachments/assets/e4f88c1a-da23-47b4-b533-cf5479a1c55c" /> Looking at the current implementation we can see that the _only_ thing that happens prior to calling into `from_gossip_verified_block` is the calculation of a `header`. We first call `SignatureVerifiedBlock::from_gossip_verified_block_check_slashable`: `261322c3e3/beacon_node/beacon_chain/src/block_verification.rs (L1075-L1076)` Which is where the `header` is calculated prior to calling `from_gossip_verified_block`: `261322c3e3/beacon_node/beacon_chain/src/block_verification.rs (L1224-L1226)` Notice that the `header` is _only_ used in the case of an error, yet we spend time computing it every time! This PR moves the calculation of the header (which involves hashing the whole beacon block, including the execution payload), into the error case. We take a cheap clone of the `Arc`'d beacon block on the hot path, and use this for calculating the header _only_ in the case an error actually occurs. This shaves 10-20ms off our pre-newPayload delays, and 10-20ms off every block processing 🎉 Co-Authored-By: Michael Sproul <michael@sigmaprime.io>	2025-11-24 05:25:46 +00:00
Michael Sproul	261322c3e3	Merge remote-tracking branch 'origin/stable' into unstable	2025-11-20 13:04:32 +11:00
Lion - dapplion	74b8c02630	Reimport the checkpoint sync block (#8417 ) We want to not require checkpoint sync starts to include the required custody data columns, and instead fetch them from p2p. Closes https://github.com/sigp/lighthouse/issues/6837 The checkpoint sync slot can: 1. Be the first slot in the epoch, such that the epoch of the block == the start checkpoint epoch 2. Be in an epoch prior to the start checkpoint epoch In both cases backfill sync already fetches that epoch worth of blocks with current code. This PR modifies the backfill import filter function to allow to re-importing the oldest block slot in the DB. I feel this solution is sufficient unless I'm missing something. ~~I have not tested this yet!~~ Michael has tested this and it works. Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com> Co-Authored-By: Michael Sproul <michael@sigmaprime.io>	2025-11-19 11:00:38 +00:00
Jimmy Chen	af1d9b9991	Fix custody context initialization race condition that caused panic (#8391 ) Take 2 of #8390. Fixes the race condition properly instead of propagating the error. I think this is a better alternative, and doesn't seem to look that bad. * Lift node id loading or generation from `NetworkService ` startup to the `ClientBuilder`, so that it can be used to compute custody columns for the beacon chain without waiting for Network bootstrap. I've considered and implemented a few alternatives: 1. passing `node_id` to beacon chain builder and compute columns when creating `CustodyContext`. This approach isn't good for separation of concerns and isn't great for testability 2. passing `ordered_custody_groups` to beacon chain. `CustodyContext` only uses this to compute ordered custody columns, so we might as well lift this logic out, so we don't have to do error handling in `CustodyContext` construction. Less tests to update;. Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>	2025-11-17 05:23:12 +00:00
Michael Sproul	01a654bfa8	Fix tracing span for execution payload verif (#8419 ) Fix the span on execution payload verification (newPayload), by creating a new span rather than using the parent span. Using the parent span was incorrectly associating the time spent verifying the payload with `from_signature_verified_components`. Co-Authored-By: Michael Sproul <michael@sigmaprime.io>	2025-11-17 02:04:46 +00:00
Lion - dapplion	53e73fa376	Remove duplicate state in ProtoArray (#8324 ) Part of a fork-choice tech debt clean-up https://github.com/sigp/lighthouse/issues/8325 https://github.com/sigp/lighthouse/issues/7089 (non-finalized checkpoint sync) changes the meaning of the checkpoints inside fork-choice. It turns out that we persist the justified and finalized checkpoints twice in fork-choice 1. Inside the fork-choice store 2. Inside the proto-array There's no reason for 2. except for making the function signature of some methods smallers. It's not consistent with the rest of the crate, because in some functions we pass the external variable of time (current_slot) via args, but then read the finalized checkpoint from the internal state. Passing both variables as args makes fork-choice easier to reason about at the cost of a few extra lines. Remove the unnecessary state (`justified_checkpoint`, `finalized_checkpoint`) inside `ProtoArray`, to make it easier to reason about. Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com> Co-Authored-By: Michael Sproul <michaelsproul@users.noreply.github.com>	2025-11-12 03:42:17 +00:00
Michael Sproul	f854afa352	Prevent unnecessary state advances pre-Fulu (#8388 ) State advances were observed as especially slow on pre-Fulu networks (mainnet). The reason being: we were doing an extra epoch of state advance because of code that should only have been running after Fulu, when proposer shufflings are determined with lookahead. Only attempt to cache the _next epoch_ shuffling if the state's slot determines it (this will only be true post-Fulu). Reusing the logic for `proposer_shuffling_decision_slot` avoids having to repeat the fiddly logic about the Fulu fork epoch itself. Co-Authored-By: Michael Sproul <michael@sigmaprime.io>	2025-11-12 01:46:05 +00:00
Javier Chávarri	2c1f1c1605	Migrate derivative to educe (#8125 ) Fixes #7001. Mostly mechanical replacement of `derivative` attributes with `educe` ones. ### Attribute Syntax Changes ```rust // Bounds: = "..." → (...) #[derivative(Hash(bound = "E: EthSpec"))] #[educe(Hash(bound(E: EthSpec)))] // Ignore: = "ignore" → (ignore) #[derivative(PartialEq = "ignore")] #[educe(PartialEq(ignore))] // Default values: value = "..." → expression = ... #[derivative(Default(value = "ForkName::Base"))] #[educe(Default(expression = ForkName::Base))] // Methods: format_with/compare_with = "..." → method(...) #[derivative(Debug(format_with = "fmt_peer_set_as_len"))] #[educe(Debug(method(fmt_peer_set_as_len)))] // Empty bounds: removed entirely, educe can infer appropriate bounds #[derivative(Default(bound = ""))] #[educe(Default)] // Transparent debug: manual implementation (educe doesn't support it) #[derivative(Debug = "transparent")] // Replaced with manual Debug impl that delegates to inner field ``` Note: Some bounds use strings (`bound("E: EthSpec")`) for superstruct compatibility (`expected ','` errors). Co-Authored-By: Javier Chávarri <javier.chavarri@gmail.com> Co-Authored-By: Mac L <mjladson@pm.me>	2025-11-06 14:13:57 +00:00
hopinheimer	8f7dcf02ba	Fix unaggregated delay metric (#8366 ) while working on this #7892 @michaelsproul pointed it might be a good metric to measure the delay from start of the slot instead of the current `slot_duration / 3`, since the attestations duties start before the `1/3rd` mark now with the change in the link PR. Co-Authored-By: hopinheimer <knmanas6@gmail.com> Co-Authored-By: hopinheimer <48147533+hopinheimer@users.noreply.github.com>	2025-11-05 06:19:35 +00:00
Michael Sproul	a7e89a8761	Optimise `state_root_at_slot` for finalized slot (#8353 ) This is an optimisation targeted at Fulu networks in non-finality. While debugging on Holesky, we found that `state_root_at_slot` was being called from `prepare_beacon_proposer` a lot, for the finalized state: `2c9b670f5d/beacon_node/http_api/src/lib.rs (L3860-L3861)` This was causing `prepare_beacon_proposer` calls to take upwards of 5 seconds, sometimes 10 seconds, because it would trigger _multiple_ beacon state loads in order to iterate back to the finalized slot. Ideally, loading the finalized state should be quick because we keep it cached in the state cache (technically we keep the split state, but they usually coincide). Instead we are computing the finalized state root separately (slow), and then loading the state from the cache (fast). Although it would be possible to make the API faster by removing the `state_root_at_slot` call, I believe it's simpler to change `state_root_at_slot` itself and remove the footgun. Devs rightly expect operations involving the finalized state to be fast. Co-Authored-By: Michael Sproul <michael@sigmaprime.io>	2025-11-05 02:08:46 +00:00
Michael Sproul	0507eca7b4	Merge remote-tracking branch 'origin/stable' into unstable-merge-v8	2025-11-04 16:08:34 +11:00
Jimmy Chen	bc86dc09e5	Reduce number of blobs used in tests to speed up CI (#8194 ) `beacon-chain-tests` is now regularly taking 1h+ on CI since Fulu fork was added. This PR attemtpts to reduce the test time by bringing down the number of blobs generated in tests - instead of generating 0..max_blobs, the generator now generates 0..1 blobs by default, and this can be modified by setting `harness.execution_block_generator.set_min_blob_count(n)`. Note: The blobs are pre-generated and doesn't require too much CPU to generate however processing a larger number of them on the beacon chain does take a lot of time. This PR also include a few other small improvements - Our slowest test (`chain_segment_varying_chunk_size`) runs 3x faster in Fulu just by reusing chain segments - Avoid re-running fork specific tests on all forks - Fix a bunch of tests that depends on the harness's existing random blob generation, which is fragile beacon chain test time on test machine is ~2x faster: ### `unstable` ``` Summary [ 751.586s] 291 tests run: 291 passed (13 slow), 0 skipped ``` ### this branch ``` Summary [ 373.792s] 291 tests run: 291 passed (2 slow), 0 skipped ``` The next set of tests to optimise is the ones that use [`get_chain_segment`](`77a9af96de/beacon_node/beacon_chain/tests/block_verification.rs (L45)`), as it by default build 320 blocks with supernode - an easy optimisation would be to build these blocks with cgc = 8 for tests that only require fullnodes. Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com> Co-Authored-By: Jimmy Chen <jimmy@sigmaprime.io>	2025-11-04 02:40:44 +00:00
Eitan Seri-Levi	5d0f8a083a	Ensure custody backfill sync couples all responses before importing (#8339 ) Custody backfill sync has a bug when we request columns from more than one peer per batch. The fix here ensures we wait for all requests to be completed before performing verification and importing the responses. I've also added an endpoint `lighthouse/custody/backfill` that resets a nodes earliest available data column to the current epoch so that custody backfill can be triggered. This endpoint is needed to rescue any nodes that may have missing columns due to the custody backfill sync bug without requiring a full re-sync. Co-Authored-By: Eitan Seri- Levi <eserilev@gmail.com> Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu> Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com> Co-Authored-By: Michael Sproul <michaelsproul@users.noreply.github.com>	2025-11-03 08:06:06 +00:00
Michael Sproul	4908687e7d	Proposer duties backwards compat (#8335 ) The beacon API spec wasn't updated to use the Fulu definition of `dependent_root` for the proposer duties endpoint. No other client updated their logic, so to retain backwards compatibility the decision has been made to continue using the block root at the end of epoch `N - 1`, and introduce a new v2 endpoint down the track to use the correct dependent root. Eth R&D discussion: https://discord.com/channels/595666850260713488/598292067260825641/1433036715848765562 Change the behaviour of the v1 endpoint back to using the last slot of `N - 1` rather than the last slot of `N - 2`. This introduces the possibility of dependent root false positives (the root can change without changing the shuffling), but causes the least compatibility issues with other clients. Co-Authored-By: Michael Sproul <michael@sigmaprime.io>	2025-11-03 08:06:03 +00:00
Eitan Seri-Levi	25832e5862	Add mainnet configs (#8344 ) #8135 mainnet config PR: https://github.com/eth-clients/mainnet/pull/11 Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu> Co-Authored-By: Michael Sproul <michael@sigmaprime.io> Co-Authored-By: Tan Chee Keong <tanck@sigmaprime.io>	2025-11-03 06:53:13 +00:00
Eitan Seri-Levi	b57d046c4a	Fix CGC backfill race condition (#8267 ) During custody backfill sync there could be an edge case where we update CGC at the same time where we are importing a batch of columns which may cause us to incorrectly overwrite values when calling `backfill_validator_custody_requirements`. To prevent this race condition, the expected cgc is now passed into this function and is used to check if the expected cgc == the current validator cgc. If the values arent equal, this probably indicates that a very recent CGC occurred so we do not prune/update values in the `epoch_validator_custody_requirements` map. Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu>	2025-11-03 00:51:42 +00:00
Michael Sproul	c46cb0b5b0	Merge remote-tracking branch 'origin/release-v8.0' into unstable	2025-11-03 09:28:48 +11:00
Eitan Seri-Levi	55588f7789	Rust 1.91 lints (#8340 ) Co-Authored-By: Eitan Seri- Levi <eserilev@gmail.com>	2025-10-31 08:08:37 +00:00
Mac L	f5809aff87	Bump `ssz_types` to `v0.12.2` (#8032 ) https://github.com/sigp/lighthouse/issues/8012 Replace all instances of `VariableList::from` and `FixedVector::from` to their `try_from` variants. While I tried to use proper error handling in most cases, there were certain situations where adding an `expect` for situations where `try_from` can trivially never fail avoided adding a lot of extra complexity. Co-Authored-By: Mac L <mjladson@pm.me> Co-Authored-By: Michael Sproul <michaelsproul@users.noreply.github.com> Co-Authored-By: Michael Sproul <michael@sigmaprime.io>	2025-10-28 04:01:09 +00:00
kevaundray	613ce3c011	chore!: remove pub visibility on `OVERFLOW_LRU_CAPACITY` and `STATE_LRU_CAPACITY_NON_ZERO` (#8234 ) - Renames `OVERFLOW_LRU_CAPACITY` to `OVERFLOW_LRU_CAPACITY_NON_ZERO` to follow naming convention of `STATE_LRU_CAPACITY_NON_ZERO` - Makes `OVERFLOW_LRU_CAPACITY_NON_ZERO` and `STATE_LRU_CAPACITY_NON_ZERO` private since they are only used in this module - Moves `STATE_LRU_CAPACITY` into test module since it is only used for tests Co-Authored-By: Kevaundray Wedderburn <kevtheappdev@gmail.com>	2025-10-27 11:23:45 +00:00
Pawan Dhananjay	c668cb7d9a	Only publish reconstructed columns that we need to sample (#8269 ) N/A We were publishing columns all columns that we didn't already have in the da cache when reconstructing. This is unnecessary outbound bandwidth for the node that is supposed to sample fewer columns. This PR changes the behaviour to publish only columns that we are supposed to sample in the topics that we are subscribed to. Co-Authored-By: Pawan Dhananjay <pawandhananjay@gmail.com>	2025-10-23 05:05:08 +00:00
Jimmy Chen	d8c6c57029	Trigger backfill on startup if user switches to a supernode or semi-supernode (#8265 ) This PR adds backfill functionality to nodes switching to become a supernode or semi-supernode. Please note that we currently only support a CGC increase, i.e. if the node's already custodying 67 columns, switching to semi-supernode (64) will have no effect. From @eserilev > if a node's cgc increases on start up, we just need two things for custody backfill to do its thing > > - data column custody info needs to be updated to reflect the cgc change > - `CustodyContext::validator_registrations::epoch_validator_custody_requirements` needs to be updated to reflect the cgc change - [x] Add tests - [x] Test on devnet-3 - [x] switch to supernode - [x] switch to semisupernode - [x] Test on live testnets - [x] Update docs (functions) Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>	2025-10-23 02:56:09 +00:00
Jimmy Chen	43c5e924d7	Add `--semi-supernode` support (#8254 ) Addresses #8218 A simplified version of #8241 for the initial release. I've tried to minimise the logic change in this PR, although introducing the `NodeCustodyType` enum still result in quite a bit a of diff, but the actual logic change in `CustodyContext` is quite small. The main changes are in the `CustdoyContext` struct * ~~combining `validator_custody_count` and `current_is_supernode` fields into a single `custody_group_count_at_head` field. We persist the cgc of the initial cli values into the `custody_group_count_at_head` field and only allow for increase (same behaviour as before).~~ * I noticed the above approach caused a backward compatibility issue, I've [made a fix](`15569bc085`) and changed the approach slightly (which was actually what I had originally in mind): * when initialising, only override the `validator_custody_count` value if either flag `--supernode` or `--semi-supernode` is used; otherwise leave it as the existing default `0`. Most other logic remains unchanged. All existing validator custody unit tests are still all passing, and I've added additional tests to cover semi-supernode, and restoring `CustodyContext` from disk. Note: I've added a `WARN` if the user attempts to switch to a `--semi-supernode` or `--supernode` - this currently has no effect, but once @eserilev column backfill is merged, we should be able to support this quite easily. Things to test - [x] cgc in metadata / enr - [x] cgc in metrics - [x] subscribed subnets - [x] getBlobs endpoint Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>	2025-10-22 05:23:17 +00:00
Eitan Seri-Levi	33e21634cb	Custody backfill sync (#7907 ) #7603 #### Custody backfill sync service Similar in many ways to the current backfill service. There may be ways to unify the two services. The difficulty there is that the current backfill service tightly couples blocks and their associated blobs/data columns. Any attempts to unify the two services should be left to a separate PR in my opinion. #### `SyncNeworkContext` `SyncNetworkContext` manages custody sync data columns by range requests separetly from other sync RPC requests. I think this is a nice separation considering that custody backfill is its own service. #### Data column import logic The import logic verifies KZG committments and that the data columns block root matches the block root in the nodes store before importing columns #### New channel to send messages to `SyncManager` Now external services can communicate with the `SyncManager`. In this PR this channel is used to trigger a custody sync. Alternatively we may be able to use the existing `mpsc` channel that the `SyncNetworkContext` uses to communicate with the `SyncManager`. I will spend some time reviewing this. Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu> Co-Authored-By: Eitan Seri- Levi <eserilev@gmail.com> Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>	2025-10-22 03:51:34 +00:00
Eitan Seri-Levi	46dde9afee	Fix data column rpc request (#8247 ) Fixes an issue mentioned in this comment regarding data column rpc requests: https://github.com/sigp/lighthouse/issues/6572#issuecomment-3400076236 Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu> Co-Authored-By: Michael Sproul <micsproul@gmail.com>	2025-10-21 23:54:35 +00:00
Michael Sproul	21bab0899a	Improve block header signature handling (#8253 ) Closes: - https://github.com/sigp/lighthouse/issues/7650 Reject blob and data column sidecars from RPC with invalid signatures. Co-Authored-By: Michael Sproul <michael@sigmaprime.io>	2025-10-21 13:58:12 +00:00
Michael Sproul	2f8587301d	More proposer shuffling cleanup (#8130 ) Addressing more review comments from: - https://github.com/sigp/lighthouse/pull/8101 I've also tweaked a few more things that I think are minor bugs. - Instrument `ensure_state_can_determine_proposers_for_epoch` - Fix `block_root` usage in `compute_proposer_duties_from_head`. This was a regression introduced in 8101 😬 . - Update the `state_advance_timer` to prime the next-epoch proposer cache post-Fulu. Co-Authored-By: Michael Sproul <michael@sigmaprime.io>	2025-10-20 03:14:14 +00:00
Jimmy Chen	76a37a0aef	Revert incorrect fix made in #8179 (#8215 ) This PR reverts #8179. It turns out that the fix was invalid because an unknown root is always not a finalized descendant: `522bd9e9c6/consensus/proto_array/src/proto_array.rs (L976-L979)` so for any data columns with unknown parents, it will always penalise the gossip peer and disconnect it pretty quickly. On a small network, the node may lose all of its peers. The impact is pretty obvious when the peer count is small and sync speed is slow, and is therefore easily reproducible by running a fresh supernode on devnet-3. This isn't as obvious on a live testnet like holesky / sepolia, we haven't noticed this, probably due to its high peer count and sync speed - the nodes might be able to reach head quickly before losing too many peers. The previous behaviour isn't ideal but safe: triggering unknown parent lookup and penalise the bad peer if it happens to be malicious or faulty. So for now it's safer to revert the change and plan for a proper fix after the v8 release. Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>	2025-10-16 23:25:30 +00:00
SunnysidedJ	d1e06dc40d	#6853 Adding store tests for data column pruning (#7228 ) #6853 Update store tests to cover data column pruning Created a helper function `check_data_column_existence` which is a copy of `check_blob_existence` but checking data columns instead. The helper function is then used to check whether data columns are also pruned when blobs are pruned if PeerDAS is enabled. Co-Authored-By: SunnysidedJ <j@testinprod.io> Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu> Co-Authored-By: Michael Sproul <michael@sigmaprime.io>	2025-10-16 15:20:26 +00:00
Pawan Dhananjay	73e75e3e69	Ignore extra columns in da cache (#8201 ) N/A Found this issue in sepolia. Note: the custody requirement for this node is 100. ``` Oct 14 11:25:40.053 DEBUG Reconstructed columns count: 28, block_root: 0x4d7946dec0ab59f2afd46610d7c54af555cb4c2851d9eea7d83dd17cf6e96aae, slot: 8725628 Oct 14 11:25:45.568 WARN Internal availability check failure block_root: 0x4d7946dec0ab59f2afd46610d7c54af555cb4c2851d9eea7d83dd17cf6e96aae, error: Unexpected("too many columns got 128 expected 100") ``` So if any of the block components arrives late, then we reconstruct all 128 columns and try to add it to da cache and have more columns than needed for availability in the cache. There are 2 ways I can think of fixing this: 1. pass only the required columns to the da cache after reconstruction here `60df5f4ab6/beacon_node/beacon_chain/src/data_availability_checker.rs (L647-L648)` 2. Ensure that we add only columns that we need to sample in the da cache. I think this is safer since we can add columns to the cache from multiple code paths and this fixes it at the source. ~~This PR implements (2).~~ Thought more about it, I think (1) is cleaner since we filter gossip and rpc columns also before calling `put_kzg_verified_data_columns`/ Co-Authored-By: Pawan Dhananjay <pawandhananjay@gmail.com>	2025-10-16 09:25:44 +00:00
Jimmy Chen	5886a48d96	Add `max_blobs_per_block` check to data column gossip validation (#8198 ) Addresses this spec change https://github.com/ethereum/consensus-specs/pull/4650 Add `max_blobs_per_block` to gossip data column check so we reject large columns before processing. (we currently do this check during processing) Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>	2025-10-15 01:52:35 +00:00
Pawan Dhananjay	2c328e32a6	Persist only custody columns in db (#8188 ) * Only persist custody columns * Get claude to write tests * lint * Address review comments and fix tests. * Use supernode only when building chain segments * Clean up * Rewrite tests. * Fix tests * Clippy --------- Co-authored-by: Jimmy Chen <jchen.tc@gmail.com> Co-authored-by: Michael Sproul <michael@sigmaprime.io>	2025-10-13 20:32:13 +11:00
Jimmy Chen	538b70495c	Reject data columns that does not descend from finalize root instead of ignoring it (#8179 ) This issue was identified during the fusaka audit competition. The [`verify_parent_block_and_finalized_descendant`](`62d9302e0f/beacon_node/beacon_chain/src/data_column_verification.rs (L606-L627)`) in data column gossip verification currently load the parent first before checking if the column descends from the finalized root. However, the `fork_choice.get_block(&block_parent_root)` function also make the same check internally: `8a4f6cf0d5/consensus/fork_choice/src/fork_choice.rs (L1242-L1249)` Therefore, if the column does not descend from the finalized root, we return an `UnknownParent` error, before hitting the `is_finalized_checkpoint_or_descendant` check just below. Which means we `IGNORE` the gossip message instead `REJECT`, and the gossip peer is not _immediately_ penalised. This deviates from the spec. However, worth noting that lighthouse will currently attempt to request the parent from this peer, and if the peer is not able to serve the parent, it gets penalised with a `LowToleranceError`, and will get banned after ~5 occurences. `ffa7b2b2b9/beacon_node/network/src/sync/network_context.rs (L1530-L1532)` This PR will penalise the bad peer immediately instead of performing block lookups before penalising it. Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>	2025-10-09 07:32:43 +00:00

1 2 3 4 5 ...

1600 Commits