lighthouse

mirror of https://github.com/sigp/lighthouse.git synced 2026-03-02 16:21:42 +00:00

Author	SHA1	Message	Date
Pawan Dhananjay	80ba0b169b	Backfill peer attribution (#7762 ) Partly addresses https://github.com/sigp/lighthouse/issues/7744 Implement similar peer sync attribution like in #7733 for backfill sync.	2025-08-12 02:11:56 +00:00
Eitan Seri-Levi	122f16776f	Add metrics to track beacon processor queue times (#7808 ) This PR adds a created_timestamp to the beacon processor send channel. When work items are sent through that channel `try_send` will forward the work event along with the current timestamp to the beacon processor. When the work event is completed the `Drop` impl for `SendOnDrop` will track the time it took from work event creation to its completion. Previously we only had data on how long a work event took to process, but didn't have data on how long it sat in the queue + how long it took to process.	2025-08-12 01:06:42 +00:00
Pawan Dhananjay	4262ad3e01	Add a flag to disable getBlobs (#7853 ) N/A Add a flag to disable get blobs. I configured the flag to disable it regardless of version because its most likely something we use for testing anyway.	2025-08-11 23:17:00 +00:00
Jimmy Chen	40c2fd5ff4	Instrument tracing spans for block processing and import (#7816 ) #7815 - removes all existing spans, so some span fields that appear in logs like `service_name` may be lost. - instruments a few key code paths in the beacon node, starting from root spans named below: * Gossip block and blobs * `process_gossip_data_column_sidecar` * `process_gossip_blob` * `process_gossip_block` * Rpc block and blobs * `process_rpc_block` * `process_rpc_blobs` * `process_rpc_custody_columns` * Rpc blocks (range and backfill) * `process_chain_segment` * `PendingComponents` lifecycle * `pending_components` To test locally: * Run Grafana and Tempo with https://github.com/sigp/lighthouse-metrics/pull/57 * Run Lighthouse BN with `--telemetry-collector-url http://localhost:4317` Some captured traces can be found here: https://hackmd.io/@jimmygchen/r1sLOxPPeg Removing the old spans seem to have reduced the memory usage quite a lot - i think we were using them on long running tasks and too excessively: <img width="910" height="495" alt="image" src="https://github.com/user-attachments/assets/5208bbe4-53b2-4ead-bc71-0b782c788669" />	2025-08-08 05:32:22 +00:00
Jimmy Chen	6dfab22267	Fix Rust 1.89 compiler warnings in slasher tests. (#7844 ) As described in title, failing test here https://github.com/sigp/lighthouse/actions/runs/16818997885/job/47646515894	2025-08-08 04:41:08 +00:00
Daniel Ramirez-Chiquillo	cafb3644e2	Fix Makefile line continuation syntax in test-release target (#7834 ) #7833 Fix a typo on the `Makefile` that was causing `make test` to run `http_api` tests when they should have been ignored.	2025-08-07 08:32:52 +00:00
Jimmy Chen	3a02bdd94a	Adjust DA checker cache size (#7825 ) The current `OVERFLOW_LRU_CAPACITY` of `1024` seems a bit excessive now we rarely store more than 1 `PendingComponents` (under normal networking components). Additionally given the blob count increases, the max size of `PendingComponents` has also increased and is expected to increase further. This PR brings the max capacity of the cache down to `64`, which should be more than enough headroom but also give us better protection from the network.	2025-08-07 05:11:38 +00:00
Jimmy Chen	8bc6693dac	Fix wrong columns getting processed on a CGC change (#7792 ) This PR fixes a bug where wrong columns could get processed immediately after a CGC increase. Scenario: - The node's CGC increased due to additional validators attached to it (lets say from 10 to 11) - The new CGC is advertised and new subnets are subscribed immediately, however the change won't be effective in the data availability check until the next epoch (See [this](`ab0e8870b4/beacon_node/beacon_chain/src/validator_custody.rs (L93-L99)`)). Data availability checker still only require 10 columns for the current epoch. - During this time, data columns for the additional custody column (lets say column 11) may arrive via gossip as we're already subscribed to the topic, and it may be incorrectly used to satisfy the existing data availability requirement (10 columns), and result in this additional column (instead of a required one) getting persisted, resulting in database inconsistency.	2025-08-07 00:45:04 +00:00
Daniel Ramirez-Chiquillo	9c972201bc	Fix: RPC test failures (#7734 ) Fixes #7735 Use `tracing::subscriber::set_default` to ensure that each test/thread has its own subscirber.	2025-08-06 14:59:41 +00:00
Eric Tu	c06ac81c67	Shuffling for 32 bit platforms (#7725 ) - In shuffling, a the raw_pivot (u64) is cast to a usize which will break on 32 bit systems. Now it is modulo'ed with the list_size first then cast to a usize. - ruint doesn't implement shifting with u64's on 32-bit arch. Since `prefix_bits` is u8 and NODE_ID_BITS = 256, we use them as u32's instead. See: https://docs.rs/ruint/latest/src/ruint/bits.rs.html#711	2025-08-06 02:37:07 +00:00
Michael Sproul	0dcce40ccb	Fix Clippy for Rust 1.90 beta (#7826 ) Fix Clippy for recently released Rust 1.90 beta. There may be more changes required when Rust 1.89 stable is released in a few days, but possibly not 🤞	2025-08-05 13:52:26 +00:00
Jimmy Chen	adf6ad70f0	Update fetch blobs metrics buckets (#7823 ) While looking at metrics I noticed that `beacon_blobs_from_el_expected` and `beacon_blobs_from_el_received_total` have different buckets, this PR adds more buckets to both (to prepare for Fusaka) and make them both consistent.	2025-08-01 18:27:53 +00:00
Age Manning	2f59d5208a	Filter dependencies from SSE logging (#7819 )	2025-08-01 04:45:20 +00:00
Michael Sproul	134039d014	Simplify ConfigAndPreset (#7777 ) I noticed that we are serving preset values for Fulu on mainnet nodes prior to the fork. This has already gone live in v7.1.0, but should hopefully be handled in a graceful way by API consumers. This PR _reverts_ the serving of Fulu data prior to Fulu, by serving Fulu data only if Fulu is scheduled.	2025-07-25 08:53:24 +00:00
Pawan Dhananjay	09065a851f	Add builder blinded_blocks v2 (#7778 ) Partially addresses https://github.com/sigp/lighthouse/issues/7381 Add blinded_blocks v2 method specified in https://github.com/ethereum/builder-specs/pull/123/	2025-07-25 08:29:19 +00:00
Jimmy Chen	2aae08a8aa	Remove KZG verification on blobs fetched from the EL (#7771 ) Continuation of #7713, addresses comment about skipping KZG verification on EL fetched blobs: https://github.com/sigp/lighthouse/pull/7713#discussion_r2198542501	2025-07-25 06:49:50 +00:00
Eitan Seri-Levi	6a52454647	Update spec tests to 1.6.0-alpha.3 (#7786 ) #7782	2025-07-25 06:49:47 +00:00
Jimmy Chen	1a6eeb228c	Bump Rust version to 1.88 (#7787 ) In #7743, rust version was bumped: - msrv to 1.87 - `Dockerfile` to 1.88 We also need to bump the other docker images as well, and might as well keep them all consistent at 1.88.	2025-07-25 05:52:51 +00:00
Michael Sproul	b904956074	Skip serializing blob_schedule before Fulu (#7779 ) Alternative to: - https://github.com/sigp/lighthouse/pull/7758 Serve the `blob_schedule` field on `/eth/v1/config/spec` _only_ when Fulu is enabled. If the blob schedule is empty, we will still serve it as `[]`, so long as Fulu is enabled.	2025-07-24 18:14:25 +00:00
Eric Tu	9911f348bc	Feature gate arbitrary crate in the consensus types crate (#7743 ) Which issue # does this PR address? Puts the `arbitrary` crate behind a feature flag in the `types` crate.	2025-07-23 16:55:02 +00:00
Jimmy Chen	4daa015971	Remove peer sampling code (#7768 ) Peer sampling has been completely removed from the spec. This PR removes our partial implementation from the codebase. https://github.com/ethereum/consensus-specs/pull/4393	2025-07-23 03:24:45 +00:00
chonghe	c4b973f5ba	Use SSZ by default when calling /eth/v3/validator/blocks (#7727 ) * #7698	2025-07-23 00:29:21 +00:00
Michael Sproul	ce99e0c383	Refine delayed head block logging (#7705 ) Small tweak to `Delayed head block` logging to make it more representative of actual issues. Previously we used the total import delay to determine whether a block was late, but this includes the time taken for IO (and now hdiff computation) which happens _after_ the block is made attestable. This PR changes the logic to use the attestable delay (where possible) falling back to the previous value if the block doesn't have one; e.g. if it didn't meet the conditions to make it into the attestable cache.	2025-07-23 00:29:18 +00:00
Mac L	e6089fe7db	Control span data through tracing Extensions (#7239 ) #7234 Removes the `Arc<Mutex<_>` which was used to store and manage span data and replaces it with the inbuilt `Extension` for managing span-specific data. This also avoids an `unwrap` which was used when acquiring the lock over the mutex'd span data.	2025-07-22 14:22:03 +00:00
Eitan Seri-Levi	db8b6be9df	Data column custody info (#7648 ) #7647 Introduces a new record in the blobs db `DataColumnCustodyInfo` When `DataColumnCustodyInfo` exists in the db this indicates that a recent cgc change has occurred and/or that a custody backfill sync is currently in progress (custody backfill will be added as a separate PR). When a cgc change has occurred `earliest_available_slot` will be equal to the slot at which the cgc change occured. During custody backfill sync`earliest_available_slot` should be updated incrementally as it progresses. ~~Note that if `advertise_false_custody_group_count` is enabled we do not add a `DataColumnCustodyInfo` record in the db as that would affect the status v2 response.~~ (See comment https://github.com/sigp/lighthouse/pull/7648#discussion_r2212403389) ~~If `DataColumnCustodyInfo` doesn't exist in the db this indicates that we have fulfilled our custody requirements up to the DA window.~~ (It now always exist, and the slot will be set to `None` once backfill is complete) StatusV2 now uses `DataColumnCustodyInfo` to calculate the `earliest_available_slot` if a `DataColumnCustodyInfo` record exists in the db, if it's `None`, then we return the `oldest_block_slot`.	2025-07-22 13:30:30 +00:00
Jimmy Chen	b48879a566	Remove KZG verification from local block production and blobs fetched from the EL (#7713 ) #7700 As described in title, the EL already performs KZG verification on all blobs when they entered the mempool, so it's redundant to perform extra validation on blobs returned from the EL. This PR removes - KZG verification for both blobs and data columns during block production - KZG verification for data columns after fetch engine blobs call. I have not done this for blobs because it requires extra changes to check the observed cache, and doesn't feel like it's a worthy optimisation given the number of blobs per block. This PR does not remove KZG verification on the block publishing path yet.	2025-07-22 10:48:49 +00:00
Michael Sproul	4a3e248b7e	Add heaptrack support (#7764 ) Although we're working on jemalloc profiler support in https://github.com/sigp/lighthouse/pull/7746, heaptrack seems to be producing more sensible results. This PR adds a heaptrack profile and a heaptrack feature so that we no longer need to patch the code in order to use heaptrack. This may prove complementary to jemalloc profiling, so I think there is no harm in having both.	2025-07-21 02:11:27 +00:00
Pawan Dhananjay	1046dfbfe7	Serialize bpo schedule in asending order (#7753 ) N/A Serializes the blob_schedule in ascending order to match other clients. This is needed to keep the output of `eth/v1/config/spec` http endpoint consistent across clients. cc @barnabasbusa	2025-07-18 05:36:18 +00:00
Pawan Dhananjay	3f06e5dfba	Fix enr loading from disk with cgc (#7754 ) N/A During building an enr on startup, we weren't using the value in the custody context. This was resulting in the enr value getting updated when the cgc updates, the change getting persisted, but getting set back to the default on restart. This PR takes the value explicitly from the custody context.	2025-07-18 04:51:11 +00:00
Eitan Seri-Levi	d6de8a7484	Add additional broadcast validation tests for Fulu/PeerDAS (#7325 ) Closes #6855 Add PeerDAS broadcast validation tests and fix a small bug where `sampling_columns_indices` is none (indicating that we've already sampled the necessary columns) and `process_gossip_data_columns` gets called	2025-07-17 07:50:28 +00:00
Pawan Dhananjay	309c301363	Allow /validator apis to work pre-genesis (#7729 ) N/A Lighthouse BN http endpoint would return a server error pre-genesis on the `validator/duties/attester` and `validator/prepare_beacon_proposer` because `slot_clock.now()` would return a `None` pre-genesis. The prysm VC depends on the endpoints pre-genesis and was having issues interoping with the lighthouse bn because of this reason. The proposer duties endpoint explicitly handles the pre-genesis case here `538067f1ff/beacon_node/http_api/src/proposer_duties.rs (L23-L28)` I see no reason why we can't make the other endpoints more flexible to work pre-genesis. This PR handles the pre-genesis case on the attester and prepare_beacon_proposer endpoints as well. Thanks for raising @james-prysm.	2025-07-14 06:42:55 +00:00
chonghe	6409a32274	Add a guide to partially reconstruct historic states to Lighthouse book (#7679 ) The main change is adding a guide to partially reconstruct historic states to the FAQ. Other changes: - Update the database scheme info - Delete the Homebrew issue as it has been solved in https://github.com/Homebrew/homebrew-core/pull/225877 - Update default gas limit in: [`7cbf7f1`](`7cbf7f1516`) - Updated the binary installation page [`8076ca7`](`8076ca7905`) as Lighthouse now supports aarch-apple binary built since v7.1.0	2025-07-14 03:24:49 +00:00
Pawan Dhananjay	90ff64381e	Sync peer attribution (#7733 ) Which issue # does this PR address? Closes #7604 Improvements to range sync including: 1. Contain column requests only to peers that are part of the SyncingChain 2. Attribute the fault to the correct peer and downscore them if they don't return the data columns for the request 3. Improve sync performance by retrying only the failed columns from other peers instead of failing the entire batch 4. Uses the earliest_available_slot to make requests to peers that claim to have the epoch. Note: if no earliest_available_slot info is available, fallback to using previous logic i.e. assume peer has everything backfilled upto WS checkpoint/da boundary Tested this on fusaka-devnet-2 with a full node and supernode and the recovering logic seems to works well. Also tested this a little on mainnet. Need to do more testing and possibly add some unit tests.	2025-07-12 00:02:30 +00:00
ethDreamer	b43e0b446c	Final changes for `fusaka-devnet-2` (#7655 ) Closes #7467. This PR primarily addresses [the P2P changes](https://github.com/ethereum/EIPs/pull/9840) in [fusaka-devnet-2](https://fusaka-devnet-2.ethpandaops.io/). Specifically: * [the new `nfd` parameter added to the `ENR`](https://github.com/ethereum/EIPs/pull/9840) * [the modified `compute_fork_digest()` changes for every BPO fork](https://github.com/ethereum/EIPs/pull/9840) 90% of this PR was absolutely hacked together as fast as possible during the Berlinterop as fast as I could while running between Glamsterdam debates. Luckily, it seems to work. But I was unable to be as careful in avoiding bugs as I usually am. I've cleaned up the things I remember wanting to come back and have a closer look at. But still working on this. Progress: * [x] get it working on `fusaka-devnet-2` * [ ] [optional disconnect from peers with incorrect `nfd` at the fork boundary](https://github.com/ethereum/consensus-specs/pull/4407) - Can be addressed in a future PR if necessary * [x] first pass clean-up * [x] fix up all the broken tests * [x] final self-review * [x] more thorough review from people more familiar with affected code	2025-07-10 21:32:58 +00:00
Jimmy Chen	3826fe91f4	Improve data column KZG verification metric buckets (#7717 ) The current data column KZG verification buckets are not giving us useful info as the upper bound is too low. And we see most of numbers above 70ms for batch verification, and we don't know how much time it really takes. This PR improves the buckets based on the numbers we got from testing. Exponential bucket seems like a good candidate here given we're expecting to increase blob count with a similar approach (possibly 2x each fork if it goes well).	2025-07-10 20:59:45 +00:00
Michael Sproul	538067f1ff	Merge remote-tracking branch 'origin/stable' into unstable	2025-07-10 15:53:45 +10:00
Michael Sproul	cfb1f73310	Release v7.1.0 (#7609 ) Post-Pectra release for tree-states hot 🎉 Already merged to `release-v7.1.0`: - https://github.com/sigp/lighthouse/pull/7444 - https://github.com/sigp/lighthouse/pull/6750 - https://github.com/sigp/lighthouse/pull/7437 - https://github.com/sigp/lighthouse/pull/7133 - https://github.com/sigp/lighthouse/pull/7620 - https://github.com/sigp/lighthouse/pull/7663 v7.1.0	2025-07-10 01:44:46 +00:00
João Oliveira	8b5ccacac9	Error from RPC `send_response` when request doesn't exist on the active inbound requests (#7663 ) Lighthouse is currently loggign a lot errors in the `RPC` behaviour whenever a response is received for a request_id that no longer exists in active_inbound_requests. This is likely due to a data race or timing issue (e.g., the peer disconnecting before the response is handled). This PR addresses that by removing the error logging from the RPC layer. Instead, RPC::send_response now simply returns an Err, shifting the responsibility to the main service. The main service can then determine whether the peer is still connected and only log an error if the peer remains connected. Thanks @ackintosh for helping debug!	2025-07-09 14:26:51 +00:00
Michael Sproul	8e55684b06	Reintroduce `--logfile` with deprecation warning (#7723 ) Reintroduce the `--logfile` flag with a deprecation warning so that it doesn't prevent nodes from starting. This is considered preferable to breaking node startups so that users fix the flag, even though it means the `--logfile` flag is completely ineffective. The flag was initially removed in: - https://github.com/sigp/lighthouse/pull/6339	2025-07-09 08:08:27 +00:00
cakevm	734ad90dd8	Upgrade to c-kzg 2.1.0 and alloy-primitives 1.0 (#7271 ) Update `c-kzg` from `v1` to `v2`. My motivation here is that `alloy-consensus` now uses `c-kzg` in `v2` and this results in a conflict when using lighthouse in combination with latest alloy. I tried also to disable the `czkg` feature in alloy, but the conflict persisted. See here for the alloy update to `c-kzg v2`: https://github.com/alloy-rs/alloy/pull/2240 Error: ``` error: failed to select a version for `c-kzg`. ... versions that meet the requirements `^1` are: 1.0.3, 1.0.2, 1.0.0 the package `c-kzg` links to the native library `ckzg`, but it conflicts with a previous package which links to `ckzg` as well: package `c-kzg v2.1.0` ... which satisfies dependency `c-kzg = "^2.1"` of package `alloy-consensus v0.13.0` ... which satisfies dependency `alloy-consensus = "^0.13.0"` of package ... ... ``` - Upgrade `alloy-consensus` to `0.14.0` and disable all default features - Upgrade `c-kzg` to `v2.1.0` - Upgrade `alloy-primitives` to `1.0.0` - Adapt the code to the new API `c-kzg` - There is now `NO_PRECOMPUTE` as my understand from https://github.com/ethereum/c-kzg-4844/pull/545/files we should use `0` here as `new_from_trusted_setup_no_precomp` does not precomp. But maybe it is misleading. For all other places I used `RECOMMENDED_PRECOMP_WIDTH` because `8` is matching the recommendation. - `BYTES_PER_G1_POINT` and `BYTES_PER_G2_POINT` are no longer public in `c-kzg` - I adapted two tests that checking for the `Attestation` bitfield size. But I could not pinpoint to what has changed and why now 8 bytes less. I would be happy about any hint, and if this is correct. I found related a PR here: https://github.com/sigp/lighthouse/pull/6915 - Use same fields names, in json, as well as `c-kzg` and `rust_eth_kzg` for `g1_monomial`, `g1_lagrange`, and `g2_monomial`	2025-07-09 05:02:41 +00:00
Michael Sproul	7b2f138ca7	Merge remote-tracking branch 'origin/stable' into release-v7.1.0	2025-07-09 11:19:16 +10:00
Michael Sproul	b9c1a2b0c0	Fix description of DB read bytes metric (#7716 ) Fix a trivial typo that mixed up reads and writes.	2025-07-08 08:50:15 +00:00
Eitan Seri-Levi	bd8a2a8ffb	Gossip recently computed light client data (#7023 )	2025-07-08 07:07:10 +00:00
Jimmy Chen	56485cc986	Remove unneeded spans that caused debug logs to appear when level is set to `info` (#7707 ) Fixes #7155. It turns out the issue is caused by calling a function that creates an info span (`chain.id()` here), e.g. ```rust debug!(id = chain.id(), ?sync_type, reason = ?remove_reason, op, "Chain removed"); ``` I've remove all unneeded spans, especially getter functions - there's little reasons for span and they often get used in logging. We should also revisit all the spans after the release - i think we could make them more useful than they are today. I've let it run for a while and no longer seeing any `DEBUG` logs.	2025-07-08 00:37:54 +00:00
Daniel Knopik	3e6b0bd0a3	Make `notifier_service::notify` pub (#7708 ) Anchor wants the `notify` function to run only in certain cases - so the `spawn_notifier` function is unsuitable for us. Anchor uses it's own `notify` function, which then calls `notifier_service::notify` (in most circumstances). To enable that, `notify` needs to be `pub`.	2025-07-07 10:46:18 +00:00
Michael Sproul	01ec2ec7ad	Update LH book for v7.1.0 (#7706 ) Update the book for upcoming v7.1.0 release. This is targeted at `unstable` rather than `release-v7.1.0` because the book is built from `unstable`.	2025-07-07 04:42:34 +00:00
Pawan Dhananjay	0f895f3066	Bump default gas limit (#7695 ) N/A Bump the default gas limit to 45 million based on recommendation from EL teams https://x.com/vdWijden/status/1939234101631856969 and pandas https://ethpandaops.io/posts/gaslimit-scaling/	2025-07-04 22:54:30 +00:00
Michael Sproul	c7bb3b00e4	Fix lookups of the block at `oldest_block_slot` (#7693 ) Closes: - https://github.com/sigp/lighthouse/issues/7690 Another checkpoint sync related fix! See issue for a description of the bug. We fix it by just loading the block root of the `oldest_block_slot`, rather than trying to load the slot prior, which will always fail.	2025-07-02 23:40:04 +00:00
Jimmy Chen	b35854b71f	Record v2 beacon blocks http api metrics separately (#7692 ) This PR adds v2 beacon block paths to the function that records http api usage, so they don't just get recorded as "/v2/beacon" like below: <img width="934" alt="image" src="https://github.com/user-attachments/assets/8b669f0a-2821-46ee-a30a-0e344d3e63c1" />	2025-07-02 08:47:35 +00:00
Michael Sproul	a459a9af98	Fix and test checkpoint sync from genesis (#7689 ) Fix a bug involving checkpoint sync from genesis reported by Sunnyside labs. Ensure that the store's `anchor` is initialised prior to storing the genesis state. In the case of checkpoint sync from genesis, the genesis state will be in the _hot DB_, so we need the hot DB metadata to be initialised in order to store it. I've extended the existing checkpoint sync tests to cover this case as well. There are some subtleties around what the `state_upper_limit` should be set to in this case. I've opted to just enable state reconstruction from the start in the test so it gets set to 0, which results in an end state more consistent with the other test cases (full state reconstruction). This is required because we can't meaningfully do any state reconstruction when the split slot is 0 (there is no range of frozen slots to reconstruct).	2025-07-02 04:50:33 +00:00

... 5 6 7 8 9 ...

7270 Commits