lighthouse

mirror of https://github.com/sigp/lighthouse.git synced 2026-04-17 04:48:21 +00:00

Author	SHA1	Message	Date
ethDreamer	7684d1f866	ContextDeserialize and Beacon API Improvements (#7372 ) * #7286 * BeaconAPI is not returning a versioned response when it should for some V1 endpoints * these [strange functions with vX in the name that still accept `endpoint_version` arguments](https://github.com/sigp/lighthouse/blob/stable/beacon_node/http_api/src/produce_block.rs#L192) This refactor is a prerequisite to get the fulu EF tests running.	2025-05-19 05:05:16 +00:00
Pawan Dhananjay	23ad833747	Change default EngineState to online (#7417 ) Resolves https://github.com/sigp/lighthouse/issues/7414 The health endpoint returns a 503 if the engine state is offline. The default state for the engine is `Offline`. So until the first request to the EL is made and the state is updated, the health endpoint will keep returning 503s. This PR changes the default state to Online to avoid that. I don't think this causes any issues because in case the EL is actually offline, the first fcu will set the state to offline. Pending testing on kurtosis.	2025-05-16 19:04:30 +00:00
Eitan Seri-Levi	268809a530	Rust clippy 1.87 lint fixes (#7471 ) Fix clippy lints for `rustc` 1.87 clippy complains about `BeaconChainError` being too large. I went on a bit of a boxing spree because of this. We may instead want to `Box` some of the `BeaconChainError` variants?	2025-05-16 05:03:00 +00:00
Odinson	1853d836b7	Added E::slots_per_epoch() to deneb time calculation (#7458 ) Which issue # does this PR address? Closes #7457 Added `E::slots_per_epoch()` and now it ensures conversion from epochs to slots while calculating deneb time	2025-05-15 07:31:31 +00:00
Michael Sproul	c2c7fb87a8	Make DAG construction more permissive (#7460 ) Workaround/fix for: - https://github.com/sigp/lighthouse/issues/7323 - Remove the `StateSummariesNotContiguousError`. This allows us to continue with DAG construction and pruning, even in the case where the DAG is disjointed. We will treat any disjoint summaries as roots of their own tree, and prune them (as they are not descended from finalized). This should be safe, as canonical summaries should not be disjoint (if they are, then the DB is already corrupt).	2025-05-15 02:15:35 +00:00
Eitan Seri-Levi	807848bc7a	Next sync committee branch bug (#7443 ) #7441 Make sure we're correctly caching light client data	2025-05-13 01:13:15 +00:00
SunnysidedJ	593390162f	`peerdas-devnet-7`: update `DataColumnSidecarsByRoot` request to use `DataColumnsByRootIdentifier` (#7399 ) Update DataColumnSidecarsByRoot request to use DataColumnsByRootIdentifier #7377 As described in https://github.com/ethereum/consensus-specs/pull/4284	2025-05-12 00:20:55 +00:00
Lion - dapplion	a497ec601c	Retry custody requests after peer metadata updates (#6975 ) Closes https://github.com/sigp/lighthouse/issues/6895 We need sync to retry custody requests when a peer CGC updates. A higher CGC can result in a data column subnet peer count increasing from 0 to 1, allowing requests to happen. Add new sync event `SyncMessage::UpdatedPeerCgc`. It's sent by the router when a metadata response updates the known CGC	2025-05-09 08:27:17 +00:00
Jimmy Chen	4b9c16fc71	Add Electra forks to basic sim tests (#7199 ) This PR adds transitions to Electra ~~and Fulu~~ fork epochs in the simulator tests. ~~It also covers blob inclusion verification and data column syncing on a full node in Fulu.~~ UPDATE: Remove fulu fork from sim tests due to https://github.com/sigp/lighthouse/pull/7199#issuecomment-2852281176	2025-05-08 08:43:44 +00:00
Jimmy Chen	0f13029c7d	Don't publish data columns reconstructed from RPC columns to the gossip network (#7409 ) Don't publish data columns reconstructed from RPC columns to the gossip network, as this may result in peer downscoring if we're sending columns from past slots.	2025-05-07 23:24:48 +00:00
Lion - dapplion	beb0ce68bd	Make range sync peer loadbalancing PeerDAS-friendly (#6922 ) - Re-opens https://github.com/sigp/lighthouse/pull/6864 targeting unstable Range sync and backfill sync still assume that each batch request is done by a single peer. This assumption breaks with PeerDAS, where we request custody columns to N peers. Issues with current unstable: - Peer prioritization counts batch requests per peer. This accounting is broken now, data columns by range request are not accounted - Peer selection for data columns by range ignores the set of peers on a syncing chain, instead draws from the global pool of peers - The implementation is very strict when we have no peers to request from. After PeerDAS this case is very common and we want to be flexible or easy and handle that case better than just hard failing everything. - [x] Upstream peer prioritization to the network context, it knows exactly how many active requests a peer (including columns by range) - [x] Upstream peer selection to the network context, now `block_components_by_range_request` gets a set of peers to choose from instead of a single peer. If it can't find a peer, it returns the error `RpcRequestSendError::NoPeer` - [ ] Range sync and backfill sync handle `RpcRequestSendError::NoPeer` explicitly - [ ] Range sync: leaves the batch in `AwaitingDownload` state and does nothing. TODO: we should have some mechanism to fail the chain if it's stale for too long - EDIT: Not done in this PR - [x] Backfill sync: pauses the sync until another peer joins - EDIT: Same logic as unstable ### TODOs - [ ] Add tests :) - [x] Manually test backfill sync Note: this touches the mainnet path!	2025-05-07 02:03:07 +00:00
Lion - dapplion	2aa5d5c25e	Make sure to log SyncingChain ID (#7359 ) Debugging an sync issue from @pawanjay176 I'm missing some key info where instead of logging the ID of the SyncingChain we just log "Finalized" (the sync type). This looks like some typo or something was lost in translation when refactoring things. ``` Apr 17 12:12:00.707 DEBUG Syncing new finalized chain chain: Finalized, component: "range_sync" ``` This log should include more info about the new chain but just logs "Finalized" ``` Apr 17 12:12:00.810 DEBUG New chain added to sync peer_id: "16Uiu2HAmHP8QLYQJwZ4cjMUEyRgxzpkJF87qPgNecLTpUdruYbdA", sync_type: Finalized, new_chain: Finalized, component: "range_sync" ``` - Remove the Display impl and log the ID explicitly for all logs. - Log more details when creating a new SyncingChain	2025-05-01 19:53:29 +00:00
Jimmy Chen	93ec9df137	Compute proposer shuffling only once in gossip verification (#7304 ) When we perform data column gossip verification, we sometimes see multiple proposer shuffling cache miss simultaneously and this results in multiple threads computing the shuffling cache and potentially slows down the gossip verification. Proposal here is to use a `OnceCell` for each shuffling key to make sure it's only computed once. I have only implemented this in data column verification as a PoC, but this can also be applied to blob and block verification Related issues: - https://github.com/sigp/lighthouse/issues/4447 - https://github.com/sigp/lighthouse/issues/7203	2025-05-01 01:30:42 +00:00
Eitan Seri-Levi	9779b4ba2c	Optimize `validate_data_columns` (#7326 )	2025-04-30 04:36:50 +00:00
Akihito Nakano	1324d3d3c4	Delayed RPC Send Using Tokens (#5923 ) closes https://github.com/sigp/lighthouse/issues/5785 The diagram below shows the differences in how the receiver (responder) behaves before and after this PR. The following sentences will detail the changes. ```mermaid flowchart TD subgraph "* After " Start2([START]) --> AA[Receive request] AA --> COND1{Is there already an active request <br> with the same protocol?} COND1 --> \|Yes\| CC[Send error response] CC --> End2([END]) %% COND1 --> \|No\| COND2{Request is too large?} %% COND2 --> \|Yes\| CC COND1 --> \|No\| DD[Process request] DD --> EE{Rate limit reached?} EE --> \|Yes\| FF[Wait until tokens are regenerated] FF --> EE EE --> \|No\| GG[Send response] GG --> End2 end subgraph " Before *" Start([START]) --> A[Receive request] A --> B{Rate limit reached <br> or <br> request is too large?} B -->\|Yes\| C[Send error response] C --> End([END]) B -->\|No\| E[Process request] E --> F[Send response] F --> End end ``` ### `Is there already an active request with the same protocol?` This check is not performed in `Before`. This is taken from the PR in the consensus-spec, which proposes updates regarding rate limiting and response timeout. https://github.com/ethereum/consensus-specs/pull/3767/files > The requester MUST NOT make more than two concurrent requests with the same ID. The PR mentions the requester side. In this PR, I introduced the `ActiveRequestsLimiter` for the `responder` side to restrict more than two requests from running simultaneously on the same protocol per peer. If the limiter disallows a request, the responder sends a rate-limited error and penalizes the requester. ### `Rate limit reached?` and `Wait until tokens are regenerated` UPDATE: I moved the limiter logic to the behaviour side. https://github.com/sigp/lighthouse/pull/5923#issuecomment-2379535927 ~~The rate limiter is shared between the behaviour and the handler. (`Arc<Mutex<RateLimiter>>>`) The handler checks the rate limit and queues the response if the limit is reached. The behaviour handles pruning.~~ ~~I considered not sharing the rate limiter between the behaviour and the handler, and performing all of these either within the behaviour or handler. However, I decided against this for the following reasons:~~ - ~~Regarding performing everything within the behaviour: The behaviour is unable to recognize the response protocol when `RPC::send_response()` is called, especially when the response is `RPCCodedResponse::Error`. Therefore, the behaviour can't rate limit responses based on the response protocol.~~ - ~~Regarding performing everything within the handler: When multiple connections are established with a peer, there could be multiple handlers interacting with that peer. Thus, we cannot enforce rate limiting per peer solely within the handler. (Any ideas? 🤔 )~~	2025-04-24 03:46:16 +00:00
chonghe	c13e069c9c	Revise logging when `queue is full` (#7324 )	2025-04-22 22:46:30 +00:00
Michael Sproul	e61e92b926	Merge remote-tracking branch 'origin/stable' into unstable	2025-04-22 18:55:06 +10:00
Michael Sproul	54f7bc5b2c	Release v7.0.0 (#7288 ) New v7.0.0 release for Electra on mainnet.	2025-04-22 09:21:03 +10:00
chonghe	80fe133d2c	Update Lighthouse Book for Electra features (#7280 ) * #7227	2025-04-17 09:31:26 +00:00
Mac L	c32569ab83	Restore HTTP API logging and add more metrics (#7225 ) #7124 - Restores previous HTTP logging with tracing compatible syntax - Adds metrics for certain missing endpoints (and alphabetized the existing ones)	2025-04-17 08:18:45 +00:00
Michael Sproul	fd82ee2f81	Release v7.0.0-beta.7 (#7333 )	2025-04-17 14:46:43 +10:00
Jean-Baptiste Pinalie	5352d5f78a	Update proposer_slashings and attester_slashings amounts for electra. (#7316 ) Did not find a specific issue beside https://github.com/sigp/lighthouse/issues/6821 Leverage `whistleblower_reward_quotient_for_state` to have accurate post-electra `proposer_slashings` and `attester_slashings` fields returned by `/eth/v1/beacon/rewards/blocks/<id>`.	2025-04-17 00:58:36 +00:00
Michael Sproul	6fad6fba6a	Release v7.0.0-beta.6	2025-04-16 08:54:53 +10:00
Jimmy Chen	476f3a593c	Add `MAX_BLOBS_PER_BLOCK_FULU` config (#7161 ) Add `MAX_BLOBS_PER_BLOCK_FULU` config.	2025-04-15 00:20:46 +00:00
Lion - dapplion	be68dd24d0	Fix wrong custody column count for lookup blocks (#7281 ) Fixes - https://github.com/sigp/lighthouse/issues/7278 Don't assume 0 columns for `RpcBlockInner::Block`	2025-04-11 22:00:57 +00:00
Mac L	39eb8145f8	Merge branch 'release-v7.0.0' into unstable	2025-04-11 21:32:24 +10:00
Eitan Seri-Levi	af51d50b05	Ensure `/eth/v2/beacon/pool/attestations` honors `committee_index` (#7298 ) #7294 Fix the filtering logic so that we actually filter by committee index for both `Base` and `Electra` attestations. Added a tiny optimization when calculating committee_index to prevent unneeded memory allocations Added a regression test	2025-04-11 04:47:30 +00:00
Eitan Seri-Levi	ef8ec35ac5	Ensure `light_client/updates` endpoint returns spec compliant SSZ data (#7230 ) Closes #7167 - Ensure the fork digest is generated from ther light client updates attested header and not the signature slot - Ensure the format of the SSZ response is spec compliant	2025-04-11 04:47:27 +00:00
Eitan Seri-Levi	aed562abef	Downgrade light client errors (#7300 ) Downgrade light client errors to debug Error messages are alarming and usually indicate somethings wrong with the beacon node. The Light Client service is supposed to minimally impact users, and most will not care if the light client server is erroring. Furthermore, the only errors we've seen in the wild are during hard forks, for the first few epochs before the fork finalizes.	2025-04-10 02:17:07 +00:00
Mac L	7534f5752d	Add `pending_consolidations` Beacon API endpoint (#7290 ) #7282 Adds the missing `beacon/states/{state_id}/pending_consolidations` Beacon API endpoint along with related tests.	2025-04-10 01:21:01 +00:00
SunnysidedJ	d96b73152e	Fix for #6296 : Deterministic RNG in peer DAS publish block tests (#7192 ) #6296: Deterministic RNG in peer DAS publish block tests Made test functions to call publish-block APIs with true for the deterministic RNG boolean parameter while production code with false. This will deterministically shuffle columns for unit tests under broadcast_validation_tests.rs.	2025-04-09 15:35:15 +00:00
Michael Sproul	ec643843e0	Remove/document remaining Electra TODOs (#6982 ) Not essential to merge this now, but I'm going through TODOs for Electra to make sure we haven't missed anything. Targeting this at the release branch anyway so that auditors/readers don't get alarmed 😅	2025-04-09 04:14:50 +00:00
Pawan Dhananjay	076f3f0984	Clarify network limits (#7175 ) Resolves #6811 Rename `GOSSIP_MAX_SIZE` to `MAX_PAYLOAD_SIZE` and remove `MAX_CHUNK_SIZE` in accordance with the spec. The spec also "clarifies" the message size limits at different levels. The rpc limits are equivalent to what we had before imo. The gossip limits have additional checks. I have gotten rid of the `is_bellatrix_enabled` checks that used a lower limit (1mb) pre-merge. Since all networks we run start from the merge, I don't think this will break any setups.	2025-04-09 02:50:45 +00:00
Jimmy Chen	759b0612b3	Offloading KZG Proof Computation from the beacon node (#7117 ) Addresses #7108 - Add EL integration for `getPayloadV5` and `getBlobsV2` - Offload proof computation and use proofs from EL RPC APIs	2025-04-08 07:37:16 +00:00
Jimmy Chen	e924264e17	Fullnodes to publish data columns from EL `getBlobs` (#7258 ) Previously only supernode contributes to data column publishing in Lighthouse. Recently we've [updated the spec](https://github.com/ethereum/consensus-specs/pull/4183) to have full nodes publishing data columns as well, to ensure all nodes contributes to propagation. This also prevents already imported data columns from being imported again (because we don't "observe" them), and ensures columns that are observed in the [gossip seen cache](`d60c24ef1c/beacon_node/beacon_chain/src/data_column_verification.rs (L492)`) are forwarded to its peers, rather than being ignored.	2025-04-08 03:20:31 +00:00
Michael Sproul	47a85cd118	Bump version to v7.1.0-beta.0 (not a release) (#7269 ) Having merged the drop-headtracker PR we now have a DB schema change in `unstable` compared to `release-v7.0.0`: - https://github.com/sigp/lighthouse/pull/6744 There is a DB downgrade available, however this needs to be applied manually and it's usually a bit of a hassle. This PR bumps the version on `unstable` to `v7.1.0-beta.0` _without_ actually cutting a `v7.1.0-beta.0` release, so that we can tell at a glance which schema version a node is using.	2025-04-07 06:01:20 +00:00
Lion - dapplion	70850fe58d	Drop head tracker for summaries DAG (#6744 ) The head tracker is a persisted piece of state that must be kept in sync with the fork-choice. It has been a source of pruning issues in the past, so we want to remove it - see https://github.com/sigp/lighthouse/issues/1785 When implementing tree-states in the hot DB we have to change the pruning routine (more details below) so we want to do those changes first in isolation. - see https://github.com/sigp/lighthouse/issues/6580 - If you want to see the full feature of tree-states hot https://github.com/dapplion/lighthouse/pull/39 Closes https://github.com/sigp/lighthouse/issues/1785 Current DB migration routine - Locate abandoned heads with head tracker - Use a roots iterator to collect the ancestors of those heads can be pruned - Delete those abandoned blocks / states - Migrate the newly finalized chain to the freezer In summary, it computes what it has to delete and keeps the rest. Then it migrates data to the freezer. If the abandoned forks routine has a bug it can break the freezer migration. Proposed migration routine (this PR) - Migrate the newly finalized chain to the freezer - Load all state summaries from disk - From those, just knowing the head and finalized block compute two sets: (1) descendants of finalized (2) newly finalized chain - Iterate all summaries, if a summary does not belong to set (1) or (2), delete This strategy is more sound as it just checks what's there in the hot DB, computes what it has to keep and deletes the rest. Because it does not rely and 3rd pieces of data we can drop the head tracker and pruning checkpoint. Since the DB migration happens first now, as long as the computation of the sets to keep is correct we won't have pruning issues.	2025-04-07 04:23:52 +00:00
Pawan Dhananjay	091e292c99	Return eth1_data early post transition (#7248 ) N/A Return state.eth1_data() early if we have passed the transition period post electra. Even if we don't return early, the function would still return state.eth1_data() based on the current conditions. However, doing this explicitly here to match the spec. This covers setting the right eth1_data in our block. The other thing we need to ensure is that the deposits returned by the eth1_chain is empty post transition. The only way we get non-empty deposits post the transition is if `state.eth1_deposit_index` in the below code is less than `min(deposit_requests_start_index, state.eth1_data().deposit_count)`. `0850bcfb89/beacon_node/beacon_chain/src/eth1_chain.rs (L543-L579)` This can never happen because state.eth1_deposit_index will be equal to state.eth1_data.deposit count and cannot exceed the value. @michaelsproul @ethDreamer please double check the logic for deposits being empty post transition. Following the logic in the spec makes my head hurt.	2025-04-07 03:16:48 +00:00
Lion - dapplion	d511ca0494	Compute roots for unfinalized by_range requests with fork-choice (#7098 ) Includes PRs - https://github.com/sigp/lighthouse/pull/7058 - https://github.com/sigp/lighthouse/pull/7066 Cleaner for the `release-v7.0.0` branch	2025-04-07 03:16:41 +00:00
Jimmy Chen	7cc64cab83	Add missing error log and remove redundant id field from lookup logs (#6990 ) Partially #6989. This PR adds the missing error log when a batch fails due to issues with converting the response into `RpcBlock`. See the above linked issue for more details. Adding this log reveals that we're completing range requests with missing columns, hence causing the batch to fail. It looks like we've hit the case where we've received enough stream terminations, but not all columns are returned. ``` Feb 12 06:12:16.558 DEBG Failed to convert range block components into RpcBlock, error: No column for block 0xc5b6c7fa02f5ef603d45819c08c6519f1dba661fd5d44a2fc849d3e7028b6007 index 18, id: 3456/RangeSync/116/3432, service: sync, module: network::sync::network_context:488 ``` I've also removed some redundant `id` logging, as the `id` debug representation is difficult to read, and is now being logged as part of `req_id` in a more succinct format (relevant PR: #6914)	2025-04-04 09:01:42 +00:00
Jimmy Chen	6a75f24ab1	Fix the `getBlobs` metric and ensure it is recorded promptly to prevent miscounts (#7188 ) From testing conducted by Sunnyside Labs, they noticed that the "expected blobs" are quite low on bandwidth constrained nodes. This observation revealed that we don't record the `beacon_blobs_from_el_expected_total` metric at all if the EL doesn't return any response. The fetch blobs function returns without recording the metric. To fix this, I've moved `BLOBS_FROM_EL_EXPECTED_TOTAL` and `BLOBS_FROM_EL_RECEIVED_TOTAL` to as early as possible, to make the metric more accurate.	2025-04-04 09:01:39 +00:00
Mac L	0e6da0fcaf	Merge branch 'release-v7.0.0' into v7-backmerge	2025-04-04 13:32:58 +11:00
Mac L	82d1674455	Rust 1.86.0 lints (#7254 ) Implement lints for the new Rust compiler version 1.86.0.	2025-04-04 02:30:22 +00:00
Age Manning	d6cd049a45	RPC RequestId Cleanup (#7238 ) I've been working at updating another library to latest Lighthouse and got very confused with RPC request Ids. There were types that had fields called `request_id` and `id`. And interchangeably could have types `PeerRequestId`, `rpc::RequestId`, `AppRequestId`, `api_types::RequestId` or even `Request.id`. I couldn't keep track of which Id was linked to what and what each type meant. So this PR mainly does a few things: - Changes the field naming to match the actual type. So any field that has an `AppRequestId` will be named `app_request_id` rather than `id` or `request_id` for example. - I simplified the types. I removed the two different `RequestId` types (one in Lighthouse_network the other in the rpc) and grouped them into one. It has one downside tho. I had to add a few unreachable lines of code in the beacon processor, which the extra type would prevent, but I feel like it might be worth it. Happy to add an extra type to avoid those few lines. - I also removed the concept of `PeerRequestId` which sometimes went alongside a `request_id`. There were times were had a `PeerRequest` and a `Request` being returned, both of which contain a `RequestId` so we had redundant information. I've simplified the logic by removing `PeerRequestId` and made a `ResponseId`. I think if you look at the code changes, it simplifies things a bit and removes the redundant extra info. I think with this PR things are a little bit easier to reasonable about what is going on with all these RPC Ids. NOTE: I did this with the help of AI, so probably should be checked	2025-04-03 10:10:15 +00:00
Jimmy Chen	80626e58d2	Attempt to fix flaky network tests (#7244 )	2025-04-03 04:01:34 +00:00
Michael Sproul	578db67755	Merge remote-tracking branch 'origin/release-v7.0.0' into backmerge-apr-2	2025-04-02 09:57:42 +11:00
Michael Sproul	9bc0d5161e	Disable LevelDB snappy feature (#7235 ) Disable the `snappy` feature of LevelDB to prevent compilation issues with CMake 4.0, e.g. https://github.com/sigp/lighthouse/actions/runs/14182783816/job/39732457274?pr=7231 We do not use Snappy compression in LevelDB, and do not need to compile this. This might also shave a few seconds off compilation!	2025-04-01 07:56:06 +00:00
Michael Sproul	bde0f1ef0b	Merge remote-tracking branch 'origin/release-v7.0.0' into unstable	2025-03-29 13:01:58 +11:00
Pawan Dhananjay	54aef2d043	Admin add/remove peer (#7198 ) N/A Adds endpoints to add and remove trusted peers from the http api. The added peers are trusted peers so they won't be disconnected for bad scores. We try to maintain a connection to the peer in case they disconnect from us by trying to dial it every heartbeat.	2025-03-28 12:59:09 +00:00
Eitan Seri-Levi	a5ea05ce2a	Top-up pubkey cache on startup (#7217 ) This is a workaround for #7216 In the case of gaps between the in-memory pub key cache and its on-disk representation, use the head state on startup to "top-up" the cache/db w/ any missing validators	2025-03-28 08:29:19 +00:00

1 2 3 4 5 ...

3407 Commits