lighthouse

mirror of https://github.com/sigp/lighthouse.git synced 2026-06-01 13:47:16 +00:00

Author	SHA1	Message	Date
Age Manning	9200042910	Transition network key to hex format (#7665 ) #7181 Instead of storing the network key as binary data we store it as hex, allowing users to modify it via the file. We can read old-binary forms, however we will migrate binary to hex as it will be the new standard.	2025-08-15 07:12:19 +00:00
chonghe	522bd9e9c6	Update Rust Edition to 2024 (#7766 ) * #7749 Thanks @dknopik and @michaelsproul for your help!	2025-08-13 03:04:31 +00:00
Pawan Dhananjay	80ba0b169b	Backfill peer attribution (#7762 ) Partly addresses https://github.com/sigp/lighthouse/issues/7744 Implement similar peer sync attribution like in #7733 for backfill sync.	2025-08-12 02:11:56 +00:00
Jimmy Chen	40c2fd5ff4	Instrument tracing spans for block processing and import (#7816 ) #7815 - removes all existing spans, so some span fields that appear in logs like `service_name` may be lost. - instruments a few key code paths in the beacon node, starting from root spans named below: * Gossip block and blobs * `process_gossip_data_column_sidecar` * `process_gossip_blob` * `process_gossip_block` * Rpc block and blobs * `process_rpc_block` * `process_rpc_blobs` * `process_rpc_custody_columns` * Rpc blocks (range and backfill) * `process_chain_segment` * `PendingComponents` lifecycle * `pending_components` To test locally: * Run Grafana and Tempo with https://github.com/sigp/lighthouse-metrics/pull/57 * Run Lighthouse BN with `--telemetry-collector-url http://localhost:4317` Some captured traces can be found here: https://hackmd.io/@jimmygchen/r1sLOxPPeg Removing the old spans seem to have reduced the memory usage quite a lot - i think we were using them on long running tasks and too excessively: <img width="910" height="495" alt="image" src="https://github.com/user-attachments/assets/5208bbe4-53b2-4ead-bc71-0b782c788669" />	2025-08-08 05:32:22 +00:00
Jimmy Chen	8bc6693dac	Fix wrong columns getting processed on a CGC change (#7792 ) This PR fixes a bug where wrong columns could get processed immediately after a CGC increase. Scenario: - The node's CGC increased due to additional validators attached to it (lets say from 10 to 11) - The new CGC is advertised and new subnets are subscribed immediately, however the change won't be effective in the data availability check until the next epoch (See [this](`ab0e8870b4/beacon_node/beacon_chain/src/validator_custody.rs (L93-L99)`)). Data availability checker still only require 10 columns for the current epoch. - During this time, data columns for the additional custody column (lets say column 11) may arrive via gossip as we're already subscribed to the topic, and it may be incorrectly used to satisfy the existing data availability requirement (10 columns), and result in this additional column (instead of a required one) getting persisted, resulting in database inconsistency.	2025-08-07 00:45:04 +00:00
Daniel Ramirez-Chiquillo	9c972201bc	Fix: RPC test failures (#7734 ) Fixes #7735 Use `tracing::subscriber::set_default` to ensure that each test/thread has its own subscirber.	2025-08-06 14:59:41 +00:00
Michael Sproul	0dcce40ccb	Fix Clippy for Rust 1.90 beta (#7826 ) Fix Clippy for recently released Rust 1.90 beta. There may be more changes required when Rust 1.89 stable is released in a few days, but possibly not 🤞	2025-08-05 13:52:26 +00:00
Jimmy Chen	4daa015971	Remove peer sampling code (#7768 ) Peer sampling has been completely removed from the spec. This PR removes our partial implementation from the codebase. https://github.com/ethereum/consensus-specs/pull/4393	2025-07-23 03:24:45 +00:00
Pawan Dhananjay	3f06e5dfba	Fix enr loading from disk with cgc (#7754 ) N/A During building an enr on startup, we weren't using the value in the custody context. This was resulting in the enr value getting updated when the cgc updates, the change getting persisted, but getting set back to the default on restart. This PR takes the value explicitly from the custody context.	2025-07-18 04:51:11 +00:00
Pawan Dhananjay	90ff64381e	Sync peer attribution (#7733 ) Which issue # does this PR address? Closes #7604 Improvements to range sync including: 1. Contain column requests only to peers that are part of the SyncingChain 2. Attribute the fault to the correct peer and downscore them if they don't return the data columns for the request 3. Improve sync performance by retrying only the failed columns from other peers instead of failing the entire batch 4. Uses the earliest_available_slot to make requests to peers that claim to have the epoch. Note: if no earliest_available_slot info is available, fallback to using previous logic i.e. assume peer has everything backfilled upto WS checkpoint/da boundary Tested this on fusaka-devnet-2 with a full node and supernode and the recovering logic seems to works well. Also tested this a little on mainnet. Need to do more testing and possibly add some unit tests.	2025-07-12 00:02:30 +00:00
ethDreamer	b43e0b446c	Final changes for `fusaka-devnet-2` (#7655 ) Closes #7467. This PR primarily addresses [the P2P changes](https://github.com/ethereum/EIPs/pull/9840) in [fusaka-devnet-2](https://fusaka-devnet-2.ethpandaops.io/). Specifically: * [the new `nfd` parameter added to the `ENR`](https://github.com/ethereum/EIPs/pull/9840) * [the modified `compute_fork_digest()` changes for every BPO fork](https://github.com/ethereum/EIPs/pull/9840) 90% of this PR was absolutely hacked together as fast as possible during the Berlinterop as fast as I could while running between Glamsterdam debates. Luckily, it seems to work. But I was unable to be as careful in avoiding bugs as I usually am. I've cleaned up the things I remember wanting to come back and have a closer look at. But still working on this. Progress: * [x] get it working on `fusaka-devnet-2` * [ ] [optional disconnect from peers with incorrect `nfd` at the fork boundary](https://github.com/ethereum/consensus-specs/pull/4407) - Can be addressed in a future PR if necessary * [x] first pass clean-up * [x] fix up all the broken tests * [x] final self-review * [x] more thorough review from people more familiar with affected code	2025-07-10 21:32:58 +00:00
João Oliveira	8b5ccacac9	Error from RPC `send_response` when request doesn't exist on the active inbound requests (#7663 ) Lighthouse is currently loggign a lot errors in the `RPC` behaviour whenever a response is received for a request_id that no longer exists in active_inbound_requests. This is likely due to a data race or timing issue (e.g., the peer disconnecting before the response is handled). This PR addresses that by removing the error logging from the RPC layer. Instead, RPC::send_response now simply returns an Err, shifting the responsibility to the main service. The main service can then determine whether the peer is still connected and only log an error if the peer remains connected. Thanks @ackintosh for helping debug!	2025-07-09 14:26:51 +00:00
Jimmy Chen	41742ce2bd	Update `SAMPLES_PER_SLOT` to be number of custody groups instead of data columns (#7683 ) Update `SAMPLES_PER_SLOT` to be number of custody groups instead of data columns. This should have no impact on the current implementation as config currently maintains a `group:subnet:column` ratio of `1:1:1`. In short, this PR doesn't change anything for Fusaka, but ensures compliance with the spec and potential future changes. I've added separate methods to compute sampling columns and custody groups for clarity: `spec.sampling_size_columns` and `spec.sampling_size_custod_groups` See the clarifications in this PR for more details: https://github.com/ethereum/consensus-specs/pull/4251	2025-07-02 00:08:40 +00:00
Jimmy Chen	522e00f48d	Fix incorrect `waker` update condition (#7656 ) This bug was first found and partially fixed by @VolodymyrBg in #7317 - this PR applies the same fix everywhere else. The old logic updated the waker when it already matched the context, and did nothing when it was stale: ```rust if waker.will_wake(cx.waker()) { self.waker = Some(cx.waker().clone()); } ``` This is the wrong way around. We only want to update the waker if it doesn't match the current context: ```rust if !waker.will_wake(cx.waker()) { self.waker = Some(cx.waker().clone()); } ``` I don't think we've ever noticed any issues, but it’s a subtle bug that could lead to missed wakeups.	2025-06-27 19:01:46 +00:00
Jimmy Chen	83cad25d98	Fix Rust 1.88 clippy errors & execution engine tests (#7657 ) Fix Rust 1.88 clippy errors.	2025-06-27 18:21:17 +00:00
Eitan Seri-Levi	6786b9d12a	Single attestation "Full" implementation (#7444 ) #6970 This allows for us to receive `SingleAttestation` over gossip and process it without converting. There is still a conversion to `Attestation` as a final step in the attestation verification process, but by then the `SingleAttestation` is fully verified. I've also fully removed the `submitPoolAttestationsV1` endpoint as its been deprecated I've also pre-emptively deprecated supporting `Attestation` in `submitPoolAttestationsV2` endpoint. See here for more info: https://github.com/ethereum/beacon-APIs/pull/531 I tried to the minimize the diff here by only making the "required" changes. There are some unnecessary complexities with the way we manage the different attestation verification wrapper types. We could probably consolidate this to one wrapper type and refactor this even further. We could leave that to a separate PR if we feel like cleaning things up in the future. Note that I've also updated the test harness to always submit `SingleAttestation` regardless of fork variant. I don't see a problem in that approach and it allows us to delete more code :)	2025-06-17 09:01:26 +00:00
Jimmy Chen	3d2d65bf8d	Advertise `--advertise-false-custody-group-count` for testing PeerDAS (#7593 ) #6973	2025-06-16 11:10:28 +00:00
Pawan Dhananjay	9803d69d80	Implement status v2 version (#7590 ) N/A Implements status v2 as defined in https://github.com/ethereum/consensus-specs/pull/4374/	2025-06-12 07:17:06 +00:00
Pawan Dhananjay	5f208bb858	Implement basic validator custody framework (no backfill) (#7578 ) Resolves #6767 This PR implements a basic version of validator custody. - It introduces a new `CustodyContext` object which contains info regarding number of validators attached to a node and the custody count they contribute to the cgc. - The `CustodyContext` is added in the da_checker and has methods for returning the current cgc and the number of columns to sample at head. Note that the logic for returning the cgc existed previously in the network globals. - To estimate the number of validators attached, we use the `beacon_committee_subscriptions` endpoint. This might overestimate the number of validators actually publishing attestations from the node in the case of multi BN setups. We could also potentially use the `publish_attestations` endpoint to get a more conservative estimate at a later point. - Anytime there's a change in the `custody_group_count` due to addition/removal of validators, the custody context should send an event on a broadcast channnel. The only subscriber for the channel exists in the network service which simply subscribes to more subnets. There can be additional subscribers in sync that will start a backfill once the cgc changes. TODO - [ ] NOT REQUIRED: Currently, the logic only handles an increase in validator count and does not handle a decrease. We should ideally unsubscribe from subnets when the cgc has decreased. - [ ] NOT REQUIRED: Add a service in the `CustodyContext` that emits an event once `MIN_EPOCHS_FOR_BLOB_SIDECARS_REQUESTS ` passes after updating the current cgc. This event should be picked up by a subscriber which updates the enr and metadata. - [x] Add more tests	2025-06-11 18:10:06 +00:00
ethDreamer	ae30480926	Implement EIP-7892 BPO hardforks (#7521 ) [EIP-7892: Blob Parameter Only Hardforks](https://eips.ethereum.org/EIPS/eip-7892) #7467	2025-06-02 06:54:42 +00:00
Akihito Nakano	5cda6a6f9e	Mitigate flakiness in test_delayed_rpc_response (#7522 ) https://github.com/sigp/lighthouse/issues/7466 Expanded the margin from 100ms to 500ms.	2025-05-29 01:37:04 +00:00
Akihito Nakano	8989ef8fb1	Enable arithmetic lint in rate-limiter (#7025 ) https://github.com/sigp/lighthouse/issues/6875 - Enabled the linter in rate-limiter and fixed errors. - Changed the type of `Quota::max_tokens` from `u64` to `NonZeroU64` because `max_tokens` cannot be zero. - Added a test to ensure that a large value for `tokens`, which causes an overflow, is handled properly.	2025-05-27 15:43:22 +00:00
Akihito Nakano	a2797d4bbd	Fix formatting errors from cargo-sort (#7512 ) [cargo-sort is currently failing on CI](https://github.com/sigp/lighthouse/actions/runs/15198128212/job/42746931918?pr=7025), likely due to new checks introduced in version [2.0.0](https://github.com/DevinR528/cargo-sort/releases/tag/v2.0.0). Fixed the errors by running cargo-sort with formatting enabled.	2025-05-23 05:25:56 +00:00
Akihito Nakano	cf0f959855	Improve log readability during rpc_tests (#7180 ) It is unclear from the logs during rpc_tests whether the output comes from the sender or the receiver. ``` 2025-03-20T11:21:50.038868Z DEBUG rpc_tests: Sending message 2 2025-03-20T11:21:50.041129Z DEBUG rpc_tests: Sender received a response 2025-03-20T11:21:50.041242Z DEBUG rpc_tests: Chunk received 2025-03-20T11:21:51.040837Z DEBUG rpc_tests: Sending message 3 2025-03-20T11:21:51.043635Z DEBUG rpc_tests: Sender received a response 2025-03-20T11:21:51.043855Z DEBUG rpc_tests: Chunk received 2025-03-20T11:21:52.043427Z DEBUG rpc_tests: Sending message 4 2025-03-20T11:21:52.052831Z DEBUG rpc_tests: Sender received a response 2025-03-20T11:21:52.052953Z DEBUG rpc_tests: Chunk received 2025-03-20T11:21:53.045589Z DEBUG rpc_tests: Sending message 5 2025-03-20T11:21:53.052718Z DEBUG rpc_tests: Sender received a response 2025-03-20T11:21:53.052825Z DEBUG rpc_tests: Chunk received 2025-03-20T11:21:54.049157Z DEBUG rpc_tests: Sending message 6 2025-03-20T11:21:54.058072Z DEBUG rpc_tests: Sender received a response 2025-03-20T11:21:54.058603Z DEBUG rpc_tests: Chunk received 2025-03-20T11:21:55.018822Z DEBUG Swarm::poll: libp2p_gossipsub::behaviour: Starting heartbeat 2025-03-20T11:21:55.018953Z DEBUG Swarm::poll: libp2p_gossipsub::behaviour: Completed Heartbeat 2025-03-20T11:21:55.027100Z DEBUG Swarm::poll: libp2p_gossipsub::behaviour: Starting heartbeat 2025-03-20T11:21:55.027199Z DEBUG Swarm::poll: libp2p_gossipsub::behaviour: Completed Heartbeat ``` Added `info_span` to both the sender and receiver in each test. ``` 2025-03-20T11:20:04.172699Z DEBUG Receiver: rpc_tests: Sending message 2 2025-03-20T11:20:04.179147Z DEBUG Sender: rpc_tests: Sender received a response 2025-03-20T11:20:04.179281Z DEBUG Sender: rpc_tests: Chunk received 2025-03-20T11:20:05.175300Z DEBUG Receiver: rpc_tests: Sending message 3 2025-03-20T11:20:05.177202Z DEBUG Sender: rpc_tests: Sender received a response 2025-03-20T11:20:05.177292Z DEBUG Sender: rpc_tests: Chunk received 2025-03-20T11:20:06.176868Z DEBUG Receiver: rpc_tests: Sending message 4 2025-03-20T11:20:06.179379Z DEBUG Sender: rpc_tests: Sender received a response 2025-03-20T11:20:06.179460Z DEBUG Sender: rpc_tests: Chunk received 2025-03-20T11:20:07.179257Z DEBUG Receiver: rpc_tests: Sending message 5 2025-03-20T11:20:07.181386Z DEBUG Sender: rpc_tests: Sender received a response 2025-03-20T11:20:07.181503Z DEBUG Sender: rpc_tests: Chunk received 2025-03-20T11:20:08.181428Z DEBUG Receiver: rpc_tests: Sending message 6 2025-03-20T11:20:08.190231Z DEBUG Sender: rpc_tests: Sender received a response 2025-03-20T11:20:08.190358Z DEBUG Sender: rpc_tests: Chunk received 2025-03-20T11:20:09.151699Z DEBUG Sender:Swarm::poll: libp2p_gossipsub::behaviour: Starting heartbeat 2025-03-20T11:20:09.151748Z DEBUG Sender:Swarm::poll: libp2p_gossipsub::behaviour: Completed Heartbeat 2025-03-20T11:20:09.160244Z DEBUG Receiver:Swarm::poll: libp2p_gossipsub::behaviour: Starting heartbeat 2025-03-20T11:20:09.160288Z DEBUG Receiver:Swarm::poll: libp2p_gossipsub::behaviour: Completed Heartbeat ```	2025-05-22 02:51:25 +00:00
Akihito Nakano	a8035d7395	Enable stdout logging in rpc_tests (#7506 ) Currently `test_delayed_rpc_response` is flaky (possibly specific to Windows?), but I'm not sure why. Enabled stdout logging in rpc_tests. Note that in nextest, std output is only displayed when a test fails.	2025-05-22 02:14:48 +00:00
SunnysidedJ	593390162f	`peerdas-devnet-7`: update `DataColumnSidecarsByRoot` request to use `DataColumnsByRootIdentifier` (#7399 ) Update DataColumnSidecarsByRoot request to use DataColumnsByRootIdentifier #7377 As described in https://github.com/ethereum/consensus-specs/pull/4284	2025-05-12 00:20:55 +00:00
Lion - dapplion	a497ec601c	Retry custody requests after peer metadata updates (#6975 ) Closes https://github.com/sigp/lighthouse/issues/6895 We need sync to retry custody requests when a peer CGC updates. A higher CGC can result in a data column subnet peer count increasing from 0 to 1, allowing requests to happen. Add new sync event `SyncMessage::UpdatedPeerCgc`. It's sent by the router when a metadata response updates the known CGC	2025-05-09 08:27:17 +00:00
Lion - dapplion	beb0ce68bd	Make range sync peer loadbalancing PeerDAS-friendly (#6922 ) - Re-opens https://github.com/sigp/lighthouse/pull/6864 targeting unstable Range sync and backfill sync still assume that each batch request is done by a single peer. This assumption breaks with PeerDAS, where we request custody columns to N peers. Issues with current unstable: - Peer prioritization counts batch requests per peer. This accounting is broken now, data columns by range request are not accounted - Peer selection for data columns by range ignores the set of peers on a syncing chain, instead draws from the global pool of peers - The implementation is very strict when we have no peers to request from. After PeerDAS this case is very common and we want to be flexible or easy and handle that case better than just hard failing everything. - [x] Upstream peer prioritization to the network context, it knows exactly how many active requests a peer (including columns by range) - [x] Upstream peer selection to the network context, now `block_components_by_range_request` gets a set of peers to choose from instead of a single peer. If it can't find a peer, it returns the error `RpcRequestSendError::NoPeer` - [ ] Range sync and backfill sync handle `RpcRequestSendError::NoPeer` explicitly - [ ] Range sync: leaves the batch in `AwaitingDownload` state and does nothing. TODO: we should have some mechanism to fail the chain if it's stale for too long - EDIT: Not done in this PR - [x] Backfill sync: pauses the sync until another peer joins - EDIT: Same logic as unstable ### TODOs - [ ] Add tests :) - [x] Manually test backfill sync Note: this touches the mainnet path!	2025-05-07 02:03:07 +00:00
Akihito Nakano	1324d3d3c4	Delayed RPC Send Using Tokens (#5923 ) closes https://github.com/sigp/lighthouse/issues/5785 The diagram below shows the differences in how the receiver (responder) behaves before and after this PR. The following sentences will detail the changes. ```mermaid flowchart TD subgraph "* After " Start2([START]) --> AA[Receive request] AA --> COND1{Is there already an active request <br> with the same protocol?} COND1 --> \|Yes\| CC[Send error response] CC --> End2([END]) %% COND1 --> \|No\| COND2{Request is too large?} %% COND2 --> \|Yes\| CC COND1 --> \|No\| DD[Process request] DD --> EE{Rate limit reached?} EE --> \|Yes\| FF[Wait until tokens are regenerated] FF --> EE EE --> \|No\| GG[Send response] GG --> End2 end subgraph " Before *" Start([START]) --> A[Receive request] A --> B{Rate limit reached <br> or <br> request is too large?} B -->\|Yes\| C[Send error response] C --> End([END]) B -->\|No\| E[Process request] E --> F[Send response] F --> End end ``` ### `Is there already an active request with the same protocol?` This check is not performed in `Before`. This is taken from the PR in the consensus-spec, which proposes updates regarding rate limiting and response timeout. https://github.com/ethereum/consensus-specs/pull/3767/files > The requester MUST NOT make more than two concurrent requests with the same ID. The PR mentions the requester side. In this PR, I introduced the `ActiveRequestsLimiter` for the `responder` side to restrict more than two requests from running simultaneously on the same protocol per peer. If the limiter disallows a request, the responder sends a rate-limited error and penalizes the requester. ### `Rate limit reached?` and `Wait until tokens are regenerated` UPDATE: I moved the limiter logic to the behaviour side. https://github.com/sigp/lighthouse/pull/5923#issuecomment-2379535927 ~~The rate limiter is shared between the behaviour and the handler. (`Arc<Mutex<RateLimiter>>>`) The handler checks the rate limit and queues the response if the limit is reached. The behaviour handles pruning.~~ ~~I considered not sharing the rate limiter between the behaviour and the handler, and performing all of these either within the behaviour or handler. However, I decided against this for the following reasons:~~ - ~~Regarding performing everything within the behaviour: The behaviour is unable to recognize the response protocol when `RPC::send_response()` is called, especially when the response is `RPCCodedResponse::Error`. Therefore, the behaviour can't rate limit responses based on the response protocol.~~ - ~~Regarding performing everything within the handler: When multiple connections are established with a peer, there could be multiple handlers interacting with that peer. Thus, we cannot enforce rate limiting per peer solely within the handler. (Any ideas? 🤔 )~~	2025-04-24 03:46:16 +00:00
Mac L	39eb8145f8	Merge branch 'release-v7.0.0' into unstable	2025-04-11 21:32:24 +10:00
Pawan Dhananjay	076f3f0984	Clarify network limits (#7175 ) Resolves #6811 Rename `GOSSIP_MAX_SIZE` to `MAX_PAYLOAD_SIZE` and remove `MAX_CHUNK_SIZE` in accordance with the spec. The spec also "clarifies" the message size limits at different levels. The rpc limits are equivalent to what we had before imo. The gossip limits have additional checks. I have gotten rid of the `is_bellatrix_enabled` checks that used a lower limit (1mb) pre-merge. Since all networks we run start from the merge, I don't think this will break any setups.	2025-04-09 02:50:45 +00:00
Mac L	0e6da0fcaf	Merge branch 'release-v7.0.0' into v7-backmerge	2025-04-04 13:32:58 +11:00
Mac L	82d1674455	Rust 1.86.0 lints (#7254 ) Implement lints for the new Rust compiler version 1.86.0.	2025-04-04 02:30:22 +00:00
Age Manning	d6cd049a45	RPC RequestId Cleanup (#7238 ) I've been working at updating another library to latest Lighthouse and got very confused with RPC request Ids. There were types that had fields called `request_id` and `id`. And interchangeably could have types `PeerRequestId`, `rpc::RequestId`, `AppRequestId`, `api_types::RequestId` or even `Request.id`. I couldn't keep track of which Id was linked to what and what each type meant. So this PR mainly does a few things: - Changes the field naming to match the actual type. So any field that has an `AppRequestId` will be named `app_request_id` rather than `id` or `request_id` for example. - I simplified the types. I removed the two different `RequestId` types (one in Lighthouse_network the other in the rpc) and grouped them into one. It has one downside tho. I had to add a few unreachable lines of code in the beacon processor, which the extra type would prevent, but I feel like it might be worth it. Happy to add an extra type to avoid those few lines. - I also removed the concept of `PeerRequestId` which sometimes went alongside a `request_id`. There were times were had a `PeerRequest` and a `Request` being returned, both of which contain a `RequestId` so we had redundant information. I've simplified the logic by removing `PeerRequestId` and made a `ResponseId`. I think if you look at the code changes, it simplifies things a bit and removes the redundant extra info. I think with this PR things are a little bit easier to reasonable about what is going on with all these RPC Ids. NOTE: I did this with the help of AI, so probably should be checked	2025-04-03 10:10:15 +00:00
Michael Sproul	bde0f1ef0b	Merge remote-tracking branch 'origin/release-v7.0.0' into unstable	2025-03-29 13:01:58 +11:00
Pawan Dhananjay	54aef2d043	Admin add/remove peer (#7198 ) N/A Adds endpoints to add and remove trusted peers from the http api. The added peers are trusted peers so they won't be disconnected for bad scores. We try to maintain a connection to the peer in case they disconnect from us by trying to dial it every heartbeat.	2025-03-28 12:59:09 +00:00
Lion - dapplion	6f31d44343	Remove CGC from data_availability checker (#7033 ) - Part of https://github.com/sigp/lighthouse/issues/6767 Validator custody makes the CGC and set of sampling columns dynamic. Right now this information is stored twice: - in the data availability checker - in the network globals If that state becomes dynamic we must make sure it is in sync updating it twice, or guarding it behind a mutex. However, I noted that we don't really have to keep the CGC inside the data availability checker. All consumers can actually read it from the network globals, and we can update `make_available` to read the expected count of data columns from the block.	2025-03-26 05:19:51 +00:00
João Oliveira	c095a0a58f	update gossipsub to the latest upstream revision (#7130 ) I feel it's preferable to do this explicitly by updating the revision on `Cargo.toml` rather than implicitly by letting `Cargo.lock` control the revision of the branch.	2025-03-17 00:22:19 +00:00
Daniel Knopik	574b204bdb	decouple `eth2` from `store` and `lighthouse_network` (#6680 ) - #6452 (partially) Remove dependencies on `store` and `lighthouse_network` from `eth2`. This was achieved as follows: - depend on `enr` and `multiaddr` directly instead of using `lighthouse_network`'s reexports. - make `lighthouse_network` responsible for converting between API and internal types. - in two cases, remove complex internal types and use the generic `serde_json::Value` instead - this is not ideal, but should be fine for now, as this affects two internal non-spec endpoints which are meant for debugging, unstable, and subject to change without notice anyway. Inspired by #6679. The alternative is to move all relevant types to `eth2` or `types` instead - what do you think?	2025-03-14 16:44:48 +00:00
ThreeHrSleep	d60c24ef1c	Integrate tracing (#6339 ) Tracing Integration - [reference](`5bbf1859e9/projects/project-ideas.md (L297)`) - [x] replace slog & log with tracing throughout the codebase - [x] implement custom crit log - [x] make relevant changes in the formatter - [x] replace sloggers - [x] re-write SSE logging components cc: @macladson @eserilev	2025-03-12 22:31:05 +00:00
João Oliveira	f23f984f85	switch to upstream gossipsub (#7057 ) We forked `gossipsub` into the lighthouse repo sometime ago so that we could iterate quicker on implementing back pressure and IDONTWANT. Meanwhile we have pushed all our changes upstream and we are now the main maintainers of `rust-libp2p` this allows us to use upstream `gossipsub` again. Nonetheless we still use our forked repo to give us freedom to experiment with features before submitting them upstream	2025-03-11 12:59:16 +00:00
Paul Hauner	8d1abce26e	Bump SSZ version for larger bitfield `SmallVec` (#6915 ) NA Bumps the `ethereum_ssz` version, along with other crates that share the dep. Primarily, this give us bitfields which can store 128 bytes on the stack before allocating, rather than 32 bytes (https://github.com/sigp/ethereum_ssz/pull/38). The validator count has increase massively since we set it at 32 bytes, so aggregation bitfields (et al) now require a heap allocation. This new value of 128 should get us to ~2m active validators.	2025-03-10 08:18:33 +00:00
Michael Sproul	e5e43ecd81	Merge remote-tracking branch 'origin/release-v7.0.0' into unstable	2025-02-24 13:59:40 +11:00
Pawan Dhananjay	b3b6aea1c5	Rust 1.85 lints (#7019 ) N/A 2 changes: 1. Replace Option::map_or(true, ...) with is_none_or(...) 2. Remove unnecessary `Into::into` blocks where the type conversion is apparent from the types	2025-02-24 02:36:13 +00:00
João Oliveira	193061ff73	Use RpcSend on RPC::self_limiter::ready_requests (#6634 ) We don't need to store `BehaviourAction` for `ready_requests` and therefore avoid having an `unreachable!` on #6625. Therefore this PR should be merged before it	2025-02-19 21:17:19 +00:00
Lion - dapplion	0055af56b6	Unsubscribe blob topics at Fulu fork (#6932 ) Addresses #6854. PeerDAS requires unsubscribing a Gossip topic at a fork boundary. This is not possible with our current topic machinery. Instead of defining which topics have to be added at a given fork, we define the complete set of topics at a given fork. The new start of the show and key function is: ```rust pub fn core_topics_to_subscribe<E: EthSpec>( fork_name: ForkName, opts: &TopicConfig, spec: &ChainSpec, ) -> Vec<GossipKind> { // ... if fork_name.deneb_enabled() && !fork_name.fulu_enabled() { // All of deneb blob topics are core topics for i in 0..spec.blob_sidecar_subnet_count(fork_name) { topics.push(GossipKind::BlobSidecar(i)); } } // ... } ``` `core_topics_to_subscribe` only returns the blob topics if `fork < Fulu`. Then at the fork boundary, we subscribe with the new fork digest to `core_topics_to_subscribe(next_fork)`, which excludes the blob topics. I added `is_fork_non_core_topic` to carry on to the next fork the aggregator topics for attestations and sync committee messages. This approach is future-proof if those topics ever become fork-dependent. Closes https://github.com/sigp/lighthouse/issues/6854	2025-02-11 23:40:14 +00:00
Lion - dapplion	d60388134d	Add PeerDAS metrics to track subnets without peers (#6928 ) Currently we track a key metric `PEERS_PER_COLUMN_SUBNET` in a getter `good_peers_on_sampling_subnets`. Another PR https://github.com/sigp/lighthouse/pull/6922 deletes that function, so we have to move the metric anyway. This PR moves that metric computation to the metrics spawned task which is refreshed every 5 seconds. I also added a few more useful metrics. The total set and intended usage is: - `sync_peers_per_column_subnet`: Track health of overall subnets in your node - `sync_peers_per_custody_column_subnet`: Track health of the subnets your node needs. We should track this metric closely in our dashboards with a heatmap and bar plot - ~~`sync_column_subnets_with_zero_peers`: Is equivalent to the Grafana query `count(sync_peers_per_column_subnet == 0) by (instance)`. We may prefer to skip it, but I believe it's the most important metric as if `sync_column_subnets_with_zero_peers > 0` your node stalls.~~ - ~~`sync_custody_column_subnets_with_zero_peers`: `count(sync_peers_per_custody_column_subnet == 0) by (instance)`~~	2025-02-11 06:56:38 +00:00
Lion - dapplion	d5a03c9d86	Add more range sync tests (#6872 ) Currently we have very poor coverage of range sync with unit tests. With the event driven test framework we could cover much more ground and be confident when modifying the code. Add two basic cases: - Happy path, complete a finalized sync for 2 epochs - Post-PeerDAS case where we start without enough custody peers and later we find enough ⚠️ If you have ideas for more test cases, please let me know! I'll write them	2025-02-10 07:55:22 +00:00
Age Manning	62a0f25f97	IPv6 By Default (#6808 )	2025-02-10 01:58:11 +00:00
Lion - dapplion	f35213ebe7	Sync active request byrange ids logs (#6914 ) - Re-opened PR from https://github.com/sigp/lighthouse/pull/6869 Writing and running tests I noted that the sync RPC requests are very verbose now. `DataColumnsByRootRequestId { id: 123, requester: Custody(CustodyId { requester: CustodyRequester(SingleLookupReqId { req_id: 121, lookup_id: 101 }) }) }` Since this Id is logged rather often I believe there's value in 1. Making them more succinct for log verbosity 2. Make them a string that's easy to copy and work with elastic Write custom `Display` implementations to render Ids in a more DX format _ DataColumnsByRootRequestId with a block lookup_ ``` 123/Custody/121/Lookup/101 ``` _DataColumnsByRangeRequestId_ ``` 123/122/RangeSync/0/5492900659401505034 ``` - This one will be shorter after https://github.com/sigp/lighthouse/pull/6868 Also made the logs format and text consistent across all methods	2025-02-10 01:27:05 +00:00

1 2 3 4 5 ...

468 Commits