lighthouse

mirror of https://github.com/sigp/lighthouse.git synced 2026-06-16 18:28:42 +00:00

Author	SHA1	Message	Date
Eitan Seri-Levi	33e21634cb	Custody backfill sync (#7907 ) #7603 #### Custody backfill sync service Similar in many ways to the current backfill service. There may be ways to unify the two services. The difficulty there is that the current backfill service tightly couples blocks and their associated blobs/data columns. Any attempts to unify the two services should be left to a separate PR in my opinion. #### `SyncNeworkContext` `SyncNetworkContext` manages custody sync data columns by range requests separetly from other sync RPC requests. I think this is a nice separation considering that custody backfill is its own service. #### Data column import logic The import logic verifies KZG committments and that the data columns block root matches the block root in the nodes store before importing columns #### New channel to send messages to `SyncManager` Now external services can communicate with the `SyncManager`. In this PR this channel is used to trigger a custody sync. Alternatively we may be able to use the existing `mpsc` channel that the `SyncNetworkContext` uses to communicate with the `SyncManager`. I will spend some time reviewing this. Co-Authored-By: Eitan Seri-Levi <eserilev@ucsc.edu> Co-Authored-By: Eitan Seri- Levi <eserilev@gmail.com> Co-Authored-By: dapplion <35266934+dapplion@users.noreply.github.com>	2025-10-22 03:51:34 +00:00
Pawan Dhananjay	092aaae961	Sync cleanups (#8230 ) N/A 1. In the batch retry logic, we were failing to set the batch state to `AwaitingDownload` before attempting a retry. This PR sets it to `AwaitingDownload` before the retry and sets it back to `Downloading` if the retry suceeded in sending out a request 2. Remove all peer scoring logic from retrying and rely on just de priorotizing the failed peer. I finally concede the point to @dapplion 😄 3. Changes `block_components_by_range_request` to accept `block_peers` and `column_peers`. This is to ensure that we use the full synced peerset for requesting columns in order to avoid splitting the column peers among multiple head chains. During forward sync, we want the block peers to be the peers from the syncing chain and column peers to be all synced peers from the peerdb. Also, fixes a typo and calls `attempt_send_awaiting_download_batches` from more places Co-Authored-By: Pawan Dhananjay <pawandhananjay@gmail.com>	2025-10-20 11:50:00 +00:00
Jimmy Chen	79b33214ea	Only send data coumn subnet discovery requests after peerdas is scheduled (#8109 ) #8105 (to be confirmed) I noticed a large number of failed discovery requests after deploying latest `unstable` to some of our testnet and mainnet nodes. This is because of a recent PeerDAS change to attempt to maintain sufficient peers across data column subnets - this shouldn't be enabled on network without peerdas scheduled, otherwise it will keep retrying discovery on these subnets and never succeed. Also removed some unused files. Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com> Co-Authored-By: Jimmy Chen <jimmy@sigmaprime.io>	2025-09-25 02:52:07 +00:00
Toki	5928407ce4	fix(rate_limiter): add missing prune calls for light client protocols (#8058 ) Co-Authored-By: Jimmy Chen <jimmy@sigmaprime.io> Co-Authored-By: gitToki <tokipro@proton.me>	2025-09-17 04:51:43 +00:00
Daniel Knopik	ee1b6bc81b	Create `network_utils` crate (#7761 ) Anchor currently depends on `lighthouse_network` for a few types and utilities that live within. As we use our own libp2p behaviours, we actually do not use the core logic in that crate. This makes us transitively depend on a bunch of unneeded crates (even a whole separate libp2p if the versions mismatch!) Move things we require into it's own lightweight crate. Co-Authored-By: Daniel Knopik <daniel@dknopik.de>	2025-09-10 12:59:24 +00:00
Jimmy Chen	677de70025	Fix incorrect prune test logic (#7999 ) I just noticed that one of the tests i added in #7915 is incorrect, after it was running flaky for a bit. This PR fixes the scenario and ensure the outcome will always be the same.	2025-09-04 19:53:38 +00:00
Jimmy Chen	c2a92f1a8c	Maintain peers across all data column subnets (#7915 ) Closes: - #7865 - #7855 Changes extracted from earlier PR #7876 This PR fixes two main things with a few other improvements mentioned below: - Prevent Lighthouse from repeatedly sending `DataColumnByRoot` requests to an unsynced peer, causing lookup sync to get stuck - Allows Lighthouse to send discovery requests if there isn't enough synced peers in the required sampling subnets - this fixes the stuck sync scenario where there isn't enough usable peers in sampling subnet but no discovery is attempted. - Make peer discovery queries if custody subnet peer count drops below the minimum threshold - Update peer pruning logic to prioritise uniform distribution across all data column subnets and avoid pruning sampling peers if the count is below the target threshold (2) - Check sync status when making discovery requests, to make sure we don't ignore requests if there isn't enough synced peers in the required sampling subnets - Optimise some of the `PeerDB` functions checking custody peers - Only send lookup requests to peers that are synced or advanced	2025-09-04 05:36:20 +00:00
Akihito Nakano	7b5be8b1e7	Remove ttfb_timeout and resp_timeout (#7925 ) `TTFB_TIMEOUT` was deprecated in https://github.com/ethereum/consensus-specs/pull/3767. Remove `ttfb_timeout` from `InboundUpgrade` and other related structs. (Update) Also removed `resp_timeout` and also removed the `NetworkParams` struct since its fields are no longer used. https://github.com/sigp/lighthouse/pull/7925#issuecomment-3226886352	2025-09-03 02:00:15 +00:00
Michael Sproul	d235f2c697	Delete `RuntimeVariableList::from_vec` (#7930 ) This method is a footgun because it truncates the list. It is the source of a recent bug: - https://github.com/sigp/lighthouse/pull/7927 - Delete uses of `RuntimeVariableList::from_vec` and replace them with `::new` which does validation and can fail. - Propagate errors where possible, unwrap in tests and use `expect` for obviously-safe uses (in `chain_spec.rs`).	2025-08-27 06:52:14 +00:00
Mac L	e438691683	Add Gloas boilerplate (#7728 ) Adds the required boilerplate code for the Gloas (Glamsterdam) hard fork. This allows PRs testing Gloas-candidate features to test fork transition. This also includes de-duplication of post-Bellatrix readiness notifiers from #6797 (credit to @dapplion)	2025-08-26 02:49:48 +00:00
Jimmy Chen	747d9118ff	Fix `DataColumnsByRoot` request limit validation bug (#7928 ) Fixes #7926 This was a bug I introduced in #7890 and @pawanjay176 noticed it on some running nodes, and added a rpc test to confirm it. The culprit is this line, where I failed to fill the vec to it's max size, so it doesn't calculate the max size properly, resulting in all `DataColumnByRoot` requests exceeding the max size during validation: `d24a6d2a45/consensus/types/src/chain_spec.rs (L1984)` The PR fixes this and includes new regression tests for this fix.	2025-08-25 04:13:36 +00:00
João Oliveira	884f30094a	use DEFAULT_TARGET_PEERS for target peers everywhere (#7916 ) Was going to leave this as a comment on #7877 but when noticed it had already been merged. we have `DEFAULT_TARGET_PEERS` which was set to 50 and only used on the `Default` impl for `peer_manager`'s `Config`, which then get's overridden by this `lighthouse_network::Config`s default This PR unifies everything on `DEFAULT_TARGET_PEERS`	2025-08-22 00:24:24 +00:00
Jimmy Chen	d24a6d2a45	Prioritise `StatusV2` over `StatusV1` RPC protocol (#7912 ) Prioritise `StatusV2` over `StatusV1` RPC protocol. A bug discovered during devnet-4 testing and extracted from the sync fixes PR #7876.	2025-08-21 23:02:18 +00:00
João Oliveira	cee30d8ca5	Update lighthouse to the latest upstream libp2p and gossipsub (#7828 )	2025-08-21 07:57:46 +00:00
Age Manning	c9ffdf7f71	Re-assess Lighthouse's peer count for Fusaka (#7877 )	2025-08-21 06:12:53 +00:00
Jimmy Chen	b4704eab4a	Fulu update to spec v1.6.0-alpha.4 (#7890 ) Fulu update to spec [v1.6.0-alpha.4](https://github.com/ethereum/consensus-specs/releases/tag/v1.6.0-alpha.4). - Make `number_of_columns` a preset - Optimise `get_custody_groups` to avoid computing if cgc = 128 - Add support for additional typenum values in type_dispatch macro	2025-08-20 02:05:04 +00:00
Jimmy Chen	34dd1b27ae	Revise data column rpc limits and queue sizes (#7887 ) Revise data column rpc limits and queue sizes. Also removed some outdated TODOs for Fulu / das.	2025-08-19 03:48:08 +00:00
Age Manning	9200042910	Transition network key to hex format (#7665 ) #7181 Instead of storing the network key as binary data we store it as hex, allowing users to modify it via the file. We can read old-binary forms, however we will migrate binary to hex as it will be the new standard.	2025-08-15 07:12:19 +00:00
chonghe	522bd9e9c6	Update Rust Edition to 2024 (#7766 ) * #7749 Thanks @dknopik and @michaelsproul for your help!	2025-08-13 03:04:31 +00:00
Pawan Dhananjay	80ba0b169b	Backfill peer attribution (#7762 ) Partly addresses https://github.com/sigp/lighthouse/issues/7744 Implement similar peer sync attribution like in #7733 for backfill sync.	2025-08-12 02:11:56 +00:00
Jimmy Chen	40c2fd5ff4	Instrument tracing spans for block processing and import (#7816 ) #7815 - removes all existing spans, so some span fields that appear in logs like `service_name` may be lost. - instruments a few key code paths in the beacon node, starting from root spans named below: * Gossip block and blobs * `process_gossip_data_column_sidecar` * `process_gossip_blob` * `process_gossip_block` * Rpc block and blobs * `process_rpc_block` * `process_rpc_blobs` * `process_rpc_custody_columns` * Rpc blocks (range and backfill) * `process_chain_segment` * `PendingComponents` lifecycle * `pending_components` To test locally: * Run Grafana and Tempo with https://github.com/sigp/lighthouse-metrics/pull/57 * Run Lighthouse BN with `--telemetry-collector-url http://localhost:4317` Some captured traces can be found here: https://hackmd.io/@jimmygchen/r1sLOxPPeg Removing the old spans seem to have reduced the memory usage quite a lot - i think we were using them on long running tasks and too excessively: <img width="910" height="495" alt="image" src="https://github.com/user-attachments/assets/5208bbe4-53b2-4ead-bc71-0b782c788669" />	2025-08-08 05:32:22 +00:00
Jimmy Chen	8bc6693dac	Fix wrong columns getting processed on a CGC change (#7792 ) This PR fixes a bug where wrong columns could get processed immediately after a CGC increase. Scenario: - The node's CGC increased due to additional validators attached to it (lets say from 10 to 11) - The new CGC is advertised and new subnets are subscribed immediately, however the change won't be effective in the data availability check until the next epoch (See [this](`ab0e8870b4/beacon_node/beacon_chain/src/validator_custody.rs (L93-L99)`)). Data availability checker still only require 10 columns for the current epoch. - During this time, data columns for the additional custody column (lets say column 11) may arrive via gossip as we're already subscribed to the topic, and it may be incorrectly used to satisfy the existing data availability requirement (10 columns), and result in this additional column (instead of a required one) getting persisted, resulting in database inconsistency.	2025-08-07 00:45:04 +00:00
Daniel Ramirez-Chiquillo	9c972201bc	Fix: RPC test failures (#7734 ) Fixes #7735 Use `tracing::subscriber::set_default` to ensure that each test/thread has its own subscirber.	2025-08-06 14:59:41 +00:00
Michael Sproul	0dcce40ccb	Fix Clippy for Rust 1.90 beta (#7826 ) Fix Clippy for recently released Rust 1.90 beta. There may be more changes required when Rust 1.89 stable is released in a few days, but possibly not 🤞	2025-08-05 13:52:26 +00:00
Jimmy Chen	4daa015971	Remove peer sampling code (#7768 ) Peer sampling has been completely removed from the spec. This PR removes our partial implementation from the codebase. https://github.com/ethereum/consensus-specs/pull/4393	2025-07-23 03:24:45 +00:00
Pawan Dhananjay	3f06e5dfba	Fix enr loading from disk with cgc (#7754 ) N/A During building an enr on startup, we weren't using the value in the custody context. This was resulting in the enr value getting updated when the cgc updates, the change getting persisted, but getting set back to the default on restart. This PR takes the value explicitly from the custody context.	2025-07-18 04:51:11 +00:00
Pawan Dhananjay	90ff64381e	Sync peer attribution (#7733 ) Which issue # does this PR address? Closes #7604 Improvements to range sync including: 1. Contain column requests only to peers that are part of the SyncingChain 2. Attribute the fault to the correct peer and downscore them if they don't return the data columns for the request 3. Improve sync performance by retrying only the failed columns from other peers instead of failing the entire batch 4. Uses the earliest_available_slot to make requests to peers that claim to have the epoch. Note: if no earliest_available_slot info is available, fallback to using previous logic i.e. assume peer has everything backfilled upto WS checkpoint/da boundary Tested this on fusaka-devnet-2 with a full node and supernode and the recovering logic seems to works well. Also tested this a little on mainnet. Need to do more testing and possibly add some unit tests.	2025-07-12 00:02:30 +00:00
ethDreamer	b43e0b446c	Final changes for `fusaka-devnet-2` (#7655 ) Closes #7467. This PR primarily addresses [the P2P changes](https://github.com/ethereum/EIPs/pull/9840) in [fusaka-devnet-2](https://fusaka-devnet-2.ethpandaops.io/). Specifically: * [the new `nfd` parameter added to the `ENR`](https://github.com/ethereum/EIPs/pull/9840) * [the modified `compute_fork_digest()` changes for every BPO fork](https://github.com/ethereum/EIPs/pull/9840) 90% of this PR was absolutely hacked together as fast as possible during the Berlinterop as fast as I could while running between Glamsterdam debates. Luckily, it seems to work. But I was unable to be as careful in avoiding bugs as I usually am. I've cleaned up the things I remember wanting to come back and have a closer look at. But still working on this. Progress: * [x] get it working on `fusaka-devnet-2` * [ ] [optional disconnect from peers with incorrect `nfd` at the fork boundary](https://github.com/ethereum/consensus-specs/pull/4407) - Can be addressed in a future PR if necessary * [x] first pass clean-up * [x] fix up all the broken tests * [x] final self-review * [x] more thorough review from people more familiar with affected code	2025-07-10 21:32:58 +00:00
João Oliveira	8b5ccacac9	Error from RPC `send_response` when request doesn't exist on the active inbound requests (#7663 ) Lighthouse is currently loggign a lot errors in the `RPC` behaviour whenever a response is received for a request_id that no longer exists in active_inbound_requests. This is likely due to a data race or timing issue (e.g., the peer disconnecting before the response is handled). This PR addresses that by removing the error logging from the RPC layer. Instead, RPC::send_response now simply returns an Err, shifting the responsibility to the main service. The main service can then determine whether the peer is still connected and only log an error if the peer remains connected. Thanks @ackintosh for helping debug!	2025-07-09 14:26:51 +00:00
Jimmy Chen	41742ce2bd	Update `SAMPLES_PER_SLOT` to be number of custody groups instead of data columns (#7683 ) Update `SAMPLES_PER_SLOT` to be number of custody groups instead of data columns. This should have no impact on the current implementation as config currently maintains a `group:subnet:column` ratio of `1:1:1`. In short, this PR doesn't change anything for Fusaka, but ensures compliance with the spec and potential future changes. I've added separate methods to compute sampling columns and custody groups for clarity: `spec.sampling_size_columns` and `spec.sampling_size_custod_groups` See the clarifications in this PR for more details: https://github.com/ethereum/consensus-specs/pull/4251	2025-07-02 00:08:40 +00:00
Jimmy Chen	522e00f48d	Fix incorrect `waker` update condition (#7656 ) This bug was first found and partially fixed by @VolodymyrBg in #7317 - this PR applies the same fix everywhere else. The old logic updated the waker when it already matched the context, and did nothing when it was stale: ```rust if waker.will_wake(cx.waker()) { self.waker = Some(cx.waker().clone()); } ``` This is the wrong way around. We only want to update the waker if it doesn't match the current context: ```rust if !waker.will_wake(cx.waker()) { self.waker = Some(cx.waker().clone()); } ``` I don't think we've ever noticed any issues, but it’s a subtle bug that could lead to missed wakeups.	2025-06-27 19:01:46 +00:00
Jimmy Chen	83cad25d98	Fix Rust 1.88 clippy errors & execution engine tests (#7657 ) Fix Rust 1.88 clippy errors.	2025-06-27 18:21:17 +00:00
Eitan Seri-Levi	6786b9d12a	Single attestation "Full" implementation (#7444 ) #6970 This allows for us to receive `SingleAttestation` over gossip and process it without converting. There is still a conversion to `Attestation` as a final step in the attestation verification process, but by then the `SingleAttestation` is fully verified. I've also fully removed the `submitPoolAttestationsV1` endpoint as its been deprecated I've also pre-emptively deprecated supporting `Attestation` in `submitPoolAttestationsV2` endpoint. See here for more info: https://github.com/ethereum/beacon-APIs/pull/531 I tried to the minimize the diff here by only making the "required" changes. There are some unnecessary complexities with the way we manage the different attestation verification wrapper types. We could probably consolidate this to one wrapper type and refactor this even further. We could leave that to a separate PR if we feel like cleaning things up in the future. Note that I've also updated the test harness to always submit `SingleAttestation` regardless of fork variant. I don't see a problem in that approach and it allows us to delete more code :)	2025-06-17 09:01:26 +00:00
Jimmy Chen	3d2d65bf8d	Advertise `--advertise-false-custody-group-count` for testing PeerDAS (#7593 ) #6973	2025-06-16 11:10:28 +00:00
Pawan Dhananjay	9803d69d80	Implement status v2 version (#7590 ) N/A Implements status v2 as defined in https://github.com/ethereum/consensus-specs/pull/4374/	2025-06-12 07:17:06 +00:00
Pawan Dhananjay	5f208bb858	Implement basic validator custody framework (no backfill) (#7578 ) Resolves #6767 This PR implements a basic version of validator custody. - It introduces a new `CustodyContext` object which contains info regarding number of validators attached to a node and the custody count they contribute to the cgc. - The `CustodyContext` is added in the da_checker and has methods for returning the current cgc and the number of columns to sample at head. Note that the logic for returning the cgc existed previously in the network globals. - To estimate the number of validators attached, we use the `beacon_committee_subscriptions` endpoint. This might overestimate the number of validators actually publishing attestations from the node in the case of multi BN setups. We could also potentially use the `publish_attestations` endpoint to get a more conservative estimate at a later point. - Anytime there's a change in the `custody_group_count` due to addition/removal of validators, the custody context should send an event on a broadcast channnel. The only subscriber for the channel exists in the network service which simply subscribes to more subnets. There can be additional subscribers in sync that will start a backfill once the cgc changes. TODO - [ ] NOT REQUIRED: Currently, the logic only handles an increase in validator count and does not handle a decrease. We should ideally unsubscribe from subnets when the cgc has decreased. - [ ] NOT REQUIRED: Add a service in the `CustodyContext` that emits an event once `MIN_EPOCHS_FOR_BLOB_SIDECARS_REQUESTS ` passes after updating the current cgc. This event should be picked up by a subscriber which updates the enr and metadata. - [x] Add more tests	2025-06-11 18:10:06 +00:00
ethDreamer	ae30480926	Implement EIP-7892 BPO hardforks (#7521 ) [EIP-7892: Blob Parameter Only Hardforks](https://eips.ethereum.org/EIPS/eip-7892) #7467	2025-06-02 06:54:42 +00:00
Akihito Nakano	5cda6a6f9e	Mitigate flakiness in test_delayed_rpc_response (#7522 ) https://github.com/sigp/lighthouse/issues/7466 Expanded the margin from 100ms to 500ms.	2025-05-29 01:37:04 +00:00
Akihito Nakano	8989ef8fb1	Enable arithmetic lint in rate-limiter (#7025 ) https://github.com/sigp/lighthouse/issues/6875 - Enabled the linter in rate-limiter and fixed errors. - Changed the type of `Quota::max_tokens` from `u64` to `NonZeroU64` because `max_tokens` cannot be zero. - Added a test to ensure that a large value for `tokens`, which causes an overflow, is handled properly.	2025-05-27 15:43:22 +00:00
Akihito Nakano	a2797d4bbd	Fix formatting errors from cargo-sort (#7512 ) [cargo-sort is currently failing on CI](https://github.com/sigp/lighthouse/actions/runs/15198128212/job/42746931918?pr=7025), likely due to new checks introduced in version [2.0.0](https://github.com/DevinR528/cargo-sort/releases/tag/v2.0.0). Fixed the errors by running cargo-sort with formatting enabled.	2025-05-23 05:25:56 +00:00
Akihito Nakano	cf0f959855	Improve log readability during rpc_tests (#7180 ) It is unclear from the logs during rpc_tests whether the output comes from the sender or the receiver. ``` 2025-03-20T11:21:50.038868Z DEBUG rpc_tests: Sending message 2 2025-03-20T11:21:50.041129Z DEBUG rpc_tests: Sender received a response 2025-03-20T11:21:50.041242Z DEBUG rpc_tests: Chunk received 2025-03-20T11:21:51.040837Z DEBUG rpc_tests: Sending message 3 2025-03-20T11:21:51.043635Z DEBUG rpc_tests: Sender received a response 2025-03-20T11:21:51.043855Z DEBUG rpc_tests: Chunk received 2025-03-20T11:21:52.043427Z DEBUG rpc_tests: Sending message 4 2025-03-20T11:21:52.052831Z DEBUG rpc_tests: Sender received a response 2025-03-20T11:21:52.052953Z DEBUG rpc_tests: Chunk received 2025-03-20T11:21:53.045589Z DEBUG rpc_tests: Sending message 5 2025-03-20T11:21:53.052718Z DEBUG rpc_tests: Sender received a response 2025-03-20T11:21:53.052825Z DEBUG rpc_tests: Chunk received 2025-03-20T11:21:54.049157Z DEBUG rpc_tests: Sending message 6 2025-03-20T11:21:54.058072Z DEBUG rpc_tests: Sender received a response 2025-03-20T11:21:54.058603Z DEBUG rpc_tests: Chunk received 2025-03-20T11:21:55.018822Z DEBUG Swarm::poll: libp2p_gossipsub::behaviour: Starting heartbeat 2025-03-20T11:21:55.018953Z DEBUG Swarm::poll: libp2p_gossipsub::behaviour: Completed Heartbeat 2025-03-20T11:21:55.027100Z DEBUG Swarm::poll: libp2p_gossipsub::behaviour: Starting heartbeat 2025-03-20T11:21:55.027199Z DEBUG Swarm::poll: libp2p_gossipsub::behaviour: Completed Heartbeat ``` Added `info_span` to both the sender and receiver in each test. ``` 2025-03-20T11:20:04.172699Z DEBUG Receiver: rpc_tests: Sending message 2 2025-03-20T11:20:04.179147Z DEBUG Sender: rpc_tests: Sender received a response 2025-03-20T11:20:04.179281Z DEBUG Sender: rpc_tests: Chunk received 2025-03-20T11:20:05.175300Z DEBUG Receiver: rpc_tests: Sending message 3 2025-03-20T11:20:05.177202Z DEBUG Sender: rpc_tests: Sender received a response 2025-03-20T11:20:05.177292Z DEBUG Sender: rpc_tests: Chunk received 2025-03-20T11:20:06.176868Z DEBUG Receiver: rpc_tests: Sending message 4 2025-03-20T11:20:06.179379Z DEBUG Sender: rpc_tests: Sender received a response 2025-03-20T11:20:06.179460Z DEBUG Sender: rpc_tests: Chunk received 2025-03-20T11:20:07.179257Z DEBUG Receiver: rpc_tests: Sending message 5 2025-03-20T11:20:07.181386Z DEBUG Sender: rpc_tests: Sender received a response 2025-03-20T11:20:07.181503Z DEBUG Sender: rpc_tests: Chunk received 2025-03-20T11:20:08.181428Z DEBUG Receiver: rpc_tests: Sending message 6 2025-03-20T11:20:08.190231Z DEBUG Sender: rpc_tests: Sender received a response 2025-03-20T11:20:08.190358Z DEBUG Sender: rpc_tests: Chunk received 2025-03-20T11:20:09.151699Z DEBUG Sender:Swarm::poll: libp2p_gossipsub::behaviour: Starting heartbeat 2025-03-20T11:20:09.151748Z DEBUG Sender:Swarm::poll: libp2p_gossipsub::behaviour: Completed Heartbeat 2025-03-20T11:20:09.160244Z DEBUG Receiver:Swarm::poll: libp2p_gossipsub::behaviour: Starting heartbeat 2025-03-20T11:20:09.160288Z DEBUG Receiver:Swarm::poll: libp2p_gossipsub::behaviour: Completed Heartbeat ```	2025-05-22 02:51:25 +00:00
Akihito Nakano	a8035d7395	Enable stdout logging in rpc_tests (#7506 ) Currently `test_delayed_rpc_response` is flaky (possibly specific to Windows?), but I'm not sure why. Enabled stdout logging in rpc_tests. Note that in nextest, std output is only displayed when a test fails.	2025-05-22 02:14:48 +00:00
SunnysidedJ	593390162f	`peerdas-devnet-7`: update `DataColumnSidecarsByRoot` request to use `DataColumnsByRootIdentifier` (#7399 ) Update DataColumnSidecarsByRoot request to use DataColumnsByRootIdentifier #7377 As described in https://github.com/ethereum/consensus-specs/pull/4284	2025-05-12 00:20:55 +00:00
Lion - dapplion	a497ec601c	Retry custody requests after peer metadata updates (#6975 ) Closes https://github.com/sigp/lighthouse/issues/6895 We need sync to retry custody requests when a peer CGC updates. A higher CGC can result in a data column subnet peer count increasing from 0 to 1, allowing requests to happen. Add new sync event `SyncMessage::UpdatedPeerCgc`. It's sent by the router when a metadata response updates the known CGC	2025-05-09 08:27:17 +00:00
Lion - dapplion	beb0ce68bd	Make range sync peer loadbalancing PeerDAS-friendly (#6922 ) - Re-opens https://github.com/sigp/lighthouse/pull/6864 targeting unstable Range sync and backfill sync still assume that each batch request is done by a single peer. This assumption breaks with PeerDAS, where we request custody columns to N peers. Issues with current unstable: - Peer prioritization counts batch requests per peer. This accounting is broken now, data columns by range request are not accounted - Peer selection for data columns by range ignores the set of peers on a syncing chain, instead draws from the global pool of peers - The implementation is very strict when we have no peers to request from. After PeerDAS this case is very common and we want to be flexible or easy and handle that case better than just hard failing everything. - [x] Upstream peer prioritization to the network context, it knows exactly how many active requests a peer (including columns by range) - [x] Upstream peer selection to the network context, now `block_components_by_range_request` gets a set of peers to choose from instead of a single peer. If it can't find a peer, it returns the error `RpcRequestSendError::NoPeer` - [ ] Range sync and backfill sync handle `RpcRequestSendError::NoPeer` explicitly - [ ] Range sync: leaves the batch in `AwaitingDownload` state and does nothing. TODO: we should have some mechanism to fail the chain if it's stale for too long - EDIT: Not done in this PR - [x] Backfill sync: pauses the sync until another peer joins - EDIT: Same logic as unstable ### TODOs - [ ] Add tests :) - [x] Manually test backfill sync Note: this touches the mainnet path!	2025-05-07 02:03:07 +00:00
Akihito Nakano	1324d3d3c4	Delayed RPC Send Using Tokens (#5923 ) closes https://github.com/sigp/lighthouse/issues/5785 The diagram below shows the differences in how the receiver (responder) behaves before and after this PR. The following sentences will detail the changes. ```mermaid flowchart TD subgraph "* After " Start2([START]) --> AA[Receive request] AA --> COND1{Is there already an active request <br> with the same protocol?} COND1 --> \|Yes\| CC[Send error response] CC --> End2([END]) %% COND1 --> \|No\| COND2{Request is too large?} %% COND2 --> \|Yes\| CC COND1 --> \|No\| DD[Process request] DD --> EE{Rate limit reached?} EE --> \|Yes\| FF[Wait until tokens are regenerated] FF --> EE EE --> \|No\| GG[Send response] GG --> End2 end subgraph " Before *" Start([START]) --> A[Receive request] A --> B{Rate limit reached <br> or <br> request is too large?} B -->\|Yes\| C[Send error response] C --> End([END]) B -->\|No\| E[Process request] E --> F[Send response] F --> End end ``` ### `Is there already an active request with the same protocol?` This check is not performed in `Before`. This is taken from the PR in the consensus-spec, which proposes updates regarding rate limiting and response timeout. https://github.com/ethereum/consensus-specs/pull/3767/files > The requester MUST NOT make more than two concurrent requests with the same ID. The PR mentions the requester side. In this PR, I introduced the `ActiveRequestsLimiter` for the `responder` side to restrict more than two requests from running simultaneously on the same protocol per peer. If the limiter disallows a request, the responder sends a rate-limited error and penalizes the requester. ### `Rate limit reached?` and `Wait until tokens are regenerated` UPDATE: I moved the limiter logic to the behaviour side. https://github.com/sigp/lighthouse/pull/5923#issuecomment-2379535927 ~~The rate limiter is shared between the behaviour and the handler. (`Arc<Mutex<RateLimiter>>>`) The handler checks the rate limit and queues the response if the limit is reached. The behaviour handles pruning.~~ ~~I considered not sharing the rate limiter between the behaviour and the handler, and performing all of these either within the behaviour or handler. However, I decided against this for the following reasons:~~ - ~~Regarding performing everything within the behaviour: The behaviour is unable to recognize the response protocol when `RPC::send_response()` is called, especially when the response is `RPCCodedResponse::Error`. Therefore, the behaviour can't rate limit responses based on the response protocol.~~ - ~~Regarding performing everything within the handler: When multiple connections are established with a peer, there could be multiple handlers interacting with that peer. Thus, we cannot enforce rate limiting per peer solely within the handler. (Any ideas? 🤔 )~~	2025-04-24 03:46:16 +00:00
Mac L	39eb8145f8	Merge branch 'release-v7.0.0' into unstable	2025-04-11 21:32:24 +10:00
Pawan Dhananjay	076f3f0984	Clarify network limits (#7175 ) Resolves #6811 Rename `GOSSIP_MAX_SIZE` to `MAX_PAYLOAD_SIZE` and remove `MAX_CHUNK_SIZE` in accordance with the spec. The spec also "clarifies" the message size limits at different levels. The rpc limits are equivalent to what we had before imo. The gossip limits have additional checks. I have gotten rid of the `is_bellatrix_enabled` checks that used a lower limit (1mb) pre-merge. Since all networks we run start from the merge, I don't think this will break any setups.	2025-04-09 02:50:45 +00:00
Mac L	0e6da0fcaf	Merge branch 'release-v7.0.0' into v7-backmerge	2025-04-04 13:32:58 +11:00
Mac L	82d1674455	Rust 1.86.0 lints (#7254 ) Implement lints for the new Rust compiler version 1.86.0.	2025-04-04 02:30:22 +00:00

1 2 3 4 5 ...

485 Commits