Previously only supernode contributes to data column publishing in Lighthouse.
Recently we've [updated the spec](https://github.com/ethereum/consensus-specs/pull/4183) to have full nodes publishing data columns as well, to ensure all nodes contributes to propagation.
This also prevents already imported data columns from being imported again (because we don't "observe" them), and ensures columns that are observed in the [gossip seen cache](d60c24ef1c/beacon_node/beacon_chain/src/data_column_verification.rs (L492)) are forwarded to its peers, rather than being ignored.
Having merged the drop-headtracker PR we now have a DB schema change in `unstable` compared to `release-v7.0.0`:
- https://github.com/sigp/lighthouse/pull/6744
There is a DB downgrade available, however this needs to be applied manually and it's usually a bit of a hassle.
This PR bumps the version on `unstable` to `v7.1.0-beta.0` _without_ actually cutting a `v7.1.0-beta.0` release, so that we can tell at a glance which schema version a node is using.
The head tracker is a persisted piece of state that must be kept in sync with the fork-choice. It has been a source of pruning issues in the past, so we want to remove it
- see https://github.com/sigp/lighthouse/issues/1785
When implementing tree-states in the hot DB we have to change the pruning routine (more details below) so we want to do those changes first in isolation.
- see https://github.com/sigp/lighthouse/issues/6580
- If you want to see the full feature of tree-states hot https://github.com/dapplion/lighthouse/pull/39
Closes https://github.com/sigp/lighthouse/issues/1785
**Current DB migration routine**
- Locate abandoned heads with head tracker
- Use a roots iterator to collect the ancestors of those heads can be pruned
- Delete those abandoned blocks / states
- Migrate the newly finalized chain to the freezer
In summary, it computes what it has to delete and keeps the rest. Then it migrates data to the freezer. If the abandoned forks routine has a bug it can break the freezer migration.
**Proposed migration routine (this PR)**
- Migrate the newly finalized chain to the freezer
- Load all state summaries from disk
- From those, just knowing the head and finalized block compute two sets: (1) descendants of finalized (2) newly finalized chain
- Iterate all summaries, if a summary does not belong to set (1) or (2), delete
This strategy is more sound as it just checks what's there in the hot DB, computes what it has to keep and deletes the rest. Because it does not rely and 3rd pieces of data we can drop the head tracker and pruning checkpoint. Since the DB migration happens **first** now, as long as the computation of the sets to keep is correct we won't have pruning issues.
N/A
Return state.eth1_data() early if we have passed the transition period post electra. Even if we don't return early, the function would still return state.eth1_data() based on the current conditions. However, doing this explicitly here to match the spec. This covers setting the right eth1_data in our block.
The other thing we need to ensure is that the deposits returned by the eth1_chain is empty post transition.
The only way we get non-empty deposits post the transition is if `state.eth1_deposit_index` in the below code is less than `min(deposit_requests_start_index, state.eth1_data().deposit_count)`.
0850bcfb89/beacon_node/beacon_chain/src/eth1_chain.rs (L543-L579)
This can never happen because state.eth1_deposit_index will be equal to state.eth1_data.deposit count and cannot exceed the value.
@michaelsproul @ethDreamer please double check the logic for deposits being empty post transition. Following the logic in the spec makes my head hurt.
Partially #6989.
This PR adds the missing error log when a batch fails due to issues with converting the response into `RpcBlock`. See the above linked issue for more details.
Adding this log reveals that we're completing range requests with missing columns, hence causing the batch to fail. It looks like we've hit the case where we've received enough stream terminations, but not all columns are returned.
```
Feb 12 06:12:16.558 DEBG Failed to convert range block components into RpcBlock, error: No column for block 0xc5b6c7fa02f5ef603d45819c08c6519f1dba661fd5d44a2fc849d3e7028b6007 index 18, id: 3456/RangeSync/116/3432, service: sync, module: network::sync::network_context:488
```
I've also removed some redundant `id` logging, as the `id` debug representation is difficult to read, and is now being logged as part of `req_id` in a more succinct format (relevant PR: #6914)
From testing conducted by Sunnyside Labs, they noticed that the "expected blobs" are quite low on bandwidth constrained nodes. This observation revealed that we don't record the `beacon_blobs_from_el_expected_total` metric at all if the EL doesn't return any response. The fetch blobs function returns without recording the metric.
To fix this, I've moved `BLOBS_FROM_EL_EXPECTED_TOTAL` and `BLOBS_FROM_EL_RECEIVED_TOTAL` to as early as possible, to make the metric more accurate.
#7226
Checks whether the application is running in a terminal, or in non-interactive mode (e.g. using systemd). It will then set the value of `--log-color` to `false` when running non-interactively.
I've been working at updating another library to latest Lighthouse and got very confused with RPC request Ids.
There were types that had fields called `request_id` and `id`. And interchangeably could have types `PeerRequestId`, `rpc::RequestId`, `AppRequestId`, `api_types::RequestId` or even `Request.id`.
I couldn't keep track of which Id was linked to what and what each type meant.
So this PR mainly does a few things:
- Changes the field naming to match the actual type. So any field that has an `AppRequestId` will be named `app_request_id` rather than `id` or `request_id` for example.
- I simplified the types. I removed the two different `RequestId` types (one in Lighthouse_network the other in the rpc) and grouped them into one. It has one downside tho. I had to add a few unreachable lines of code in the beacon processor, which the extra type would prevent, but I feel like it might be worth it. Happy to add an extra type to avoid those few lines.
- I also removed the concept of `PeerRequestId` which sometimes went alongside a `request_id`. There were times were had a `PeerRequest` and a `Request` being returned, both of which contain a `RequestId` so we had redundant information. I've simplified the logic by removing `PeerRequestId` and made a `ResponseId`. I think if you look at the code changes, it simplifies things a bit and removes the redundant extra info.
I think with this PR things are a little bit easier to reasonable about what is going on with all these RPC Ids.
NOTE: I did this with the help of AI, so probably should be checked
#7153#7146#7147#7148 -> Thanks to @ackintosh
This PR does the following:
1. Disable logging to file when using either `--logfile-max-number 0` or `--logfile-max-size 0`. Note that disabling the log file in this way will also disable `discv5` and `libp2p` logging.
1. `discv5` and `libp2p` logging will be disabled by default unless running `beacon_node` or `boot_node`. This also should fix the VC panic we were seeing.
1. Removes log rotation and compression from `libp2p` and `discv5` logs. It is now limited to 1 file and will rotate based on the value of the `--logfile-max-size` flag. We could potentially add flags specifically to control the size/number of these, however I felt a single log file was sufficient. Perhaps @AgeManning has opinions about this?
1. Removes all dependency logging and references to `dep_log`.
1. Introduces workspace filtering to file and stdout. This explicitly allows logs from members of the Lighthouse workspace, disallowing all others. It uses a proc macro which pulls the member list from cargo metadata at compile time. This might be over-engineered but my hope is that this list will not require maintenance.
1. Unifies file and stdout JSON format. With slog, the formats were slightly different. @threehrsleep worked to maintain that format difference, to ensure there was no breaking changes. If these format differences are actually problematic we can restore it, however I felt the added complexity wasn't worth it.
1. General code improvements and cleanup.
Timeouts sometimes occur on downloading the Holeksy genesis state from AWS, we've had reputable outside reports on this.
It's around 200MB and hosted in APAC, it makes sense to bump the default, at least for Holesky.
Bump default timeout from 180 to 300 secs
N/A
Adds endpoints to add and remove trusted peers from the http api. The added peers are trusted peers so they won't be disconnected for bad scores. We try to maintain a connection to the peer in case they disconnect from us by trying to dial it every heartbeat.
This is a workaround for #7216
In the case of gaps between the in-memory pub key cache and its on-disk representation, use the head state on startup to "top-up" the cache/db w/ any missing validators
- Part of https://github.com/sigp/lighthouse/issues/6767
Validator custody makes the CGC and set of sampling columns dynamic. Right now this information is stored twice:
- in the data availability checker
- in the network globals
If that state becomes dynamic we must make sure it is in sync updating it twice, or guarding it behind a mutex. However, I noted that we don't really have to keep the CGC inside the data availability checker. All consumers can actually read it from the network globals, and we can update `make_available` to read the expected count of data columns from the block.
Even though the `consensus/types` crate has a feature named `sqlite`, it unconditionally depends on the `rusqlite` crate, which then depends on the `sqlite` crate — even when the feature is disabled. When the feature is disabled, the code that imports from `rusqlite` is disabled, so this dependency is not needed when the feature is disabled.
This is not a problem for Lighthouse itself, but I’m interested in using the types defined here in a different Rust project, which depends on a conflicting version of the `sqlite` crate.
Ensure that the dependency on `rusqlite` is only present when the `sqlite` feature is enabled.
* #6447
- Move some deprecated pages to a new section under `Archived`
- Remove fallback log in mev as the log will not be present after VC using `/eth/v3/validator/blocks` endpoint by default
- Add warning against using Btrfs file system (thank you @ChosunOne for the report)
- Add data shared by @mcdee on tree states API queries time
- Rename partial withdrawals to validator sweep to differentiate it from the upcoming execution layer partial withdrawals
- Update NAT API response
- Update docs on IPv6
- Rename .md files to follow a standard prefix section name, e.g., installation_*.md, advanced_*.md
- Standardise .md files using underscore `_` instead of hyphen `-` to be consistent with other files naming conventions.
Getting this when running lcli with `--osaka-time`:
```
error: unexpected argument '--osaka-time' found
tip: a similar argument exists: '--shanghai-time'
```
This PR adds the missing `--osaka-time` option to `lcli`.
Part of
- https://github.com/sigp/lighthouse/issues/6258
`RangeBlockComponentsRequest` handles a set of by_range requests. It's quite lose on these requests, not tracking them by ID. We want to implement individual request retries, so we must make `RangeBlockComponentsRequest` aware of its requests IDs. We don't want the result of a prior by_range request to affect the state of a future retry. Lookup sync uses this mechanism.
Now `RangeBlockComponentsRequest` tracks:
```rust
pub struct RangeBlockComponentsRequest<E: EthSpec> {
blocks_request: ByRangeRequest<BlocksByRangeRequestId, Vec<Arc<SignedBeaconBlock<E>>>>,
block_data_request: RangeBlockDataRequest<E>,
}
enum RangeBlockDataRequest<E: EthSpec> {
NoData,
Blobs(ByRangeRequest<BlobsByRangeRequestId, Vec<Arc<BlobSidecar<E>>>>),
DataColumns {
requests: HashMap<
DataColumnsByRangeRequestId,
ByRangeRequest<DataColumnsByRangeRequestId, DataColumnSidecarList<E>>,
>,
expected_custody_columns: Vec<ColumnIndex>,
},
}
enum ByRangeRequest<I: PartialEq + std::fmt::Display, T> {
Active(I),
Complete(T),
}
```
I have merged `is_finished` and `Into_responses` into the same function. Otherwise, we need to duplicate the logic to figure out if the requests are done.
Cross builds were failing since:
- https://github.com/sigp/lighthouse/pull/7086
This seems to have been due to a regression upstream in `ring` which is noted in the v0.17.14 release notes. I'm hoping that updating remedies it.
> Compatibility with GNU binutils 2.29 (used on Amazon Linux 2), and probably even earlier versions, was restored. It is expected that ring 0.17.14 will build on all the systems that 0.17.12 would build on.
https://github.com/briansmith/ring/blob/main/RELEASES.md#version-01714-2025-03-11
Backport of:
- https://github.com/sigp/lighthouse/pull/7067
For:
- https://github.com/sigp/lighthouse/issues/7039
- Prevent writing to state cache when migrating the database
- Add `state-cache-headroom` flag to control pruning
- Prune old epoch boundary states ahead of mid-epoch states
- Never prune head block's state
- Avoid caching ancestor states unless they are on an epoch boundary
- Log when states enter/exit the cache
Co-authored-by: Eitan Seri-Levi <eserilev@ucsc.edu>
https://github.com/sigp/lighthouse/issues/7146
Removes `filter_layer` from the builder as this was acting as a "global minimum". We don't actually need this, since we are using more granular control in `stdout_logging_layer` and `file_logging_layer`. Removing this restores control of the logfiles level back to the `--logfile-debug-level` flag (and defaults to debug).