#7461 and partly #6439.
Desired behaviour after receiving `engine_getBlobs` response:
1. Gossip verify the blobs and proofs, but don't mark them as observed yet. This is because not all blobs are published immediately (due to staggered publishing). If we mark them as observed and not publish them, we could end up blocking the gossip propagation.
2. Blobs are marked as observed _either_ when:
* They are received from gossip and forwarded to the network .
* They are published by the node.
Current behaviour:
- ❗ We only gossip verify `engine_getBlobsV1` responses, but not `engine_getBlobsV2` responses (PeerDAS).
- ❗ After importing EL blobs AND before they're published, if the same blobs arrive via gossip, they will get re-processed, which may result in a re-import.
1. Perform gossip verification on data columns computed from EL `getBlobsV2` response. We currently only do this for `getBlobsV1` to prevent importing blobs with invalid proofs into the `DataAvailabilityChecker`, this should be done on V2 responses too.
2. Add additional gossip verification to make sure we don't re-process a ~~blob~~ or data column that was imported via the EL `getBlobs` but not yet "seen" on the gossip network. If an "unobserved" gossip blob is found in the availability cache, then we know it has passed verification so we can immediately propagate the `ACCEPT` result and forward it to the network, but without re-processing it.
**UPDATE:** I've left blobs out for the second change mentioned above, as the likelihood and impact is very slow and we haven't seen it enough, but under PeerDAS this issue is a regular occurrence and we do see the same block getting imported many times.
Update `engine_getBlobsV2` response type to `Option<Vec<BlobsAndProofV2>>`. See recent spec change [here](https://github.com/ethereum/execution-apis/pull/630).
Added some tests to cover basic fetch blob scenarios.
Currently `test_delayed_rpc_response` is flaky (possibly specific to Windows?), but I'm not sure why.
Enabled stdout logging in rpc_tests. Note that in nextest, std output is only displayed when a test fails.
Use slice.is_sorted which was stabilised in Rust 1.82.0
I thought there would be more places we could use this, but it seems we often want to check strict monotonicity (i.e. sorted + no duplicates)
We would like to reuse the `notifier` and `latency_service` in Anchor. To make this possible, this PR moves these from `validator_client` to `validator_services` and makes them use the new `ValidatorStore` trait is used so that the code can be reused in Anchor.
While the Lighthouse implementation of the `ValidatorStore` does not really care about blobs, Anchor needs to be able to return different blobs from `sign_blocks` than what was passed into it, in case it decides to sign another Anchor node's block. Only passing the unsigned block into `sign_block` and only returning a signed block from it (without any blobs and proofs) was an oversight in #6705.
- Replace `validator_store::{Uns,S}ignedBlock` with `validator_store::block_service::{Uns,S}ignedBlock`, as we need all data in there.
- In `lighthouse_validator_store`, just add the received blobs back to the signed block after signing it.
Closes:
- https://github.com/sigp/lighthouse/issues/7489
Use `ForkName::latest_stable`, i.e. Electra when decoding blobs from a server that does not provide `version`. This is only a temporary workaround that should be reverted once `checkpointz` is fixed. Having a default fork is potentially incorrect, and glossing over bugs in servers in general is not ideal.
However, even in the case where we update `ForkName::latest_stable` to `Fulu`, this should continue to work, as the blob limit is likely to increase and the `RuntimeVariableList` will just have a slightly higher limit than necessary (which is OK so long as the server isn't buggy enough to violate the correct lower bound: e.g. if the block is an Electra one and the server sends 10 blobs, which exceeds the Electra max (9) but not the Fulu max).
The endpoint `/eth/v1/beacon/states/head/validator_balances` returns an empty data when the data field is `[]`. According to the beacon API spec, it should return the balances of all validators:
Reference: https://ethereum.github.io/beacon-APIs/#/Beacon/postStateValidatorBalances
`If the supplied list is empty (i.e. the body is []) or no body is supplied then balances will be returned for all validators.`
This PR changes so that: `curl -X 'POST' 'http://localhost:5052/eth/v1/beacon/states/head/validator_balances' -d '[]' | jq` returns balances of all validators.
Closes#5016
The op pool was using the wrong denominator when calculating proposer block rewards! This was mostly inconsequential as our studies of Lighthouse's block profitability already showed that it is very close to optimal. The wrong denominator was leftover from phase0 code, and wasn't properly updated for Altair.
- Small revision in Siren documentation in: `book/src/ui_installation.md`
- Add a section about slashing protection in web3signer, as per: https://github.com/sigp/lighthouse/issues/5310 in: `book/src/advanced_web3signer.md`
- Add a presign option in `lighthouse account validator exit` in `book/src/validator_voluntary_exit.md`
- Replace 'Holesky' with 'Hoodi' in all related parts in Lighthouse book
- Add https://ethpandaops.io/posts/kurtosis-deep-dive/ to local testnet documentation
Which issue # does this PR address?
This PR addresses reproducible builds. The current dockerfile builds the lighthouse binary but not reproducibly.
You can verify that by following these steps:
```
docker build --no-cache --output=. .
mv usr/local/bin/lighthouse lighthouse1
rm usr/local/bin/lighthouse
docker build --no-cache --output=. .
mv usr/local/bin/lighthouse lighthouse2
sha256sum lighthouse1 lighthouse2
```
You will notice that each one of the binaries has a different checksum upon each build. This is critical for systems that depends on requiring reproducible builds, such as running lighthouse in confidential computing, like Intel TDX.
This PR adds a new build profile as well as a Dockerfile.reproducible that enables building the lighthouse binary reproducibly.
By following the steps I listed above, you will be able to verify that the resulted binary has the same hash upon several subsequent builds for the same version.
How to test it:
```
mkdir output1 output2
docker build --no-cache -f Dockerfile.reproducible --output=output1 .
docker build --no-cache -f Dockerfile.reproducible --output=output2 .
sha256sum output1/lighthouse output2/lighthouse
# hashes should be identical
rm -rf output1 output2
```
Resolves https://github.com/sigp/lighthouse/issues/7414
The health endpoint returns a 503 if the engine state is offline. The default state for the engine is `Offline`. So until the first request to the EL is made and the state is updated, the health endpoint will keep returning 503s.
This PR changes the default state to Online to avoid that. I don't think this causes any issues because in case the EL is actually offline, the first fcu will set the state to offline.
Pending testing on kurtosis.
Fix clippy lints for `rustc` 1.87
clippy complains about `BeaconChainError` being too large. I went on a bit of a boxing spree because of this. We may instead want to `Box` some of the `BeaconChainError` variants?
This PR relates to:
- https://github.com/sigp/lighthouse/pull/7199
- -> workspace_filter has been enabled (dependency logging has been disabled)
- https://github.com/sigp/lighthouse/pull/7394
- -> file logging has been optionally enabled
Building on these, this PR enables dependency logging for the simulators. The logs are written to separate files.
The libp2p/discv5 logs:
- are saved to the directory specified with `--log-dir`
- respects the `RUST_LOG` environment variable for log level configuration
Which issue # does this PR address?
Closes#7457
Added `E::slots_per_epoch()` and now it ensures conversion from epochs to slots while calculating deneb time
Workaround/fix for:
- https://github.com/sigp/lighthouse/issues/7323
- Remove the `StateSummariesNotContiguousError`. This allows us to continue with DAG construction and pruning, even in the case where the DAG is disjointed. We will treat any disjoint summaries as roots of their own tree, and prune them (as they are not descended from finalized). This should be safe, as canonical summaries should not be disjoint (if they are, then the DB is already corrupt).
Beacon logs in the simulator are printed only to stdout. The logs are usually large, so persisting them would be helpful for debugging.
Added `--log-dir` parameter to the simulators and a step to upload the logs to Artifacts.
(Update)
Added `--disable-stdout-logging` to disable stdout logging, making the CI page cleaner.
Closes https://github.com/sigp/lighthouse/issues/6895
We need sync to retry custody requests when a peer CGC updates. A higher CGC can result in a data column subnet peer count increasing from 0 to 1, allowing requests to happen.
Add new sync event `SyncMessage::UpdatedPeerCgc`. It's sent by the router when a metadata response updates the known CGC
This PR adds transitions to Electra ~~and Fulu~~ fork epochs in the simulator tests.
~~It also covers blob inclusion verification and data column syncing on a full node in Fulu.~~
UPDATE: Remove fulu fork from sim tests due to https://github.com/sigp/lighthouse/pull/7199#issuecomment-2852281176
Don't publish data columns reconstructed from RPC columns to the gossip network, as this may result in peer downscoring if we're sending columns from past slots.
Prevent running `lighthouse vc --http-port <PORT>` without `--http`.
Issue: https://github.com/sigp/lighthouse/issues/7402
Added requires `--http` when using `lighthouse vc --http-port <PORT>`.
Implemented a test code for this issue.
- Create trait `ValidatorStore` with all functions used by the `validator_services`
- Make `validator_services` generic on `S: ValidatorStore`
- Introduce `LighthouseValidatorStore`, which has identical functionality to the old `ValidatorStore`
- Remove dependencies (especially `environment`) from `validator_services` and `beacon_node_fallback` in order to be able to cleanly use them in Anchor
- Re-opens https://github.com/sigp/lighthouse/pull/6864 targeting unstable
Range sync and backfill sync still assume that each batch request is done by a single peer. This assumption breaks with PeerDAS, where we request custody columns to N peers.
Issues with current unstable:
- Peer prioritization counts batch requests per peer. This accounting is broken now, data columns by range request are not accounted
- Peer selection for data columns by range ignores the set of peers on a syncing chain, instead draws from the global pool of peers
- The implementation is very strict when we have no peers to request from. After PeerDAS this case is very common and we want to be flexible or easy and handle that case better than just hard failing everything.
- [x] Upstream peer prioritization to the network context, it knows exactly how many active requests a peer (including columns by range)
- [x] Upstream peer selection to the network context, now `block_components_by_range_request` gets a set of peers to choose from instead of a single peer. If it can't find a peer, it returns the error `RpcRequestSendError::NoPeer`
- [ ] Range sync and backfill sync handle `RpcRequestSendError::NoPeer` explicitly
- [ ] Range sync: leaves the batch in `AwaitingDownload` state and does nothing. **TODO**: we should have some mechanism to fail the chain if it's stale for too long - **EDIT**: Not done in this PR
- [x] Backfill sync: pauses the sync until another peer joins - **EDIT**: Same logic as unstable
### TODOs
- [ ] Add tests :)
- [x] Manually test backfill sync
Note: this touches the mainnet path!
Debugging an sync issue from @pawanjay176 I'm missing some key info where instead of logging the ID of the SyncingChain we just log "Finalized" (the sync type). This looks like some typo or something was lost in translation when refactoring things.
```
Apr 17 12:12:00.707 DEBUG Syncing new finalized chain chain: Finalized, component: "range_sync"
```
This log should include more info about the new chain but just logs "Finalized"
```
Apr 17 12:12:00.810 DEBUG New chain added to sync peer_id: "16Uiu2HAmHP8QLYQJwZ4cjMUEyRgxzpkJF87qPgNecLTpUdruYbdA", sync_type: Finalized, new_chain: Finalized, component: "range_sync"
```
- Remove the Display impl and log the ID explicitly for all logs.
- Log more details when creating a new SyncingChain
When we perform data column gossip verification, we sometimes see multiple proposer shuffling cache miss simultaneously and this results in multiple threads computing the shuffling cache and potentially slows down the gossip verification.
Proposal here is to use a `OnceCell` for each shuffling key to make sure it's only computed once. I have only implemented this in data column verification as a PoC, but this can also be applied to blob and block verification
Related issues:
- https://github.com/sigp/lighthouse/issues/4447
- https://github.com/sigp/lighthouse/issues/7203
Changes the endpoint to get fallback health information from `/lighthouse/ui/fallback_health` to `/lighthouse/beacon/health`. This more accurately describes that the endpoint is related to the connected beacon nodes and also matched the `/lighthouse/beacon/update` endpoint being added in #6551.
Adds documentation for both fallback health and the endpoint to the Lighthouse book.