Addresses #8218
A simplified version of #8241 for the initial release.
I've tried to minimise the logic change in this PR, although introducing the `NodeCustodyType` enum still result in quite a bit a of diff, but the actual logic change in `CustodyContext` is quite small.
The main changes are in the `CustdoyContext` struct
* ~~combining `validator_custody_count` and `current_is_supernode` fields into a single `custody_group_count_at_head` field. We persist the cgc of the initial cli values into the `custody_group_count_at_head` field and only allow for increase (same behaviour as before).~~
* I noticed the above approach caused a backward compatibility issue, I've [made a fix](15569bc085) and changed the approach slightly (which was actually what I had originally in mind):
* when initialising, only override the `validator_custody_count` value if either flag `--supernode` or `--semi-supernode` is used; otherwise leave it as the existing default `0`. Most other logic remains unchanged.
All existing validator custody unit tests are still all passing, and I've added additional tests to cover semi-supernode, and restoring `CustodyContext` from disk.
Note: I've added a `WARN` if the user attempts to switch to a `--semi-supernode` or `--supernode` - this currently has no effect, but once @eserilev column backfill is merged, we should be able to support this quite easily.
Things to test
- [x] cgc in metadata / enr
- [x] cgc in metrics
- [x] subscribed subnets
- [x] getBlobs endpoint
Co-Authored-By: Jimmy Chen <jchen.tc@gmail.com>
Address contention on the store's `block_cache` by allowing it to be disabled when `--block-cache-size 0` is provided, and also making this the default.
Co-Authored-By: Michael Sproul <michael@sigmaprime.io>
Allows users to customize the OpenTelemetry service name instead of using the hardcoded default `lighthouse`. Defaults to 'lighthouse-bn' for beacon node, 'lighthouse-vc' for validator client, or 'lighthouse' for other subcommands.
This is useful when analysing traces from multiple nodes, see Grafana screenshot below with service name overrides in Kurtosis (`ethereum-package` PR: https://github.com/ethpandaops/ethereum-package/pull/1160):
<img width="1148" height="627" alt="image" src="https://github.com/user-attachments/assets/7e875639-10f7-4756-837f-2006fa4b12e0" />
N/A
Add a flag to disable get blobs. I configured the flag to disable it regardless of version because its most likely something we use for testing anyway.
#7815
- removes all existing spans, so some span fields that appear in logs like `service_name` may be lost.
- instruments a few key code paths in the beacon node, starting from **root spans** named below:
* Gossip block and blobs
* `process_gossip_data_column_sidecar`
* `process_gossip_blob`
* `process_gossip_block`
* Rpc block and blobs
* `process_rpc_block`
* `process_rpc_blobs`
* `process_rpc_custody_columns`
* Rpc blocks (range and backfill)
* `process_chain_segment`
* `PendingComponents` lifecycle
* `pending_components`
To test locally:
* Run Grafana and Tempo with https://github.com/sigp/lighthouse-metrics/pull/57
* Run Lighthouse BN with `--telemetry-collector-url http://localhost:4317`
Some captured traces can be found here: https://hackmd.io/@jimmygchen/r1sLOxPPeg
Removing the old spans seem to have reduced the memory usage quite a lot - i think we were using them on long running tasks and too excessively:
<img width="910" height="495" alt="image" src="https://github.com/user-attachments/assets/5208bbe4-53b2-4ead-bc71-0b782c788669" />
N/A
After the electra fork which includes EIP 6110, the beacon node no longer needs the eth1 bridging mechanism to include new deposits as they are provided by the EL as a `deposit_request`. So after electra + a transition period where the finalized bridge deposits pre-fork are included through the old mechanism, we no longer need the elaborate machinery we had to get deposit contract data from the execution layer.
Since holesky has already forked to electra and completed the transition period, this PR basically checks to see if removing all the eth1 related logic leads to any surprises.
Closes:
- https://github.com/sigp/lighthouse/issues/7363
- Change default state cache size back to 128.
- Make state pruning properly LRU rather than MSU after skipping the cull-exempt states.
Timeouts sometimes occur on downloading the Holeksy genesis state from AWS, we've had reputable outside reports on this.
It's around 200MB and hosted in APAC, it makes sense to bump the default, at least for Holesky.
Bump default timeout from 180 to 300 secs
Backport of:
- https://github.com/sigp/lighthouse/pull/7067
For:
- https://github.com/sigp/lighthouse/issues/7039
- Prevent writing to state cache when migrating the database
- Add `state-cache-headroom` flag to control pruning
- Prune old epoch boundary states ahead of mid-epoch states
- Never prune head block's state
- Avoid caching ancestor states unless they are on an epoch boundary
- Log when states enter/exit the cache
Co-authored-by: Eitan Seri-Levi <eserilev@ucsc.edu>
Resolves https://github.com/sigp/lighthouse/issues/7000
Set the accept header on builder to the correct value when requesting ssz.
This PR also adds a flag to disable ssz over the builder api altogether. In the case that builders/relays have an ssz bug, we can react quickly by asking clients to restart their nodes with the `--disable-ssz-builder` flag to force json. I'm not fully convinced if this is useful so open to removing it or opening another PR for it.
Testing this currently.
There were two things I came across during some recent testing, that this PR addresses.
1 - The default port for IPv6 was set to 9090, which is confusing. I've set this to match its ipv4 counterpart (i.e 9000 and 9001). This makes more sense and is easier to firewall, for those firewalls that support both versions for a single rule.
2 - Watching the NAT status of lighthouse, I notice we only set the field to 1 once the NAT is passed. We don't give it a default 0 (false). So we only see results when its successful. On peer disconnects, i've piggy-backed a loop of the connected peers to also watch and check for NAT status updates.
* make `execution-endpoint` mandatory
* use parse_required instead
* make test pass
* Merge branch 'unstable' into make_ee_required
* fix test
* Merge branch 'unstable' into make_ee_required
* Fix cli help text
* Fix tests
* Merge branch 'unstable' into make_ee_required
* Add comment
* Clarification
* Merge remote-tracking branch 'origin/unstable' into make_ee_required
* Require manual confirmation to purge database
* Fix tests
* Rename to `purge-db-force and skip in non-interactive mode
* Do not skip when stdin_inputs is true
* Change prompt to be info logging to ensure consistent output
* Update warning text
* Move delete log after deletion
* fix bitvector test random impl
* add a new flag that allows for configuring the timeout for get builder header api calls
* make cli
* add upper limit check, changes based on feedback
* update cli
* capitalization
* cli fix
* MVP implementation (untested)
* update store checkpoint sync test
* update cli help
* Merge pull request #5253 from realbigsean/checkpoint-blobs-sean
Checkpoint blobs sean
* Warn only if blobs are missing from server
* Merge remote-tracking branch 'origin/unstable' into checkpoint-blobs
* Verify checkpoint blobs
* Move blob verification earlier
* Downgrade duplicate publish logs
* Maintain backwards compatiblity, deprecate flag
* The tests had to go, because there's no config to test against
* Update help_bn.md
* Disallow Syncing From Genesis By Default
* Fix CLI Tests
* Perform checks in the `ClientBuilder`
* Tidy, fix tests
* Return an error based on the Deneb fork
* Fix typos
* Fix failing test
* Add missing CLI flag
* Fix CLI flags
* Add suggestion from Sean
* Fix conflict with blob sidecars epochs
---------
Co-authored-by: Mark Mackey <mark@sigmaprime.io>