Instrument tracing spans for block processing and import (#7816)

#7815

- removes all existing spans, so some span fields that appear in logs like `service_name` may be lost.
- instruments a few key code paths in the beacon node, starting from **root spans** named below:

* Gossip block and blobs
* `process_gossip_data_column_sidecar`
* `process_gossip_blob`
* `process_gossip_block`
* Rpc block and blobs
* `process_rpc_block`
* `process_rpc_blobs`
* `process_rpc_custody_columns`
* Rpc blocks (range and backfill)
* `process_chain_segment`
* `PendingComponents` lifecycle
* `pending_components`

To test locally:
* Run Grafana and Tempo with https://github.com/sigp/lighthouse-metrics/pull/57
* Run Lighthouse BN with `--telemetry-collector-url http://localhost:4317`

Some captured traces can be found here: https://hackmd.io/@jimmygchen/r1sLOxPPeg

Removing the old spans seem to have reduced the memory usage quite a lot - i think we were using them on long running tasks and too excessively:
<img width="910" height="495" alt="image" src="https://github.com/user-attachments/assets/5208bbe4-53b2-4ead-bc71-0b782c788669" />
This commit is contained in:
Jimmy Chen
2025-08-08 15:32:22 +10:00
committed by GitHub
parent 6dfab22267
commit 40c2fd5ff4
52 changed files with 633 additions and 1164 deletions

View File

@@ -15,7 +15,7 @@ use ssz_derive::{Decode, Encode};
use std::iter;
use std::marker::PhantomData;
use std::sync::Arc;
use tracing::debug;
use tracing::{debug, instrument};
use types::data_column_sidecar::ColumnIndex;
use types::{
BeaconStateError, ChainSpec, DataColumnSidecar, DataColumnSubnetId, EthSpec, Hash256,
@@ -412,6 +412,7 @@ impl<E: EthSpec> KzgVerifiedCustodyDataColumn<E> {
/// Complete kzg verification for a `DataColumnSidecar`.
///
/// Returns an error if the kzg verification check fails.
#[instrument(skip_all, level = "debug")]
pub fn verify_kzg_for_data_column<E: EthSpec>(
data_column: Arc<DataColumnSidecar<E>>,
kzg: &Kzg,
@@ -426,6 +427,7 @@ pub fn verify_kzg_for_data_column<E: EthSpec>(
///
/// Note: This function should be preferred over calling `verify_kzg_for_data_column`
/// in a loop since this function kzg verifies a list of data columns more efficiently.
#[instrument(skip_all, level = "debug")]
pub fn verify_kzg_for_data_column_list<'a, E: EthSpec, I>(
data_column_iter: I,
kzg: &'a Kzg,
@@ -470,6 +472,7 @@ where
Err(errors)
}
#[instrument(skip_all, level = "debug")]
pub fn validate_data_column_sidecar_for_gossip<T: BeaconChainTypes, O: ObservationStrategy>(
data_column: Arc<DataColumnSidecar<T::EthSpec>>,
subnet: DataColumnSubnetId,