Add PeerDAS metrics to track subnets without peers (#6928)

Currently we track a key metric `PEERS_PER_COLUMN_SUBNET` in a getter `good_peers_on_sampling_subnets`. Another PR https://github.com/sigp/lighthouse/pull/6922 deletes that function, so we have to move the metric anyway. This PR moves that metric computation to the metrics spawned task which is refreshed every 5 seconds.

I also added a few more useful metrics. The total set and intended usage is:

- `sync_peers_per_column_subnet`: Track health of overall subnets in your node
- `sync_peers_per_custody_column_subnet`: Track health of the subnets your node needs. We should track this metric closely in our dashboards with a heatmap and bar plot
- ~~`sync_column_subnets_with_zero_peers`: Is equivalent to the Grafana query `count(sync_peers_per_column_subnet == 0) by (instance)`. We may prefer to skip it, but I believe it's the most important metric as if `sync_column_subnets_with_zero_peers > 0` your node stalls.~~
- ~~`sync_custody_column_subnets_with_zero_peers`: `count(sync_peers_per_custody_column_subnet == 0) by (instance)`~~
This commit is contained in:
Lion - dapplion
2025-02-11 03:56:38 -03:00
committed by GitHub
parent 3992d6ba74
commit d60388134d
3 changed files with 46 additions and 14 deletions

View File

@@ -1,7 +1,6 @@
use super::batch::{BatchInfo, BatchProcessingResult, BatchState};
use super::RangeSyncType;
use crate::metrics;
use crate::metrics::PEERS_PER_COLUMN_SUBNET;
use crate::network_beacon_processor::ChainSegmentProcessId;
use crate::sync::network_context::RangeRequestId;
use crate::sync::{network_context::SyncNetworkContext, BatchOperationOutcome, BatchProcessResult};
@@ -10,7 +9,6 @@ use beacon_chain::BeaconChainTypes;
use fnv::FnvHashMap;
use lighthouse_network::service::api_types::Id;
use lighthouse_network::{PeerAction, PeerId};
use metrics::set_int_gauge;
use rand::seq::SliceRandom;
use rand::Rng;
use slog::{crit, debug, o, warn};
@@ -1106,11 +1104,6 @@ impl<T: BeaconChainTypes> SyncingChain<T> {
.good_custody_subnet_peer(*subnet_id)
.count();
set_int_gauge(
&PEERS_PER_COLUMN_SUBNET,
&[&subnet_id.to_string()],
peer_count as i64,
);
peer_count > 0
});
peers_on_all_custody_subnets