docs: document prometheus metrics (#399)

This commit is contained in:
l5y
2025-10-31 11:04:20 +01:00
committed by GitHub
parent 49e0f39ca9
commit 3c2c7611ee
2 changed files with 106 additions and 0 deletions

100
PROMETHEUS.md Normal file
View File

@@ -0,0 +1,100 @@
# Prometheus Monitoring for PotatoMesh
PotatoMesh exposes runtime telemetry through a dedicated Prometheus endpoint so you can
observe message flow, node health, and geospatial metadata alongside the rest of your
infrastructure. This guide explains how the exporter is wired into the web
application, which metrics are available, and how to integrate the endpoint with a
Prometheus server.
## Runtime integration
The Sinatra application automatically loads the `prometheus-client` gem and mounts the
collector and exporter middlewares during boot. No additional configuration is
required to enable the `/metrics` endpoint—running the web application is enough to
serve Prometheus data on the same port as the dashboard. The middleware pair both
collects default Rack statistics and publishes PotatoMesh-specific gauges and
counters that are updated whenever the ingestors process new node records.
A background refresh is triggered during start-up via
`update_all_prometheus_metrics_from_nodes`, which seeds the gauges based on the latest
state in the database. Subsequent POST requests to the ingest APIs update each metric
in near real time.
## Selecting which nodes are exported
To avoid creating high-cardinality time series, PotatoMesh does not export per-node
metrics unless you opt in by providing node identifiers. Control this behaviour with
the `PROM_REPORT_IDS` environment variable:
- Leave the variable unset or blank to only export aggregate gauges such as the total
node count.
- Set `PROM_REPORT_IDS=*` to export metrics for every node in the database.
- Provide a comma-separated list (for example `PROM_REPORT_IDS=ABCD1234,EFGH5678`) to
expose metrics for specific nodes.
The selection applies to both the initial refresh and the incremental updates handled
by the ingest pipeline.
## Available metrics
| Metric name | Type | Labels | Description |
| --- | --- | --- | --- |
| `meshtastic_messages_total` | Counter | _none_ | Increments each time the ingest pipeline accepts a new message payload. |
| `meshtastic_nodes` | Gauge | _none_ | Tracks the number of nodes currently stored in the database. |
| `meshtastic_node` | Gauge | `node`, `short_name`, `long_name`, `hw_model`, `role` | Reports a node as present (value `1`) along with identity metadata. |
| `meshtastic_node_battery_level` | Gauge | `node` | Most recent battery percentage reported by the node. |
| `meshtastic_node_voltage` | Gauge | `node` | Most recent battery voltage reading. |
| `meshtastic_node_uptime_seconds` | Gauge | `node` | Uptime reported by the device in seconds. |
| `meshtastic_node_channel_utilization` | Gauge | `node` | Latest channel utilisation ratio supplied by the node. |
| `meshtastic_node_transmit_air_utilization` | Gauge | `node` | Proportion of on-air time spent transmitting. |
| `meshtastic_node_latitude` | Gauge | `node` | Latitude component of the last known position. |
| `meshtastic_node_longitude` | Gauge | `node` | Longitude component of the last known position. |
| `meshtastic_node_altitude` | Gauge | `node` | Altitude (in metres) of the last known position. |
All per-node gauges are only emitted for identifiers included in `PROM_REPORT_IDS`.
Some values require telemetry packets to be present—for example, devices must provide
metrics or positional updates before the related gauges appear.
## Accessing the `/metrics` endpoint
Once the application is running, query the exporter directly:
```bash
curl http://localhost:41447/metrics
```
Use any HTTP client capable of plain-text requests. Prometheus scrapers should target
the same URL. The endpoint returns data in the standard exposition format produced by
`prometheus-client`.
## Prometheus scrape configuration
Add a job to your Prometheus server configuration that points to the PotatoMesh
instance. This example polls an instance running locally on the default port every 15
seconds:
```yaml
scrape_configs:
- job_name: potatomesh
scrape_interval: 15s
static_configs:
- targets:
- localhost:41447
```
If your deployment requires authentication or runs behind a reverse proxy, configure
Prometheus to match your network topology (for example by adding basic authentication
credentials, custom headers, or TLS settings).
## Troubleshooting
- **No per-node metrics appear.** Ensure that `PROM_REPORT_IDS` is set and that the
specified nodes exist in the database. Set the value to `*` if you want to export
every node during initial validation.
- **Metrics look stale after a restart.** Confirm that the ingestor is still posting
telemetry. The exporter only reflects data stored in the PotatoMesh database.
- **Scrapes time out.** Verify that the Prometheus server can reach the PotatoMesh
HTTP port and that no reverse proxy is blocking the `/metrics` path.
With the endpoint configured, you can build Grafana dashboards or alerting rules to
keep track of community mesh health in real time.

View File

@@ -133,6 +133,12 @@ The web app contains an API:
The `API_TOKEN` environment variable must be set to a non-empty value and match the token supplied in the `Authorization` header for `POST` requests.
### Observability
PotatoMesh ships with a Prometheus exporter mounted at `/metrics`. Consult
[`PROMETHEUS.md`](./PROMETHEUS.md) for deployment guidance, metric details, and
scrape configuration examples.
## Python Ingestor
The web app is not meant to be run locally connected to a Meshtastic node but rather