diff --git a/PROMETHEUS.md b/PROMETHEUS.md new file mode 100644 index 0000000..0b29a9c --- /dev/null +++ b/PROMETHEUS.md @@ -0,0 +1,100 @@ +# Prometheus Monitoring for PotatoMesh + +PotatoMesh exposes runtime telemetry through a dedicated Prometheus endpoint so you can +observe message flow, node health, and geospatial metadata alongside the rest of your +infrastructure. This guide explains how the exporter is wired into the web +application, which metrics are available, and how to integrate the endpoint with a +Prometheus server. + +## Runtime integration + +The Sinatra application automatically loads the `prometheus-client` gem and mounts the +collector and exporter middlewares during boot. No additional configuration is +required to enable the `/metrics` endpoint—running the web application is enough to +serve Prometheus data on the same port as the dashboard. The middleware pair both +collects default Rack statistics and publishes PotatoMesh-specific gauges and +counters that are updated whenever the ingestors process new node records. + +A background refresh is triggered during start-up via +`update_all_prometheus_metrics_from_nodes`, which seeds the gauges based on the latest +state in the database. Subsequent POST requests to the ingest APIs update each metric +in near real time. + +## Selecting which nodes are exported + +To avoid creating high-cardinality time series, PotatoMesh does not export per-node +metrics unless you opt in by providing node identifiers. Control this behaviour with +the `PROM_REPORT_IDS` environment variable: + +- Leave the variable unset or blank to only export aggregate gauges such as the total + node count. +- Set `PROM_REPORT_IDS=*` to export metrics for every node in the database. +- Provide a comma-separated list (for example `PROM_REPORT_IDS=ABCD1234,EFGH5678`) to + expose metrics for specific nodes. + +The selection applies to both the initial refresh and the incremental updates handled +by the ingest pipeline. + +## Available metrics + +| Metric name | Type | Labels | Description | +| --- | --- | --- | --- | +| `meshtastic_messages_total` | Counter | _none_ | Increments each time the ingest pipeline accepts a new message payload. | +| `meshtastic_nodes` | Gauge | _none_ | Tracks the number of nodes currently stored in the database. | +| `meshtastic_node` | Gauge | `node`, `short_name`, `long_name`, `hw_model`, `role` | Reports a node as present (value `1`) along with identity metadata. | +| `meshtastic_node_battery_level` | Gauge | `node` | Most recent battery percentage reported by the node. | +| `meshtastic_node_voltage` | Gauge | `node` | Most recent battery voltage reading. | +| `meshtastic_node_uptime_seconds` | Gauge | `node` | Uptime reported by the device in seconds. | +| `meshtastic_node_channel_utilization` | Gauge | `node` | Latest channel utilisation ratio supplied by the node. | +| `meshtastic_node_transmit_air_utilization` | Gauge | `node` | Proportion of on-air time spent transmitting. | +| `meshtastic_node_latitude` | Gauge | `node` | Latitude component of the last known position. | +| `meshtastic_node_longitude` | Gauge | `node` | Longitude component of the last known position. | +| `meshtastic_node_altitude` | Gauge | `node` | Altitude (in metres) of the last known position. | + +All per-node gauges are only emitted for identifiers included in `PROM_REPORT_IDS`. +Some values require telemetry packets to be present—for example, devices must provide +metrics or positional updates before the related gauges appear. + +## Accessing the `/metrics` endpoint + +Once the application is running, query the exporter directly: + +```bash +curl http://localhost:41447/metrics +``` + +Use any HTTP client capable of plain-text requests. Prometheus scrapers should target +the same URL. The endpoint returns data in the standard exposition format produced by +`prometheus-client`. + +## Prometheus scrape configuration + +Add a job to your Prometheus server configuration that points to the PotatoMesh +instance. This example polls an instance running locally on the default port every 15 +seconds: + +```yaml +scrape_configs: + - job_name: potatomesh + scrape_interval: 15s + static_configs: + - targets: + - localhost:41447 +``` + +If your deployment requires authentication or runs behind a reverse proxy, configure +Prometheus to match your network topology (for example by adding basic authentication +credentials, custom headers, or TLS settings). + +## Troubleshooting + +- **No per-node metrics appear.** Ensure that `PROM_REPORT_IDS` is set and that the + specified nodes exist in the database. Set the value to `*` if you want to export + every node during initial validation. +- **Metrics look stale after a restart.** Confirm that the ingestor is still posting + telemetry. The exporter only reflects data stored in the PotatoMesh database. +- **Scrapes time out.** Verify that the Prometheus server can reach the PotatoMesh + HTTP port and that no reverse proxy is blocking the `/metrics` path. + +With the endpoint configured, you can build Grafana dashboards or alerting rules to +keep track of community mesh health in real time. diff --git a/README.md b/README.md index fbc29bd..ed8becc 100644 --- a/README.md +++ b/README.md @@ -133,6 +133,12 @@ The web app contains an API: The `API_TOKEN` environment variable must be set to a non-empty value and match the token supplied in the `Authorization` header for `POST` requests. +### Observability + +PotatoMesh ships with a Prometheus exporter mounted at `/metrics`. Consult +[`PROMETHEUS.md`](./PROMETHEUS.md) for deployment guidance, metric details, and +scrape configuration examples. + ## Python Ingestor The web app is not meant to be run locally connected to a Meshtastic node but rather