Files
meshexplorer/.env.example
T
Alex Vanderpot 72aa6be3d3 ingest: resubscribe on reconnect + staleness watchdog for zombie MQTT conns (#38)
The letsmesh broker was migrated behind Cloudflare and changed its topic
layout on 2026-06-02, which left prod's MQTT client in a zombie state:
connected per paho's IsConnected() (so the 30s monitor never rebuilt it) but
receiving zero messages, because the subscription was established only once
after the initial connect and never re-applied on paho auto-reconnects. Result:
12 days of silently missing letsmesh ingestion while davekeogh masked the loss.

Make reconnection robust instead of relying on broker-side session persistence:

- Subscribe inside the OnConnect handler so every (re)connect — including paho
  auto-reconnects — restores delivery. Use CleanSession(true)+ResumeSubs(false)
  so we never depend on the broker remembering our session.
- Add a per-broker data-staleness watchdog: a broker that reports connected but
  delivers no messages for MQTT_STALE_AFTER_SECONDS (default 300) is treated as a
  zombie and force-rebuilt (disconnect + fresh connect/subscribe). This catches
  exactly the failure IsConnected() misses.
- Reduce the external monitor to that watchdog role; transient drops are left to
  paho auto-reconnect rather than racing it with a brand-new client.
- Stable per-broker client IDs (by index) and pre-sized MQTTClients slice so
  indices stay aligned when an earlier broker fails; guard BrokerStatus/lastActivity
  with a mutex; promote connect/subscribe logs to Info for visibility.

Adds unit tests for the watchdog and env parsing; documents the new env var.

Co-authored-by: Alex Vanderpot <alex@Alexs-MacBook-Pro-2.local>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 11:55:33 -04:00

51 lines
3.0 KiB
Bash

# MeshExplorer unified stack configuration.
# Copy this file to .env and fill in the values, then run:
# docker compose up --build
# (add `--profile bot` to also start the Discord relay).
# ─── ClickHouse ──────────────────────────────────────────────────────────────
# The read/write "default" user is used by the ingest daemon and the migration
# runner. Set a real password before deploying.
CLICKHOUSE_DB=default
CLICKHOUSE_USER=default
CLICKHOUSE_PASSWORD=changeme
# Read-only user used by the web app and the Discord bot. This account is only
# reachable on the internal docker network; the default matches ingest/clickhouse/users.xml.
CLICKHOUSE_READONLY_USER=readonly
CLICKHOUSE_READONLY_PASSWORD=readonly
# ─── MeshCore MQTT ingest ────────────────────────────────────────────────────
# JSON array of MQTT brokers to subscribe to for meshcore packets. Each entry:
# { "url": "...", "username": "...", "password": "...", "topics": ["meshcore/#"] }
# "topics" is optional and defaults to ["meshcore/#"]. The ingest daemon exits
# with an error if this is empty, so configure at least one broker.
MQTT_BROKERS=[{"url":"tcp://mqtt.example.com:1883","username":"CHANGE_ME","password":"CHANGE_ME","topics":["meshcore/#"]}]
MQTT_CLIENT_ID=meshcore-ingest
# Staleness watchdog: if a broker reports connected but delivers no messages for
# this many seconds, the daemon forces a fresh reconnect + resubscribe. Guards
# against "zombie" connections that survive an upstream broker swap. Default 300.
MQTT_STALE_AFTER_SECONDS=300
# ─── Web app ─────────────────────────────────────────────────────────────────
# Base URL for client-side API calls. Leave empty to use relative URLs.
NEXT_PUBLIC_API_URL=
# ─── Discord relay bot (optional, --profile bot) ─────────────────────────────
# Required when running the bot. Create a webhook in your Discord server.
DISCORD_WEBHOOK_URL=
# Optional: post into a specific thread instead of the channel.
DISCORD_THREAD_ID=
# Region filter for messages (e.g. seattle).
MESH_REGION=seattle
# Poll interval (ms) and batch size.
POLL_INTERVAL=300
MAX_ROWS_PER_POLL=50
# Comma-separated base64 private keys used to decrypt channel messages.
PRIVATE_KEYS=
# ─── Grafana ─────────────────────────────────────────────────────────────────
# Admin password for the bundled Grafana (published on 127.0.0.1:3000). A
# ClickHouse datasource is auto-provisioned using the read-only user above.
GRAFANA_ADMIN_PASSWORD=admin