mirror of
https://github.com/ajvpot/meshexplorer.git
synced 2026-07-05 09:11:00 +02:00
72aa6be3d3
The letsmesh broker was migrated behind Cloudflare and changed its topic layout on 2026-06-02, which left prod's MQTT client in a zombie state: connected per paho's IsConnected() (so the 30s monitor never rebuilt it) but receiving zero messages, because the subscription was established only once after the initial connect and never re-applied on paho auto-reconnects. Result: 12 days of silently missing letsmesh ingestion while davekeogh masked the loss. Make reconnection robust instead of relying on broker-side session persistence: - Subscribe inside the OnConnect handler so every (re)connect — including paho auto-reconnects — restores delivery. Use CleanSession(true)+ResumeSubs(false) so we never depend on the broker remembering our session. - Add a per-broker data-staleness watchdog: a broker that reports connected but delivers no messages for MQTT_STALE_AFTER_SECONDS (default 300) is treated as a zombie and force-rebuilt (disconnect + fresh connect/subscribe). This catches exactly the failure IsConnected() misses. - Reduce the external monitor to that watchdog role; transient drops are left to paho auto-reconnect rather than racing it with a brand-new client. - Stable per-broker client IDs (by index) and pre-sized MQTTClients slice so indices stay aligned when an earlier broker fails; guard BrokerStatus/lastActivity with a mutex; promote connect/subscribe logs to Info for visibility. Adds unit tests for the watchdog and env parsing; documents the new env var. Co-authored-by: Alex Vanderpot <alex@Alexs-MacBook-Pro-2.local> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
51 lines
3.0 KiB
Bash
51 lines
3.0 KiB
Bash
# MeshExplorer unified stack configuration.
|
|
# Copy this file to .env and fill in the values, then run:
|
|
# docker compose up --build
|
|
# (add `--profile bot` to also start the Discord relay).
|
|
|
|
# ─── ClickHouse ──────────────────────────────────────────────────────────────
|
|
# The read/write "default" user is used by the ingest daemon and the migration
|
|
# runner. Set a real password before deploying.
|
|
CLICKHOUSE_DB=default
|
|
CLICKHOUSE_USER=default
|
|
CLICKHOUSE_PASSWORD=changeme
|
|
|
|
# Read-only user used by the web app and the Discord bot. This account is only
|
|
# reachable on the internal docker network; the default matches ingest/clickhouse/users.xml.
|
|
CLICKHOUSE_READONLY_USER=readonly
|
|
CLICKHOUSE_READONLY_PASSWORD=readonly
|
|
|
|
# ─── MeshCore MQTT ingest ────────────────────────────────────────────────────
|
|
# JSON array of MQTT brokers to subscribe to for meshcore packets. Each entry:
|
|
# { "url": "...", "username": "...", "password": "...", "topics": ["meshcore/#"] }
|
|
# "topics" is optional and defaults to ["meshcore/#"]. The ingest daemon exits
|
|
# with an error if this is empty, so configure at least one broker.
|
|
MQTT_BROKERS=[{"url":"tcp://mqtt.example.com:1883","username":"CHANGE_ME","password":"CHANGE_ME","topics":["meshcore/#"]}]
|
|
MQTT_CLIENT_ID=meshcore-ingest
|
|
# Staleness watchdog: if a broker reports connected but delivers no messages for
|
|
# this many seconds, the daemon forces a fresh reconnect + resubscribe. Guards
|
|
# against "zombie" connections that survive an upstream broker swap. Default 300.
|
|
MQTT_STALE_AFTER_SECONDS=300
|
|
|
|
# ─── Web app ─────────────────────────────────────────────────────────────────
|
|
# Base URL for client-side API calls. Leave empty to use relative URLs.
|
|
NEXT_PUBLIC_API_URL=
|
|
|
|
# ─── Discord relay bot (optional, --profile bot) ─────────────────────────────
|
|
# Required when running the bot. Create a webhook in your Discord server.
|
|
DISCORD_WEBHOOK_URL=
|
|
# Optional: post into a specific thread instead of the channel.
|
|
DISCORD_THREAD_ID=
|
|
# Region filter for messages (e.g. seattle).
|
|
MESH_REGION=seattle
|
|
# Poll interval (ms) and batch size.
|
|
POLL_INTERVAL=300
|
|
MAX_ROWS_PER_POLL=50
|
|
# Comma-separated base64 private keys used to decrypt channel messages.
|
|
PRIVATE_KEYS=
|
|
|
|
# ─── Grafana ─────────────────────────────────────────────────────────────────
|
|
# Admin password for the bundled Grafana (published on 127.0.0.1:3000). A
|
|
# ClickHouse datasource is auto-provisioned using the read-only user above.
|
|
GRAFANA_ADMIN_PASSWORD=admin
|