mirror of
https://github.com/ajvpot/meshexplorer.git
synced 2026-07-04 08:40:57 +02:00
72aa6be3d3
The letsmesh broker was migrated behind Cloudflare and changed its topic layout on 2026-06-02, which left prod's MQTT client in a zombie state: connected per paho's IsConnected() (so the 30s monitor never rebuilt it) but receiving zero messages, because the subscription was established only once after the initial connect and never re-applied on paho auto-reconnects. Result: 12 days of silently missing letsmesh ingestion while davekeogh masked the loss. Make reconnection robust instead of relying on broker-side session persistence: - Subscribe inside the OnConnect handler so every (re)connect — including paho auto-reconnects — restores delivery. Use CleanSession(true)+ResumeSubs(false) so we never depend on the broker remembering our session. - Add a per-broker data-staleness watchdog: a broker that reports connected but delivers no messages for MQTT_STALE_AFTER_SECONDS (default 300) is treated as a zombie and force-rebuilt (disconnect + fresh connect/subscribe). This catches exactly the failure IsConnected() misses. - Reduce the external monitor to that watchdog role; transient drops are left to paho auto-reconnect rather than racing it with a brand-new client. - Stable per-broker client IDs (by index) and pre-sized MQTTClients slice so indices stay aligned when an earlier broker fails; guard BrokerStatus/lastActivity with a mutex; promote connect/subscribe logs to Info for visibility. Adds unit tests for the watchdog and env parsing; documents the new env var. Co-authored-by: Alex Vanderpot <alex@Alexs-MacBook-Pro-2.local> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>