mirror of
https://github.com/MarekWo/mc-webui.git
synced 2026-06-11 09:14:52 +02:00
422e7a3b34
The container watchdog only restarted on three legacy "device clearly dead" log lines, so today's failure mode (firmware briefly stalls and get_stats_* / get_battery commands time out with an empty error while passive RX keeps working) never tripped it — leaving the user with 10-15 s freezes several times a day and no automatic recovery. DeviceManager now tracks two liveness signals: - _last_rx_at, bumped on every RX_LOG_DATA event - _consecutive_stats_failures, incremented on get_stats_* / get_bat exceptions and cleared on success New /health/strict endpoint exposes these to the watchdog. It returns 503 when the device is connected but has 5+ consecutive stats failures, or when no RX event has been seen for over 5 minutes on a serial transport. The cheap /health endpoint keeps its lenient behavior so Docker's healthcheck doesn't suddenly start tripping. The watchdog's check_device_unresponsive() gains a "soft" pattern class with a count threshold of 5 in the last 2 minutes — matching against get_stats_core/radio/packets failed:, Failed to get battery:, and Failed to get channel. Hard patterns still trigger on a single hit. Deploy note: the watchdog runs as a host-level systemd service and is NOT restarted by mcupdate, so after deploy run: sudo systemctl restart mc-webui-watchdog.service Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
mc-webui Container Watchdog
The watchdog service is a utility designed to run on the host machine running the Docker containers for the mc-webui project. Its primary purpose is to continuously monitor the health of the application's containers, specifically the mc-webui container, which handles the physical connection to the LoRa device (like Heltec V3 or V4).
Key Capabilities
- Automated Restarts: If a container becomes
unhealthy, stops, or reports device connection issues in its logs, the watchdog automatically restarts it to restore service without human intervention. - Hardware USB Bus Reset: If the
mc-webuicontainer fails to recover after three successive restarts (e.g., due to a hardware freeze on the LoRa device itself), the watchdog will intelligently simulate a physical disconnection and reconnection of the device via a low-level USB bus reset, completely resolving hardware lockups.
Installation / Update
You can easily install or update the watchdog by running the provided installer script with root privileges:
cd ~/mc-webui/scripts/watchdog
sudo ./install.sh
Detailed Documentation
For full details on configuration, logs, troubleshooting, and more advanced features, please refer to the main Container Watchdog Documentation located in the docs folder.