Files
MarekWo 422e7a3b34 feat(watchdog): catch sluggish-device failures via soft-pattern counting
The container watchdog only restarted on three legacy "device clearly dead"
log lines, so today's failure mode (firmware briefly stalls and get_stats_*
/ get_battery commands time out with an empty error while passive RX
keeps working) never tripped it — leaving the user with 10-15 s freezes
several times a day and no automatic recovery.

DeviceManager now tracks two liveness signals:
- _last_rx_at, bumped on every RX_LOG_DATA event
- _consecutive_stats_failures, incremented on get_stats_* / get_bat
  exceptions and cleared on success

New /health/strict endpoint exposes these to the watchdog. It returns 503
when the device is connected but has 5+ consecutive stats failures, or
when no RX event has been seen for over 5 minutes on a serial transport.
The cheap /health endpoint keeps its lenient behavior so Docker's
healthcheck doesn't suddenly start tripping.

The watchdog's check_device_unresponsive() gains a "soft" pattern class
with a count threshold of 5 in the last 2 minutes — matching against
get_stats_core/radio/packets failed:, Failed to get battery:, and
Failed to get channel. Hard patterns still trigger on a single hit.

Deploy note: the watchdog runs as a host-level systemd service and is
NOT restarted by mcupdate, so after deploy run:
  sudo systemctl restart mc-webui-watchdog.service

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-07 09:43:43 +02:00
..

mc-webui Container Watchdog

The watchdog service is a utility designed to run on the host machine running the Docker containers for the mc-webui project. Its primary purpose is to continuously monitor the health of the application's containers, specifically the mc-webui container, which handles the physical connection to the LoRa device (like Heltec V3 or V4).

Key Capabilities

  • Automated Restarts: If a container becomes unhealthy, stops, or reports device connection issues in its logs, the watchdog automatically restarts it to restore service without human intervention.
  • Hardware USB Bus Reset: If the mc-webui container fails to recover after three successive restarts (e.g., due to a hardware freeze on the LoRa device itself), the watchdog will intelligently simulate a physical disconnection and reconnection of the device via a low-level USB bus reset, completely resolving hardware lockups.

Installation / Update

You can easily install or update the watchdog by running the provided installer script with root privileges:

cd ~/mc-webui/scripts/watchdog
sudo ./install.sh

Detailed Documentation

For full details on configuration, logs, troubleshooting, and more advanced features, please refer to the main Container Watchdog Documentation located in the docs folder.