mirror of
https://github.com/MarekWo/mc-webui.git
synced 2026-06-11 01:04:56 +02:00
422e7a3b34
The container watchdog only restarted on three legacy "device clearly dead" log lines, so today's failure mode (firmware briefly stalls and get_stats_* / get_battery commands time out with an empty error while passive RX keeps working) never tripped it — leaving the user with 10-15 s freezes several times a day and no automatic recovery. DeviceManager now tracks two liveness signals: - _last_rx_at, bumped on every RX_LOG_DATA event - _consecutive_stats_failures, incremented on get_stats_* / get_bat exceptions and cleared on success New /health/strict endpoint exposes these to the watchdog. It returns 503 when the device is connected but has 5+ consecutive stats failures, or when no RX event has been seen for over 5 minutes on a serial transport. The cheap /health endpoint keeps its lenient behavior so Docker's healthcheck doesn't suddenly start tripping. The watchdog's check_device_unresponsive() gains a "soft" pattern class with a count threshold of 5 in the last 2 minutes — matching against get_stats_core/radio/packets failed:, Failed to get battery:, and Failed to get channel. Hard patterns still trigger on a single hit. Deploy note: the watchdog runs as a host-level systemd service and is NOT restarted by mcupdate, so after deploy run: sudo systemctl restart mc-webui-watchdog.service Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>