Files
mc-webui/docs/watchdog.md
T
MarekWo 53ef2759d5 docs: cover analyzer settings, vacuum/optimize, path apply, watchdog soft patterns
User-guide: new Settings > Analyzer tab (custom analyzer services with default/disabled
toggles and {packetHash} placeholder), apply-path upload button in DM Path Management,
Backup modal's Optimize button + live size label, console change_path now accepts
arrow/whitespace separators with consistent multi-byte chunk length and "path" output
shows hop count and byte size.

Architecture: new /api/analyzers CRUD + default endpoints, /api/db/size and the split
/api/db/vacuum kickoff + /api/db/vacuum/status polling (worker-thread VACUUM to survive
proxy idle timeouts), /api/contacts/<key>/paths/<id>/apply, /health and /health/strict
top-level routes, analyzers table and direct_messages.delivery_path_hash_size column,
recombined path_len byte storage. DeviceManager: per-send channel-secret refresh,
liveness telemetry (_last_rx_at + _consecutive_stats_failures), TCP self-heal via
_liveness_watcher_loop + in-place reconnect. Retention scheduler: on-by-default
90/90/60/30, post-cleanup VACUUM at >=1000 deletions, app-context wrapping, archiver
emoji-name fallback. Socket.IO clients forced to polling transport.

Watchdog: documented hard- vs soft-pattern detection (5 hits in 2 min for sluggish
get_stats / get_battery failures), pointer to /health/strict, and the systemd-restart
deploy note for scripts/watchdog/ changes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-08 11:53:41 +02:00

4.4 KiB

Container Watchdog

The Container Watchdog is a systemd service that monitors the mc-webui Docker container and automatically restarts it if it becomes unhealthy or if the LoRa device becomes unresponsive. This is useful for ensuring reliability, especially on resource-constrained systems or when the LoRa hardware hangs.

Features

  • Health monitoring - Checks container status every 30 seconds
  • Log monitoring - Two pattern classes (see Failure detection)
  • Automatic restart - Restarts the container when issues are detected
  • Auto-start stopped container - Starts the container if it has stopped (configurable)
  • Hardware USB reset - Performs a low-level USB bus reset (unbind/bind or DTR/RTS) if the LoRa device freezes. Note: USB reset is automatically skipped if a TCP connection is used.
  • Diagnostic logging - Captures container logs before restart for troubleshooting
  • HTTP status endpoint - Query watchdog status via HTTP API
  • Restart history - Tracks all automatic restarts with timestamps

Failure detection

check_device_unresponsive() scans the last 2 minutes of container logs against two pattern classes:

  • Hard patterns — any single occurrence triggers a restart. These are the long-standing "device clearly dead" messages: No response from meshcore node, disconnecting, Device connected but self_info is empty, Failed to connect after 10 attempts.
  • Soft patterns — any of these failing 5 or more times in the last 2 minutes triggers a restart. Catches the "sluggish but not dead" mode where the firmware briefly stalls on get_stats_* / get_battery commands (empty-string concurrent.futures.TimeoutError) while passive RX still works: get_stats_core failed:, get_stats_radio failed:, get_stats_packets failed:, Failed to get battery:, Failed to get channel.

In parallel, the app exposes /health/strict — a stricter device-health check that the watchdog (or any external monitor) can consume to react before the soft-pattern threshold is reached.

Deploy note: the watchdog runs as a host-level systemd service and is not restarted by mcupdate. After deploying changes to scripts/watchdog/, run:

sudo systemctl restart mc-webui-watchdog.service

Installation

cd ~/mc-webui
sudo ./scripts/watchdog/install.sh

The installer will:

  • Create a systemd service mc-webui-watchdog
  • Start monitoring the container immediately
  • Enable automatic startup on boot
  • Create a log file at /var/log/mc-webui-watchdog.log

Usage

Check service status

systemctl status mc-webui-watchdog

View watchdog logs

# Real-time logs
tail -f /var/log/mc-webui-watchdog.log

# Or via journalctl
journalctl -u mc-webui-watchdog -f

HTTP Status Endpoints

The watchdog provides HTTP endpoints on port 5051:

# Service health
curl http://localhost:5051/health

# Container status
curl http://localhost:5051/status

# Restart history
curl http://localhost:5051/history

Diagnostic Files

When the container is restarted, diagnostic information is saved to:

/tmp/mc-webui-watchdog-mc-webui-{timestamp}.log

These files contain:

  • Container status at the time of failure
  • Recent container logs (last 200 lines)
  • Timestamp and restart result

Configuration (Optional)

No configuration required - the installer automatically detects paths and sets sensible defaults.

If you need to customize the behavior, the service supports these environment variables:

Variable Default Description
MCWEBUI_DIR (auto-detected) Path to mc-webui directory
CHECK_INTERVAL 30 Seconds between health checks
LOG_FILE /var/log/mc-webui-watchdog.log Path to log file
HTTP_PORT 5051 HTTP status port (0 to disable)
AUTO_START true Start stopped container (set to false to disable)
USB_DEVICE_PATH (auto-detected) Path to the LoRa device for hardware USB bus reset

To modify defaults, create an override file:

sudo systemctl edit mc-webui-watchdog

Then add your overrides, for example:

[Service]
Environment=CHECK_INTERVAL=60
Environment=AUTO_START=false

Uninstall

sudo ~/mc-webui/scripts/watchdog/install.sh --uninstall

Note: The log file is preserved after uninstall. Remove manually if needed:

sudo rm /var/log/mc-webui-watchdog.log