Files
mc-webui/docs/watchdog.md
MarekWo 2f82c589c7 feat(watchdog): Hardware USB bus reset for stuck LoRa devices
Implement a smart auto-detection and low-level fcntl ioctl reset mechanism for LoRa USB devices. This 'last resort' recovery is triggered if the meshcore-bridge container fails to recover after 3 restarts within an 8-minute window. Includes updates to the installer, systemd service, and newly added README.

Co-Authored-By: Gemini CLI <noreply@google.com>
2026-02-22 20:15:27 +00:00

3.5 KiB

Container Watchdog

The Container Watchdog is a systemd service that monitors Docker containers and automatically restarts unhealthy or stopped ones. This is useful for ensuring reliability, especially on resource-constrained systems.

Features

  • Health monitoring - Checks container status every 30 seconds
  • Automatic restart - Restarts containers that become unhealthy
  • Auto-start stopped containers - Starts containers that have stopped (configurable)
  • Hardware USB reset - Performs a low-level USB bus reset if the LoRa device freezes (detected after 3 failed container restarts within 8 minutes)
  • Diagnostic logging - Captures container logs before restart for troubleshooting
  • HTTP status endpoint - Query container status via HTTP API
  • Restart history - Tracks all automatic restarts with timestamps

Installation

cd ~/mc-webui
sudo ./scripts/watchdog/install.sh

The installer will:

  • Create a systemd service mc-webui-watchdog
  • Start monitoring containers immediately
  • Enable automatic startup on boot
  • Create log file at /var/log/mc-webui-watchdog.log

Usage

Check service status

systemctl status mc-webui-watchdog

View watchdog logs

# Real-time logs
tail -f /var/log/mc-webui-watchdog.log

# Or via journalctl
journalctl -u mc-webui-watchdog -f

HTTP Status Endpoints

The watchdog provides HTTP endpoints on port 5051:

# Service health
curl http://localhost:5051/health

# Container status
curl http://localhost:5051/status

# Restart history
curl http://localhost:5051/history

Diagnostic Files

When a container is restarted, diagnostic information is saved to:

/tmp/mc-webui-watchdog-{container}-{timestamp}.log

These files contain:

  • Container status at the time of failure
  • Recent container logs (last 200 lines)
  • Timestamp and restart result

Configuration (Optional)

No configuration required - the installer automatically detects paths and sets sensible defaults.

If you need to customize the behavior, the service supports these environment variables:

Variable Default Description
MCWEBUI_DIR (auto-detected) Path to mc-webui directory
CHECK_INTERVAL 30 Seconds between health checks
LOG_FILE /var/log/mc-webui-watchdog.log Path to log file
HTTP_PORT 5051 HTTP status port (0 to disable)
AUTO_START true Start stopped containers (set to false to disable)
USB_DEVICE_PATH (auto-detected) Path to the LoRa device (e.g., /dev/bus/usb/001/002) for hardware USB bus reset

To modify defaults, create an override file:

sudo systemctl edit mc-webui-watchdog

Then add your overrides, for example:

[Service]
Environment=CHECK_INTERVAL=60
Environment=AUTO_START=false

Uninstall

sudo ~/mc-webui/scripts/watchdog/install.sh --uninstall

Note: The log file is preserved after uninstall. Remove manually if needed:

sudo rm /var/log/mc-webui-watchdog.log

Troubleshooting

Service won't start

Check the logs:

journalctl -u mc-webui-watchdog -n 50

Common issues:

  • Docker not running
  • Python 3 not installed
  • Permission issues

Containers keep restarting

Check the diagnostic files in /tmp/mc-webui-watchdog-*.log to see what's causing the containers to become unhealthy.

HTTP endpoint not responding

Verify the service is running and check if port 5051 is available:

systemctl status mc-webui-watchdog
ss -tlnp | grep 5051