Implement a smart auto-detection and low-level fcntl ioctl reset mechanism for LoRa USB devices. This 'last resort' recovery is triggered if the meshcore-bridge container fails to recover after 3 restarts within an 8-minute window. Includes updates to the installer, systemd service, and newly added README. Co-Authored-By: Gemini CLI <noreply@google.com>
3.5 KiB
Container Watchdog
The Container Watchdog is a systemd service that monitors Docker containers and automatically restarts unhealthy or stopped ones. This is useful for ensuring reliability, especially on resource-constrained systems.
Features
- Health monitoring - Checks container status every 30 seconds
- Automatic restart - Restarts containers that become unhealthy
- Auto-start stopped containers - Starts containers that have stopped (configurable)
- Hardware USB reset - Performs a low-level USB bus reset if the LoRa device freezes (detected after 3 failed container restarts within 8 minutes)
- Diagnostic logging - Captures container logs before restart for troubleshooting
- HTTP status endpoint - Query container status via HTTP API
- Restart history - Tracks all automatic restarts with timestamps
Installation
cd ~/mc-webui
sudo ./scripts/watchdog/install.sh
The installer will:
- Create a systemd service
mc-webui-watchdog - Start monitoring containers immediately
- Enable automatic startup on boot
- Create log file at
/var/log/mc-webui-watchdog.log
Usage
Check service status
systemctl status mc-webui-watchdog
View watchdog logs
# Real-time logs
tail -f /var/log/mc-webui-watchdog.log
# Or via journalctl
journalctl -u mc-webui-watchdog -f
HTTP Status Endpoints
The watchdog provides HTTP endpoints on port 5051:
# Service health
curl http://localhost:5051/health
# Container status
curl http://localhost:5051/status
# Restart history
curl http://localhost:5051/history
Diagnostic Files
When a container is restarted, diagnostic information is saved to:
/tmp/mc-webui-watchdog-{container}-{timestamp}.log
These files contain:
- Container status at the time of failure
- Recent container logs (last 200 lines)
- Timestamp and restart result
Configuration (Optional)
No configuration required - the installer automatically detects paths and sets sensible defaults.
If you need to customize the behavior, the service supports these environment variables:
| Variable | Default | Description |
|---|---|---|
MCWEBUI_DIR |
(auto-detected) | Path to mc-webui directory |
CHECK_INTERVAL |
30 |
Seconds between health checks |
LOG_FILE |
/var/log/mc-webui-watchdog.log |
Path to log file |
HTTP_PORT |
5051 |
HTTP status port (0 to disable) |
AUTO_START |
true |
Start stopped containers (set to false to disable) |
USB_DEVICE_PATH |
(auto-detected) | Path to the LoRa device (e.g., /dev/bus/usb/001/002) for hardware USB bus reset |
To modify defaults, create an override file:
sudo systemctl edit mc-webui-watchdog
Then add your overrides, for example:
[Service]
Environment=CHECK_INTERVAL=60
Environment=AUTO_START=false
Uninstall
sudo ~/mc-webui/scripts/watchdog/install.sh --uninstall
Note: The log file is preserved after uninstall. Remove manually if needed:
sudo rm /var/log/mc-webui-watchdog.log
Troubleshooting
Service won't start
Check the logs:
journalctl -u mc-webui-watchdog -n 50
Common issues:
- Docker not running
- Python 3 not installed
- Permission issues
Containers keep restarting
Check the diagnostic files in /tmp/mc-webui-watchdog-*.log to see what's causing the containers to become unhealthy.
HTTP endpoint not responding
Verify the service is running and check if port 5051 is available:
systemctl status mc-webui-watchdog
ss -tlnp | grep 5051