mirror of
https://github.com/MarekWo/mc-webui.git
synced 2026-06-11 09:14:52 +02:00
53ef2759d5
User-guide: new Settings > Analyzer tab (custom analyzer services with default/disabled
toggles and {packetHash} placeholder), apply-path upload button in DM Path Management,
Backup modal's Optimize button + live size label, console change_path now accepts
arrow/whitespace separators with consistent multi-byte chunk length and "path" output
shows hop count and byte size.
Architecture: new /api/analyzers CRUD + default endpoints, /api/db/size and the split
/api/db/vacuum kickoff + /api/db/vacuum/status polling (worker-thread VACUUM to survive
proxy idle timeouts), /api/contacts/<key>/paths/<id>/apply, /health and /health/strict
top-level routes, analyzers table and direct_messages.delivery_path_hash_size column,
recombined path_len byte storage. DeviceManager: per-send channel-secret refresh,
liveness telemetry (_last_rx_at + _consecutive_stats_failures), TCP self-heal via
_liveness_watcher_loop + in-place reconnect. Retention scheduler: on-by-default
90/90/60/30, post-cleanup VACUUM at >=1000 deletions, app-context wrapping, archiver
emoji-name fallback. Socket.IO clients forced to polling transport.
Watchdog: documented hard- vs soft-pattern detection (5 hits in 2 min for sluggish
get_stats / get_battery failures), pointer to /health/strict, and the systemd-restart
deploy note for scripts/watchdog/ changes.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
125 lines
4.4 KiB
Markdown
125 lines
4.4 KiB
Markdown
# Container Watchdog
|
|
|
|
The Container Watchdog is a systemd service that monitors the `mc-webui` Docker container and automatically restarts it if it becomes unhealthy or if the LoRa device becomes unresponsive. This is useful for ensuring reliability, especially on resource-constrained systems or when the LoRa hardware hangs.
|
|
|
|
## Features
|
|
|
|
- **Health monitoring** - Checks container status every 30 seconds
|
|
- **Log monitoring** - Two pattern classes (see [Failure detection](#failure-detection))
|
|
- **Automatic restart** - Restarts the container when issues are detected
|
|
- **Auto-start stopped container** - Starts the container if it has stopped (configurable)
|
|
- **Hardware USB reset** - Performs a low-level USB bus reset (unbind/bind or DTR/RTS) if the LoRa device freezes. *Note: USB reset is automatically skipped if a TCP connection is used.*
|
|
- **Diagnostic logging** - Captures container logs before restart for troubleshooting
|
|
- **HTTP status endpoint** - Query watchdog status via HTTP API
|
|
- **Restart history** - Tracks all automatic restarts with timestamps
|
|
|
|
## Failure detection
|
|
|
|
`check_device_unresponsive()` scans the last 2 minutes of container logs against two pattern classes:
|
|
|
|
- **Hard patterns** — any single occurrence triggers a restart. These are the long-standing "device clearly dead" messages: `No response from meshcore node, disconnecting`, `Device connected but self_info is empty`, `Failed to connect after 10 attempts`.
|
|
- **Soft patterns** — any of these failing **5 or more times in the last 2 minutes** triggers a restart. Catches the "sluggish but not dead" mode where the firmware briefly stalls on `get_stats_*` / `get_battery` commands (empty-string `concurrent.futures.TimeoutError`) while passive RX still works: `get_stats_core failed:`, `get_stats_radio failed:`, `get_stats_packets failed:`, `Failed to get battery:`, `Failed to get channel`.
|
|
|
|
In parallel, the app exposes [`/health/strict`](architecture.md#health-endpoints) — a stricter device-health check that the watchdog (or any external monitor) can consume to react before the soft-pattern threshold is reached.
|
|
|
|
> **Deploy note:** the watchdog runs as a host-level systemd service and is **not** restarted by `mcupdate`. After deploying changes to `scripts/watchdog/`, run:
|
|
> ```bash
|
|
> sudo systemctl restart mc-webui-watchdog.service
|
|
> ```
|
|
|
|
## Installation
|
|
|
|
```bash
|
|
cd ~/mc-webui
|
|
sudo ./scripts/watchdog/install.sh
|
|
```
|
|
|
|
The installer will:
|
|
- Create a systemd service `mc-webui-watchdog`
|
|
- Start monitoring the container immediately
|
|
- Enable automatic startup on boot
|
|
- Create a log file at `/var/log/mc-webui-watchdog.log`
|
|
|
|
## Usage
|
|
|
|
### Check service status
|
|
|
|
```bash
|
|
systemctl status mc-webui-watchdog
|
|
```
|
|
|
|
### View watchdog logs
|
|
|
|
```bash
|
|
# Real-time logs
|
|
tail -f /var/log/mc-webui-watchdog.log
|
|
|
|
# Or via journalctl
|
|
journalctl -u mc-webui-watchdog -f
|
|
```
|
|
|
|
### HTTP Status Endpoints
|
|
|
|
The watchdog provides HTTP endpoints on port 5051:
|
|
|
|
```bash
|
|
# Service health
|
|
curl http://localhost:5051/health
|
|
|
|
# Container status
|
|
curl http://localhost:5051/status
|
|
|
|
# Restart history
|
|
curl http://localhost:5051/history
|
|
```
|
|
|
|
### Diagnostic Files
|
|
|
|
When the container is restarted, diagnostic information is saved to:
|
|
```
|
|
/tmp/mc-webui-watchdog-mc-webui-{timestamp}.log
|
|
```
|
|
|
|
These files contain:
|
|
- Container status at the time of failure
|
|
- Recent container logs (last 200 lines)
|
|
- Timestamp and restart result
|
|
|
|
## Configuration (Optional)
|
|
|
|
**No configuration required** - the installer automatically detects paths and sets sensible defaults.
|
|
|
|
If you need to customize the behavior, the service supports these environment variables:
|
|
|
|
| Variable | Default | Description |
|
|
|----------|---------|-------------|
|
|
| `MCWEBUI_DIR` | *(auto-detected)* | Path to mc-webui directory |
|
|
| `CHECK_INTERVAL` | `30` | Seconds between health checks |
|
|
| `LOG_FILE` | `/var/log/mc-webui-watchdog.log` | Path to log file |
|
|
| `HTTP_PORT` | `5051` | HTTP status port (0 to disable) |
|
|
| `AUTO_START` | `true` | Start stopped container (set to `false` to disable) |
|
|
| `USB_DEVICE_PATH` | *(auto-detected)* | Path to the LoRa device for hardware USB bus reset |
|
|
|
|
To modify defaults, create an override file:
|
|
```bash
|
|
sudo systemctl edit mc-webui-watchdog
|
|
```
|
|
|
|
Then add your overrides, for example:
|
|
```ini
|
|
[Service]
|
|
Environment=CHECK_INTERVAL=60
|
|
Environment=AUTO_START=false
|
|
```
|
|
|
|
## Uninstall
|
|
|
|
```bash
|
|
sudo ~/mc-webui/scripts/watchdog/install.sh --uninstall
|
|
```
|
|
|
|
Note: The log file is preserved after uninstall. Remove manually if needed:
|
|
```bash
|
|
sudo rm /var/log/mc-webui-watchdog.log
|
|
```
|