The container watchdog only restarted on three legacy "device clearly dead"
log lines, so today's failure mode (firmware briefly stalls and get_stats_*
/ get_battery commands time out with an empty error while passive RX
keeps working) never tripped it — leaving the user with 10-15 s freezes
several times a day and no automatic recovery.
DeviceManager now tracks two liveness signals:
- _last_rx_at, bumped on every RX_LOG_DATA event
- _consecutive_stats_failures, incremented on get_stats_* / get_bat
exceptions and cleared on success
New /health/strict endpoint exposes these to the watchdog. It returns 503
when the device is connected but has 5+ consecutive stats failures, or
when no RX event has been seen for over 5 minutes on a serial transport.
The cheap /health endpoint keeps its lenient behavior so Docker's
healthcheck doesn't suddenly start tripping.
The watchdog's check_device_unresponsive() gains a "soft" pattern class
with a count threshold of 5 in the last 2 minutes — matching against
get_stats_core/radio/packets failed:, Failed to get battery:, and
Failed to get channel. Hard patterns still trigger on a single hit.
Deploy note: the watchdog runs as a host-level systemd service and is
NOT restarted by mcupdate, so after deploy run:
sudo systemctl restart mc-webui-watchdog.service
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
In-container BLE reconnection is unreliable because bleak leaves stale
GATT notification handles after abnormal disconnect, and adapter power-
cycling from within Docker corrupts bleak's internal BlueZ manager state.
New approach:
- On BLE disconnect or keepalive failure, immediately mark as permanently
failed (no in-container reconnect attempts)
- Health endpoint returns 503, Docker healthcheck triggers container restart
- Docker entrypoint script disconnects stale BLE connections before app
starts, ensuring clean GATT state for bleak
This is reliable because:
- MeshCore.create_ble(address=...) works on fresh container starts
- The BlueZ daemon on the host maintains adapter state correctly
- Container restart is fast (~5s) and gives a truly clean BLE state
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Integrate meshcore library's BLE connection (via bleak) as a third
transport option alongside serial and TCP. Priority: BLE > TCP > Serial.
Config: MC_BLE_ADDRESS and MC_BLE_PIN environment variables.
Docker: bluez/dbus packages, NET_ADMIN cap, D-Bus socket mount.
UI: transport type badge in navbar, transport_type in /api/status.
Watchdog: skip USB reset for BLE connections (same as TCP).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The MeshCore community uses "companion" not "client" for type 1 nodes.
Rename the CLI label to COM across all UI, API, JS, and docs to align
with official terminology. Includes cache migration for old CLI entries.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Since mc-webui can now connect via TCP to a remote proxy instead of local USB/serial device, the hardware USB bus reset logic in Watchdog will no longer blindly attempt a reset on repeated container crashes.
Added \is_tcp_connection()\ helper to read the config and conditionally skip the USB reset if TCP is active.
When a native USB ESP32 device freezes, ioctl reset or DTR/RTS is often ignored. This uses sysfs unbind/bind and authorized toggles to forcefully drop the device from the kernel logic, causing it to re-enumerate cleanly without physical power cycles.
Since a standard USB bus reset often isn't enough to revive a hung ESP32, this adds a serial DTR/RTS toggle sequence (used by esptool) to physically reset the chip before trying a USB bus reset.
This prevents the container from holding the serial port open during the hardware reset, which was causing the reset to fail or the device to re-enumerate on a different port.
The v2 branch consolidated meshcore-bridge into mc-webui. Watchdog now:
- Monitors mc-webui logs for specific device connection errors
- Automatically restarts the container when errors are detected
- Performs a hardware USB bus reset if errors persist across 3 restarts
- Updated README.md to reflect the removal of meshcore-bridge
- Add pkt_payload column to direct_messages table for stable packet
hash generation and Analyzer URL linking
- Update insert_direct_message() and DeviceManager to store pkt_payload
- Add test for DM pkt_payload storage (43 tests pass)
- Update watchdog to monitor only mc-webui (meshcore-bridge removed)
- USB reset trigger now fires for mc-webui container failures
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implement a smart auto-detection and low-level fcntl ioctl reset mechanism for LoRa USB devices. This 'last resort' recovery is triggered if the meshcore-bridge container fails to recover after 3 restarts within an 8-minute window. Includes updates to the installer, systemd service, and newly added README.
Co-Authored-By: Gemini CLI <noreply@google.com>
Prompt lines (DeviceName|* ...) and summary lines (> N contacts)
are normal meshcli output, not format changes.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The script runs from host piped into the container, so argparse
doesn't work with stdin. Use env vars (BRIDGE_URL, FULL) as primary
config with fallback CLI arg parsing.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Diagnostic tool that tests all meshcli commands and response formats
used by mc-webui against a running bridge instance, detecting breaking
changes early when updating meshcore-cli versions.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Added AUTO_START option (default: true) to automatically start
stopped containers, not just restart unhealthy ones
- Added handle_stopped_container() function
- Updated documentation with new configuration option
Set AUTO_START=false to disable automatic starting of stopped containers.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Added docs/watchdog.md with installation and usage guide
- Added watchdog reference to README.md documentation table
- Fixed executable permissions on watchdog scripts
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
New systemd service that monitors Docker containers and automatically
restarts unhealthy ones. Features:
- Checks container health every 30 seconds
- Captures logs before restart for diagnostics
- Saves diagnostic files to /tmp/mc-webui-watchdog-*.log
- HTTP status endpoint on port 5051
- Restart history tracking
Install with: sudo ./scripts/watchdog/install.sh
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add sudo to journalctl command in install.sh help text
- Move "Update now" link below version number to prevent line wrap
- Add "What's new?" link in update modal pointing to GitHub commits
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add -u flag to Python for unbuffered logging to journald
- Configure git safe.directory automatically during install
- Revert test marker from base.html
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds webhook-based update system that allows triggering updates
directly from the mc-webui menu. Includes:
- Webhook server (updater.py) on port 5050
- Systemd service and install script
- API proxy endpoints for container-to-host communication
- Update modal with progress tracking and auto-reload
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add scripts/update.sh with colored output and error handling
- Automates: git pull, version freeze, docker compose rebuild
- Update README with script usage and alias instructions
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>