12 Commits

Author SHA1 Message Date
MarekWo 422e7a3b34 feat(watchdog): catch sluggish-device failures via soft-pattern counting
The container watchdog only restarted on three legacy "device clearly dead"
log lines, so today's failure mode (firmware briefly stalls and get_stats_*
/ get_battery commands time out with an empty error while passive RX
keeps working) never tripped it — leaving the user with 10-15 s freezes
several times a day and no automatic recovery.

DeviceManager now tracks two liveness signals:
- _last_rx_at, bumped on every RX_LOG_DATA event
- _consecutive_stats_failures, incremented on get_stats_* / get_bat
  exceptions and cleared on success

New /health/strict endpoint exposes these to the watchdog. It returns 503
when the device is connected but has 5+ consecutive stats failures, or
when no RX event has been seen for over 5 minutes on a serial transport.
The cheap /health endpoint keeps its lenient behavior so Docker's
healthcheck doesn't suddenly start tripping.

The watchdog's check_device_unresponsive() gains a "soft" pattern class
with a count threshold of 5 in the last 2 minutes — matching against
get_stats_core/radio/packets failed:, Failed to get battery:, and
Failed to get channel. Hard patterns still trigger on a single hit.

Deploy note: the watchdog runs as a host-level systemd service and is
NOT restarted by mcupdate, so after deploy run:
  sudo systemctl restart mc-webui-watchdog.service

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-07 09:43:43 +02:00
MarekWo 710f69c350 feat: add BLE transport support for companion devices
Integrate meshcore library's BLE connection (via bleak) as a third
transport option alongside serial and TCP. Priority: BLE > TCP > Serial.

Config: MC_BLE_ADDRESS and MC_BLE_PIN environment variables.
Docker: bluez/dbus packages, NET_ADMIN cap, D-Bus socket mount.
UI: transport type badge in navbar, transport_type in /api/status.
Watchdog: skip USB reset for BLE connections (same as TCP).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-29 10:03:45 +02:00
MarekWo c1b0085710 feat(watchdog): skip USB reset if TCP connection is used
Since mc-webui can now connect via TCP to a remote proxy instead of local USB/serial device, the hardware USB bus reset logic in Watchdog will no longer blindly attempt a reset on repeated container crashes.

Added \is_tcp_connection()\ helper to read the config and conditionally skip the USB reset if TCP is active.
2026-03-06 09:53:06 +00:00
MarekWo 5c47a5b617 fix(watchdog): add logical USB unbind/bind and authorized toggle
When a native USB ESP32 device freezes, ioctl reset or DTR/RTS is often ignored. This uses sysfs unbind/bind and authorized toggles to forcefully drop the device from the kernel logic, causing it to re-enumerate cleanly without physical power cycles.
2026-03-03 20:28:40 +00:00
MarekWo 02b75c167b fix(watchdog): add ESP32 hardware reset via DTR/RTS
Since a standard USB bus reset often isn't enough to revive a hung ESP32, this adds a serial DTR/RTS toggle sequence (used by esptool) to physically reset the chip before trying a USB bus reset.
2026-03-03 20:20:52 +00:00
MarekWo d079f97a38 fix(watchdog): stop container before resetting USB bus
This prevents the container from holding the serial port open during the hardware reset, which was causing the reset to fail or the device to re-enumerate on a different port.
2026-03-03 20:13:19 +00:00
MarekWo ad8c5702f9 feat(watchdog): monitor mc-webui logs for unresponsive LoRa device
The v2 branch consolidated meshcore-bridge into mc-webui. Watchdog now:
- Monitors mc-webui logs for specific device connection errors
- Automatically restarts the container when errors are detected
- Performs a hardware USB bus reset if errors persist across 3 restarts
- Updated README.md to reflect the removal of meshcore-bridge
2026-03-03 20:01:46 +00:00
MarekWo e98acf6afa feat(v2): Add pkt_payload to DMs, update watchdog for single container
- Add pkt_payload column to direct_messages table for stable packet
  hash generation and Analyzer URL linking
- Update insert_direct_message() and DeviceManager to store pkt_payload
- Add test for DM pkt_payload storage (43 tests pass)
- Update watchdog to monitor only mc-webui (meshcore-bridge removed)
- USB reset trigger now fires for mc-webui container failures

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 10:01:43 +01:00
MarekWo 2f82c589c7 feat(watchdog): Hardware USB bus reset for stuck LoRa devices
Implement a smart auto-detection and low-level fcntl ioctl reset mechanism for LoRa USB devices. This 'last resort' recovery is triggered if the meshcore-bridge container fails to recover after 3 restarts within an 8-minute window. Includes updates to the installer, systemd service, and newly added README.

Co-Authored-By: Gemini CLI <noreply@google.com>
2026-02-22 20:15:27 +00:00
MarekWo aa788d7a0b feat: Add auto-start for stopped containers in watchdog
- Added AUTO_START option (default: true) to automatically start
  stopped containers, not just restart unhealthy ones
- Added handle_stopped_container() function
- Updated documentation with new configuration option

Set AUTO_START=false to disable automatic starting of stopped containers.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-31 14:05:51 +01:00
MarekWo 96efc2a716 docs: Add watchdog documentation and fix executable flags
- Added docs/watchdog.md with installation and usage guide
- Added watchdog reference to README.md documentation table
- Fixed executable permissions on watchdog scripts

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-31 13:43:17 +01:00
MarekWo 73e1c63083 feat: Add container watchdog service
New systemd service that monitors Docker containers and automatically
restarts unhealthy ones. Features:

- Checks container health every 30 seconds
- Captures logs before restart for diagnostics
- Saves diagnostic files to /tmp/mc-webui-watchdog-*.log
- HTTP status endpoint on port 5051
- Restart history tracking

Install with: sudo ./scripts/watchdog/install.sh

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-31 13:39:08 +01:00