mirror of
https://github.com/MarekWo/mc-webui.git
synced 2026-07-05 17:31:39 +02:00
docs: cover analyzer settings, vacuum/optimize, path apply, watchdog soft patterns
User-guide: new Settings > Analyzer tab (custom analyzer services with default/disabled
toggles and {packetHash} placeholder), apply-path upload button in DM Path Management,
Backup modal's Optimize button + live size label, console change_path now accepts
arrow/whitespace separators with consistent multi-byte chunk length and "path" output
shows hop count and byte size.
Architecture: new /api/analyzers CRUD + default endpoints, /api/db/size and the split
/api/db/vacuum kickoff + /api/db/vacuum/status polling (worker-thread VACUUM to survive
proxy idle timeouts), /api/contacts/<key>/paths/<id>/apply, /health and /health/strict
top-level routes, analyzers table and direct_messages.delivery_path_hash_size column,
recombined path_len byte storage. DeviceManager: per-send channel-secret refresh,
liveness telemetry (_last_rx_at + _consecutive_stats_failures), TCP self-heal via
_liveness_watcher_loop + in-place reconnect. Retention scheduler: on-by-default
90/90/60/30, post-cleanup VACUUM at >=1000 deletions, app-context wrapping, archiver
emoji-name fallback. Socket.IO clients forced to polling transport.
Watchdog: documented hard- vs soft-pattern detection (5 hits in 2 min for sluggish
get_stats / get_battery failures), pointer to /health/strict, and the systemd-restart
deploy note for scripts/watchdog/ changes.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -78,6 +78,9 @@ The `DeviceManager` handles the connection to the MeshCore device via a direct s
|
||||
- **BLE keepalive & reconnect** - When using Bluetooth transport, a 60s keepalive loop detects "zombie" connections (reads still succeed but writes silently fail). On disconnect or keepalive failure, the manager marks the session as permanently failed and the `/health` endpoint returns 503, letting the Docker healthcheck trigger a fast container restart (~5s) to get a clean BLE state rather than attempting unreliable in-process reconnects
|
||||
- **Echo correlation** - Sent channel messages pre-compute their expected `pkt_payload` using the channel secret and send timestamp (±3s for clock drift), so incoming echoes are matched exactly instead of only by 1-byte channel hash (prevents misattribution when two messages go out simultaneously on the same channel)
|
||||
- **Per-channel region scope** - Before each channel send, the channel's mapped region scope key (16 bytes) is pushed to the firmware via `CMD_SET_FLOOD_SCOPE_KEY` (54). The scope-set + send pair is serialised under a `_send_lock` so concurrent sends on different channels can't swap each other's scope. Channels without a mapping get an all-zero key so a previously-set scope doesn't leak across channels
|
||||
- **Per-send channel-secret refresh** - Channel indices on the device compact down after a deletion, so the boot-time `_load_channel_secrets()` cache can drift. `send_channel_message` calls `_refresh_channel_secret(idx)` first (one extra `get_channel(idx)` round-trip) to fetch the current secret straight from firmware, update the in-memory cache and DB if they had drifted, and use it for the `pkt_payload` echo correlation
|
||||
- **Liveness telemetry** - Tracks `_last_rx_at` (bumped on every `RX_LOG_DATA` event) and `_consecutive_stats_failures` (incremented on `get_stats_*` / `get_bat` exceptions, cleared on success). Surfaced via `/health/strict` for the external watchdog
|
||||
- **TCP self-heal** - A `_liveness_watcher_loop` task on the DM event loop calls `force_reconnect()` when no RX event has arrived for `HEALTH_STRICT_MAX_RX_STALE_SEC` (5 min). `send_channel_message` also detects empty-string `concurrent.futures.TimeoutError` from `set_flood_scope_key` (the symptom of a degraded long-lived TCP) and runs an in-place reconnect + one retry before failing. A 30 s cooldown and `_reconnect_lock` prevent churn; `_intentional_disconnect` keeps the DISCONNECTED handler from racing the reconnect
|
||||
|
||||
---
|
||||
|
||||
@@ -138,9 +141,22 @@ Key tables:
|
||||
- `regions` - User-curated MeshCore flood scopes (`name`, `key_hex`, `is_default`)
|
||||
- `channel_scopes` - Per-channel region mapping (`channel_idx` → `region_id`, CASCADE on region delete; absent row = no override → firmware default applies)
|
||||
- `read_status` - Per-channel read counters and favorites (`is_favorite` column; used to pin channels in the sidebar/dropdown sort order)
|
||||
- `analyzers` - User-configured MeshCore Analyzer services (`name`, `url_template` with `{packetHash}` placeholder, `is_default`, `is_disabled`; partial unique index enforces a single default)
|
||||
|
||||
`direct_messages` gained a `delivery_path_hash_size` column (auto-migrated, defaults to 1) so reloaded DM bubbles render multi-byte routes correctly. The `path_len` column on `channel_messages`, `direct_messages`, and `paths` now stores the raw firmware byte (masked hop count plus path_hash_mode in the upper bits), recombined at write time via `pack_path_len()`; the API endpoints decode it back into `path_hash_size` on read.
|
||||
|
||||
The use of SQLite allows for fast queries, reliable data storage, full-text search, and complex filtering (such as contact ignoring/blocking) without the risk of file corruption inherent to flat JSON files.
|
||||
|
||||
### Retention scheduler
|
||||
|
||||
Retention is enabled by default with `90 / 90 / 60 / 30` days for `channel_messages / direct_messages / advertisements / diagnostics`. The job runs daily at 03:30 local (`TZ` from `.env`) and `cleanup_old_messages()` also deletes from `echoes`, `paths`, and `acks` (the diagnostic tables — historically the bulk of DB size). When at least 1 000 rows are removed in a pass, the scheduler immediately runs `VACUUM` to reclaim file space (a SQLite `DELETE` only marks pages free).
|
||||
|
||||
The retention/cleanup scheduler runs APScheduler jobs in worker threads, so each job is decorated with `@_with_app_context` and the Flask app is passed in via `set_flask_app()`; the `init_*_schedule()` callers also wrap themselves in `app.app_context()` so the boot-time read of `current_app.db` doesn't blow up with "Working outside of application context".
|
||||
|
||||
The archiver builds the `.msgs` path from `device_name`, but the `meshcore` library strips non-ASCII when writing the file (so a device renamed to include an emoji breaks the strict path match). The archiver now falls back to globbing the data directory for a single non-archive `.msgs` file when the expected path is missing — mirroring `migrate_v1`.
|
||||
|
||||
The channels API reads from the `channels` DB table rather than iterating device slots. `_load_channel_secrets()` syncs the table on every device connect (and prunes stale rows), `set_channel()` / `remove_channel()` update it synchronously with the device, and `_refresh_channel_secret()` refreshes individual rows on per-send refresh. This makes `/api/channels` a single sub-millisecond `SELECT` and unaffected by device responsiveness — the original symptom (only "Public" showing up after a refresh when the device briefly stalls) is gone.
|
||||
|
||||
---
|
||||
|
||||
## API Reference
|
||||
@@ -188,6 +204,7 @@ The use of SQLite allows for fast queries, reliable data storage, full-text sear
|
||||
| PUT | `/api/contacts/<key>/paths/<id>` | Update path (star, label) |
|
||||
| DELETE | `/api/contacts/<key>/paths/<id>` | Delete path |
|
||||
| POST | `/api/contacts/<key>/paths/reorder` | Reorder paths |
|
||||
| POST | `/api/contacts/<key>/paths/<id>/apply` | Push a configured path to the firmware as the active route (mirrors `change_path`); invalidates the contacts cache |
|
||||
| POST | `/api/contacts/<key>/paths/reset_flood` | Reset to FLOOD routing |
|
||||
| POST | `/api/contacts/<key>/paths/clear` | Clear all paths |
|
||||
| GET | `/api/contacts/<key>/no_auto_flood` | Get "Keep path" flag |
|
||||
@@ -219,6 +236,21 @@ The use of SQLite allows for fast queries, reliable data storage, full-text sear
|
||||
| POST | `/api/regions/<id>/default` | Mark default in DB AND push to firmware (CMD_SET_DEFAULT_FLOOD_SCOPE = 63, requires firmware v1.15+) |
|
||||
| DELETE | `/api/regions/default` | Clear default region in DB and on firmware |
|
||||
|
||||
The `PUT /api/channels/<index>/scope` endpoint accepts any `index` in `[0, device_manager._max_channels)` (40 on current firmwares; falls back to 8 if the DM is unreachable).
|
||||
|
||||
### Analyzers
|
||||
|
||||
| Method | Endpoint | Description |
|
||||
|--------|----------|-------------|
|
||||
| GET | `/api/analyzers` | List configured analyzer services |
|
||||
| POST | `/api/analyzers` | Create analyzer (`{name, url_template}`); template must contain `{packetHash}` |
|
||||
| PUT | `/api/analyzers/<id>` | Update analyzer (name / url / is_disabled) |
|
||||
| DELETE | `/api/analyzers/<id>` | Delete analyzer |
|
||||
| POST | `/api/analyzers/<id>/default` | Mark as default (enforced single-default via partial unique index) |
|
||||
| DELETE | `/api/analyzers/default` | Clear the default analyzer |
|
||||
|
||||
The backend no longer ships a pre-built `analyzer_url` per message — channel-message payloads include `packet_hash` instead, and the frontend substitutes `{packetHash}` in the chosen URL template at click time.
|
||||
|
||||
### Direct Messages
|
||||
|
||||
| Method | Endpoint | Description |
|
||||
@@ -259,6 +291,18 @@ The use of SQLite allows for fast queries, reliable data storage, full-text sear
|
||||
| GET | `/api/backup/list` | List database backups |
|
||||
| POST | `/api/backup/create` | Create database backup |
|
||||
| GET | `/api/backup/download` | Download backup file |
|
||||
| GET | `/api/db/size` | Current DB file size (bytes) |
|
||||
| POST | `/api/db/vacuum` | Kick off SQLite `VACUUM` in a worker thread. Returns 202 immediately; 409 if already running. The kickoff endpoint deliberately splits from polling so reverse proxies with ~30 s idle timeouts can't kill it mid-rewrite |
|
||||
| GET | `/api/db/vacuum/status` | Poll vacuum progress: `{running, elapsed_seconds, size_before, size_after}` |
|
||||
|
||||
### Health endpoints
|
||||
|
||||
These are top-level routes (not under `/api/`), consumed by Docker's healthcheck and the host-level watchdog.
|
||||
|
||||
| Method | Endpoint | Description |
|
||||
|--------|----------|-------------|
|
||||
| GET | `/health` | Lenient liveness check. Returns 503 only when BLE reconnection has permanently failed (so Docker triggers a container restart to clear BLE state). Returns 200 otherwise |
|
||||
| GET | `/health/strict` | Strict device-health check for the external watchdog. JSON response. Returns 503 when (a) BLE permanently failed, (b) `_consecutive_stats_failures` ≥ 5, or (c) transport is serial/usb/tcp and no RX event for > `HEALTH_STRICT_MAX_RX_STALE_SEC` (5 min). Returns 200 with the same counters when healthy |
|
||||
|
||||
### Other
|
||||
|
||||
@@ -284,6 +328,8 @@ The use of SQLite allows for fast queries, reliable data storage, full-text sear
|
||||
|
||||
## WebSocket API
|
||||
|
||||
All Socket.IO clients (`/chat`, `/console`, `/logs`) are configured with `transports: ['polling']`. The Werkzeug dev server can't upgrade WebSockets, so every `io()` upgrade attempt previously returned HTTP 500 and clients fell into a polling/upgrade reconnect loop — visible as 10–15 s freezes on app load. Long-polling keeps real-time pushes working with ~1–2 s latency.
|
||||
|
||||
### Console Namespace (`/console`)
|
||||
|
||||
Interactive console via Socket.IO WebSocket connection.
|
||||
|
||||
+22
-3
@@ -456,6 +456,7 @@ Configure message routing paths for individual contacts:
|
||||
- **Repeater picker** - Browse available repeaters by name or ID
|
||||
- **Map picker** - Select repeaters from a map view showing their GPS locations
|
||||
- **Import current path** - Import the path currently stored on the device
|
||||
- **Apply to device** (upload-arrow icon) - Push a configured path to the firmware as the active route without leaving the modal. The device-path line refreshes once the change is confirmed, mirroring the console's `change_path` command
|
||||
- **Reorder** - Drag paths to change priority (starred path is used first)
|
||||
- **Star** - Mark a preferred primary path (used first in retry rotation)
|
||||
- **Delete** - Remove individual paths
|
||||
@@ -500,8 +501,8 @@ The console supports a comprehensive set of MeshCore commands organized into cat
|
||||
- `.pending_contacts` - List pending contacts
|
||||
- `add_pending <key>` - Approve pending contact
|
||||
- `remove_contact <name>` - Remove contact
|
||||
- `change_path <name> <path>` - Change contact's routing path. Accepts comma-separated hex bytes (`D1,90,05`), continuous hex (`D19005`), or space-separated bytes. Use the keyword `direct` to set a Direct (0-hop) path. Hash size is auto-detected from the chunk length. Use `reset_path <name>` to switch back to Flood
|
||||
- `path <name>` - Show the current path for a contact
|
||||
- `change_path <name> <path>` - Change contact's routing path. Accepts comma-, whitespace-, or arrow-separated hex chunks (`D1,90,05`, `D103 5E34`, `D1->90->05`) or continuous hex (`D19005`). For multi-byte paths all chunks must have a consistent length — that length determines the hash-size mode (1, 2, or 3 bytes per hop). Use the keyword `direct` to set a Direct (0-hop) path; use `reset_path <name>` to switch back to Flood
|
||||
- `path <name>` - Show the current path for a contact (e.g. `D103,5E34 (2 hops, 2B)` — hop count and byte size)
|
||||
|
||||
**Device & Channel Management:**
|
||||
- `infos` / `ver` - Device info / firmware version
|
||||
@@ -592,7 +593,7 @@ Access the Settings modal to configure application behavior:
|
||||
1. Click the menu icon (☰) in the navbar (or tap the gear FAB button)
|
||||
2. Select "Settings" from the menu
|
||||
|
||||
The modal is organized into tabs: **Device**, **Messages**, **Group Chat**, **Interface**, **Appearance**, **Contacts**, **Regions**, and **Notifications**. A global **Close** button at the bottom of the modal dismisses Settings from any tab.
|
||||
The modal is organized into tabs: **Device**, **Messages**, **Group Chat**, **Interface**, **Appearance**, **Contacts**, **Regions**, **Analyzer**, and **Notifications**. A global **Close** button at the bottom of the modal dismisses Settings from any tab.
|
||||
|
||||
### Device Tab
|
||||
|
||||
@@ -676,6 +677,22 @@ Manage MeshCore region scopes (also called flood scopes). See [Region Scopes](#r
|
||||
- Pick **None** to clear the firmware default
|
||||
- Delete regions you no longer need (channels using a deleted region revert to "no scope")
|
||||
|
||||
### Analyzer Tab
|
||||
|
||||
Configure MeshCore Analyzer services used by the chart icon under each group-chat message. The icon resolves at click time depending on what you configure here:
|
||||
|
||||
- **No custom analyzers (or all disabled)** → opens the built-in Letsmesh analyzer
|
||||
- **One default analyzer set** → opens that service directly
|
||||
- **Multiple enabled analyzers, no default** → opens a chooser modal
|
||||
|
||||
Each row supports:
|
||||
|
||||
- **Star toggle** — mark this analyzer as the default. Only one default is allowed
|
||||
- **Enabled switch** — temporarily disable a service without deleting it
|
||||
- **Edit / Delete** buttons
|
||||
|
||||
When adding or editing, the URL template must contain the placeholder `{packetHash}` — it is substituted with the message's packet hash at click time.
|
||||
|
||||
### Notifications Tab
|
||||
|
||||
Enable or disable browser push notifications for new messages received while the app is hidden or in the background.
|
||||
@@ -710,6 +727,8 @@ Create and manage database backups:
|
||||
- **Create backup** - Creates a timestamped copy of the current database
|
||||
- **List backups** - View all available backups with timestamps and file sizes
|
||||
- **Download** - Download any backup file to your local machine
|
||||
- **Current size** - Live label showing the active DB file size
|
||||
- **Optimize now** - Run `VACUUM` on demand to reclaim free pages left behind by the retention job. The kickoff returns immediately and the UI polls for completion; a toast reports `freed X bytes in Y s` when done. A concurrent request returns HTTP 409. A nightly `VACUUM` already runs automatically when the retention job deletes 1000+ rows, so use this only when you want to reclaim space before the next 03:30 run
|
||||
|
||||
Backups are stored in the `./data/` directory alongside the main database.
|
||||
|
||||
|
||||
+15
-1
@@ -5,7 +5,7 @@ The Container Watchdog is a systemd service that monitors the `mc-webui` Docker
|
||||
## Features
|
||||
|
||||
- **Health monitoring** - Checks container status every 30 seconds
|
||||
- **Log monitoring** - Monitors `mc-webui` logs for specific "unresponsive LoRa device" errors
|
||||
- **Log monitoring** - Two pattern classes (see [Failure detection](#failure-detection))
|
||||
- **Automatic restart** - Restarts the container when issues are detected
|
||||
- **Auto-start stopped container** - Starts the container if it has stopped (configurable)
|
||||
- **Hardware USB reset** - Performs a low-level USB bus reset (unbind/bind or DTR/RTS) if the LoRa device freezes. *Note: USB reset is automatically skipped if a TCP connection is used.*
|
||||
@@ -13,6 +13,20 @@ The Container Watchdog is a systemd service that monitors the `mc-webui` Docker
|
||||
- **HTTP status endpoint** - Query watchdog status via HTTP API
|
||||
- **Restart history** - Tracks all automatic restarts with timestamps
|
||||
|
||||
## Failure detection
|
||||
|
||||
`check_device_unresponsive()` scans the last 2 minutes of container logs against two pattern classes:
|
||||
|
||||
- **Hard patterns** — any single occurrence triggers a restart. These are the long-standing "device clearly dead" messages: `No response from meshcore node, disconnecting`, `Device connected but self_info is empty`, `Failed to connect after 10 attempts`.
|
||||
- **Soft patterns** — any of these failing **5 or more times in the last 2 minutes** triggers a restart. Catches the "sluggish but not dead" mode where the firmware briefly stalls on `get_stats_*` / `get_battery` commands (empty-string `concurrent.futures.TimeoutError`) while passive RX still works: `get_stats_core failed:`, `get_stats_radio failed:`, `get_stats_packets failed:`, `Failed to get battery:`, `Failed to get channel`.
|
||||
|
||||
In parallel, the app exposes [`/health/strict`](architecture.md#health-endpoints) — a stricter device-health check that the watchdog (or any external monitor) can consume to react before the soft-pattern threshold is reached.
|
||||
|
||||
> **Deploy note:** the watchdog runs as a host-level systemd service and is **not** restarted by `mcupdate`. After deploying changes to `scripts/watchdog/`, run:
|
||||
> ```bash
|
||||
> sudo systemctl restart mc-webui-watchdog.service
|
||||
> ```
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
|
||||
Reference in New Issue
Block a user