Files
MarekWo 53ef2759d5 docs: cover analyzer settings, vacuum/optimize, path apply, watchdog soft patterns
User-guide: new Settings > Analyzer tab (custom analyzer services with default/disabled
toggles and {packetHash} placeholder), apply-path upload button in DM Path Management,
Backup modal's Optimize button + live size label, console change_path now accepts
arrow/whitespace separators with consistent multi-byte chunk length and "path" output
shows hop count and byte size.

Architecture: new /api/analyzers CRUD + default endpoints, /api/db/size and the split
/api/db/vacuum kickoff + /api/db/vacuum/status polling (worker-thread VACUUM to survive
proxy idle timeouts), /api/contacts/<key>/paths/<id>/apply, /health and /health/strict
top-level routes, analyzers table and direct_messages.delivery_path_hash_size column,
recombined path_len byte storage. DeviceManager: per-send channel-secret refresh,
liveness telemetry (_last_rx_at + _consecutive_stats_failures), TCP self-heal via
_liveness_watcher_loop + in-place reconnect. Retention scheduler: on-by-default
90/90/60/30, post-cleanup VACUUM at >=1000 deletions, app-context wrapping, archiver
emoji-name fallback. Socket.IO clients forced to polling transport.

Watchdog: documented hard- vs soft-pattern detection (5 hits in 2 min for sluggish
get_stats / get_battery failures), pointer to /health/strict, and the systemd-restart
deploy note for scripts/watchdog/ changes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-08 11:53:41 +02:00

370 lines
24 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# mc-webui Architecture
Technical documentation for mc-webui, covering system architecture, project structure, and internal APIs.
## Table of Contents
- [Tech Stack](#tech-stack)
- [Container Architecture](#container-architecture)
- [DeviceManager Architecture](#devicemanager-architecture)
- [Project Structure](#project-structure)
- [Database Architecture](#database-architecture)
- [API Reference](#api-reference)
- [WebSocket API](#websocket-api)
- [Offline Support](#offline-support)
---
## Tech Stack
- **Backend:** Python 3.11+, Flask, Flask-SocketIO (gevent), SQLite
- **Frontend:** HTML5, Bootstrap 5, vanilla JavaScript, Socket.IO client
- **Deployment:** Docker / Docker Compose (Single-container architecture)
- **Communication:** Direct hardware access (USB, BLE, or TCP) via `meshcore` library
- **Data source:** SQLite Database (`./data/meshcore/<pubkey_prefix>.db`)
---
## Container Architecture
mc-webui uses a **single-container architecture** for simplified deployment and direct hardware communication:
```text
┌─────────────────────────────────────────────────────────────┐
│ Docker Network │
│ │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ mc-webui │ │
│ │ │ │
│ │ - Flask web app (Port 5000) │ │
│ │ - DeviceManager (Direct USB/BLE/TCP access) │ │
│ │ - Database (SQLite) │ │
│ │ │ │
│ └─────────┬─────────────────────────────────────────────┘ │
│ │ │
└────────────┼─────────────────────────────────────────────────┘
┌──────────────┐
│ USB/BLE/TCP │
│ Device │
└──────────────┘
```
Three transport options are supported with the following priority: **BLE > TCP > Serial (USB)**. Set the `MC_BLE_ADDRESS` or `MC_TCP_HOST` environment variable to activate BLE or TCP transport respectively; otherwise, USB serial is used by default.
This v2 architecture eliminates the need for a separate bridge container and relies on the native `meshcore` Python library for direct communication, ensuring lower latency and greater stability.
### Docker Entrypoint (BLE cleanup)
`scripts/docker-entrypoint.sh` runs before the Flask app starts. When `MC_BLE_ADDRESS` is set, it uses D-Bus to check if BlueZ has an active session to the device and disconnects it. BlueZ auto-reconnects trusted devices, which leaves stale GATT notification handles that block `bleak` from establishing a new session. A clean disconnect at startup ensures the app starts with a fresh BLE state.
### Multi-architecture Images
Official images are built via GitHub Actions for `linux/amd64`, `linux/arm64`, and `linux/arm/v7` (Raspberry Pi 2/3/4/5 supported). Build dependencies (`gcc`, `python3-dev`, `libjpeg-dev`, `zlib1g-dev`) are installed and then purged to keep the final image size small while still compiling `Pillow` and `pycryptodome` from source when wheels are unavailable (notably on `arm/v7`). GHA layer cache (`cache-from` / `cache-to`) speeds up subsequent rebuilds. Images are pushed to both Docker Hub (`mawoj/mc-webui`) and GitHub Container Registry (`ghcr.io/marekwo/mc-webui`), with `latest` tag on `main` and `dev` tag on the `dev` branch.
---
## DeviceManager Architecture
The `DeviceManager` handles the connection to the MeshCore device via a direct session:
- **Single persistent session** - One long-lived connection utilizing the `meshcore` library
- **Event-driven** - Subscribes to device events (e.g., incoming messages, advert receptions, ACKs) and triggers appropriate handlers
- **Direct Database integration** - Seamlessly syncs contacts, messages, and device settings to the SQLite database
- **Real-time messages** - Instant message processing via callback events without polling
- **Thread-safe queue** - Commands are serialized to prevent device lockups
- **Auto-restart watchdog** - Monitors connection health and restarts the session on crash
- **BLE keepalive & reconnect** - When using Bluetooth transport, a 60s keepalive loop detects "zombie" connections (reads still succeed but writes silently fail). On disconnect or keepalive failure, the manager marks the session as permanently failed and the `/health` endpoint returns 503, letting the Docker healthcheck trigger a fast container restart (~5s) to get a clean BLE state rather than attempting unreliable in-process reconnects
- **Echo correlation** - Sent channel messages pre-compute their expected `pkt_payload` using the channel secret and send timestamp (±3s for clock drift), so incoming echoes are matched exactly instead of only by 1-byte channel hash (prevents misattribution when two messages go out simultaneously on the same channel)
- **Per-channel region scope** - Before each channel send, the channel's mapped region scope key (16 bytes) is pushed to the firmware via `CMD_SET_FLOOD_SCOPE_KEY` (54). The scope-set + send pair is serialised under a `_send_lock` so concurrent sends on different channels can't swap each other's scope. Channels without a mapping get an all-zero key so a previously-set scope doesn't leak across channels
- **Per-send channel-secret refresh** - Channel indices on the device compact down after a deletion, so the boot-time `_load_channel_secrets()` cache can drift. `send_channel_message` calls `_refresh_channel_secret(idx)` first (one extra `get_channel(idx)` round-trip) to fetch the current secret straight from firmware, update the in-memory cache and DB if they had drifted, and use it for the `pkt_payload` echo correlation
- **Liveness telemetry** - Tracks `_last_rx_at` (bumped on every `RX_LOG_DATA` event) and `_consecutive_stats_failures` (incremented on `get_stats_*` / `get_bat` exceptions, cleared on success). Surfaced via `/health/strict` for the external watchdog
- **TCP self-heal** - A `_liveness_watcher_loop` task on the DM event loop calls `force_reconnect()` when no RX event has arrived for `HEALTH_STRICT_MAX_RX_STALE_SEC` (5 min). `send_channel_message` also detects empty-string `concurrent.futures.TimeoutError` from `set_flood_scope_key` (the symptom of a degraded long-lived TCP) and runs an in-place reconnect + one retry before failing. A 30 s cooldown and `_reconnect_lock` prevent churn; `_intentional_disconnect` keeps the DISCONNECTED handler from racing the reconnect
---
## Project Structure
```text
mc-webui/
├── Dockerfile # Main app Docker image
├── docker-compose.yml # Single-container orchestration
├── app/
│ ├── __init__.py
│ ├── main.py # Flask entry point + Socket.IO handlers
│ ├── config.py # Configuration from env vars
│ ├── database.py # SQLite database models and CRUD operations
│ ├── device_manager.py # Core logic for meshcore communication
│ ├── contacts_cache.py # Persistent contacts cache (DB-backed)
│ ├── read_status.py # Server-side read status manager (DB-backed)
│ ├── version.py # Git-based version management
│ ├── migrate_v1.py # Migration script from v1 flat files to v2 SQLite
│ ├── meshcore/
│ │ ├── __init__.py
│ │ ├── cli.py # Meshcore library wrapper interface
│ │ └── parser.py # Data parsers
│ ├── archiver/
│ │ └── manager.py # Archive scheduler and management
│ ├── routes/
│ │ ├── __init__.py
│ │ ├── api.py # REST API endpoints
│ │ └── views.py # HTML views
│ ├── static/ # Frontend assets (CSS, JS, images, vendors)
│ │ └── js/fab-utils.js # Floating-button drag/collapse/sizing helpers
│ └── templates/ # HTML templates
├── docs/ # Documentation
├── scripts/
│ ├── update.sh # Automated update script
│ ├── docker-entrypoint.sh # Container startup (BLE cleanup)
│ ├── updater/ # Remote update webhook service
│ └── watchdog/ # Container health monitor
└── README.md
```
---
## Database Architecture
mc-webui v2 uses a robust **SQLite Database** with WAL (Write-Ahead Logging) enabled.
Location: `./data/meshcore/<pubkey_prefix>.db`
Key tables:
- `messages` - All channel and direct messages (with FTS5 index for full-text search)
- `contacts` - Contact list with sync status, types, block/ignore flags, `no_auto_flood` flag
- `channels` - Channel configuration and keys
- `echoes` - Sent message tracking and repeater paths, `hash_size` for path_hash_mode
- `direct_messages` - DM messages with delivery tracking (`delivery_status`, `delivery_attempt`, `delivery_max_attempts`, `delivery_path`)
- `acks` - DM delivery status
- `settings` - Application settings (migrated from .webui_settings.json)
- `regions` - User-curated MeshCore flood scopes (`name`, `key_hex`, `is_default`)
- `channel_scopes` - Per-channel region mapping (`channel_idx``region_id`, CASCADE on region delete; absent row = no override → firmware default applies)
- `read_status` - Per-channel read counters and favorites (`is_favorite` column; used to pin channels in the sidebar/dropdown sort order)
- `analyzers` - User-configured MeshCore Analyzer services (`name`, `url_template` with `{packetHash}` placeholder, `is_default`, `is_disabled`; partial unique index enforces a single default)
`direct_messages` gained a `delivery_path_hash_size` column (auto-migrated, defaults to 1) so reloaded DM bubbles render multi-byte routes correctly. The `path_len` column on `channel_messages`, `direct_messages`, and `paths` now stores the raw firmware byte (masked hop count plus path_hash_mode in the upper bits), recombined at write time via `pack_path_len()`; the API endpoints decode it back into `path_hash_size` on read.
The use of SQLite allows for fast queries, reliable data storage, full-text search, and complex filtering (such as contact ignoring/blocking) without the risk of file corruption inherent to flat JSON files.
### Retention scheduler
Retention is enabled by default with `90 / 90 / 60 / 30` days for `channel_messages / direct_messages / advertisements / diagnostics`. The job runs daily at 03:30 local (`TZ` from `.env`) and `cleanup_old_messages()` also deletes from `echoes`, `paths`, and `acks` (the diagnostic tables — historically the bulk of DB size). When at least 1 000 rows are removed in a pass, the scheduler immediately runs `VACUUM` to reclaim file space (a SQLite `DELETE` only marks pages free).
The retention/cleanup scheduler runs APScheduler jobs in worker threads, so each job is decorated with `@_with_app_context` and the Flask app is passed in via `set_flask_app()`; the `init_*_schedule()` callers also wrap themselves in `app.app_context()` so the boot-time read of `current_app.db` doesn't blow up with "Working outside of application context".
The archiver builds the `.msgs` path from `device_name`, but the `meshcore` library strips non-ASCII when writing the file (so a device renamed to include an emoji breaks the strict path match). The archiver now falls back to globbing the data directory for a single non-archive `.msgs` file when the expected path is missing — mirroring `migrate_v1`.
The channels API reads from the `channels` DB table rather than iterating device slots. `_load_channel_secrets()` syncs the table on every device connect (and prunes stale rows), `set_channel()` / `remove_channel()` update it synchronously with the device, and `_refresh_channel_secret()` refreshes individual rows on per-send refresh. This makes `/api/channels` a single sub-millisecond `SELECT` and unaffected by device responsiveness — the original symptom (only "Public" showing up after a refresh when the device briefly stalls) is gone.
---
## API Reference
### Messages
| Method | Endpoint | Description |
|--------|----------|-------------|
| GET | `/api/messages` | List messages (`?archive_date`, `?days`, `?channel_idx`) |
| POST | `/api/messages` | Send message (`{text, channel_idx, reply_to?}`) |
| GET | `/api/messages/updates` | Check for new messages (smart refresh) |
| GET | `/api/messages/<id>/meta` | Get message metadata (echoes, paths) |
| GET | `/api/messages/search` | Full-text search (`?q=`, `?channel_idx=`, `?limit=`) |
### Contacts
| Method | Endpoint | Description |
|--------|----------|-------------|
| GET | `/api/contacts` | List contacts |
| GET | `/api/contacts/detailed` | Full contact data (includes protection, ignore, block flags) |
| GET | `/api/contacts/cached` | Get cached contacts (superset of device contacts) |
| POST | `/api/contacts/delete` | Soft-delete contact (`{selector}`) |
| POST | `/api/contacts/cached/delete` | Delete cached contact |
| GET | `/api/contacts/protected` | List protected public keys |
| POST | `/api/contacts/<key>/protect` | Toggle contact protection |
| POST | `/api/contacts/<key>/ignore` | Toggle contact ignore |
| POST | `/api/contacts/<key>/block` | Toggle contact block |
| GET | `/api/contacts/blocked-names` | Get blocked names count |
| POST | `/api/contacts/block-name` | Block a name pattern |
| GET | `/api/contacts/blocked-names-list` | List blocked name patterns |
| POST | `/api/contacts/preview-cleanup` | Preview cleanup criteria |
| POST | `/api/contacts/cleanup` | Remove contacts by filter |
| GET | `/api/contacts/cleanup-settings` | Get auto-cleanup settings |
| POST | `/api/contacts/cleanup-settings` | Update auto-cleanup settings |
| GET | `/api/contacts/pending` | Pending contacts (`?types=1&types=2`) |
| POST | `/api/contacts/pending/approve` | Approve pending contact |
| POST | `/api/contacts/pending/reject` | Reject pending contact |
| POST | `/api/contacts/pending/clear` | Clear all pending contacts |
| POST | `/api/contacts/manual-add` | Add contact from URI or params |
| POST | `/api/contacts/<key>/push-to-device` | Push cached contact to device |
| POST | `/api/contacts/<key>/move-to-cache` | Move device contact to cache |
| GET | `/api/contacts/repeaters` | List repeater contacts (for path picker) |
| GET | `/api/contacts/<key>/paths` | Get contact paths |
| POST | `/api/contacts/<key>/paths` | Add path to contact |
| PUT | `/api/contacts/<key>/paths/<id>` | Update path (star, label) |
| DELETE | `/api/contacts/<key>/paths/<id>` | Delete path |
| POST | `/api/contacts/<key>/paths/reorder` | Reorder paths |
| POST | `/api/contacts/<key>/paths/<id>/apply` | Push a configured path to the firmware as the active route (mirrors `change_path`); invalidates the contacts cache |
| POST | `/api/contacts/<key>/paths/reset_flood` | Reset to FLOOD routing |
| POST | `/api/contacts/<key>/paths/clear` | Clear all paths |
| GET | `/api/contacts/<key>/no_auto_flood` | Get "Keep path" flag |
| PUT | `/api/contacts/<key>/no_auto_flood` | Set "Keep path" flag |
### Channels
| Method | Endpoint | Description |
|--------|----------|-------------|
| GET | `/api/channels` | List all channels |
| POST | `/api/channels` | Create new channel (idempotent — returns existing slot if name already used) |
| POST | `/api/channels/join` | Join existing channel (idempotent unless explicit `index` overrides) |
| DELETE | `/api/channels/<index>` | Remove channel |
| GET | `/api/channels/<index>/qr` | QR code (`?format=json\|png`) |
| GET | `/api/channels/muted` | Get muted channels |
| POST | `/api/channels/<index>/mute` | Toggle channel mute |
| GET | `/api/channels/scopes` | Bulk per-channel region mapping for UI |
| PUT | `/api/channels/<index>/scope` | Assign/clear region scope (`{region_id: int\|null}`) |
| GET | `/api/channels/favorites` | List favorite channel indices |
| POST | `/api/channels/<index>/favorite` | Set favorite state (`{favorite: bool}`) |
### Regions (MeshCore flood scopes)
| Method | Endpoint | Description |
|--------|----------|-------------|
| GET | `/api/regions` | List the device's region registry |
| POST | `/api/regions` | Create region (`{name}`); key derived as `SHA256('#'+name)[:16]` |
| DELETE | `/api/regions/<id>` | Delete region; CASCADE clears channel mappings; if it was the firmware default, clears it on device |
| POST | `/api/regions/<id>/default` | Mark default in DB AND push to firmware (CMD_SET_DEFAULT_FLOOD_SCOPE = 63, requires firmware v1.15+) |
| DELETE | `/api/regions/default` | Clear default region in DB and on firmware |
The `PUT /api/channels/<index>/scope` endpoint accepts any `index` in `[0, device_manager._max_channels)` (40 on current firmwares; falls back to 8 if the DM is unreachable).
### Analyzers
| Method | Endpoint | Description |
|--------|----------|-------------|
| GET | `/api/analyzers` | List configured analyzer services |
| POST | `/api/analyzers` | Create analyzer (`{name, url_template}`); template must contain `{packetHash}` |
| PUT | `/api/analyzers/<id>` | Update analyzer (name / url / is_disabled) |
| DELETE | `/api/analyzers/<id>` | Delete analyzer |
| POST | `/api/analyzers/<id>/default` | Mark as default (enforced single-default via partial unique index) |
| DELETE | `/api/analyzers/default` | Clear the default analyzer |
The backend no longer ships a pre-built `analyzer_url` per message — channel-message payloads include `packet_hash` instead, and the frontend substitutes `{packetHash}` in the chosen URL template at click time.
### Direct Messages
| Method | Endpoint | Description |
|--------|----------|-------------|
| GET | `/api/dm/conversations` | List DM conversations |
| GET | `/api/dm/messages` | Get messages (`?conversation_id=`, `?limit=`) |
| POST | `/api/dm/messages` | Send DM (`{recipient, text}`) |
| GET | `/api/dm/updates` | Check for new DMs |
| GET | `/api/dm/auto_retry` | Get DM retry configuration |
| POST | `/api/dm/auto_retry` | Update DM retry configuration |
### Device & Settings
| Method | Endpoint | Description |
|--------|----------|-------------|
| GET | `/api/status` | Connection status (device name, transport type, serial port / BLE address) |
| GET | `/api/device/info` | Device information |
| GET | `/api/device/stats` | Device statistics |
| GET | `/api/device/settings` | Get device settings |
| POST | `/api/device/settings` | Update device settings |
| GET | `/api/device/config` | Get device configuration (name, coords, advert_loc_policy, path_hash_mode, radio params, tx_power) |
| POST | `/api/device/config` | Update device configuration from Settings > Device tab. Subset of fields incl. `path_hash_mode` (0=1B, 1=2B, 2=3B) |
| POST | `/api/device/command` | Execute command (advert, floodadv) |
| GET | `/api/device/commands` | List available special commands |
| GET | `/api/chat/settings` | Get chat settings (quote length, route popup timeout/no-autoclose) |
| POST | `/api/chat/settings` | Update chat settings |
| GET | `/api/ui/settings` | Get UI settings (toast timeout, no-autoclose, position) |
| POST | `/api/ui/settings` | Update UI settings |
| GET | `/api/retention-settings` | Get message retention settings |
| POST | `/api/retention-settings` | Update retention settings |
### Archives & Backup
| Method | Endpoint | Description |
|--------|----------|-------------|
| GET | `/api/archives` | List archives |
| POST | `/api/archive/trigger` | Manual archive |
| GET | `/api/backup/list` | List database backups |
| POST | `/api/backup/create` | Create database backup |
| GET | `/api/backup/download` | Download backup file |
| GET | `/api/db/size` | Current DB file size (bytes) |
| POST | `/api/db/vacuum` | Kick off SQLite `VACUUM` in a worker thread. Returns 202 immediately; 409 if already running. The kickoff endpoint deliberately splits from polling so reverse proxies with ~30 s idle timeouts can't kill it mid-rewrite |
| GET | `/api/db/vacuum/status` | Poll vacuum progress: `{running, elapsed_seconds, size_before, size_after}` |
### Health endpoints
These are top-level routes (not under `/api/`), consumed by Docker's healthcheck and the host-level watchdog.
| Method | Endpoint | Description |
|--------|----------|-------------|
| GET | `/health` | Lenient liveness check. Returns 503 only when BLE reconnection has permanently failed (so Docker triggers a container restart to clear BLE state). Returns 200 otherwise |
| GET | `/health/strict` | Strict device-health check for the external watchdog. JSON response. Returns 503 when (a) BLE permanently failed, (b) `_consecutive_stats_failures` ≥ 5, or (c) transport is serial/usb/tcp and no RX event for > `HEALTH_STRICT_MAX_RX_STALE_SEC` (5 min). Returns 200 with the same counters when healthy |
### Other
| Method | Endpoint | Description |
|--------|----------|-------------|
| GET | `/api/read_status` | Get server-side read status |
| POST | `/api/read_status/mark_read` | Mark messages as read |
| POST | `/api/read_status/mark_all_read` | Mark all messages as read |
| GET | `/api/version` | Get app version |
| GET | `/api/check-update` | Check for available updates |
| GET | `/api/updater/status` | Get updater service status |
| POST | `/api/updater/trigger` | Trigger remote update |
| GET | `/api/advertisements` | Get recent advertisements |
| GET | `/api/console/history` | Get console command history |
| POST | `/api/console/history` | Save console command |
| DELETE | `/api/console/history` | Clear console history |
| GET | `/api/console/output` | Get persisted console output transcript (capped at 500 entries) |
| POST | `/api/console/output` | Append entry to transcript |
| DELETE | `/api/console/output` | Clear transcript |
| GET | `/api/logs` | Get application logs |
---
## WebSocket API
All Socket.IO clients (`/chat`, `/console`, `/logs`) are configured with `transports: ['polling']`. The Werkzeug dev server can't upgrade WebSockets, so every `io()` upgrade attempt previously returned HTTP 500 and clients fell into a polling/upgrade reconnect loop — visible as 1015 s freezes on app load. Long-polling keeps real-time pushes working with ~12 s latency.
### Console Namespace (`/console`)
Interactive console via Socket.IO WebSocket connection.
**Client → Server:**
- `send_command` - Execute command (`{command: "infos"}`)
**Server → Client:**
- `console_status` - Connection status
- `command_response` - Command result (`{success, command, output}`)
### Chat Namespace (`/chat`)
Real-time message delivery via Socket.IO.
**Server → Client:**
- `new_channel_message` - New channel message received
- `new_dm_message` - New DM received
- `message_echo` - Echo/ACK update for sent message (includes `hash_size`)
- `dm_ack` - DM delivery confirmation
- `dm_retry_status` - Real-time retry progress (`dm_id`, `attempt`, `max_attempts`)
- `dm_retry_failed` - All retry attempts exhausted (`dm_id`)
- `dm_delivered_info` - Delivery details after ACK (`dm_id`, `attempt`, `max_attempts`, `path`, `hash_size`)
- `path_changed` - Contact path discovered/updated (`public_key`)
### Logs Namespace (`/logs`)
Real-time log streaming via Socket.IO.
**Server → Client:**
- `log_line` - New log line
---
## Offline Support
The application works completely offline without internet connection. Vendor libraries (Bootstrap, Bootstrap Icons, Socket.IO, Emoji Picker) are bundled locally. A Service Worker provides hybrid caching to ensure functionality without connectivity.