From ff0d52e281ec101ac0fdb90cd3029a3fb228bb2d Mon Sep 17 00:00:00 2001 From: MarekWo Date: Sun, 28 Dec 2025 18:10:32 +0100 Subject: [PATCH] docs: Update documentation for persistent meshcli session architecture MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Updated documentation to reflect the fundamental architectural change from per-request subprocess spawning to a persistent meshcli session in meshcore-bridge. Changes: - Updated README.md with detailed bridge session architecture section - Added TZ environment variable to configuration table - Created comprehensive technical note (technotes/persistent-meshcli-session.md) documenting the refactor, implementation details, and benefits Key architectural changes documented: - Single subprocess.Popen with stdin/stdout pipes (not subprocess.run per request) - Multiplexing: JSON adverts → .adverts.jsonl log, CLI responses → HTTP - Real-time message reception via msgs_subscribe (no polling required) - Thread-safe command queue with event-based synchronization - Watchdog thread for automatic crash recovery - Timeout-based response detection (300ms idle threshold) This persistent session enables: ✅ Real-time message reception without polling ✅ Network advertisement logging ✅ Advanced interactive features (manual_add_contacts, etc.) ✅ Better stability and lower latency 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 --- README.md | 25 +- technotes/persistent-meshcli-session.md | 504 ++++++++++++++++++++++++ 2 files changed, 526 insertions(+), 3 deletions(-) create mode 100644 technotes/persistent-meshcli-session.md diff --git a/README.md b/README.md index c7ba70f..6695181 100644 --- a/README.md +++ b/README.md @@ -88,7 +88,7 @@ All configuration is done via environment variables in the `.env` file: | Variable | Description | Default | |----------|-------------|---------| | `MC_SERIAL_PORT` | Path to serial device | `/dev/ttyUSB0` | -| `MC_DEVICE_NAME` | Device name (for .msgs file) | `MeshCore` | +| `MC_DEVICE_NAME` | Device name (for .msgs and .adverts.jsonl files) | `MeshCore` | | `MC_CONFIG_DIR` | meshcore configuration directory | `/root/.config/meshcore` | | `MC_REFRESH_INTERVAL` | Auto-refresh interval (seconds) | `60` | | `MC_INACTIVE_HOURS` | Inactivity threshold for cleanup | `48` | @@ -98,6 +98,7 @@ All configuration is done via environment variables in the `.env` file: | `FLASK_HOST` | Listen address | `0.0.0.0` | | `FLASK_PORT` | Application port | `5000` | | `FLASK_DEBUG` | Debug mode | `false` | +| `TZ` | Timezone for container logs | `UTC` | See [.env.example](.env.example) for a complete example. @@ -106,9 +107,12 @@ See [.env.example](.env.example) for a complete example. mc-webui uses a **2-container architecture** for improved USB stability: 1. **meshcore-bridge** - Lightweight service with exclusive USB device access - - Runs meshcore-cli subprocess calls + - Maintains a **persistent meshcli session** (single long-lived process) + - Multiplexes stdout: JSON adverts → `.adverts.jsonl` log, CLI commands → HTTP responses + - Real-time message reception via `msgs_subscribe` (no polling) + - Thread-safe command queue with event-based synchronization + - Watchdog thread for automatic crash recovery - Exposes HTTP API on port 5001 (internal only) - - Automatically restarts on USB communication issues 2. **mc-webui** - Main web application - Flask-based web interface @@ -117,6 +121,21 @@ mc-webui uses a **2-container architecture** for improved USB stability: This separation solves USB timeout/deadlock issues common in Docker + VM environments. +### Bridge Session Architecture + +The meshcore-bridge maintains a **single persistent meshcli session** instead of spawning new processes per request: + +- **Single subprocess.Popen** - One long-lived meshcli process with stdin/stdout pipes +- **Multiplexing** - Intelligently routes output: + - JSON adverts (with `payload_typename: "ADVERT"`) → logged to `{device_name}.adverts.jsonl` + - CLI command responses → returned via HTTP API +- **Real-time messages** - `msgs_subscribe` command enables instant message reception without polling +- **Thread-safe queue** - Commands are serialized through a queue.Queue for FIFO execution +- **Timeout-based detection** - Response completion detected when no new lines arrive for 300ms +- **Auto-restart watchdog** - Monitors process health and restarts on crash + +This architecture enables advanced features like pending contact management (`manual_add_contacts`) and provides better stability and performance. + ## Project Structure ``` diff --git a/technotes/persistent-meshcli-session.md b/technotes/persistent-meshcli-session.md new file mode 100644 index 0000000..547edfd --- /dev/null +++ b/technotes/persistent-meshcli-session.md @@ -0,0 +1,504 @@ +# Persistent meshcli Session Architecture - Technical Notes + +## Overview + +This document describes the architectural refactor from per-request subprocess spawning to a **persistent meshcli session** in the `meshcore-bridge` container. This fundamental change enables real-time message reception, advert logging, and advanced features like pending contact management. + +## Previous Architecture (Before Refactor) + +### How it Worked + +The original `meshcore-bridge` implementation used **subprocess.run()** for each HTTP request: + +```python +def run_meshcli_command(args, timeout=DEFAULT_TIMEOUT): + result = subprocess.run( + ['meshcli', '-s', MC_SERIAL_PORT] + args, + capture_output=True, + text=True, + timeout=timeout + ) + return result +``` + +### Limitations + +1. **Serial Port Conflicts** - Each command spawned a new meshcli process, risking USB device locking +2. **No Real-time Messages** - Required periodic `recv` polling (inefficient, 30-60s delay) +3. **No Advert Logging** - JSON adverts from the mesh network were discarded +4. **No Interactive Features** - Commands like `msgs_subscribe` or `manual_add_contacts` require persistent session +5. **Higher Overhead** - Process spawn/teardown for every command added latency + +### Why Change Was Needed + +User reported: **"od czasu zmian, czyli od ponad 1.5 godziny, nie dotarła ANI JEDNA wiadomość"** + +In non-interactive mode (subprocess.run), meshcli doesn't automatically receive new messages. The `recv` command only reads what's already in the `.msgs` file, it doesn't fetch NEW messages from the radio. + +## New Architecture (Persistent Session) + +### Core Concept + +Instead of spawning a new process per request, the bridge maintains a **single long-lived meshcli process** with: +- **stdin pipe** - Send commands +- **stdout pipe** - Receive responses and adverts +- **stderr pipe** - Monitor errors + +### Key Components + +#### 1. MeshCLISession Class + +The `MeshCLISession` class encapsulates the entire persistent session: + +```python +class MeshCLISession: + def __init__(self, serial_port, config_dir, device_name): + self.process = subprocess.Popen( + ['meshcli', '-s', serial_port], + stdin=subprocess.PIPE, + stdout=subprocess.PIPE, + stderr=subprocess.PIPE, + text=True, + bufsize=1 # Line-buffered + ) +``` + +#### 2. Worker Threads (4 Concurrent Threads) + +**a) stdout_thread** - Reads stdout line-by-line +- Parses each line as JSON +- If `payload_typename == "ADVERT"` → log to `.adverts.jsonl` +- Otherwise → append to current CLI command response buffer + +**b) stderr_thread** - Reads stderr and logs errors +- Monitors `meshcli stderr: ...` messages +- TTY errors are harmless (meshcli tries to use terminal features that don't exist in pipes) + +**c) stdin_thread** - Sends queued commands to stdin +- Pulls commands from thread-safe `queue.Queue` +- Writes to `process.stdin` +- Starts timeout monitor thread for each command + +**d) watchdog_thread** - Monitors process health +- Checks `process.poll()` every 5 seconds +- If process crashed → cancels pending commands, restarts session + +#### 3. Command Queue System + +Commands are executed serially through a thread-safe queue: + +```python +self.command_queue = queue.Queue() + +# Client calls execute_command() +self.command_queue.put((cmd_id, command, event, response_dict)) + +# stdin_thread pulls from queue +cmd_id, command, event, response_dict = self.command_queue.get(timeout=1.0) +``` + +#### 4. Event-based Synchronization + +Each command gets a `threading.Event` for completion notification: + +```python +event = threading.Event() +response_dict = { + "event": event, + "response": [], + "done": False, + "error": None, + "last_line_time": time.time() +} + +# Queue command +self.command_queue.put((cmd_id, command, event, response_dict)) + +# Wait for completion +if not event.wait(timeout): + return {'success': False, 'stderr': 'Command timeout'} +``` + +#### 5. Timeout-based Response Detection + +Since meshcli doesn't provide end-of-response markers, we use **idle timeout detection**: + +- Monitor `last_line_time` timestamp for each command +- If no new lines arrive for **300ms** → command is complete +- `event.set()` signals completion to waiting client + +```python +def _monitor_response_timeout(self, cmd_id, response_dict, event, timeout_ms=300): + while not self.shutdown_flag.is_set(): + time.sleep(timeout_ms / 1000.0) + + with self.pending_lock: + time_since_last_line = time.time() - response_dict["last_line_time"] + + if time_since_last_line >= (timeout_ms / 1000.0): + logger.info(f"Command [{cmd_id}] completed (timeout-based)") + response_dict["done"] = True + event.set() + return +``` + +### Session Initialization Commands + +On startup, the bridge configures the meshcli session: + +```python +def _init_session_settings(self): + self.process.stdin.write('set json_log_rx on\n') + self.process.stdin.write('set print_adverts on\n') + self.process.stdin.write('msgs_subscribe\n') + self.process.stdin.flush() +``` + +#### Command Breakdown: + +1. **`set json_log_rx on`** - Enable JSON output for received messages +2. **`set print_adverts on`** - Print advertisement frames to stdout +3. **`msgs_subscribe`** - Subscribe to real-time message events (critical for instant message reception!) + +### Multiplexing Logic + +The `_read_stdout()` thread routes each line to the correct destination: + +```python +def _read_stdout(self): + for line in iter(self.process.stdout.readline, ''): + line = line.rstrip('\n\r') + + # Try to parse as JSON advert + if self._is_advert_json(line): + self._log_advert(line) # → .adverts.jsonl + continue + + # Otherwise, append to current CLI response + self._append_to_current_response(line) # → HTTP response +``` + +### Advert Logging + +JSON adverts are logged to `{device_name}.adverts.jsonl`: + +```python +def _log_advert(self, json_line): + data = json.loads(json_line) + data["ts"] = time.time() # Add timestamp + + with open(self.advert_log_path, 'a', encoding='utf-8') as f: + f.write(json.dumps(data, ensure_ascii=False) + '\n') +``` + +**File format**: JSON Lines (.jsonl) - one JSON object per line: +```json +{"payload_typename":"ADVERT","from_id":"abc123",...,"ts":1735425678.123} +{"payload_typename":"ADVERT","from_id":"def456",...,"ts":1735425680.456} +``` + +## Command Argument Quoting + +meshcli in interactive mode requires proper quoting for arguments with spaces: + +```python +def execute_command(self, args, timeout=DEFAULT_TIMEOUT): + quoted_args = [] + for arg in args: + # If argument contains spaces or special chars, wrap in double quotes + if ' ' in arg or '"' in arg or "'" in arg: + escaped = arg.replace('"', '\\"') + quoted_args.append(f'"{escaped}"') + else: + quoted_args.append(arg) + + command = ' '.join(quoted_args) +``` + +**Why not shlex.quote()?** +- `shlex.quote()` uses single quotes (`'message'`) +- meshcli treats single quotes literally, so they appear in sent messages +- **Solution**: Custom double-quote wrapping with escaped internal double quotes + +## Real-time Message Reception + +### The Problem (Before msgs_subscribe) + +With periodic `recv` polling: +- `recv` command only reads from `.msgs` file +- It doesn't fetch NEW messages from the radio +- User reported: "od ponad 1.5 godziny, nie dotarła ANI JEDNA wiadomość" + +### The Solution (msgs_subscribe) + +User insight: **"W trybie interaktywnym, `msg_subscribe` włącza wyświetlanie wiadomości w momencie ich nadejścia"** + +When `msgs_subscribe` is active in interactive mode: +- meshcli listens for message events from the radio +- New messages are immediately printed to stdout +- No polling needed - true event-driven architecture + +### How It Works + +1. Session init sends `msgs_subscribe\n` to stdin +2. meshcli subscribes to radio message events +3. When new message arrives: + - meshcli writes message to `.msgs` file + - meshcli prints message to stdout (captured by `_read_stdout` thread) +4. mc-webui detects change in `.msgs` file (file watcher or periodic stat check) +5. UI updates in real-time + +## Watchdog and Auto-restart + +The watchdog thread monitors process health: + +```python +def _watchdog(self): + while not self.shutdown_flag.is_set(): + time.sleep(5) + + if self.process and self.process.poll() is not None: + logger.error(f"meshcli process died (exit code: {self.process.returncode})") + + # Cancel all pending commands + with self.pending_lock: + for cmd_id, resp_dict in self.pending_commands.items(): + resp_dict["error"] = "meshcli process crashed" + resp_dict["done"] = True + resp_dict["event"].set() + self.pending_commands.clear() + + # Restart + self._start_session() +``` + +**Benefits:** +- Automatic recovery from crashes +- No manual intervention required +- Pending commands receive error responses instead of hanging + +## Thread Safety + +### Locks Used + +1. **`self.pending_lock`** - Protects `pending_commands` dict and `current_cmd_id` +2. **`self.process_lock`** - Protects process handle (currently unused, reserved for future) + +### Thread-safe Data Structures + +- **`queue.Queue()`** - Thread-safe command queue (built-in locking) + +## Docker Configuration Changes + +### Environment Variables Added + +```yaml +# docker-compose.yml +meshcore-bridge: + environment: + - MC_CONFIG_DIR=/root/.config/meshcore # For advert log path + - MC_DEVICE_NAME=${MC_DEVICE_NAME} # For .adverts.jsonl filename + - TZ=${TZ:-UTC} # Configurable timezone +``` + +### .env Configuration + +```bash +# .env +TZ=Europe/Warsaw # Timezone for container logs (default: UTC) +``` + +## Benefits of Persistent Session + +### Immediate Benefits + +1. **Real-time Messages** - `msgs_subscribe` enables instant message reception +2. **Advert Logging** - Network advertisements logged to `.adverts.jsonl` +3. **Better Stability** - Single USB session, no serial port conflicts +4. **Lower Latency** - No process spawn/teardown overhead + +### Future Possibilities + +The persistent session enables advanced features that were impossible before: + +1. **Pending Contact Management** + ```bash + set manual_add_contacts on # Disable auto-add + pending_contacts # List pending contact requests + add_pending # Approve specific contact + ``` + +2. **Interactive Configuration** + ```bash + set