# Persistent meshcli Session Architecture - Technical Notes ## Overview This document describes the architectural refactor from per-request subprocess spawning to a **persistent meshcli session** in the `meshcore-bridge` container. This fundamental change enables real-time message reception, advert logging, and advanced features like pending contact management. ## Previous Architecture (Before Refactor) ### How it Worked The original `meshcore-bridge` implementation used **subprocess.run()** for each HTTP request: ```python def run_meshcli_command(args, timeout=DEFAULT_TIMEOUT): result = subprocess.run( ['meshcli', '-s', MC_SERIAL_PORT] + args, capture_output=True, text=True, timeout=timeout ) return result ``` ### Limitations 1. **Serial Port Conflicts** - Each command spawned a new meshcli process, risking USB device locking 2. **No Real-time Messages** - Required periodic `recv` polling (inefficient, 30-60s delay) 3. **No Advert Logging** - JSON adverts from the mesh network were discarded 4. **No Interactive Features** - Commands like `msgs_subscribe` or `manual_add_contacts` require persistent session 5. **Higher Overhead** - Process spawn/teardown for every command added latency ### Why Change Was Needed User reported: **"od czasu zmian, czyli od ponad 1.5 godziny, nie dotarła ANI JEDNA wiadomość"** In non-interactive mode (subprocess.run), meshcli doesn't automatically receive new messages. The `recv` command only reads what's already in the `.msgs` file, it doesn't fetch NEW messages from the radio. ## New Architecture (Persistent Session) ### Core Concept Instead of spawning a new process per request, the bridge maintains a **single long-lived meshcli process** with: - **stdin pipe** - Send commands - **stdout pipe** - Receive responses and adverts - **stderr pipe** - Monitor errors ### Key Components #### 1. MeshCLISession Class The `MeshCLISession` class encapsulates the entire persistent session: ```python class MeshCLISession: def __init__(self, serial_port, config_dir, device_name): self.process = subprocess.Popen( ['meshcli', '-s', serial_port], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True, bufsize=1 # Line-buffered ) ``` #### 2. Worker Threads (4 Concurrent Threads) **a) stdout_thread** - Reads stdout line-by-line - Parses each line as JSON - If `payload_typename == "ADVERT"` → log to `.adverts.jsonl` - Otherwise → append to current CLI command response buffer **b) stderr_thread** - Reads stderr and logs errors - Monitors `meshcli stderr: ...` messages - TTY errors are harmless (meshcli tries to use terminal features that don't exist in pipes) **c) stdin_thread** - Sends queued commands to stdin - Pulls commands from thread-safe `queue.Queue` - Writes to `process.stdin` - Starts timeout monitor thread for each command **d) watchdog_thread** - Monitors process health - Checks `process.poll()` every 5 seconds - If process crashed → cancels pending commands, restarts session #### 3. Command Queue System Commands are executed serially through a thread-safe queue: ```python self.command_queue = queue.Queue() # Client calls execute_command() self.command_queue.put((cmd_id, command, event, response_dict)) # stdin_thread pulls from queue cmd_id, command, event, response_dict = self.command_queue.get(timeout=1.0) ``` #### 4. Event-based Synchronization Each command gets a `threading.Event` for completion notification: ```python event = threading.Event() response_dict = { "event": event, "response": [], "done": False, "error": None, "last_line_time": time.time() } # Queue command self.command_queue.put((cmd_id, command, event, response_dict)) # Wait for completion if not event.wait(timeout): return {'success': False, 'stderr': 'Command timeout'} ``` #### 5. Timeout-based Response Detection Since meshcli doesn't provide end-of-response markers, we use **idle timeout detection**: - Monitor `last_line_time` timestamp for each command - If no new lines arrive for **300ms** → command is complete - `event.set()` signals completion to waiting client ```python def _monitor_response_timeout(self, cmd_id, response_dict, event, timeout_ms=300): while not self.shutdown_flag.is_set(): time.sleep(timeout_ms / 1000.0) with self.pending_lock: time_since_last_line = time.time() - response_dict["last_line_time"] if time_since_last_line >= (timeout_ms / 1000.0): logger.info(f"Command [{cmd_id}] completed (timeout-based)") response_dict["done"] = True event.set() return ``` ### Session Initialization Commands On startup, the bridge configures the meshcli session: ```python def _init_session_settings(self): self.process.stdin.write('set json_log_rx on\n') self.process.stdin.write('set print_adverts on\n') self.process.stdin.write('msgs_subscribe\n') self.process.stdin.flush() ``` #### Command Breakdown: 1. **`set json_log_rx on`** - Enable JSON output for received messages 2. **`set print_adverts on`** - Print advertisement frames to stdout 3. **`msgs_subscribe`** - Subscribe to real-time message events (critical for instant message reception!) ### Multiplexing Logic The `_read_stdout()` thread routes each line to the correct destination: ```python def _read_stdout(self): for line in iter(self.process.stdout.readline, ''): line = line.rstrip('\n\r') # Try to parse as JSON advert if self._is_advert_json(line): self._log_advert(line) # → .adverts.jsonl continue # Otherwise, append to current CLI response self._append_to_current_response(line) # → HTTP response ``` ### Advert Logging JSON adverts are logged to `{device_name}.adverts.jsonl`: ```python def _log_advert(self, json_line): data = json.loads(json_line) data["ts"] = time.time() # Add timestamp with open(self.advert_log_path, 'a', encoding='utf-8') as f: f.write(json.dumps(data, ensure_ascii=False) + '\n') ``` **File format**: JSON Lines (.jsonl) - one JSON object per line: ```json {"payload_typename":"ADVERT","from_id":"abc123",...,"ts":1735425678.123} {"payload_typename":"ADVERT","from_id":"def456",...,"ts":1735425680.456} ``` ## Command Argument Quoting meshcli in interactive mode requires proper quoting for arguments with spaces: ```python def execute_command(self, args, timeout=DEFAULT_TIMEOUT): quoted_args = [] for arg in args: # If argument contains spaces or special chars, wrap in double quotes if ' ' in arg or '"' in arg or "'" in arg: escaped = arg.replace('"', '\\"') quoted_args.append(f'"{escaped}"') else: quoted_args.append(arg) command = ' '.join(quoted_args) ``` **Why not shlex.quote()?** - `shlex.quote()` uses single quotes (`'message'`) - meshcli treats single quotes literally, so they appear in sent messages - **Solution**: Custom double-quote wrapping with escaped internal double quotes ## Real-time Message Reception ### The Problem (Before msgs_subscribe) With periodic `recv` polling: - `recv` command only reads from `.msgs` file - It doesn't fetch NEW messages from the radio - User reported: "od ponad 1.5 godziny, nie dotarła ANI JEDNA wiadomość" ### The Solution (msgs_subscribe) User insight: **"W trybie interaktywnym, `msg_subscribe` włącza wyświetlanie wiadomości w momencie ich nadejścia"** When `msgs_subscribe` is active in interactive mode: - meshcli listens for message events from the radio - New messages are immediately printed to stdout - No polling needed - true event-driven architecture ### How It Works 1. Session init sends `msgs_subscribe\n` to stdin 2. meshcli subscribes to radio message events 3. When new message arrives: - meshcli writes message to `.msgs` file - meshcli prints message to stdout (captured by `_read_stdout` thread) 4. mc-webui detects change in `.msgs` file (file watcher or periodic stat check) 5. UI updates in real-time ## Watchdog and Auto-restart The watchdog thread monitors process health: ```python def _watchdog(self): while not self.shutdown_flag.is_set(): time.sleep(5) if self.process and self.process.poll() is not None: logger.error(f"meshcli process died (exit code: {self.process.returncode})") # Cancel all pending commands with self.pending_lock: for cmd_id, resp_dict in self.pending_commands.items(): resp_dict["error"] = "meshcli process crashed" resp_dict["done"] = True resp_dict["event"].set() self.pending_commands.clear() # Restart self._start_session() ``` **Benefits:** - Automatic recovery from crashes - No manual intervention required - Pending commands receive error responses instead of hanging ## Thread Safety ### Locks Used 1. **`self.pending_lock`** - Protects `pending_commands` dict and `current_cmd_id` 2. **`self.process_lock`** - Protects process handle (currently unused, reserved for future) ### Thread-safe Data Structures - **`queue.Queue()`** - Thread-safe command queue (built-in locking) ## Docker Configuration Changes ### Environment Variables Added ```yaml # docker-compose.yml meshcore-bridge: environment: - MC_CONFIG_DIR=/root/.config/meshcore # For advert log path - MC_DEVICE_NAME=${MC_DEVICE_NAME} # For .adverts.jsonl filename - TZ=${TZ:-UTC} # Configurable timezone ``` ### .env Configuration ```bash # .env TZ=Europe/Warsaw # Timezone for container logs (default: UTC) ``` ## Benefits of Persistent Session ### Immediate Benefits 1. **Real-time Messages** - `msgs_subscribe` enables instant message reception 2. **Advert Logging** - Network advertisements logged to `.adverts.jsonl` 3. **Better Stability** - Single USB session, no serial port conflicts 4. **Lower Latency** - No process spawn/teardown overhead ### Future Possibilities The persistent session enables advanced features that were impossible before: 1. **Pending Contact Management** ```bash set manual_add_contacts on # Disable auto-add pending_contacts # List pending contact requests add_pending # Approve specific contact ``` 2. **Interactive Configuration** ```bash set