Merge pull request #7 from pe1hvh/feature/debug

feat: pretty-print debug output for dicts and arrays
This commit is contained in:
pe1hvh
2026-02-09 23:52:14 +01:00
committed by GitHub
8 changed files with 327 additions and 37 deletions

View File

@@ -8,6 +8,27 @@ Format follows [Keep a Changelog](https://keepachangelog.com/) and [Semantic Ver
---
## [5.6.0] - 2026-02-09 — SDK Event Race Condition Fix
### Fixed
- 🛠 **BLE startup delay of ~2 minutes eliminated** — The meshcore Python SDK (`commands/base.py`) dispatched device response events before `wait_for_events()` registered its subscription. On busy networks with frequent `RX_LOG_DATA` events, this caused `send_device_query()` and `get_channel()` to fail repeatedly with `no_event_received`, wasting 110+ seconds in timeouts
### Changed
- 📄 `meshcore` SDK `commands/base.py`: Rewritten `send()` method to subscribe to expected events **before** transmitting the BLE command (subscribe-before-send pattern), matching the approach used by the companion apps (meshcore.js, iOS, Android). Submitted upstream as [meshcore_py PR #52](https://github.com/meshcore-dev/meshcore_py/pull/52)
### Impact
- Startup time reduced from ~2+ minutes to ~10 seconds on busy networks
- All BLE commands (`send_device_query`, `get_channel`, `get_bat`, `send_appstart`, etc.) now succeed on first attempt instead of requiring multiple retries
- No changes to meshcore_gui code required — the fix is entirely in the meshcore SDK
### Temporary Installation
Until the fix is merged upstream, install the patched meshcore SDK:
```bash
pip install --force-reinstall git+https://github.com/PE1HVH/meshcore_py.git@fix/event-race-condition
```
---
<!-- ADDED: v5.5.2 bugfix entry -->
## [5.5.2] - 2026-02-09 — Bugfix: Bot Device Name Restoration After Restart

View File

@@ -274,13 +274,15 @@ Device info, contacts and channel keys are automatically cached to disk in `~/.m
**Channel key loading:**
BLE commands to the MeshCore device are fundamentally unreliable — `get_channel()`, `send_appstart()` and `send_device_query()` can all fail even after multiple retries. The channel key loading strategy handles this gracefully:
Channel key loading uses a cache-first strategy with BLE fallback:
1. Cached keys are loaded first and never overwritten by name-derived fallbacks
2. Each channel gets 2 quick attempts at startup (non-blocking)
2. Each channel is queried from the device at startup
3. Channels that fail are retried in the background every 30 seconds
4. Successfully loaded keys are immediately written to the cache for next startup
> **Note:** Prior to v5.6.0, BLE commands frequently timed out due to a race condition in the meshcore SDK. The patched SDK resolves this — see [Known Limitations](#known-limitations) for installation instructions.
**Contact merge strategy:**
- New contacts from the device are added to the cache with a `last_seen` timestamp
- Existing contacts are updated (fresh data wins)
@@ -381,8 +383,8 @@ The built-in bot automatically replies to messages containing recognised keyword
## Known Limitations
1. **Channels hardcoded**The `get_channel()` function in meshcore-py is unreliable via BLE (mitigated by background retry and disk caching of channel keys)
2. **BLE command unreliability**`send_appstart()`, `send_device_query()` and `get_channel()` can all fail intermittently. The application uses aggressive retries (10 attempts for device info, background retry every 30s for channel keys) and disk caching to compensate
1. **Channels hardcoded**Channel configuration is defined in `config.py` rather than auto-discovered from the device
2. **BLE command reliability**Resolved in v5.6.0. The meshcore SDK previously had a race condition where device responses were missed. The patched SDK ([PR #52](https://github.com/meshcore-dev/meshcore_py/pull/52)) uses subscribe-before-send to eliminate this. Until merged upstream, install the patched version: `pip install --force-reinstall git+https://github.com/PE1HVH/meshcore_py.git@fix/event-race-condition`
3. **Initial load time** — GUI waits for BLE data before the first render is complete (mitigated by cache: if cached data exists, the GUI populates instantly)
4. **Archive route visualization** — Route data for archived messages depends on contacts currently in memory; archived-only messages without recent contact data may show incomplete routes
<!-- ADDED: Archive-related limitation -->

File diff suppressed because one or more lines are too long

View File

@@ -1 +0,0 @@
,hans,hans-NLx0AU,09.02.2026 15:24,file:///home/hans/.config/libreoffice/4;

Binary file not shown.

View File

@@ -35,9 +35,13 @@ from typing import Optional, Set
from meshcore import MeshCore, EventType
from meshcore_gui.config import (
BLE_DEFAULT_TIMEOUT,
BLE_LIB_DEBUG,
CHANNELS_CONFIG,
CONTACT_REFRESH_SECONDS,
debug_data,
debug_print,
pp,
)
from meshcore_gui.core.protocols import SharedDataWriter
from meshcore_gui.ble.commands import CommandHandler
@@ -142,10 +146,11 @@ class BLEWorker:
self.shared.set_status(f"🔄 Connecting to {self.address}...")
try:
print(f"BLE: Connecting to {self.address}...")
self.mc = await MeshCore.create_ble(self.address)
self.mc = await MeshCore.create_ble(self.address, default_timeout=BLE_DEFAULT_TIMEOUT, debug=BLE_LIB_DEBUG)
print("BLE: Connected!")
await asyncio.sleep(1)
debug_print("Post-connection sleep done, wiring collaborators")
# Wire collaborators now that mc is available
self._evt_handler = EventHandler(
@@ -242,34 +247,76 @@ class BLEWorker:
"""Load device info, channels and contacts from device.
Updates both SharedData (for GUI) and the disk cache.
Uses longer delays between retries because BLE command/response
over the meshcore library is unreliable with short intervals.
Key insight: ``MeshCore.connect()`` already sends ``send_appstart``
internally and stores the result in ``self.mc.self_info``. We reuse
that instead of sending a duplicate command that is likely to fail
on a busy mesh network. Only ``send_device_query`` needs a fresh
BLE round-trip.
"""
# send_appstart (retries with longer delays)
# ----------------------------------------------------------
# send_appstart — reuse result from MeshCore.connect()
# ----------------------------------------------------------
self.shared.set_status("🔄 Device info...")
appstart_ok = False
for i in range(10):
debug_print(f"send_appstart attempt {i + 1}/10")
try:
r = await self.mc.commands.send_appstart()
if r.type != EventType.ERROR:
print(f"BLE: send_appstart OK: {r.payload.get('name')} (attempt {i + 1})")
self.shared.update_from_appstart(r.payload)
self._cache.set_device(r.payload)
appstart_ok = True
break
except Exception as exc:
debug_print(f"send_appstart attempt {i + 1} exception: {exc}")
await asyncio.sleep(1.0)
if not appstart_ok:
print("BLE: ⚠️ send_appstart failed after 10 attempts")
cached_info = self.mc.self_info # Filled by connect() → send_appstart()
if cached_info and cached_info.get("name"):
print(f"BLE: send_appstart OK (from connect): {cached_info.get('name')}")
self.shared.update_from_appstart(cached_info)
self._cache.set_device(cached_info)
else:
# Fallback: device info not populated by connect() — retry manually
debug_print(
"self_info empty after connect(), falling back to manual send_appstart"
)
appstart_ok = False
for i in range(3):
debug_print(f"send_appstart fallback attempt {i + 1}/3")
try:
r = await self.mc.commands.send_appstart()
if r is None:
debug_print(
f"send_appstart fallback {i + 1}: received None, retrying"
)
await asyncio.sleep(2.0)
continue
if r.type != EventType.ERROR:
print(
f"BLE: send_appstart OK: {r.payload.get('name')} "
f"(fallback attempt {i + 1})"
)
self.shared.update_from_appstart(r.payload)
self._cache.set_device(r.payload)
appstart_ok = True
break
else:
debug_print(
f"send_appstart fallback {i + 1}: "
f"ERROR — payload={pp(r.payload)}"
)
except Exception as exc:
debug_print(f"send_appstart fallback {i + 1} exception: {exc}")
await asyncio.sleep(2.0)
# send_device_query (retries)
for i in range(10):
debug_print(f"send_device_query attempt {i + 1}/10")
if not appstart_ok:
print("BLE: ⚠️ send_appstart failed after 3 fallback attempts")
# ----------------------------------------------------------
# send_device_query — no internal cache, must query device
# Fewer attempts (5) with longer delays (2s) to give the
# firmware time to process between mesh traffic bursts.
# ----------------------------------------------------------
for i in range(5):
debug_print(f"send_device_query attempt {i + 1}/5")
try:
r = await self.mc.commands.send_device_query()
if r is None:
debug_print(
f"send_device_query attempt {i + 1}: "
f"received None response, retrying"
)
await asyncio.sleep(2.0)
continue
if r.type != EventType.ERROR:
fw = r.payload.get("ver", "")
print(f"BLE: send_device_query OK: {fw} (attempt {i + 1})")
@@ -277,21 +324,39 @@ class BLEWorker:
if fw:
self._cache.set_firmware_version(fw)
break
else:
debug_print(
f"send_device_query attempt {i + 1}: "
f"ERROR response — payload={pp(r.payload)}"
)
except Exception as exc:
debug_print(f"send_device_query attempt {i + 1} exception: {exc}")
await asyncio.sleep(1.0)
await asyncio.sleep(2.0)
# ----------------------------------------------------------
# Channels (hardcoded — BLE get_channel is unreliable)
# ----------------------------------------------------------
self.shared.set_status("🔄 Channels...")
self.shared.set_channels(CHANNELS_CONFIG)
self._cache.set_channels(CHANNELS_CONFIG)
print(f"BLE: Channels loaded: {[c['name'] for c in CHANNELS_CONFIG]}")
# ----------------------------------------------------------
# Contacts (merge with cache)
# ----------------------------------------------------------
self.shared.set_status("🔄 Contacts...")
debug_print("get_contacts starting")
try:
r = await self.mc.commands.get_contacts()
if r.type != EventType.ERROR:
debug_print(f"get_contacts result: type={r.type if r else None}")
if r and r.payload:
debug_data("get_contacts payload", r.payload)
if r is None:
debug_print(
"BLE: get_contacts returned None, "
"keeping cached contacts"
)
elif r.type != EventType.ERROR:
merged = self._cache.merge_contacts(r.payload)
self.shared.set_contacts(merged)
print(
@@ -299,7 +364,10 @@ class BLEWorker:
f"{len(merged)} total (with cache)"
)
else:
debug_print("BLE: get_contacts failed, keeping cached contacts")
debug_print(
"BLE: get_contacts failed — "
f"payload={pp(r.payload)}, keeping cached contacts"
)
except Exception as exc:
debug_print(f"BLE: get_contacts exception: {exc}")
@@ -375,10 +443,18 @@ class BLEWorker:
try:
r = await self.mc.commands.get_channel(idx)
if r is None:
debug_print(
f"get_channel({idx}) attempt {attempt + 1}/{max_attempts}: "
f"received None response, retrying"
)
await asyncio.sleep(delay)
continue
if r.type == EventType.ERROR:
debug_print(
f"get_channel({idx}) attempt {attempt + 1}/{max_attempts}: "
f"ERROR response"
f"ERROR response — payload={pp(r.payload)}"
)
await asyncio.sleep(delay)
continue
@@ -489,6 +565,9 @@ class BLEWorker:
"""Periodic background contact refresh — merge new/changed."""
try:
r = await self.mc.commands.get_contacts()
if r is None:
debug_print("Periodic refresh: get_contacts returned None, skipping")
return
if r.type != EventType.ERROR:
merged = self._cache.merge_contacts(r.payload)
self.shared.set_contacts(merged)

View File

@@ -7,9 +7,39 @@ UI display constants live in :mod:`meshcore_gui.gui.constants`.
The ``DEBUG`` flag defaults to False and can be activated at startup
with the ``--debug-on`` command-line option.
Debug output is written to both stdout and a rotating log file at
``~/.meshcore-gui/logs/meshcore_gui.log``.
"""
from typing import Dict, List
import json
import logging
import sys
from logging.handlers import RotatingFileHandler
from pathlib import Path
from typing import Any, Dict, List
# ==============================================================================
# DIRECTORY STRUCTURE
# ==============================================================================
# Base data directory — all persistent data lives under this root.
# Existing services (cache, pins, archive) each define their own
# sub-directory; this constant centralises the root for new consumers.
DATA_DIR: Path = Path.home() / ".meshcore-gui"
# Log directory for debug and error log files.
LOG_DIR: Path = DATA_DIR / "logs"
# Log file path (rotating: max 5 MB per file, 3 backups = 20 MB total).
LOG_FILE: Path = LOG_DIR / "meshcore_gui.log"
# Maximum size per log file in bytes (5 MB).
LOG_MAX_BYTES: int = 5 * 1024 * 1024
# Number of rotated backup files to keep.
LOG_BACKUP_COUNT: int = 3
# ==============================================================================
@@ -18,11 +48,162 @@ from typing import Dict, List
DEBUG: bool = False
# Internal file logger — initialised lazily on first debug_print() call.
_file_logger: logging.Logger | None = None
def _init_file_logger() -> logging.Logger:
"""Create and configure the rotating file logger (called once)."""
LOG_DIR.mkdir(parents=True, exist_ok=True)
logger = logging.getLogger("meshcore_gui.debug")
logger.setLevel(logging.DEBUG)
logger.propagate = False
handler = RotatingFileHandler(
LOG_FILE,
maxBytes=LOG_MAX_BYTES,
backupCount=LOG_BACKUP_COUNT,
encoding="utf-8",
)
handler.setFormatter(
logging.Formatter("%(asctime)s %(message)s", datefmt="%Y-%m-%d %H:%M:%S")
)
logger.addHandler(handler)
return logger
def _caller_module() -> str:
"""Return a short module label for the calling code.
Walks two frames up (debug_print -> caller) and extracts the
module ``__name__``. The common ``meshcore_gui.`` prefix is
stripped for brevity, e.g. ``ble.worker`` instead of
``meshcore_gui.ble.worker``.
"""
frame = sys._getframe(2) # 0=_caller_module, 1=debug_print, 2=actual caller
module = frame.f_globals.get("__name__", "<unknown>")
if module.startswith("meshcore_gui."):
module = module[len("meshcore_gui."):]
return module
def _init_meshcore_logger() -> None:
"""Route meshcore library debug output to our rotating log file.
The meshcore library uses ``logging.getLogger("meshcore")`` throughout,
but never attaches a handler. Without this function all library-level
debug output (raw BLE send/receive, event dispatching, command flow)
is silently dropped because Python's root logger only forwards
WARNING and above.
Call once at startup (or lazily from ``debug_print``) so that
``BLE_LIB_DEBUG=True`` actually produces visible output.
"""
LOG_DIR.mkdir(parents=True, exist_ok=True)
mc_logger = logging.getLogger("meshcore")
# Guard against duplicate handlers on repeated calls
if any(isinstance(h, RotatingFileHandler) for h in mc_logger.handlers):
return
handler = RotatingFileHandler(
LOG_FILE,
maxBytes=LOG_MAX_BYTES,
backupCount=LOG_BACKUP_COUNT,
encoding="utf-8",
)
handler.setFormatter(
logging.Formatter(
"%(asctime)s LIB [%(name)s]: %(message)s",
datefmt="%Y-%m-%d %H:%M:%S",
)
)
mc_logger.addHandler(handler)
# Also add a stdout handler so library output appears in the console
stdout_handler = logging.StreamHandler(sys.stdout)
stdout_handler.setFormatter(
logging.Formatter(
"%(asctime)s LIB [%(name)s]: %(message)s",
datefmt="%Y-%m-%d %H:%M:%S",
)
)
mc_logger.addHandler(stdout_handler)
def debug_print(msg: str) -> None:
"""Print a debug message when ``DEBUG`` is enabled."""
if DEBUG:
print(f"DEBUG: {msg}")
"""Print a debug message when ``DEBUG`` is enabled.
Output goes to both stdout and the rotating log file.
The calling module name is automatically included so that
exception context is immediately clear, e.g.::
DEBUG [ble.worker]: send_appstart attempt 3 exception: TimeoutError
"""
global _file_logger
if not DEBUG:
return
module = _caller_module()
formatted = f"DEBUG [{module}]: {msg}"
# stdout (existing behaviour, now with module tag)
print(formatted)
# Rotating log file
if _file_logger is None:
_file_logger = _init_file_logger()
# Also wire up the meshcore library logger so BLE_LIB_DEBUG
# output actually appears in the same log file + stdout.
_init_meshcore_logger()
_file_logger.debug(formatted)
def pp(obj: Any, indent: int = 2) -> str:
"""Pretty-format a dict, list, or other object for debug output.
Use inside f-strings::
debug_print(f"payload={pp(r.payload)}")
Dicts/lists get indented JSON; everything else falls back to repr().
"""
if isinstance(obj, (dict, list)):
try:
return json.dumps(obj, indent=indent, default=str, ensure_ascii=False)
except (TypeError, ValueError):
return repr(obj)
return repr(obj)
def debug_data(label: str, obj: Any) -> None:
"""Print a labelled data structure with pretty indentation.
Combines a header line with pretty-printed data below it::
debug_data("get_contacts result", r.payload)
Output::
DEBUG [ble.worker]: get_contacts result ↓
{
"name": "PE1HVH",
"contacts": 629,
...
}
"""
if not DEBUG:
return
formatted = pp(obj)
# Single-line values stay on the same line
if '\n' not in formatted:
debug_print(f"{label}: {formatted}")
else:
# Multi-line: indent each line for readability
indented = '\n'.join(f" {line}" for line in formatted.splitlines())
debug_print(f"{label}\n{indented}")
# ==============================================================================
@@ -61,6 +242,14 @@ DEVICE_NAME: str = "PE1HVH T1000e"
# CACHE / REFRESH
# ==============================================================================
# Default timeout (seconds) for BLE command responses.
# Increase if you see frequent 'no_event_received' errors during startup.
BLE_DEFAULT_TIMEOUT: float = 10.0
# Enable debug logging inside the meshcore BLE library itself.
# When True, raw BLE send/receive data and event parsing are logged.
BLE_LIB_DEBUG: bool = True
# Interval in seconds between periodic contact refreshes from the device.
# Contacts are merged (new/changed contacts update the cache; contacts
# only present in cache are kept so offline nodes are preserved).

1
requirements.txt Normal file
View File

@@ -0,0 +1 @@
meshcore @ git+https://github.com/PE1HVH/meshcore_py.git@fix/event-race-condition