mirror of
https://github.com/pe1hvh/meshcore-gui.git
synced 2026-05-02 11:32:26 +02:00
Merge pull request #7 from pe1hvh/feature/debug
feat: pretty-print debug output for dicts and arrays
This commit is contained in:
21
CHANGELOG.md
21
CHANGELOG.md
@@ -8,6 +8,27 @@ Format follows [Keep a Changelog](https://keepachangelog.com/) and [Semantic Ver
|
||||
|
||||
---
|
||||
|
||||
## [5.6.0] - 2026-02-09 — SDK Event Race Condition Fix
|
||||
|
||||
### Fixed
|
||||
- 🛠 **BLE startup delay of ~2 minutes eliminated** — The meshcore Python SDK (`commands/base.py`) dispatched device response events before `wait_for_events()` registered its subscription. On busy networks with frequent `RX_LOG_DATA` events, this caused `send_device_query()` and `get_channel()` to fail repeatedly with `no_event_received`, wasting 110+ seconds in timeouts
|
||||
|
||||
### Changed
|
||||
- 📄 `meshcore` SDK `commands/base.py`: Rewritten `send()` method to subscribe to expected events **before** transmitting the BLE command (subscribe-before-send pattern), matching the approach used by the companion apps (meshcore.js, iOS, Android). Submitted upstream as [meshcore_py PR #52](https://github.com/meshcore-dev/meshcore_py/pull/52)
|
||||
|
||||
### Impact
|
||||
- Startup time reduced from ~2+ minutes to ~10 seconds on busy networks
|
||||
- All BLE commands (`send_device_query`, `get_channel`, `get_bat`, `send_appstart`, etc.) now succeed on first attempt instead of requiring multiple retries
|
||||
- No changes to meshcore_gui code required — the fix is entirely in the meshcore SDK
|
||||
|
||||
### Temporary Installation
|
||||
Until the fix is merged upstream, install the patched meshcore SDK:
|
||||
```bash
|
||||
pip install --force-reinstall git+https://github.com/PE1HVH/meshcore_py.git@fix/event-race-condition
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
<!-- ADDED: v5.5.2 bugfix entry -->
|
||||
|
||||
## [5.5.2] - 2026-02-09 — Bugfix: Bot Device Name Restoration After Restart
|
||||
|
||||
10
README.md
10
README.md
@@ -274,13 +274,15 @@ Device info, contacts and channel keys are automatically cached to disk in `~/.m
|
||||
|
||||
**Channel key loading:**
|
||||
|
||||
BLE commands to the MeshCore device are fundamentally unreliable — `get_channel()`, `send_appstart()` and `send_device_query()` can all fail even after multiple retries. The channel key loading strategy handles this gracefully:
|
||||
Channel key loading uses a cache-first strategy with BLE fallback:
|
||||
|
||||
1. Cached keys are loaded first and never overwritten by name-derived fallbacks
|
||||
2. Each channel gets 2 quick attempts at startup (non-blocking)
|
||||
2. Each channel is queried from the device at startup
|
||||
3. Channels that fail are retried in the background every 30 seconds
|
||||
4. Successfully loaded keys are immediately written to the cache for next startup
|
||||
|
||||
> **Note:** Prior to v5.6.0, BLE commands frequently timed out due to a race condition in the meshcore SDK. The patched SDK resolves this — see [Known Limitations](#known-limitations) for installation instructions.
|
||||
|
||||
**Contact merge strategy:**
|
||||
- New contacts from the device are added to the cache with a `last_seen` timestamp
|
||||
- Existing contacts are updated (fresh data wins)
|
||||
@@ -381,8 +383,8 @@ The built-in bot automatically replies to messages containing recognised keyword
|
||||
|
||||
## Known Limitations
|
||||
|
||||
1. **Channels hardcoded** — The `get_channel()` function in meshcore-py is unreliable via BLE (mitigated by background retry and disk caching of channel keys)
|
||||
2. **BLE command unreliability** — `send_appstart()`, `send_device_query()` and `get_channel()` can all fail intermittently. The application uses aggressive retries (10 attempts for device info, background retry every 30s for channel keys) and disk caching to compensate
|
||||
1. **Channels hardcoded** — Channel configuration is defined in `config.py` rather than auto-discovered from the device
|
||||
2. **BLE command reliability** — Resolved in v5.6.0. The meshcore SDK previously had a race condition where device responses were missed. The patched SDK ([PR #52](https://github.com/meshcore-dev/meshcore_py/pull/52)) uses subscribe-before-send to eliminate this. Until merged upstream, install the patched version: `pip install --force-reinstall git+https://github.com/PE1HVH/meshcore_py.git@fix/event-race-condition`
|
||||
3. **Initial load time** — GUI waits for BLE data before the first render is complete (mitigated by cache: if cached data exists, the GUI populates instantly)
|
||||
4. **Archive route visualization** — Route data for archived messages depends on contacts currently in memory; archived-only messages without recent contact data may show incomplete routes
|
||||
<!-- ADDED: Archive-related limitation -->
|
||||
|
||||
File diff suppressed because one or more lines are too long
@@ -1 +0,0 @@
|
||||
,hans,hans-NLx0AU,09.02.2026 15:24,file:///home/hans/.config/libreoffice/4;
|
||||
Binary file not shown.
@@ -35,9 +35,13 @@ from typing import Optional, Set
|
||||
from meshcore import MeshCore, EventType
|
||||
|
||||
from meshcore_gui.config import (
|
||||
BLE_DEFAULT_TIMEOUT,
|
||||
BLE_LIB_DEBUG,
|
||||
CHANNELS_CONFIG,
|
||||
CONTACT_REFRESH_SECONDS,
|
||||
debug_data,
|
||||
debug_print,
|
||||
pp,
|
||||
)
|
||||
from meshcore_gui.core.protocols import SharedDataWriter
|
||||
from meshcore_gui.ble.commands import CommandHandler
|
||||
@@ -142,10 +146,11 @@ class BLEWorker:
|
||||
self.shared.set_status(f"🔄 Connecting to {self.address}...")
|
||||
try:
|
||||
print(f"BLE: Connecting to {self.address}...")
|
||||
self.mc = await MeshCore.create_ble(self.address)
|
||||
self.mc = await MeshCore.create_ble(self.address, default_timeout=BLE_DEFAULT_TIMEOUT, debug=BLE_LIB_DEBUG)
|
||||
print("BLE: Connected!")
|
||||
|
||||
await asyncio.sleep(1)
|
||||
debug_print("Post-connection sleep done, wiring collaborators")
|
||||
|
||||
# Wire collaborators now that mc is available
|
||||
self._evt_handler = EventHandler(
|
||||
@@ -242,34 +247,76 @@ class BLEWorker:
|
||||
"""Load device info, channels and contacts from device.
|
||||
|
||||
Updates both SharedData (for GUI) and the disk cache.
|
||||
Uses longer delays between retries because BLE command/response
|
||||
over the meshcore library is unreliable with short intervals.
|
||||
|
||||
Key insight: ``MeshCore.connect()`` already sends ``send_appstart``
|
||||
internally and stores the result in ``self.mc.self_info``. We reuse
|
||||
that instead of sending a duplicate command that is likely to fail
|
||||
on a busy mesh network. Only ``send_device_query`` needs a fresh
|
||||
BLE round-trip.
|
||||
"""
|
||||
# send_appstart (retries with longer delays)
|
||||
# ----------------------------------------------------------
|
||||
# send_appstart — reuse result from MeshCore.connect()
|
||||
# ----------------------------------------------------------
|
||||
self.shared.set_status("🔄 Device info...")
|
||||
appstart_ok = False
|
||||
for i in range(10):
|
||||
debug_print(f"send_appstart attempt {i + 1}/10")
|
||||
try:
|
||||
r = await self.mc.commands.send_appstart()
|
||||
if r.type != EventType.ERROR:
|
||||
print(f"BLE: send_appstart OK: {r.payload.get('name')} (attempt {i + 1})")
|
||||
self.shared.update_from_appstart(r.payload)
|
||||
self._cache.set_device(r.payload)
|
||||
appstart_ok = True
|
||||
break
|
||||
except Exception as exc:
|
||||
debug_print(f"send_appstart attempt {i + 1} exception: {exc}")
|
||||
await asyncio.sleep(1.0)
|
||||
|
||||
if not appstart_ok:
|
||||
print("BLE: ⚠️ send_appstart failed after 10 attempts")
|
||||
cached_info = self.mc.self_info # Filled by connect() → send_appstart()
|
||||
if cached_info and cached_info.get("name"):
|
||||
print(f"BLE: send_appstart OK (from connect): {cached_info.get('name')}")
|
||||
self.shared.update_from_appstart(cached_info)
|
||||
self._cache.set_device(cached_info)
|
||||
else:
|
||||
# Fallback: device info not populated by connect() — retry manually
|
||||
debug_print(
|
||||
"self_info empty after connect(), falling back to manual send_appstart"
|
||||
)
|
||||
appstart_ok = False
|
||||
for i in range(3):
|
||||
debug_print(f"send_appstart fallback attempt {i + 1}/3")
|
||||
try:
|
||||
r = await self.mc.commands.send_appstart()
|
||||
if r is None:
|
||||
debug_print(
|
||||
f"send_appstart fallback {i + 1}: received None, retrying"
|
||||
)
|
||||
await asyncio.sleep(2.0)
|
||||
continue
|
||||
if r.type != EventType.ERROR:
|
||||
print(
|
||||
f"BLE: send_appstart OK: {r.payload.get('name')} "
|
||||
f"(fallback attempt {i + 1})"
|
||||
)
|
||||
self.shared.update_from_appstart(r.payload)
|
||||
self._cache.set_device(r.payload)
|
||||
appstart_ok = True
|
||||
break
|
||||
else:
|
||||
debug_print(
|
||||
f"send_appstart fallback {i + 1}: "
|
||||
f"ERROR — payload={pp(r.payload)}"
|
||||
)
|
||||
except Exception as exc:
|
||||
debug_print(f"send_appstart fallback {i + 1} exception: {exc}")
|
||||
await asyncio.sleep(2.0)
|
||||
|
||||
# send_device_query (retries)
|
||||
for i in range(10):
|
||||
debug_print(f"send_device_query attempt {i + 1}/10")
|
||||
if not appstart_ok:
|
||||
print("BLE: ⚠️ send_appstart failed after 3 fallback attempts")
|
||||
|
||||
# ----------------------------------------------------------
|
||||
# send_device_query — no internal cache, must query device
|
||||
# Fewer attempts (5) with longer delays (2s) to give the
|
||||
# firmware time to process between mesh traffic bursts.
|
||||
# ----------------------------------------------------------
|
||||
for i in range(5):
|
||||
debug_print(f"send_device_query attempt {i + 1}/5")
|
||||
try:
|
||||
r = await self.mc.commands.send_device_query()
|
||||
if r is None:
|
||||
debug_print(
|
||||
f"send_device_query attempt {i + 1}: "
|
||||
f"received None response, retrying"
|
||||
)
|
||||
await asyncio.sleep(2.0)
|
||||
continue
|
||||
if r.type != EventType.ERROR:
|
||||
fw = r.payload.get("ver", "")
|
||||
print(f"BLE: send_device_query OK: {fw} (attempt {i + 1})")
|
||||
@@ -277,21 +324,39 @@ class BLEWorker:
|
||||
if fw:
|
||||
self._cache.set_firmware_version(fw)
|
||||
break
|
||||
else:
|
||||
debug_print(
|
||||
f"send_device_query attempt {i + 1}: "
|
||||
f"ERROR response — payload={pp(r.payload)}"
|
||||
)
|
||||
except Exception as exc:
|
||||
debug_print(f"send_device_query attempt {i + 1} exception: {exc}")
|
||||
await asyncio.sleep(1.0)
|
||||
await asyncio.sleep(2.0)
|
||||
|
||||
# ----------------------------------------------------------
|
||||
# Channels (hardcoded — BLE get_channel is unreliable)
|
||||
# ----------------------------------------------------------
|
||||
self.shared.set_status("🔄 Channels...")
|
||||
self.shared.set_channels(CHANNELS_CONFIG)
|
||||
self._cache.set_channels(CHANNELS_CONFIG)
|
||||
print(f"BLE: Channels loaded: {[c['name'] for c in CHANNELS_CONFIG]}")
|
||||
|
||||
# ----------------------------------------------------------
|
||||
# Contacts (merge with cache)
|
||||
# ----------------------------------------------------------
|
||||
self.shared.set_status("🔄 Contacts...")
|
||||
debug_print("get_contacts starting")
|
||||
try:
|
||||
r = await self.mc.commands.get_contacts()
|
||||
if r.type != EventType.ERROR:
|
||||
debug_print(f"get_contacts result: type={r.type if r else None}")
|
||||
if r and r.payload:
|
||||
debug_data("get_contacts payload", r.payload)
|
||||
if r is None:
|
||||
debug_print(
|
||||
"BLE: get_contacts returned None, "
|
||||
"keeping cached contacts"
|
||||
)
|
||||
elif r.type != EventType.ERROR:
|
||||
merged = self._cache.merge_contacts(r.payload)
|
||||
self.shared.set_contacts(merged)
|
||||
print(
|
||||
@@ -299,7 +364,10 @@ class BLEWorker:
|
||||
f"{len(merged)} total (with cache)"
|
||||
)
|
||||
else:
|
||||
debug_print("BLE: get_contacts failed, keeping cached contacts")
|
||||
debug_print(
|
||||
"BLE: get_contacts failed — "
|
||||
f"payload={pp(r.payload)}, keeping cached contacts"
|
||||
)
|
||||
except Exception as exc:
|
||||
debug_print(f"BLE: get_contacts exception: {exc}")
|
||||
|
||||
@@ -375,10 +443,18 @@ class BLEWorker:
|
||||
try:
|
||||
r = await self.mc.commands.get_channel(idx)
|
||||
|
||||
if r is None:
|
||||
debug_print(
|
||||
f"get_channel({idx}) attempt {attempt + 1}/{max_attempts}: "
|
||||
f"received None response, retrying"
|
||||
)
|
||||
await asyncio.sleep(delay)
|
||||
continue
|
||||
|
||||
if r.type == EventType.ERROR:
|
||||
debug_print(
|
||||
f"get_channel({idx}) attempt {attempt + 1}/{max_attempts}: "
|
||||
f"ERROR response"
|
||||
f"ERROR response — payload={pp(r.payload)}"
|
||||
)
|
||||
await asyncio.sleep(delay)
|
||||
continue
|
||||
@@ -489,6 +565,9 @@ class BLEWorker:
|
||||
"""Periodic background contact refresh — merge new/changed."""
|
||||
try:
|
||||
r = await self.mc.commands.get_contacts()
|
||||
if r is None:
|
||||
debug_print("Periodic refresh: get_contacts returned None, skipping")
|
||||
return
|
||||
if r.type != EventType.ERROR:
|
||||
merged = self._cache.merge_contacts(r.payload)
|
||||
self.shared.set_contacts(merged)
|
||||
|
||||
@@ -7,9 +7,39 @@ UI display constants live in :mod:`meshcore_gui.gui.constants`.
|
||||
|
||||
The ``DEBUG`` flag defaults to False and can be activated at startup
|
||||
with the ``--debug-on`` command-line option.
|
||||
|
||||
Debug output is written to both stdout and a rotating log file at
|
||||
``~/.meshcore-gui/logs/meshcore_gui.log``.
|
||||
"""
|
||||
|
||||
from typing import Dict, List
|
||||
import json
|
||||
import logging
|
||||
import sys
|
||||
from logging.handlers import RotatingFileHandler
|
||||
from pathlib import Path
|
||||
from typing import Any, Dict, List
|
||||
|
||||
|
||||
# ==============================================================================
|
||||
# DIRECTORY STRUCTURE
|
||||
# ==============================================================================
|
||||
|
||||
# Base data directory — all persistent data lives under this root.
|
||||
# Existing services (cache, pins, archive) each define their own
|
||||
# sub-directory; this constant centralises the root for new consumers.
|
||||
DATA_DIR: Path = Path.home() / ".meshcore-gui"
|
||||
|
||||
# Log directory for debug and error log files.
|
||||
LOG_DIR: Path = DATA_DIR / "logs"
|
||||
|
||||
# Log file path (rotating: max 5 MB per file, 3 backups = 20 MB total).
|
||||
LOG_FILE: Path = LOG_DIR / "meshcore_gui.log"
|
||||
|
||||
# Maximum size per log file in bytes (5 MB).
|
||||
LOG_MAX_BYTES: int = 5 * 1024 * 1024
|
||||
|
||||
# Number of rotated backup files to keep.
|
||||
LOG_BACKUP_COUNT: int = 3
|
||||
|
||||
|
||||
# ==============================================================================
|
||||
@@ -18,11 +48,162 @@ from typing import Dict, List
|
||||
|
||||
DEBUG: bool = False
|
||||
|
||||
# Internal file logger — initialised lazily on first debug_print() call.
|
||||
_file_logger: logging.Logger | None = None
|
||||
|
||||
|
||||
def _init_file_logger() -> logging.Logger:
|
||||
"""Create and configure the rotating file logger (called once)."""
|
||||
LOG_DIR.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
logger = logging.getLogger("meshcore_gui.debug")
|
||||
logger.setLevel(logging.DEBUG)
|
||||
logger.propagate = False
|
||||
|
||||
handler = RotatingFileHandler(
|
||||
LOG_FILE,
|
||||
maxBytes=LOG_MAX_BYTES,
|
||||
backupCount=LOG_BACKUP_COUNT,
|
||||
encoding="utf-8",
|
||||
)
|
||||
handler.setFormatter(
|
||||
logging.Formatter("%(asctime)s %(message)s", datefmt="%Y-%m-%d %H:%M:%S")
|
||||
)
|
||||
logger.addHandler(handler)
|
||||
return logger
|
||||
|
||||
|
||||
def _caller_module() -> str:
|
||||
"""Return a short module label for the calling code.
|
||||
|
||||
Walks two frames up (debug_print -> caller) and extracts the
|
||||
module ``__name__``. The common ``meshcore_gui.`` prefix is
|
||||
stripped for brevity, e.g. ``ble.worker`` instead of
|
||||
``meshcore_gui.ble.worker``.
|
||||
"""
|
||||
frame = sys._getframe(2) # 0=_caller_module, 1=debug_print, 2=actual caller
|
||||
module = frame.f_globals.get("__name__", "<unknown>")
|
||||
if module.startswith("meshcore_gui."):
|
||||
module = module[len("meshcore_gui."):]
|
||||
return module
|
||||
|
||||
|
||||
def _init_meshcore_logger() -> None:
|
||||
"""Route meshcore library debug output to our rotating log file.
|
||||
|
||||
The meshcore library uses ``logging.getLogger("meshcore")`` throughout,
|
||||
but never attaches a handler. Without this function all library-level
|
||||
debug output (raw BLE send/receive, event dispatching, command flow)
|
||||
is silently dropped because Python's root logger only forwards
|
||||
WARNING and above.
|
||||
|
||||
Call once at startup (or lazily from ``debug_print``) so that
|
||||
``BLE_LIB_DEBUG=True`` actually produces visible output.
|
||||
"""
|
||||
LOG_DIR.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
mc_logger = logging.getLogger("meshcore")
|
||||
# Guard against duplicate handlers on repeated calls
|
||||
if any(isinstance(h, RotatingFileHandler) for h in mc_logger.handlers):
|
||||
return
|
||||
|
||||
handler = RotatingFileHandler(
|
||||
LOG_FILE,
|
||||
maxBytes=LOG_MAX_BYTES,
|
||||
backupCount=LOG_BACKUP_COUNT,
|
||||
encoding="utf-8",
|
||||
)
|
||||
handler.setFormatter(
|
||||
logging.Formatter(
|
||||
"%(asctime)s LIB [%(name)s]: %(message)s",
|
||||
datefmt="%Y-%m-%d %H:%M:%S",
|
||||
)
|
||||
)
|
||||
mc_logger.addHandler(handler)
|
||||
|
||||
# Also add a stdout handler so library output appears in the console
|
||||
stdout_handler = logging.StreamHandler(sys.stdout)
|
||||
stdout_handler.setFormatter(
|
||||
logging.Formatter(
|
||||
"%(asctime)s LIB [%(name)s]: %(message)s",
|
||||
datefmt="%Y-%m-%d %H:%M:%S",
|
||||
)
|
||||
)
|
||||
mc_logger.addHandler(stdout_handler)
|
||||
|
||||
|
||||
def debug_print(msg: str) -> None:
|
||||
"""Print a debug message when ``DEBUG`` is enabled."""
|
||||
if DEBUG:
|
||||
print(f"DEBUG: {msg}")
|
||||
"""Print a debug message when ``DEBUG`` is enabled.
|
||||
|
||||
Output goes to both stdout and the rotating log file.
|
||||
The calling module name is automatically included so that
|
||||
exception context is immediately clear, e.g.::
|
||||
|
||||
DEBUG [ble.worker]: send_appstart attempt 3 exception: TimeoutError
|
||||
"""
|
||||
global _file_logger
|
||||
|
||||
if not DEBUG:
|
||||
return
|
||||
|
||||
module = _caller_module()
|
||||
formatted = f"DEBUG [{module}]: {msg}"
|
||||
|
||||
# stdout (existing behaviour, now with module tag)
|
||||
print(formatted)
|
||||
|
||||
# Rotating log file
|
||||
if _file_logger is None:
|
||||
_file_logger = _init_file_logger()
|
||||
# Also wire up the meshcore library logger so BLE_LIB_DEBUG
|
||||
# output actually appears in the same log file + stdout.
|
||||
_init_meshcore_logger()
|
||||
_file_logger.debug(formatted)
|
||||
|
||||
|
||||
def pp(obj: Any, indent: int = 2) -> str:
|
||||
"""Pretty-format a dict, list, or other object for debug output.
|
||||
|
||||
Use inside f-strings::
|
||||
|
||||
debug_print(f"payload={pp(r.payload)}")
|
||||
|
||||
Dicts/lists get indented JSON; everything else falls back to repr().
|
||||
"""
|
||||
if isinstance(obj, (dict, list)):
|
||||
try:
|
||||
return json.dumps(obj, indent=indent, default=str, ensure_ascii=False)
|
||||
except (TypeError, ValueError):
|
||||
return repr(obj)
|
||||
return repr(obj)
|
||||
|
||||
|
||||
def debug_data(label: str, obj: Any) -> None:
|
||||
"""Print a labelled data structure with pretty indentation.
|
||||
|
||||
Combines a header line with pretty-printed data below it::
|
||||
|
||||
debug_data("get_contacts result", r.payload)
|
||||
|
||||
Output::
|
||||
|
||||
DEBUG [ble.worker]: get_contacts result ↓
|
||||
{
|
||||
"name": "PE1HVH",
|
||||
"contacts": 629,
|
||||
...
|
||||
}
|
||||
"""
|
||||
if not DEBUG:
|
||||
return
|
||||
formatted = pp(obj)
|
||||
# Single-line values stay on the same line
|
||||
if '\n' not in formatted:
|
||||
debug_print(f"{label}: {formatted}")
|
||||
else:
|
||||
# Multi-line: indent each line for readability
|
||||
indented = '\n'.join(f" {line}" for line in formatted.splitlines())
|
||||
debug_print(f"{label} ↓\n{indented}")
|
||||
|
||||
|
||||
# ==============================================================================
|
||||
@@ -61,6 +242,14 @@ DEVICE_NAME: str = "PE1HVH T1000e"
|
||||
# CACHE / REFRESH
|
||||
# ==============================================================================
|
||||
|
||||
# Default timeout (seconds) for BLE command responses.
|
||||
# Increase if you see frequent 'no_event_received' errors during startup.
|
||||
BLE_DEFAULT_TIMEOUT: float = 10.0
|
||||
|
||||
# Enable debug logging inside the meshcore BLE library itself.
|
||||
# When True, raw BLE send/receive data and event parsing are logged.
|
||||
BLE_LIB_DEBUG: bool = True
|
||||
|
||||
# Interval in seconds between periodic contact refreshes from the device.
|
||||
# Contacts are merged (new/changed contacts update the cache; contacts
|
||||
# only present in cache are kept so offline nodes are preserved).
|
||||
|
||||
1
requirements.txt
Normal file
1
requirements.txt
Normal file
@@ -0,0 +1 @@
|
||||
meshcore @ git+https://github.com/PE1HVH/meshcore_py.git@fix/event-race-condition
|
||||
Reference in New Issue
Block a user