Empty device channel slots have all-zero secrets (32 hex chars) which
passed the length check and got persisted to DB as "Channel N". This
caused ghost channels (e.g. Channel 14) to appear in unread counts
while the sidebar correctly showed only real channels.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
In-container BLE reconnection is unreliable because bleak leaves stale
GATT notification handles after abnormal disconnect, and adapter power-
cycling from within Docker corrupts bleak's internal BlueZ manager state.
New approach:
- On BLE disconnect or keepalive failure, immediately mark as permanently
failed (no in-container reconnect attempts)
- Health endpoint returns 503, Docker healthcheck triggers container restart
- Docker entrypoint script disconnects stale BLE connections before app
starts, ensuring clean GATT state for bleak
This is reliable because:
- MeshCore.create_ble(address=...) works on fresh container starts
- The BlueZ daemon on the host maintains adapter state correctly
- Container restart is fast (~5s) and gives a truly clean BLE state
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
bleak inside Docker cannot initiate new BLE connections — it can only
take over connections already established by BlueZ. Replace the
force-disconnect approach with a connect-via-BlueZ approach:
1. _ble_ensure_connected() connects the device via BlueZ D-Bus
(Device1.Connect) before bleak tries to take over
2. BleakScanner.find_device_by_address() provides the BLEDevice
object that bleak 3.x needs (raw MAC address doesn't work)
3. MeshCore.create_ble(device=...) takes over the BlueZ connection
On reconnect after disconnect:
1. Power-cycle adapter clears stale GATT notification handles
2. BlueZ re-connects the trusted device automatically
3. bleak takes over the re-established connection
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
In bleak 3.x, BleakClient(address_string) can't find paired BLE
devices that aren't actively advertising. This caused
BleakDeviceNotFoundError or 30-second connection timeouts.
Fix: pre-scan via BleakScanner.find_device_by_address() which queries
BlueZ's D-Bus object tree directly, then pass the BLEDevice object to
MeshCore.create_ble(device=...) instead of the raw MAC address.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
BlueZ auto-reconnects trusted BLE devices, which races with bleak's
connect and causes 'failed to discover services' or 'Notify acquired'.
Now we temporarily untrust the device before connecting (to prevent
BlueZ from auto-reconnecting during the handoff), then re-trust it
after bleak has established its GATT session.
Also adds _ble_retrust() helper to re-trust the device in a finally
block, ensuring the bond is maintained even on connection failure.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
On startup, _connect_with_retry also needs adapter power-cycling every
3rd failed attempt to clear stale GATT state from previous sessions.
Without this, the container can fail all 10 startup retries when BlueZ
holds stale notification handles.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
BLE connections can enter a "zombie" state where notifications (reads) still
arrive but writes silently fail. This went undetected until the user tried
to send a message, at which point the connection was already dead.
Additionally, after an abnormal BLE disconnect, BlueZ retains stale GATT
notification handles, causing reconnection to fail with
"[org.bluez.Error.NotPermitted] Notify acquired".
Changes:
- Add BLE keepalive loop (60s interval) that sends get_bat() to detect
zombie connections proactively and trigger reconnection automatically
- Add adapter power-cycle (hci0 off/on via D-Bus) during BLE reconnection
to clear stale GATT notification state
- Dedicated _ble_reconnect() with 5 attempts + adapter reset between each
- Health endpoint returns 503 when BLE permanently fails, triggering
Docker container restart via healthcheck
- Guard against concurrent reconnection attempts
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
After set_channel(), read back the actual secret from the device and
update both _channel_secrets in-memory cache and the DB. This fixes
newly-joined # channels (where firmware auto-generates the key) having
no repeater info, missing Analyzer URLs, and incorrect route data until
container restart.
Also clean up _channel_secrets on channel removal.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The 60s checkForUpdates poll was detecting has_updates due to clock skew
between client and server timestamps. Now the send API returns the server
timestamp, and the frontend uses it for markChannelAsRead — ensuring the
poll sees no updates for own sent messages.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
_confirm_delivery() now saves retry context (attempt, max_attempts,
path) and emits dm_delivered_info so the frontend shows delivery
details instantly. Similarly, dm_retry_failed now includes attempt
count so the failure state shows how many attempts were made.
Previously this info was only available after reloading messages
from DB (closing and reopening the conversation).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Stage 2 of path_hash_mode support. All API endpoints and SocketIO
emissions now include decoded hop_count and path_hash_size fields
alongside the raw path_len, so the frontend can display and segment
paths correctly for any hash mode.
Changes:
- Import decode_path_len in api.py
- GET /api/messages: add hop_count, path_hash_size, echo_hash_sizes
- GET /api/messages/<id>/meta: add hop_count, path_hash_size, echo_hash_sizes
- GET /api/dm/messages: add hop_count, path_hash_size
- SocketIO new_message emission: add hop_count, path_hash_size
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Stage 1 of path_hash_mode support. The critical bug in _on_rx_log_data
treated the raw path_len byte as a direct byte count, which breaks with
mode>0 (e.g. mode=1, 0 hops → path_len=0x40=64, reading 64 bytes of
non-existent path data). Now properly decodes the encoded path_len byte
into hop_count, hash_size, and path_byte_len.
Changes:
- Add decode_path_len() utility for MeshCore v1.14+ path_len encoding
- Fix _on_rx_log_data binary parsing to use decoded path length
- Pass hash_size through _process_echo → DB insert → SocketIO emission
- Add hash_size column to echoes table (schema + migration)
- Update insert_echo() to store hash_size (default 1 for backward compat)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
BlueZ auto-reconnects trusted BLE devices after container restart,
blocking bleak from establishing a new GATT session. Clear the stale
connection via D-Bus before each connect attempt.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
BLE connections with retries can take >60s, exceeding the startup
wait timeout. Move runtime_config.set_device_name() into _connect()
so the navbar shows the correct name regardless of connection delay.
Also fixes name update on reconnections.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
DM delivery status was lost when switching conversations because
_confirm_delivery() only stored the ACK record and emitted a socket
event, but never set delivery_status='delivered' in direct_messages.
During retries, each attempt generates a new ACK code. The DM record
stores the initial expected_ack, but the actual ACK may arrive for a
later retry's code. The ACK lookup by expected_ack then fails to match.
Now _confirm_delivery() also sets delivery_status='delivered', and
message loading checks this DB field first (like it already did for
'failed'), so delivery persists across page navigations.
Also fixed 213 existing DMs on server via data migration.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
MC_BLE_PIN was non-functional — bleak in Docker cannot perform
interactive pairing (no BlueZ agent). Pairing must be done on
the host before starting mc-webui. Added comprehensive pairing
guide at docs/meshcore_bluetooth_pairing.md.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Integrate meshcore library's BLE connection (via bleak) as a third
transport option alongside serial and TCP. Priority: BLE > TCP > Serial.
Config: MC_BLE_ADDRESS and MC_BLE_PIN environment variables.
Docker: bluez/dbus packages, NET_ADMIN cap, D-Bus socket mount.
UI: transport type badge in navbar, transport_type in /api/status.
Watchdog: skip USB reset for BLE connections (same as TCP).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The Contact Info dialog showed stale path data (e.g. "Flood" instead of
the discovered route) because auto_update_contacts is OFF and PATH_UPDATE
only sets _contacts_dirty=True without refreshing mc.contacts. The API
then served stale in-memory data even after cache invalidation.
Now ensure_contacts(follow=True) is called on PATH_UPDATE to read fresh
contact data from the device before invalidating cache and emitting the
socket event. PATH_UPDATE events are rare (only on path discovery), so
the serial I/O cost is acceptable unlike advertisements.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When FLOOD delivery is confirmed, the PATH_UPDATE event payload often
has empty path data because firmware updates the contact's out_path
asynchronously. After 3s delay, read the contact's updated path from
the meshcore library's in-memory contacts dict and backfill the DB.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When both ACK and PATH_UPDATE fire for FLOOD delivery, _on_ack may
store empty path before PATH_UPDATE can provide the discovered route.
Now _on_path_update also checks for recently-delivered DMs with empty
delivery_path and backfills with the discovered path from the event.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When PATH_UPDATE confirms delivery, use the actual path from the
event data instead of the empty path_desc from _retry_context (which
is empty during FLOOD phase). This captures the route firmware
discovered via the flood delivery.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The _on_ack handler cancels the retry task before _retry() can store
delivery info (attempt count, path). Fix by maintaining a _retry_context
dict updated before each send. _on_ack reads context and stores delivery
info + emits dm_delivered_info BEFORE cancelling the task. Same fix
applied to PATH_UPDATE backup delivery handler.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Store actual hex path instead of DIRECT/FLOOD labels in delivery_path.
Format route as AB→CD→EF (same as channel messages, truncated if >4
hops). Add dm_delivered_info WebSocket event so delivery meta appears
in real-time without needing page reload. Remove path info from failed
messages since it's not meaningful for undelivered messages.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move the attempt counter (e.g. "Attempt 15/24") from next to the status
icon to below the message text, left of the Resend button. Add visible
delivery meta line for delivered/failed messages showing attempt count
and path used. Store attempt info for failed messages too. Replace
Polish abbreviations (ŚK, ŚD, ŚG) with English in all log messages.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Show retry progress in DM message bubble via WebSocket:
- "attempt X/Y" counter updates in real-time during retries
- Failed icon (✗) when all retries exhausted
- Delivery info persisted in DB (attempt number, path used)
Backend: emit dm_retry_status/dm_retry_failed socket events,
store delivery_attempt/delivery_path in direct_messages table.
Frontend: socket listeners update status icon and counter,
delivered tooltip shows attempt info and path.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace 3-way branching (configured_paths/has_path/else) with
4-scenario matrix based on (has_path × has_configured_paths):
- S1: No path, no configured paths → FLOOD only
- S2: Has path, no configured paths → DIRECT + optional FLOOD
- S3: No path, has configured paths → FLOOD first, then ŚD rotation
- S4: Has path, has configured paths → DIRECT on ŚK, ŚD rotation, optional FLOOD
Key changes:
- S3: FLOOD before configured paths (discover new routes)
- S4: exhaust retries on current ŚK before rotating ŚD
- S4: dedup ŚG/ŚK to skip redundant retries on same path
- Add _paths_match() helper for path deduplication
- Update tooltip text for settings clarity
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Path info in Contact Info modal was stale due to 60s server cache
and no refresh after path operations. Now:
- Invalidate contacts cache after reset_path, change_path, path_update
- Emit 'path_changed' socket event on PATH_UPDATE from device
- UI listens and re-renders Contact Info when path changes
- Reset to FLOOD button immediately refreshes the path display
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Enable moving contacts between device and cache directly from the
Existing Contacts UI:
- "To device" button on cache-only contacts (pushes to device)
- "To cache" button on device contacts (removes from device, keeps in DB)
This helps manage the 350-contact device limit by offloading inactive
contacts to cache and restoring them when needed.
- Add DeviceManager.push_to_device() and move_to_cache() methods
- Add API endpoints: POST /contacts/<pk>/push-to-device, move-to-cache
- Add UI buttons with confirm dialogs in contacts.js
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The meshcore library's update_contact() reads out_path_hash_mode directly
from the contact dict. Without it, add_contact_manual() fails with
KeyError: 'out_path_hash_mode'. Default value 0 is correct for new
contacts with no known path (flood mode).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add support for adding contacts manually using the MeshCore mobile app URI
format (meshcore://contact/add?name=...&public_key=...&type=...) or raw
parameters (public_key, type, name). This enables contact sharing between
mc-webui and the MeshCore Android/iOS app via URI/QR codes.
- Add parse_meshcore_uri() helper to parse mobile app URIs
- Add DeviceManager.add_contact_manual() using CMD_ADD_UPDATE_CONTACT
- Update import_contact_uri() to handle both mobile app and hex blob URIs
- Add manual_add console command with two usage variants
- Update console help text
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The DISCOVER_RESPONSE payload uses 'pubkey' and 'node_type', not
'public_key'/'name'/'adv_name'. Now shows pubkey prefix, resolved
contact name, node type, SNR, and RSSI. Also rename CLI->COM type.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
_load_channel_secrets() cached secrets in memory only. After dfc3b14
switched /api/messages to use DB channels instead of device calls,
the empty channels table caused Route info and Analyzer links to
disappear from message bubbles.
Now upserts each channel (name + secret) to DB during startup so
the API can compute pkt_payload without hitting the device.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
meshcore 2.3.0's ConnectionManager has a bug: when auto-reconnect creates
a new TCP connection, the old connection's connection_lost callback fires,
triggering another reconnect cycle. Since each success resets the attempt
counter, this loops forever (~1 TCP connection/second).
Disabled library auto_reconnect and added reconnection logic to
_on_disconnected() with 3 attempts and increasing backoff (5/10/15s).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
PATH_ROTATION now has 3 phases:
1. Exhaust retries on primary path first (initial send + retries_per_path-1)
2. Rotate through remaining non-primary paths
3. Optional FLOOD fallback (if no_auto_flood=False)
Previously, retry iterated all paths in sort_order giving the primary
path only the initial send attempt before switching to the first path
on the list, which was often an older/worse path.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
_change_path_async manually set out_path and out_path_len on the contact
dict then called update_contact(contact) with path=None. This path reads
out_path_hash_mode from the contact dict, which is -1 when the contact
is in flood mode (after reset_path or device read with plen=255).
The encoding then produced: hop_count | (-1 << 6) = negative number,
causing "can't convert negative int to unsigned" in to_bytes().
Fix: use mc.commands.change_contact_path() which properly computes all
fields including out_path_hash_mode, avoiding the negative value issue.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Enables detailed tracking of each DM retry step: send attempt,
ACK wait timeout, and ACK timeout results. device_manager logger
set to DEBUG level so these messages appear in System Log.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Endpoint now returns error if device reset fails instead of always
returning success:true. Added logging to both endpoint and
device_manager.reset_path to diagnose reset failures.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- New `contact_paths` table for storing multiple user-configured paths per contact
- New `no_auto_flood` column on contacts to prevent automatic DIRECT→FLOOD reset
- Path rotation during DM retry: cycles through configured paths before optional flood fallback
- REST API for path CRUD, reorder, reset-to-flood, repeater listing
- Path management UI in Contact Info modal: add/delete/reorder paths, repeater picker with uniqueness warnings, hash size selector (1B/2B/3B)
- "No Flood" per-contact toggle in modal footer
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
req_status, req_acl, req_neighbours, req_mma use send_binary_req
which calculates timeout from suggested_timeout/800. After firmware
updates this can be too short, causing instant timeouts. Adding
min_timeout=15 ensures we wait at least 15 seconds for a response.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace hardcoded DM retry logic with user-configurable settings stored
in app_settings DB. Settings modal opens from menu with tab-based UI
(ready for future settings tabs). Defaults: 3 direct + 1 flood retries
(was 8+2), 30s/60s intervals, 60s grace period.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
All settings (protected_contacts, cleanup_settings, retention_settings,
manual_add_contacts) moved from .webui_settings.json file to SQLite database.
Startup migration auto-imports existing file and renames it to .json.bak.
Added safeguard in _on_new_contact: if firmware fires NEW_CONTACT for a
contact already on the device, skip pending and log a warning. Also added
diagnostic logging showing previous DB state (source, protected) when
contacts reappear as pending.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- trace: accepts comma-separated hex path (e.g. "trace 5e,d1,e7"),
waits for TRACE_DATA response with proper timeout from device
- stats: fix field names (uptime_secs, queue_len, battery_mv, etc.),
show all radio/packet stats with detail breakdown
- self_telemetry: format LPP sensor data nicely instead of raw dict
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- get help / set help: detailed parameter descriptions with
explanations, matching meshcore-cli style
- get path_hash_mode: library returns int not Event, fixed check
- set help: now reachable (was behind len(args)>=3 guard)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Was using self_info (which has no firmware data). Now uses
send_device_query() like meshcore-cli, showing model, version,
build date and repeat mode.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- req_clock: parse timestamp from binary hex data (little-endian)
and display as human-readable datetime, matching meshcore-cli
- req_neighbours: new command that fetches neighbour list from
repeater with formatted output (name resolution from device
contacts and DB cache, time ago, SNR)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
meshcore _sync methods return dict (data) or None (error/timeout),
not Event objects. hasattr(dict, 'payload') is always False, causing
instant "timeout" errors. Changed to check `result is not None`.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Clock command now shows datetime like meshcore-cli: "Current time: 2026-03-19 11:39:07 (1773916747)"
- Repeater req_* commands: pass timeout=0 to meshcore library so it uses
device's suggested_timeout instead of hardcoded 30s (matching meshcore-cli behavior)
- Execute timeout raised to 120s to accommodate slow repeater responses
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add device management: get/set params, clock/clock sync, time,
reboot, ver, scope, self_telemetry, node_discover.
Add channel management: get_channel, set_channel, add_channel,
remove_channel. Update help text with all command categories.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>