Commit Graph

131 Commits

Author SHA1 Message Date
MarekWo 201bc137e5 fix(channels): also pass path_hash_size on the echo-driven raw_packet rebuild
Earlier path_hash_mode fix updated the send-time build but the matching
edit to _refresh_raw_packet_if_drifted didn't make it into commit 10df846.
For channels where the secret isn't available at send time, guess_pkt_payload
stays None and raw_packet is created for the first time in this fallback
path (triggered when echo correlation matches via the channel-hash branch).
Without the path_hash_size argument the build defaulted to 1-byte hashes,
producing the same mixed-size badge the prior fix was meant to eliminate.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-09 14:54:42 +02:00
MarekWo 10df8464b7 fix(channels): honor device path_hash_mode when building raw_packet
Resends were building raw_packet with the default 1-byte path-hash size,
ignoring the device's actual path_hash_mode. When path_hash_mode=1 (2-byte
hashes) the original send produced 2-byte path entries in repeater echoes,
but the resend's path_len byte said "1-byte" — so post-resend echoes
appended 1-byte hashes, mixing into the badge as inconsistent tokens
(e.g. "44D8, D103, E7" — the trailing E7 was a 1-byte fragment).

Cache path_hash_mode from DEVICE_INFO at connect (fw_ver_code >= 10) and
expose path_hash_size = max(1, mode+1). Pass it through to
_build_grp_txt_raw_packet in send_channel_message and the clock-drift
refresh path. Keep cache in sync with set_param('path_hash_mode', N).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-09 14:40:21 +02:00
MarekWo d23e865f35 feat(channels): merge post-resend echoes into existing repeater badge
PR #4 of 5. After a successful resend, re-arm _pending_echo with the
original msg_id and known pkt_payload so echoes from previously-unreached
repeaters that pick up the rebroadcast are classified as 'sent' and carry
msg_id in the SocketIO emit.

The frontend echo handler now collects forced msg_ids and passes them to
refreshMessagesMeta(forceIds), which bypasses the "already has route info,
skip" guard for those ids. End result: clicking resend extends the
repeater list on the existing message's badge in place — no duplicate row,
no stale count.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-09 14:32:44 +02:00
MarekWo 4729055900 feat(channels): firmware version gate for raw resend (requires ≥1.16)
CMD_SEND_RAW_PACKET (0x41) was introduced in companion-v1.16.0
(FIRMWARE_VER_CODE bump 11 → 13). Older firmware returns
ERR_CODE_UNSUPPORTED_CMD with no useful context for the user.

Capture fw_ver_code from the DEVICE_INFO event at connect (re-using the
existing send_device_query call) and expose a supports_raw_resend
property. The resend endpoint now refuses early with a clear message
("Firmware too old for raw resend, need ≥1.16, device reports
fw_ver_code=N") and /api/status surfaces both fw_ver_code and the
supports_raw_resend flag so the UI can hide or disable the button on
older firmware.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-09 13:03:48 +02:00
MarekWo 9c48518771 fix(channels): surface meshcore lib's error_code/code_string on resend failure
The lib's reader.py wraps device ERROR frames as {error_code, code_string},
not {reason, error}. The previous extraction collapsed every device error to
"unknown error", hiding the actual ERR_CODE_* the firmware sent back. Check
code_string/reason/error in order, then fall back to a raw error_code, then
"unknown error".

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-09 12:47:44 +02:00
MarekWo 67c59cc341 feat(channels): backend resend endpoint via CMD_SEND_RAW_PACKET
PR #3 of 5. Adds POST /api/messages/<msg_id>/resend, which re-broadcasts an
own channel message verbatim using the raw_packet bytes captured at send
time. Pushes the wire bytes directly through companion command 0x41
(CMD_SEND_RAW_PACKET), bypassing the higher-level send paths so repeaters
dedupe by packet hash via Mesh::hasSeen — only previously-unreached nodes
will pick up the resend.

Returns 404 for unknown msg_id, 400 for not-own / missing snapshot /
disconnected device, 500 for unexpected device errors.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-09 12:39:34 +02:00
MarekWo fa0b1c9109 feat(channels): capture raw_packet at send time for raw resend
PR #2 of 5. Builds the full GRP_TXT wire bytes (header + transport_codes if
scoped + path_len + encrypted payload) from the ts+0 pkt_payload guess and
stores it in channel_messages.raw_packet right after the send. When echo
correlation later identifies the actual pkt_payload (potentially using a
different ±dt candidate due to host/firmware clock drift), the raw_packet is
rebuilt from the actual one so a future resend matches the original packet
hash and dedupes at the repeaters.

Transport-scope codes are computed in Python via HMAC-SHA256(scope_key,
payload_type||payload)[:2], mirroring TransportKey::calcTransportCode in
MeshCore Core (including the 0x0000/0xFFFF reservations).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-09 12:27:17 +02:00
MarekWo fef6845c03 fix(connection): self-heal degraded long-lived TCP via in-place reconnect
Long-lived TCP against the meshcore-proxy can degrade in a way the socket
can't see: some commands (set_flood_scope_key with all-zero key) start
timing out while RX events and other commands keep working. The 5 s
execute() timeout fires with concurrent.futures.TimeoutError() — whose
str() is empty — so the UI showed "Could not set region scope (none):"
with no error text, and only channels with a mapped region could send
because their non-zero scope_key happened to keep working.

Two recovery paths:

- send_channel_message now detects the timeout case (set_flood_scope_key
  surfaces timed_out=True) and runs force_reconnect() + one retry before
  failing. The user sees a brief delay instead of a cryptic error and
  having to restart the container.

- A new _liveness_watcher_loop task runs on the DM event loop and forces
  a reconnect when no RX event has arrived for HEALTH_STRICT_MAX_RX_STALE_SEC
  (5 min). /health/strict now also reports rx_stale for TCP (previously
  serial/USB only), so an external watchdog could act on it too.

force_reconnect() runs on the DM loop via run_coroutine_threadsafe with
a 20 s cap, a 30 s cooldown to avoid churn under fire, and a
_reconnect_lock to prevent concurrent attempts. mc.disconnect() fires
DISCONNECTED — _intentional_disconnect tells _on_disconnected to skip
its own reconnect loop so the two don't race.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-07 21:10:03 +02:00
MarekWo 13a650bb6c fix(channels): read channels from DB instead of iterating device slots
The TimeoutError-based fallback added in 1d47c9c only fires when
mc.commands.get_channel() actually raises — but on a sluggish device the
call returns an empty/falsy event without raising, so the loop walks
all dm._max_channels slots (40 on the firmware in production), each
empty result returns None, and the API yields just Public (or whatever
slot 0 happened to succeed on). The DB fallback never triggered and the
user kept seeing just Public after refresh.

The channels table in the DB is already the authoritative cache:
- _load_channel_secrets() syncs it on every device connect and prunes
  stale rows,
- set_channel()/remove_channel() update it synchronously with the
  device,
- _refresh_channel_secret() refreshes individual rows on per-send
  refresh.

Drop the device-slot iteration in cli.get_channels() and read from the
DB. /api/channels response time becomes a single SELECT (<1 ms) and is
unaffected by device responsiveness — exactly what we wanted from the
fallback in the first place.

Also revert the TimeoutError re-raise in get_channel_info(): the
console `channels` and `add_channel` commands iterate slots and would
crash on the first slow one. Logging + None on failure is the right
behavior for slot iteration. The 3 s default timeout stays since it
still keeps individual slot probes cheap.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-07 12:21:34 +02:00
MarekWo 422e7a3b34 feat(watchdog): catch sluggish-device failures via soft-pattern counting
The container watchdog only restarted on three legacy "device clearly dead"
log lines, so today's failure mode (firmware briefly stalls and get_stats_*
/ get_battery commands time out with an empty error while passive RX
keeps working) never tripped it — leaving the user with 10-15 s freezes
several times a day and no automatic recovery.

DeviceManager now tracks two liveness signals:
- _last_rx_at, bumped on every RX_LOG_DATA event
- _consecutive_stats_failures, incremented on get_stats_* / get_bat
  exceptions and cleared on success

New /health/strict endpoint exposes these to the watchdog. It returns 503
when the device is connected but has 5+ consecutive stats failures, or
when no RX event has been seen for over 5 minutes on a serial transport.
The cheap /health endpoint keeps its lenient behavior so Docker's
healthcheck doesn't suddenly start tripping.

The watchdog's check_device_unresponsive() gains a "soft" pattern class
with a count threshold of 5 in the last 2 minutes — matching against
get_stats_core/radio/packets failed:, Failed to get battery:, and
Failed to get channel. Hard patterns still trigger on a single hit.

Deploy note: the watchdog runs as a host-level systemd service and is
NOT restarted by mcupdate, so after deploy run:
  sudo systemctl restart mc-webui-watchdog.service

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-07 09:43:43 +02:00
MarekWo 1d47c9c0e8 fix(perf): polling-only Socket.IO + channels DB fallback on USB timeout
Werkzeug dev server can't upgrade WebSockets, so every io() upgrade attempt
returned HTTP 500 and clients fell into a polling/upgrade reconnect loop —
visible as 10-15s freezes on app load. Force transports: ['polling'] on
/chat, /console and /logs clients; long-poll keeps real-time pushes
working with ~1-2s latency.

When the MeshCore device briefly stalls, get_channel_info() used to block
on the default 30s timeout per slot, so iterating max_channels slots could
take minutes; in practice only Public answered and the rest timed out,
leaving the UI with just one channel. Drop per-call timeout to 3s, raise
TimeoutError to the caller, and have cli.get_channels() break on first
timeout and merge the remaining slots from the channels table in the DB
(which already mirrors device state via upsert_channel).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-07 07:31:47 +02:00
MarekWo 10792b8566 feat(analyzer): add configurable analyzer services in Settings
Add a Settings > Analyzer tab letting users CRUD custom MeshCore Analyzer
services with a star-toggle default and inline disabled switch. The chart
icon under each group-chat message now resolves at click time: built-in
Letsmesh when no enabled customs, the default when set, or a chooser
modal otherwise. Backend stops shipping the prebuilt analyzer_url and
emits packet_hash instead — the frontend substitutes {packetHash} in the
chosen URL template.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-05 15:34:45 +02:00
MarekWo 843d59a2d6 fix(channels): refresh per-send channel secret to keep echo correlation working
Channel indices on the device can shift after the user deletes a
channel — subsequent slots compact down by one — but mc-webui only
ran _load_channel_secrets() once at startup, so the in-memory cache
mapped channel_idx to whichever secret was there at boot. Once the
indices moved, expected_payloads for sent channel messages were
encrypted with the wrong key, so legitimate repeater echoes always
fell into the 'doesn't match expected candidates' branch and never
got linked to the originating send.

send_channel_message now calls _refresh_channel_secret(idx) before
building the candidate list: one extra get_channel(idx) round-trip
that fetches the current secret straight from the firmware, updates
the in-memory cache + DB if they had drifted, and is used for the
pkt_payload computation. If the slot is empty, the stale cache entry
is dropped.

Also bump the set_param timeout for path_hash_mode and custom_var to
20s — the meshcore lib has a 15s internal timeout, so the previous
5s outer wrapper raised a bare concurrent.futures.TimeoutError with
empty str(e) before the device's ERROR event could surface. The
exception handler now logs the exception type as well so future
empty-string errors are still diagnosable, and stores the
event.payload (not the never-defined event.data) when capturing the
sent message's pkt_payload field.
2026-06-05 10:42:07 +02:00
MarekWo c39037214c fix(messages): persist raw path_len byte so incoming path_hash_size is correct
meshcore lib 2.x splits the wire path_len byte into payload['path_len']
(masked hop count) + payload['path_hash_mode'] (hash-size mode). We were
storing only the masked half in channel_messages / direct_messages /
paths, so the downstream decode_path_len() in the API endpoints always
returned hash_size=1 — fine for the Hops counter but wrong for any UI
that renders the incoming hex path (e.g. echo-fallback rendering).

Added pack_path_len() that recombines the two fields back into the
firmware byte and routed all three insertion sites through it. The
channel-message socket emit now uses the recombined byte too, so
realtime path_hash_size matches the value the API will return on reload.

No schema migration needed — the column still holds an INTEGER. Old
rows continue to decode as hash_size=1 (their original behavior); only
newly received messages benefit from the fix.
2026-06-05 09:54:56 +02:00
MarekWo bcaa550809 fix(dm): persist delivery_path_hash_size so reloaded bubbles render multi-byte routes
Live dm_delivered_info already carried the correct hash_size, but the
DB row only kept delivery_path. After a reload the API filled in
path_hash_size from the incoming path_len column (NULL for outgoing
DMs → default 1), so 2-byte routes were re-rendered as single-byte
hops.

Added a delivery_path_hash_size column (auto-migrated, defaults to 1)
that update_dm_delivery_info now stores alongside the delivery path,
populated from the same hash_size already known by each delivery path
(retry ctx, PATH event, delayed contact backfill). /api/dm/messages
returns the new field; dm.js prefers it over path_hash_size when
rendering the Route line, falling back to the old field for legacy
rows.
2026-06-05 09:22:51 +02:00
MarekWo 4effa47fe1 fix(ui): multi-byte path rendering across contact list, DM modal, retry
Same root cause as the previous console fix: meshcore lib 2.x stores
out_path_len as the masked hop count and out_path_hash_mode separately.
Several UI surfaces and the DM retry logic were still decoding the
hash-size mode from the upper bits of out_path_len, which always yields
1 for in-memory contact data and silently truncates multi-byte paths.

Fixed sites:
- /api/contacts/detailed: path_or_mode and outgoing payload now use
  out_path_hash_mode; the field is included in /api/contacts too.
- dm.js: Contact Info modal computes hashSize for the import button
  from out_path_hash_mode.
- console "contacts" command: same correction as "path".
- device_manager._paths_match / _extract_path_hex: accept hash mode as
  a parameter; callers (_dm_retry_task, _delayed_path_backfill, Phase 2
  rotation dedup) pass contact.out_path_hash_mode.
- PATH event handlers: derive hash_size from path_hash_mode instead of
  decoding it from an already-masked path_len.
2026-06-05 08:54:29 +02:00
MarekWo fecf8cdccb fix(console): multi-byte hops in change_path parser and path display
The console treated 2-/3-byte hops as 1-byte:
- change_path "<name>" d103 5e34 (space-separated) was joined into
  continuous hex with hash_size=1, producing four 1-byte hops instead
  of two 2-byte ones.
- path <name> always rendered 1-byte hops because it decoded the
  hash-size mode from upper bits of out_path_len. In meshcore 2.x the
  library already masks out_path_len to the hop count and exposes the
  mode separately in out_path_hash_mode.

Parser now splits on commas, whitespace, or arrow separators and
requires consistent hop length. Display reads out_path_hash_mode and
also shows the byte size, e.g. "D103,5E34 (2 hops, 2B)".
2026-06-05 08:29:33 +02:00
MarekWo 3ef1eac0be feat(console): rename to mc-webui, fix change_path, persist transcript
- Rename "meshcli Console" to "mc-webui Console" (modal title + docs).
- Drop redundant "Connected to..." messages; replace intro with a one-line "Type 'help' for available commands." hint.
- Use a teal device-name style so the header label is readable on the dark background.
- Display contact paths with commas (D1,90,05,54) instead of arrows in `contacts` and `path`, matching the standard MeshCore client.
- Fix `change_path`: previously read only args[2] after shlex split, silently writing a 1-byte path. Now joins remaining args, accepts comma/space/continuous-hex, validates hex, auto-deduces hash_size from comma-chunk length (1/2/3-byte hops), and routes through _change_path_async so path_hash_mode is set and the contacts cache is invalidated.
- Update `help` line and add a usage hint for the no-args form.
- Add capped persistent output transcript: GET/POST/DELETE /api/console/output (cap 500 entries). Console restores prior entries (faded) above a divider on open and exposes a trash button to clear it.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-05 22:18:46 +02:00
MarekWo e293de2a76 fix(regions): rename tab to Regions and soften v1.14 firmware error
Two small follow-ups after initial deployment.

- Rename the Settings tab 'Channels' -> 'Regions' (id now tabSettingsRegions).
  The tab manages the region registry, not channels; the old label was
  confusing. The per-channel picker still lives under Manage Channels as
  before.
- Graceful handling of firmware rejection: CMD_SET_DEFAULT_FLOOD_SCOPE
  (63) and CMD_GET_DEFAULT_FLOOD_SCOPE (64) were introduced in firmware
  v1.15.0; on v1.14.x the device replies with a generic ERR frame and
  our toast showed the unhelpful 'Firmware error: unknown'. Now the
  device_manager translates the empty/timeout reason into a concrete
  message naming the v1.15 requirement, and the api handler appends
  'Your choice is saved locally' so the user knows the local state
  still persists. Same treatment for the delete-default-region clear
  path.
2026-04-24 11:54:01 +02:00
MarekWo afe0c7cf17 feat(regions): per-channel scope picker + send-flow integration
Fourth slice — the feature is now functional end-to-end from UI to radio.

- Manage Channels modal: each row now has a pin-map button between Mute
  and Share that opens a region picker for that channel; rows show an
  inline badge with the assigned region name.
- Region picker modal (new #regionPickerModal): radio list of regions
  with a "(None) — use firmware default" option at the top. Empty-state
  shows a "Manage Regions" CTA that deep-links to Settings > Channels.
- api.py: two new routes —
  - GET /api/channels/scopes          → bulk map for UI rendering
  - PUT /api/channels/<idx>/scope     → {region_id: int | null} set/clear
- device_manager.send_channel_message: looks up the channel's scope,
  then — under _send_lock — pushes the 16-byte key via CMD 54 before
  the actual send_chan_msg. Channels without a mapping get an all-zero
  key so a previously-set scope doesn't leak across channels (firmware's
  send_scope is sticky until overwritten, not one-shot).
2026-04-24 07:27:33 +02:00
MarekWo 0e38e0ce8c feat(regions): DeviceManager wrappers for flood-scope commands
Second slice of the per-channel region-scope feature — firmware plumbing.
No UI, routes, or send-flow integration yet; those land in PR #3 / #4.

- _send_lock: threading.Lock added to __init__ (consumed in PR #4 to
  serialize the set-scope + send-channel-message pair across Flask
  threads; introduced here to keep the init diff small).
- set_flood_scope_key(key_hex): thin wrapper over the existing
  meshcore-py `set_flood_scope(bytes)` path (CMD 54). None/empty clears
  the volatile scope. Used on the channel-send hot path in PR #4.
- set_default_flood_scope(name, key_hex): hand-rolled CMD 63 frame
  (opcode + 31-byte NUL-padded name + 16-byte key = 48 bytes) via the
  lib's generic send() with [OK, ERROR] wait. Installed meshcore-py
  (<=2.2.15) has no wrapper for this opcode; frame format matches
  MyMesh.cpp lines 1893-1909.
- Deliberately NOT implementing CMD 64 (GET_DEFAULT_FLOOD_SCOPE): the
  library's reader drops RESP_CODE 28 as "unhandled" (reader.py:919-921),
  so there is no Event we can wait for. Until upstream adds support,
  mc-webui treats its own regions.is_default row as the source of truth
  and pushes one-way via CMD 63. Comment in code documents the reason.
2026-04-24 07:20:30 +02:00
MarekWo 57a0ca018d fix: treat slots with empty name as empty regardless of secret bytes
Some firmwares return SHA256(\"\")[:16] (e3b0c442...) for an empty
channel slot's secret instead of all zeros. The load path checked only
for the all-zero sentinel, so those slots passed the \"valid\" branch
and got persisted to the DB with a synthetic 'Channel N' name plus the
bogus secret. The stale rows then leaked into db.get_channels() and
would have supplied wrong keys for pkt_payload computation.

Anchor the decision on name presence: a slot is used iff firmware
returned a non-empty name. Drop the 'Channel {idx}' fallback so we
never invent names for empty slots. The existing end-of-loop cleanup
then removes any phantom rows already in the DB on next connect.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-22 20:52:35 +02:00
MarekWo 3dd1c52687 feat: contacts settings tab with suppress + auto-ignore options
Move Manual approval toggle into a new Contacts tab in the global
Settings modal and clean up the Contact Management panel (drop the
duplicated Settings/Manage Contacts headers, shorten the Existing
Contacts blurb). Add two new persisted options gated on Manual
approval being ON: Suppress new advert notifications (frontend hides
FAB badge + browser notification while the Pending list itself stays
populated) and Automatically add new contacts to "Ignored" (advert
handler marks the new contact ignored before emitting pending_contact,
so the user is silenced end-to-end while contacts remain in the cache
for promotion via "To Device").

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-18 10:01:58 +02:00
MarekWo 77c3ffa5c2 fix: prevent echo mis-correlation for sent channel messages
Pre-compute expected pkt_payloads at send time using channel secret +
timestamp (±3s for clock drift), then match echoes exactly instead of
only checking the 1-byte channel hash. Fixes race condition where an
incoming message's echo on the same channel could be incorrectly
attributed to a just-sent message (wrong Analyzer URL).

Falls back to channel-hash matching when channel secret is unavailable.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-14 21:47:07 +02:00
MarekWo 8f8bd30747 fix: refresh mc.contacts from device on dirty flag to update stale names
Contact names stayed stale indefinitely because mc.contacts (in-memory
dict) was only populated at startup. When a remote node renamed itself,
the device firmware updated its contact list but the app never re-read it.

Now ensure_contacts(follow=True) is called when contacts_dirty is set:
- In _on_advertisement(): refresh before name lookup (incremental via lastmod)
- In get_contacts_with_last_seen(): refresh + DB sync before serving API data

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-09 12:29:45 +02:00
MarekWo bbfca38d34 fix: use adv_lat/adv_lon keys for device coordinates
Device info from meshcore uses adv_lat/adv_lon, not lat/lon.
Fixed in get_param, set_param (lat/lon individually), and the new
/api/device/config endpoint.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-06 19:26:42 +02:00
MarekWo bc1da9e45e fix: get_device_info checked for 'data' attr instead of 'payload'
Event objects use 'payload', not 'data'. This bug was latent because
the cache was always populated during connect — only exposed after
the cache invalidation fix.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-06 14:43:26 +02:00
MarekWo 1e6f8caf03 fix: invalidate self_info cache after set_param
get_device_info() cached SELF_INFO payload in _self_info and never
refreshed it after set operations, so get always returned stale values.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-06 14:40:14 +02:00
MarekWo c3f61ce3f7 fix: get radio returns actual values, implement set radio command
get radio used wrong key names (freq/bw/sf/cr instead of
radio_freq/radio_bw/radio_sf/radio_cr from SELF_INFO payload).

set radio was missing entirely — would silently fall through to
custom variable handler. Now parses freq,bw,sf,cr and calls
mc.commands.set_radio().

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-06 14:33:13 +02:00
MarekWo 1a194d5050 fix: implement get advert_loc_policy console command
The set command was implemented but get was missing, causing
"Unknown param" error. Reads adv_loc_policy from device info.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-06 14:10:20 +02:00
MarekWo 6c02220719 fix: skip empty channel slots during sync, clean up stale DB channels
Empty device channel slots have all-zero secrets (32 hex chars) which
passed the length check and got persisted to DB as "Channel N". This
caused ghost channels (e.g. Channel 14) to appear in unread counts
while the sidebar correctly showed only real channels.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-06 11:08:38 +02:00
MarekWo c36d7b5fbf fix(ble): simplify reconnection — rely on container restart for clean state
In-container BLE reconnection is unreliable because bleak leaves stale
GATT notification handles after abnormal disconnect, and adapter power-
cycling from within Docker corrupts bleak's internal BlueZ manager state.

New approach:
- On BLE disconnect or keepalive failure, immediately mark as permanently
  failed (no in-container reconnect attempts)
- Health endpoint returns 503, Docker healthcheck triggers container restart
- Docker entrypoint script disconnects stale BLE connections before app
  starts, ensuring clean GATT state for bleak

This is reliable because:
- MeshCore.create_ble(address=...) works on fresh container starts
- The BlueZ daemon on the host maintains adapter state correctly
- Container restart is fast (~5s) and gives a truly clean BLE state

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-05 16:39:03 +02:00
MarekWo 53063f199a fix(ble): connect via BlueZ D-Bus instead of bleak direct connect
bleak inside Docker cannot initiate new BLE connections — it can only
take over connections already established by BlueZ.  Replace the
force-disconnect approach with a connect-via-BlueZ approach:

1. _ble_ensure_connected() connects the device via BlueZ D-Bus
   (Device1.Connect) before bleak tries to take over
2. BleakScanner.find_device_by_address() provides the BLEDevice
   object that bleak 3.x needs (raw MAC address doesn't work)
3. MeshCore.create_ble(device=...) takes over the BlueZ connection

On reconnect after disconnect:
1. Power-cycle adapter clears stale GATT notification handles
2. BlueZ re-connects the trusted device automatically
3. bleak takes over the re-established connection

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-05 14:13:06 +02:00
MarekWo 9c692fac8b fix(ble): use BleakScanner to find device before connecting
In bleak 3.x, BleakClient(address_string) can't find paired BLE
devices that aren't actively advertising.  This caused
BleakDeviceNotFoundError or 30-second connection timeouts.

Fix: pre-scan via BleakScanner.find_device_by_address() which queries
BlueZ's D-Bus object tree directly, then pass the BLEDevice object to
MeshCore.create_ble(device=...) instead of the raw MAC address.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-05 14:10:47 +02:00
MarekWo a92b505975 fix(ble): untrust device during connect to prevent BlueZ auto-reconnect
BlueZ auto-reconnects trusted BLE devices, which races with bleak's
connect and causes 'failed to discover services' or 'Notify acquired'.
Now we temporarily untrust the device before connecting (to prevent
BlueZ from auto-reconnecting during the handoff), then re-trust it
after bleak has established its GATT session.

Also adds _ble_retrust() helper to re-trust the device in a finally
block, ensuring the bond is maintained even on connection failure.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-05 14:06:18 +02:00
MarekWo 1de98433d4 fix(ble): add adapter power-cycle to startup retry loop
On startup, _connect_with_retry also needs adapter power-cycling every
3rd failed attempt to clear stale GATT state from previous sessions.
Without this, the container can fail all 10 startup retries when BlueZ
holds stale notification handles.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-05 13:40:53 +02:00
MarekWo f352ccd968 fix(ble): add keepalive and robust reconnection for BLE zombie connections
BLE connections can enter a "zombie" state where notifications (reads) still
arrive but writes silently fail.  This went undetected until the user tried
to send a message, at which point the connection was already dead.

Additionally, after an abnormal BLE disconnect, BlueZ retains stale GATT
notification handles, causing reconnection to fail with
"[org.bluez.Error.NotPermitted] Notify acquired".

Changes:
- Add BLE keepalive loop (60s interval) that sends get_bat() to detect
  zombie connections proactively and trigger reconnection automatically
- Add adapter power-cycle (hci0 off/on via D-Bus) during BLE reconnection
  to clear stale GATT notification state
- Dedicated _ble_reconnect() with 5 attempts + adapter reset between each
- Health endpoint returns 503 when BLE permanently fails, triggering
  Docker container restart via healthcheck
- Guard against concurrent reconnection attempts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-05 13:37:33 +02:00
MarekWo f6c9c65a51 fix(channels): refresh channel secret cache after join/create
After set_channel(), read back the actual secret from the device and
update both _channel_secrets in-memory cache and the DB. This fixes
newly-joined # channels (where firmware auto-generates the key) having
no repeater info, missing Analyzer URLs, and incorrect route data until
container restart.

Also clean up _channel_secrets on channel removal.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-31 21:00:34 +02:00
MarekWo 29e5e6982d fix(chat): prevent poll-triggered reload after send by using server timestamp
The 60s checkForUpdates poll was detecting has_updates due to clock skew
between client and server timestamps. Now the send API returns the server
timestamp, and the frontend uses it for markChannelAsRead — ensuring the
poll sees no updates for own sent messages.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-31 10:28:54 +02:00
MarekWo 695321c0c9 fix(dm): show delivery info immediately on ACK/failure without reopen
_confirm_delivery() now saves retry context (attempt, max_attempts,
path) and emits dm_delivered_info so the frontend shows delivery
details instantly. Similarly, dm_retry_failed now includes attempt
count so the failure state shows how many attempts were made.

Previously this info was only available after reloading messages
from DB (closing and reopening the conversation).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-31 09:58:41 +02:00
MarekWo 2368ec656e feat(path_hash_mode): fix DM route display and delivery path segmentation
Stage 4 of path_hash_mode support. DM delivery paths now carry hash_size
through the entire pipeline: retry context → ACK handler → SocketIO
emission → frontend rendering. All hardcoded 2-char hex segmentation
removed from dm.js.

Backend changes (device_manager.py):
- Track path_hash_size alongside path_desc in DM retry context
- Update path_hash_size on path rotation and flood fallback
- Add hash_size to all 4 dm_delivered_info SocketIO emissions
- Derive hash_size from PATH event path_len for discovered paths

Frontend changes (dm.js):
- Add segmentHexPath() utility (shared by all 3 route functions)
- formatDmRoute(), buildDmRouteHtml(), showDmRoutePopup() accept hashSize
- All call sites pass hash_size from event data or message context

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 13:11:00 +02:00
MarekWo e8f271f4ef feat(path_hash_mode): add hop_count and path_hash_size to API responses
Stage 2 of path_hash_mode support. All API endpoints and SocketIO
emissions now include decoded hop_count and path_hash_size fields
alongside the raw path_len, so the frontend can display and segment
paths correctly for any hash mode.

Changes:
- Import decode_path_len in api.py
- GET /api/messages: add hop_count, path_hash_size, echo_hash_sizes
- GET /api/messages/<id>/meta: add hop_count, path_hash_size, echo_hash_sizes
- GET /api/dm/messages: add hop_count, path_hash_size
- SocketIO new_message emission: add hop_count, path_hash_size

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 10:00:03 +02:00
MarekWo 719e11e868 feat(path_hash_mode): add decode_path_len and fix RX_LOG_DATA parsing
Stage 1 of path_hash_mode support. The critical bug in _on_rx_log_data
treated the raw path_len byte as a direct byte count, which breaks with
mode>0 (e.g. mode=1, 0 hops → path_len=0x40=64, reading 64 bytes of
non-existent path data). Now properly decodes the encoded path_len byte
into hop_count, hash_size, and path_byte_len.

Changes:
- Add decode_path_len() utility for MeshCore v1.14+ path_len encoding
- Fix _on_rx_log_data binary parsing to use decoded path length
- Pass hash_size through _process_echo → DB insert → SocketIO emission
- Add hash_size column to echoes table (schema + migration)
- Update insert_echo() to store hash_size (default 1 for backward compat)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 09:47:20 +02:00
MarekWo 10c232fc7d fix(ble): force-disconnect stale BlueZ connection before connecting
BlueZ auto-reconnects trusted BLE devices after container restart,
blocking bleak from establishing a new GATT session. Clear the stale
connection via D-Bus before each connect attempt.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-29 19:42:34 +02:00
MarekWo 9f335794e4 fix(ble): update runtime device name on every connect
BLE connections with retries can take >60s, exceeding the startup
wait timeout. Move runtime_config.set_device_name() into _connect()
so the navbar shows the correct name regardless of connection delay.
Also fixes name update on reconnections.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-29 19:24:45 +02:00
MarekWo 147a12c8f5 fix(dm): persist delivery_status='delivered' on ACK receipt
DM delivery status was lost when switching conversations because
_confirm_delivery() only stored the ACK record and emitted a socket
event, but never set delivery_status='delivered' in direct_messages.

During retries, each attempt generates a new ACK code. The DM record
stores the initial expected_ack, but the actual ACK may arrive for a
later retry's code. The ACK lookup by expected_ack then fails to match.

Now _confirm_delivery() also sets delivery_status='delivered', and
message loading checks this DB field first (like it already did for
'failed'), so delivery persists across page navigations.

Also fixed 213 existing DMs on server via data migration.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-29 14:49:37 +02:00
MarekWo b18c0145dd docs(ble): add pairing guide, remove unused MC_BLE_PIN config
MC_BLE_PIN was non-functional — bleak in Docker cannot perform
interactive pairing (no BlueZ agent). Pairing must be done on
the host before starting mc-webui. Added comprehensive pairing
guide at docs/meshcore_bluetooth_pairing.md.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-29 13:56:45 +02:00
MarekWo 710f69c350 feat: add BLE transport support for companion devices
Integrate meshcore library's BLE connection (via bleak) as a third
transport option alongside serial and TCP. Priority: BLE > TCP > Serial.

Config: MC_BLE_ADDRESS and MC_BLE_PIN environment variables.
Docker: bluez/dbus packages, NET_ADMIN cap, D-Bus socket mount.
UI: transport type badge in navbar, transport_type in /api/status.
Watchdog: skip USB reset for BLE connections (same as TCP).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-29 10:03:45 +02:00
MarekWo 701f6f1197 fix(dm): refresh mc.contacts from device on PATH_UPDATE event
The Contact Info dialog showed stale path data (e.g. "Flood" instead of
the discovered route) because auto_update_contacts is OFF and PATH_UPDATE
only sets _contacts_dirty=True without refreshing mc.contacts. The API
then served stale in-memory data even after cache invalidation.

Now ensure_contacts(follow=True) is called on PATH_UPDATE to read fresh
contact data from the device before invalidating cache and emitting the
socket event. PATH_UPDATE events are rare (only on path discovery), so
the serial I/O cost is acceptable unlike advertisements.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 17:52:41 +01:00
MarekWo 0b3bd1da60 fix(dm): delayed path backfill for FLOOD-delivered messages
When FLOOD delivery is confirmed, the PATH_UPDATE event payload often
has empty path data because firmware updates the contact's out_path
asynchronously. After 3s delay, read the contact's updated path from
the meshcore library's in-memory contacts dict and backfill the DB.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 15:23:35 +01:00