Operators can opt a node out of the public-facing surfaces by putting an
opt-out emoji in the node name. Hidden nodes are removed from the map,
neighbor edges/lists, search, and their own detail page (server-side
ClickHouse filters), and chat messages from a hidden sender are dropped
client-side after decryption. Matching keys on the base codepoint so the
variation-selector form (⛔️) is caught too.
Co-authored-by: Alex Vanderpot <alex@Alexs-MacBook-Pro-2.local>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* ingest: batch ClickHouse inserts to stop MQTT flapping & packet loss
The meshcore handler did a synchronous per-message ClickHouse insert on
paho's single inbound goroutine. At ~86ms/insert (single-row inserts +
async_insert wait + materialized views) the goroutine couldn't keep up
with the high-volume letsmesh feed, so it stalled past PingTimeout and
paho declared "pingresp not received" and reconnected — ~847 cycles in
19.5h, ~45% downtime, ~50% of letsmesh packets lost. The low-volume
davekeogh broker never saturated the goroutine and was unaffected.
Decouple receipt from insertion: the handler now enqueues decoded rows
onto a buffered channel and a single background writer flushes them to
meshcore_packets in batched native inserts (every MESHCORE_BATCH_FLUSH_
SECONDS or MESHCORE_BATCH_MAX_ROWS rows). The inbound goroutine never
blocks, so PINGRESP is always processed in time.
- New batch writer with env-configurable flush interval / max rows /
buffer size (MESHCORE_BATCH_* ), wired in docker-compose.
- Drop server-side async_insert (redundant once we batch app-side).
- Bump PingTimeout 10s -> 20s (env MQTT_PING_TIMEOUT_SECONDS) for margin
against Cloudflare WebSocket buffering jitter.
- Enqueue is non-blocking; rows are dropped+counted only if the buffer
fills (ClickHouse unavailable). A failed batch is dropped and retried
by the next flush (native blocks commit atomically).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* ingest: make MQTT KeepAlive configurable (MQTT_KEEPALIVE_SECONDS)
As a near-silent subscriber, paho emits a PINGREQ roughly every KeepAlive
seconds; lowering it sends client->server frames more often to keep the
Cloudflare-proxied WebSocket path warm in both directions, a lever for the
residual mid-stream "pingresp not received" stalls on the letsmesh broker.
Default unchanged (30s); wired through docker-compose.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* ingest: add configurable MQTT write timeout (MQTT_WRITE_TIMEOUT_SECONDS)
Bounds PINGREQ/SUBSCRIBE writes so a stalled write through the Cloudflare
WebSocket proxy can't hang the client. Default 0 (paho's existing no-timeout
behavior); wired through docker-compose. Recommended ~20s when behind a
buffering reverse proxy.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Alex Vanderpot <alex@Alexs-MacBook-Pro-2.local>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the Location History coordinate table with an embedded Leaflet
map (new LocationHistoryMap component): one circle marker per position
with timestamp/coord popups, the most-recent point highlighted, and a
subtle polyline connecting points chronologically, auto-fit to bounds.
Cap Recent Adverts at the 5 most recent with a "Show more"/"Show less"
toggle and a count in the subtitle.
Co-authored-by: Alex Vanderpot <alex@Alexs-MacBook-Pro-2.local>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The map page slowed the browser at high node counts (~5k) because every
marker eagerly rendered its popover to an HTML string at creation, and the
visual-update effects re-rendered every marker's icon and popup on each
selection change. Hovering a meshcore node (the default type) re-rendered
all markers.
Bind popups lazily so PopupContent is only rendered when a popup actually
opens, drop the now-unnecessary popup setContent calls, and re-skin only the
markers whose selected state changed instead of the whole set.
Co-authored-by: Alex Vanderpot <alex@Alexs-MacBook-Pro-2.local>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the per-request argMax/GROUP-BY views with insert-triggered
(incremental) materialized views so map node positions, node search, and
public-channel chat read pre-aggregated state instead of re-scanning all of
meshcore_packets on every query.
- 005: meshcore_adverts_latest_state (AggregatingMergeTree of argMaxState/
min/maxState) + incremental MV + backfill; meshcore_adverts_latest becomes a
-Merge view with the identical column contract. Node search reads it directly;
map (unified_latest_nodeinfo) is unchanged.
- 006: meshcore_public_channel_messages_raw, a decoded payload_type=5 MergeTree
keyed (channel_hash, ingest_timestamp); chat dedups by message_id at read time
over a timestamp-bounded scan. Streaming/pagination push channel+cursor onto
the primary key.
- Neighbor-edge MVs stay hourly REFRESH (they read the preserved view).
Verified against full prod data (14.5M rows): exact parity (0 mismatches) and
5-9x faster reads with no regressions.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the hand-curated region groups with data-driven ones and make the
region_groups ClickHouse table the single source of truth.
- scripts/generate-region-groups.ts: offline generator — clusters regions by
cross-region packet co-occurrence (min-share single-linkage) at two levels
(broad "region" + tight "metro"), names clusters via `claude -p`, reconciles
codes by member overlap so permalinks stay stable, and emits the region_groups
seed. Migration 004 reseeded with the resulting 39 groups.
- Groups are DB-sourced: getRegionGroups() (cached) feeds /api/regions and the
dropdown/labels; filtering resolves a selector in SQL to a region
(region = 'X') or a group (region IN / hasAny ... SELECT region_code FROM
region_groups WHERE group_code = ...). No hardcoded membership in TS;
resolveSelector removed.
- Drop the TS<->SQL parity script (no membership left to sync); regionSql and
the migration ALIAS are kept in sync by hand.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace hardcoded (broker, topic) region slugs with uppercase IATA codes
derived from the meshcore/{IATA} base topic, discovered dynamically from
data (adding a region needs no code change). Adds region groups, Grafana
region/group filtering, and fixes the neighbor graph.
- regions.ts: single source of truth — regionFromTopic / normalizeRegion /
regionSql / resolveSelector / selectorLabel. Legacy slugs (seattle->SEA)
and bare meshcore + meshcore/salish -> SEA still resolve.
- regionGroups.ts + seeded region_groups table: PNW/CAL/DEU/POL.
- migration 004: region ALIAS column on meshcore_packets; 001 views expose
region / regions[]; reworked neighbor MV (region-scoped, no cross-region
edges, drops implausible >150km and (0,0) edges); scheduled meshcore_regions MV.
- API/streaming/actions resolve selectors; stream routes drop the hardcoded
region allow-lists; map node query excludes (0,0) sentinel nodes.
- Dynamic region/group dropdowns (useRegions/RegionSelect); /api/regions.
- Grafana: cascading $region / $region_group template vars + panel filters.
- region-parity.ts (npm run check:regions) guards TS<->SQL drift.
- nix dev shell (flake.nix, Node 24).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The letsmesh broker was migrated behind Cloudflare and changed its topic
layout on 2026-06-02, which left prod's MQTT client in a zombie state:
connected per paho's IsConnected() (so the 30s monitor never rebuilt it) but
receiving zero messages, because the subscription was established only once
after the initial connect and never re-applied on paho auto-reconnects. Result:
12 days of silently missing letsmesh ingestion while davekeogh masked the loss.
Make reconnection robust instead of relying on broker-side session persistence:
- Subscribe inside the OnConnect handler so every (re)connect — including paho
auto-reconnects — restores delivery. Use CleanSession(true)+ResumeSubs(false)
so we never depend on the broker remembering our session.
- Add a per-broker data-staleness watchdog: a broker that reports connected but
delivers no messages for MQTT_STALE_AFTER_SECONDS (default 300) is treated as a
zombie and force-rebuilt (disconnect + fresh connect/subscribe). This catches
exactly the failure IsConnected() misses.
- Reduce the external monitor to that watchdog role; transient drops are left to
paho auto-reconnect rather than racing it with a brand-new client.
- Stable per-broker client IDs (by index) and pre-sized MQTTClients slice so
indices stay aligned when an earlier broker fails; guard BrokerStatus/lastActivity
with a mutex; promote connect/subscribe logs to Info for visibility.
Adds unit tests for the watchdog and env parsing; documents the new env var.
Co-authored-by: Alex Vanderpot <alex@Alexs-MacBook-Pro-2.local>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Fixes a 25.x global memory-tracker drift where the tracker pinned at the
max-memory cap (RSS far below it), causing the OvercommitTracker to kill every
query (map/stats/neighbors all 500ing). Deployed in-place on prod over the
existing data dir after a cold backup.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The /api/chat endpoint queried the meshcore_public_channel_messages
VIEW, which does GROUP BY payload over all payload_type=5 packets.
Filtering its output on ingest_timestamp/channel_hash can't push below
the GROUP BY (they're max(...)/derived-from-grouped-payload), so every
call re-aggregated the entire history (~8M rows / 1.2 GiB / ~700ms),
ignoring the ingest_timestamp primary key.
Replace the view reference with an inline subquery
(publicChannelMessagesSubquery) that pushes the time/channel filters
into the inner meshcore_packets scan, so partition + primary-key
pruning applies. Region filtering stays on the outer query since
origin_path_info only exists post-aggregation. Same change to the chat
streaming poller.
Verified on prod: identical output, 8.06M->114K rows read, ~700ms->28ms.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Point the seattle region at the letsmesh broker (wss://mqtt-us-v1.letsmesh.net:443,
topic meshcore/SEA) where Seattle traffic now lives.
- Fix a pre-existing bug in the path-edge extraction: `path` is a hex string of
1-byte hop prefixes, so use substring(path, 2*i-1, 2) instead of
hex(substring(path, i, 1)) (which re-hexed a single hex char and never matched
the 2-char repeater prefixes -> path edges were always empty). Seattle now yields
path edges again.
Verified on a full prod snapshot: the MV-backed "show all neighbors" query drops
from ~1.6s / 145M rows / 11.8 GiB to ~1ms / 108 rows / 3.8 KiB.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The two slow neighbor queries are converted to read precomputed tables that an
hourly REFRESH EVERY 1 HOUR materialized view maintains, instead of re-aggregating
meshcore_packets per request:
- meshcore_all_neighbor_edges: the global per-region edge graph (direct path_len=0
adverts + repeater-prefix path edges) with endpoint details. getAllNodeNeighbors
now filters it by region + bbox + lastSeen + has_location.
- meshcore_node_direct_neighbors: per-node direct adjacency (both directions) with
neighbor details. getMeshcoreNodeNeighbors now filters it by node_public_key.
Also add the meshcore/SEA topic to the seattle region. Validated on a clean local
stack: migration 001->003 applies, both refreshable MVs create + refresh + populate.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The readonly profile's max_rows_to_read / max_bytes_to_read (500MB) is exceeded by
the map/stats views, which scan the full (growing) meshcore_packets table -> the web
app failed with TOO_MANY_BYTES. Remove the read-size caps; readonly=1, allow_ddl=0,
max_memory_usage and max_execution_time remain the guardrails.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
ClickHouse's internal diagnostics grew unbounded (text_log at Trace level and the
1s query profiler -> trace_log accumulated ~160G over months). Add short TTLs to
all system *_log tables, cap text_log at warning level, and disable the query
profiler in both profiles so trace_log stays empty.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Pin the bundled service images to known versions for reproducible releases and
safe in-place reuse of an existing data dir (matching the production deployment).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add the MeshCore dashboard (exported from prod) as a provisioned file
dashboard, with a file provider config. Pin the ClickHouse datasource
uid to "clickhouse" so the dashboard's panel datasource references
resolve at provision time.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Bundle Grafana (127.0.0.1:3000) with the grafana-clickhouse-datasource plugin
and an auto-provisioned ClickHouse datasource using the read-only user. Adds
GRAFANA_ADMIN_PASSWORD to .env.example. Verified: datasource health returns OK.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Module is now github.com/ajvpot/meshexplorer/ingest (the code lives under
ingest/ in the meshexplorer repo), updated from the old standalone
clickhouse-meshingest path. build/vet/test pass.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Add a deploy-focused root README; update the web app README (meshcore-only,
point Docker usage at the unified root compose).
- Fix the migration runner: set the goose clickhouse dialect (it defaulted to
postgres and failed to create its version table). Migrations now apply cleanly.
- Remove the unused meshcore decrypt UDF (meshcore_try_decrypt was never called
by any view/query/code) and simplify the ClickHouse image to a single stage.
Verified end-to-end: `docker compose up` brings up clickhouse -> migrate ->
meshcoreingest + meshexplorer; live ingestion from the real MQTT brokers lands
packets in ClickHouse and the web API serves decoded meshcore nodes via the
readonly user.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Single root compose brings up the whole stack on one internal network:
clickhouse (healthchecked) -> migrate (one-shot) -> meshcoreingest + meshexplorer,
with the discord-bot behind a "bot" profile. Web app/bot connect as the readonly
ClickHouse user; ingest/migrate use the default user. Named volume replaces the
host /tank path. .env.example documents every variable with placeholders; root
.gitignore keeps real .env out of git. Drops the per-project compose files.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Meshtastic was UI-filtering only (no meshtastic data backend). Drop it as a
node type/option, and simplify the map marker/cluster/popup rendering now that
every node is meshcore. Update product copy to MeshCore-only. The nodeTypes
query plumbing stays (the unified view's type is always 'meshcore').
Production build passes.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Vendor the ingest service under ingest/ and move the web app under meshexplorer/.
The ingest builds the meshcoreingest daemon and the goose migration runner,
applies the meshcore ClickHouse schema (packets, adverts, unified node view),
and loads its MQTT broker list and ClickHouse settings entirely from environment
variables (MQTT_BROKERS as a JSON array, CLICKHOUSE_*). No credentials are baked
into the source.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>