Commit Graph

144 Commits

Author SHA1 Message Date
Alex Vanderpot c7bfce1268 Add node privacy: hide nodes whose name contains 🛑 🚫 (#44)
Operators can opt a node out of the public-facing surfaces by putting an
opt-out emoji in the node name. Hidden nodes are removed from the map,
neighbor edges/lists, search, and their own detail page (server-side
ClickHouse filters), and chat messages from a hidden sender are dropped
client-side after decryption. Matching keys on the base codepoint so the
variation-selector form (️) is caught too.

Co-authored-by: Alex Vanderpot <alex@Alexs-MacBook-Pro-2.local>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-19 00:06:07 -04:00
Alex Vanderpot 7cea182c6d ingest: batch ClickHouse inserts to stop MQTT flapping & packet loss (#41)
* ingest: batch ClickHouse inserts to stop MQTT flapping & packet loss

The meshcore handler did a synchronous per-message ClickHouse insert on
paho's single inbound goroutine. At ~86ms/insert (single-row inserts +
async_insert wait + materialized views) the goroutine couldn't keep up
with the high-volume letsmesh feed, so it stalled past PingTimeout and
paho declared "pingresp not received" and reconnected — ~847 cycles in
19.5h, ~45% downtime, ~50% of letsmesh packets lost. The low-volume
davekeogh broker never saturated the goroutine and was unaffected.

Decouple receipt from insertion: the handler now enqueues decoded rows
onto a buffered channel and a single background writer flushes them to
meshcore_packets in batched native inserts (every MESHCORE_BATCH_FLUSH_
SECONDS or MESHCORE_BATCH_MAX_ROWS rows). The inbound goroutine never
blocks, so PINGRESP is always processed in time.

- New batch writer with env-configurable flush interval / max rows /
  buffer size (MESHCORE_BATCH_* ), wired in docker-compose.
- Drop server-side async_insert (redundant once we batch app-side).
- Bump PingTimeout 10s -> 20s (env MQTT_PING_TIMEOUT_SECONDS) for margin
  against Cloudflare WebSocket buffering jitter.
- Enqueue is non-blocking; rows are dropped+counted only if the buffer
  fills (ClickHouse unavailable). A failed batch is dropped and retried
  by the next flush (native blocks commit atomically).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* ingest: make MQTT KeepAlive configurable (MQTT_KEEPALIVE_SECONDS)

As a near-silent subscriber, paho emits a PINGREQ roughly every KeepAlive
seconds; lowering it sends client->server frames more often to keep the
Cloudflare-proxied WebSocket path warm in both directions, a lever for the
residual mid-stream "pingresp not received" stalls on the letsmesh broker.
Default unchanged (30s); wired through docker-compose.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* ingest: add configurable MQTT write timeout (MQTT_WRITE_TIMEOUT_SECONDS)

Bounds PINGREQ/SUBSCRIBE writes so a stalled write through the Cloudflare
WebSocket proxy can't hang the client. Default 0 (paho's existing no-timeout
behavior); wired through docker-compose. Recommended ~20s when behind a
buffering reverse proxy.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Alex Vanderpot <alex@Alexs-MacBook-Pro-2.local>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 22:42:46 -04:00
Alex Vanderpot 27c94e1aee node page: map-based location history + collapsible recent adverts (#43)
Replace the Location History coordinate table with an embedded Leaflet
map (new LocationHistoryMap component): one circle marker per position
with timestamp/coord popups, the most-recent point highlighted, and a
subtle polyline connecting points chronologically, auto-fit to bounds.

Cap Recent Adverts at the 5 most recent with a "Show more"/"Show less"
toggle and a count in the subtitle.

Co-authored-by: Alex Vanderpot <alex@Alexs-MacBook-Pro-2.local>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 22:38:16 -04:00
Alex Vanderpot eaaf729b15 map: render node popovers lazily and skip needless marker re-renders (#42)
The map page slowed the browser at high node counts (~5k) because every
marker eagerly rendered its popover to an HTML string at creation, and the
visual-update effects re-rendered every marker's icon and popup on each
selection change. Hovering a meshcore node (the default type) re-rendered
all markers.

Bind popups lazily so PopupContent is only rendered when a popup actually
opens, drop the now-unnecessary popup setContent calls, and re-skin only the
markers whose selected state changed instead of the whole set.

Co-authored-by: Alex Vanderpot <alex@Alexs-MacBook-Pro-2.local>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 22:02:40 -04:00
Alex Vanderpot 166ef69f51 Merge pull request #40 from ajvpot/ajvpot/clickhouse-live-materialized-views
clickhouse: back map, node search, and chat with live materialized views
2026-06-15 00:13:44 -04:00
Alex Vanderpot c689ca1b3d clickhouse: back map, node search, and chat with live materialized views
Replace the per-request argMax/GROUP-BY views with insert-triggered
(incremental) materialized views so map node positions, node search, and
public-channel chat read pre-aggregated state instead of re-scanning all of
meshcore_packets on every query.

- 005: meshcore_adverts_latest_state (AggregatingMergeTree of argMaxState/
  min/maxState) + incremental MV + backfill; meshcore_adverts_latest becomes a
  -Merge view with the identical column contract. Node search reads it directly;
  map (unified_latest_nodeinfo) is unchanged.
- 006: meshcore_public_channel_messages_raw, a decoded payload_type=5 MergeTree
  keyed (channel_hash, ingest_timestamp); chat dedups by message_id at read time
  over a timestamp-bounded scan. Streaming/pagination push channel+cursor onto
  the primary key.
- Neighbor-edge MVs stay hourly REFRESH (they read the preserved view).

Verified against full prod data (14.5M rows): exact parity (0 mismatches) and
5-9x faster reads with no regressions.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 00:13:00 -04:00
Alex Vanderpot ee65644fe9 Merge pull request #39 from ajvpot/ajvpot/region-system-iata-codes
Region system: IATA codes, data-driven groups, Grafana filtering
2026-06-14 17:42:26 -04:00
Alex Vanderpot c5ee493d8c region groups: data-driven generation, DB-sourced (drop TS/SQL sync)
Replace the hand-curated region groups with data-driven ones and make the
region_groups ClickHouse table the single source of truth.

- scripts/generate-region-groups.ts: offline generator — clusters regions by
  cross-region packet co-occurrence (min-share single-linkage) at two levels
  (broad "region" + tight "metro"), names clusters via `claude -p`, reconciles
  codes by member overlap so permalinks stay stable, and emits the region_groups
  seed. Migration 004 reseeded with the resulting 39 groups.
- Groups are DB-sourced: getRegionGroups() (cached) feeds /api/regions and the
  dropdown/labels; filtering resolves a selector in SQL to a region
  (region = 'X') or a group (region IN / hasAny ... SELECT region_code FROM
  region_groups WHERE group_code = ...). No hardcoded membership in TS;
  resolveSelector removed.
- Drop the TS<->SQL parity script (no membership left to sync); regionSql and
  the migration ALIAS are kept in sync by hand.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 17:40:44 -04:00
Alex Vanderpot a72c64a008 region system: derive IATA region codes from MQTT topics
Replace hardcoded (broker, topic) region slugs with uppercase IATA codes
derived from the meshcore/{IATA} base topic, discovered dynamically from
data (adding a region needs no code change). Adds region groups, Grafana
region/group filtering, and fixes the neighbor graph.

- regions.ts: single source of truth — regionFromTopic / normalizeRegion /
  regionSql / resolveSelector / selectorLabel. Legacy slugs (seattle->SEA)
  and bare meshcore + meshcore/salish -> SEA still resolve.
- regionGroups.ts + seeded region_groups table: PNW/CAL/DEU/POL.
- migration 004: region ALIAS column on meshcore_packets; 001 views expose
  region / regions[]; reworked neighbor MV (region-scoped, no cross-region
  edges, drops implausible >150km and (0,0) edges); scheduled meshcore_regions MV.
- API/streaming/actions resolve selectors; stream routes drop the hardcoded
  region allow-lists; map node query excludes (0,0) sentinel nodes.
- Dynamic region/group dropdowns (useRegions/RegionSelect); /api/regions.
- Grafana: cascading $region / $region_group template vars + panel filters.
- region-parity.ts (npm run check:regions) guards TS<->SQL drift.
- nix dev shell (flake.nix, Node 24).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 15:20:00 -04:00
Alex Vanderpot 72aa6be3d3 ingest: resubscribe on reconnect + staleness watchdog for zombie MQTT conns (#38)
The letsmesh broker was migrated behind Cloudflare and changed its topic
layout on 2026-06-02, which left prod's MQTT client in a zombie state:
connected per paho's IsConnected() (so the 30s monitor never rebuilt it) but
receiving zero messages, because the subscription was established only once
after the initial connect and never re-applied on paho auto-reconnects. Result:
12 days of silently missing letsmesh ingestion while davekeogh masked the loss.

Make reconnection robust instead of relying on broker-side session persistence:

- Subscribe inside the OnConnect handler so every (re)connect — including paho
  auto-reconnects — restores delivery. Use CleanSession(true)+ResumeSubs(false)
  so we never depend on the broker remembering our session.
- Add a per-broker data-staleness watchdog: a broker that reports connected but
  delivers no messages for MQTT_STALE_AFTER_SECONDS (default 300) is treated as a
  zombie and force-rebuilt (disconnect + fresh connect/subscribe). This catches
  exactly the failure IsConnected() misses.
- Reduce the external monitor to that watchdog role; transient drops are left to
  paho auto-reconnect rather than racing it with a brand-new client.
- Stable per-broker client IDs (by index) and pre-sized MQTTClients slice so
  indices stay aligned when an earlier broker fails; guard BrokerStatus/lastActivity
  with a mutex; promote connect/subscribe logs to Info for visibility.

Adds unit tests for the watchdog and env parsing; documents the new env var.

Co-authored-by: Alex Vanderpot <alex@Alexs-MacBook-Pro-2.local>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 11:55:33 -04:00
Alex Vanderpot a0ab900da3 clickhouse: upgrade image 25.6.2.5 -> 26.5.1.882
Fixes a 25.x global memory-tracker drift where the tracker pinned at the
max-memory cap (RSS far below it), causing the OvercommitTracker to kill every
query (map/stats/neighbors all 500ing). Deployed in-place on prod over the
existing data dir after a cold backup.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-29 10:16:22 -04:00
Alex Vanderpot 5966564b29 Merge pull request #35 from ajvpot/ajvpot/slow-chat-api-query
chat: fix slow /api/chat query (full-history aggregation)
2026-05-29 03:36:08 -04:00
Alex Vanderpot f7da741f74 chat: push time/channel filters into base scan to fix slow query
The /api/chat endpoint queried the meshcore_public_channel_messages
VIEW, which does GROUP BY payload over all payload_type=5 packets.
Filtering its output on ingest_timestamp/channel_hash can't push below
the GROUP BY (they're max(...)/derived-from-grouped-payload), so every
call re-aggregated the entire history (~8M rows / 1.2 GiB / ~700ms),
ignoring the ingest_timestamp primary key.

Replace the view reference with an inline subquery
(publicChannelMessagesSubquery) that pushes the time/channel filters
into the inner meshcore_packets scan, so partition + primary-key
pruning applies. Region filtering stays on the outer query since
origin_path_info only exists post-aggregation. Same change to the chat
streaming poller.

Verified on prod: identical output, 8.06M->114K rows read, ~700ms->28ms.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 03:33:06 -04:00
Alex Vanderpot 1940f995dc Merge pull request #34 from ajvpot/neighbor-refreshable-mvs
Speed up neighbors with hourly refreshable materialized views
2026-05-29 03:03:10 -04:00
Alex Vanderpot 02623e5559 neighbors: seattle->letsmesh region + fix path-edge prefix extraction
- Point the seattle region at the letsmesh broker (wss://mqtt-us-v1.letsmesh.net:443,
  topic meshcore/SEA) where Seattle traffic now lives.
- Fix a pre-existing bug in the path-edge extraction: `path` is a hex string of
  1-byte hop prefixes, so use substring(path, 2*i-1, 2) instead of
  hex(substring(path, i, 1)) (which re-hexed a single hex char and never matched
  the 2-char repeater prefixes -> path edges were always empty). Seattle now yields
  path edges again.

Verified on a full prod snapshot: the MV-backed "show all neighbors" query drops
from ~1.6s / 145M rows / 11.8 GiB to ~1ms / 108 rows / 3.8 KiB.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-29 03:02:09 -04:00
Alex Vanderpot cab5d821cf neighbors: serve from hourly refreshable materialized views
The two slow neighbor queries are converted to read precomputed tables that an
hourly REFRESH EVERY 1 HOUR materialized view maintains, instead of re-aggregating
meshcore_packets per request:

- meshcore_all_neighbor_edges: the global per-region edge graph (direct path_len=0
  adverts + repeater-prefix path edges) with endpoint details. getAllNodeNeighbors
  now filters it by region + bbox + lastSeen + has_location.
- meshcore_node_direct_neighbors: per-node direct adjacency (both directions) with
  neighbor details. getMeshcoreNodeNeighbors now filters it by node_public_key.

Also add the meshcore/SEA topic to the seattle region. Validated on a clean local
stack: migration 001->003 applies, both refreshable MVs create + refresh + populate.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-29 02:33:57 -04:00
Alex Vanderpot 49847c46af Merge pull request #33 from ajvpot/release-ingest
add ingest code: monorepo + one-command deploy
2026-05-29 01:51:20 -04:00
Alex Vanderpot 6be73b04a6 clickhouse: lift readonly read-size caps
The readonly profile's max_rows_to_read / max_bytes_to_read (500MB) is exceeded by
the map/stats views, which scan the full (growing) meshcore_packets table -> the web
app failed with TOO_MANY_BYTES. Remove the read-size caps; readonly=1, allow_ddl=0,
max_memory_usage and max_execution_time remain the guardrails.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-29 01:48:56 -04:00
Alex Vanderpot 7785158934 clickhouse: cap system log growth (TTL + disable profiler)
ClickHouse's internal diagnostics grew unbounded (text_log at Trace level and the
1s query profiler -> trace_log accumulated ~160G over months). Add short TTLs to
all system *_log tables, cap text_log at warning level, and disable the query
profiler in both profiles so trace_log stays empty.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-29 01:46:04 -04:00
Alex Vanderpot c233952840 pin clickhouse to 25.6.2.5 and grafana to 12.1.1
Pin the bundled service images to known versions for reproducible releases and
safe in-place reuse of an existing data dir (matching the production deployment).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-29 01:28:01 -04:00
Alex Vanderpot e6c4200448 Provision MeshCore Grafana dashboard
Add the MeshCore dashboard (exported from prod) as a provisioned file
dashboard, with a file provider config. Pin the ClickHouse datasource
uid to "clickhouse" so the dashboard's panel datasource references
resolve at provision time.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-29 01:24:02 -04:00
Alex Vanderpot 78bf1c5855 add Grafana service with provisioned ClickHouse datasource
Bundle Grafana (127.0.0.1:3000) with the grafana-clickhouse-datasource plugin
and an auto-provisioned ClickHouse datasource using the read-only user. Adds
GRAFANA_ADMIN_PASSWORD to .env.example. Verified: datasource health returns OK.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-29 01:19:42 -04:00
Alex Vanderpot 384f7f8b14 ingest: fix go module path to match repo
Module is now github.com/ajvpot/meshexplorer/ingest (the code lives under
ingest/ in the meshexplorer repo), updated from the old standalone
clickhouse-meshingest path. build/vet/test pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-29 01:19:42 -04:00
Alex Vanderpot b7bcca6bf3 release-prep: docs, migration dialect fix, drop unused UDF
- Add a deploy-focused root README; update the web app README (meshcore-only,
  point Docker usage at the unified root compose).
- Fix the migration runner: set the goose clickhouse dialect (it defaulted to
  postgres and failed to create its version table). Migrations now apply cleanly.
- Remove the unused meshcore decrypt UDF (meshcore_try_decrypt was never called
  by any view/query/code) and simplify the ClickHouse image to a single stage.

Verified end-to-end: `docker compose up` brings up clickhouse -> migrate ->
meshcoreingest + meshexplorer; live ingestion from the real MQTT brokers lands
packets in ClickHouse and the web API serves decoded meshcore nodes via the
readonly user.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-29 01:01:33 -04:00
Alex Vanderpot 6a1536410c release-prep: unified docker-compose + .env.example
Single root compose brings up the whole stack on one internal network:
clickhouse (healthchecked) -> migrate (one-shot) -> meshcoreingest + meshexplorer,
with the discord-bot behind a "bot" profile. Web app/bot connect as the readonly
ClickHouse user; ingest/migrate use the default user. Named volume replaces the
host /tank path. .env.example documents every variable with placeholders; root
.gitignore keeps real .env out of git. Drops the per-project compose files.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-29 01:01:33 -04:00
Alex Vanderpot cd1a345242 release-prep: remove meshtastic from web app
Meshtastic was UI-filtering only (no meshtastic data backend). Drop it as a
node type/option, and simplify the map marker/cluster/popup rendering now that
every node is meshcore. Update product copy to MeshCore-only. The nodeTypes
query plumbing stays (the unified view's type is always 'meshcore').
Production build passes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-29 01:01:33 -04:00
Alex Vanderpot 83978609f0 ingest: meshcore MQTT->ClickHouse stack
Vendor the ingest service under ingest/ and move the web app under meshexplorer/.
The ingest builds the meshcoreingest daemon and the goose migration runner,
applies the meshcore ClickHouse schema (packets, adverts, unified node view),
and loads its MQTT broker list and ClickHouse settings entirely from environment
variables (MQTT_BROKERS as a JSON array, CLICKHOUSE_*). No credentials are baked
into the source.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-29 01:01:33 -04:00
ajvpot 815d465566 Bump nextjs 2026-01-06 03:11:33 +01:00
ajvpot e6e74589c1 save map position, stroke width slider 2025-09-29 18:06:11 +02:00
ajvpot b173351011 path display overhaul 2025-09-23 04:04:17 +02:00
ajvpot 665b91f1fa add analyze links to PathVisualization 2025-09-23 03:13:10 +02:00
Alex Vanderpot 580c315a20 Merge pull request #24 from CoryNQ1E/main
Support Lowercase Public Keys in Node API Endpoints
2025-09-22 19:38:14 -04:00
ajvpot b12d570142 analyze link in discord bot relay 2025-09-23 01:28:47 +02:00
ajvpot add14fa37b fix concurrency in bot 2025-09-23 00:05:56 +02:00
ajvpot b296017331 Add anchors to stats pages (closes #26) 2025-09-19 17:43:00 +02:00
ajvpot a1416bcc05 handle numeric values in useQueryParams (closes #25) 2025-09-19 17:34:12 +02:00
ajvpot f34836763c min packet thresh 2025-09-18 01:58:48 +02:00
ajvpot 4faa6491c2 ui rearranging 2025-09-18 01:53:15 +02:00
ajvpot 0f1327469d hide meshcore overlay setting 2025-09-18 01:41:52 +02:00
ajvpot f3ad947b9a All neighbors, map layer settings 2025-09-18 01:40:45 +02:00
ajvpot fe6e44cc80 path magic 2025-09-17 17:46:43 +02:00
ajvpot ea32ccea77 support # channels 2025-09-15 23:03:29 +02:00
ajvpot ed9943e091 missing file 2025-09-15 22:52:25 +02:00
ajvpot 4c7e7d8e1c highlights for discord 2025-09-15 04:36:14 +02:00
ajvpot 0aac045c1b discord bot: support threads 2025-09-15 04:14:58 +02:00
ajvpot 9f1056095b all neighbors 2025-09-15 04:14:50 +02:00
ajvpot 7addbb3165 update stats and path viz to use 2 day last seen time 2025-09-14 18:26:47 +02:00
ajvpot e41dfe83d3 stuff 2025-09-14 18:25:56 +02:00
ajvpot 3ade624a51 emoji 2025-09-13 03:31:30 +02:00
ajvpot 8ac7d5eece Discord bot, profile picture endpoint, channel management ux, streaming apis, refactor region logic 2025-09-13 02:57:52 +02:00