* ingest: batch ClickHouse inserts to stop MQTT flapping & packet loss
The meshcore handler did a synchronous per-message ClickHouse insert on
paho's single inbound goroutine. At ~86ms/insert (single-row inserts +
async_insert wait + materialized views) the goroutine couldn't keep up
with the high-volume letsmesh feed, so it stalled past PingTimeout and
paho declared "pingresp not received" and reconnected — ~847 cycles in
19.5h, ~45% downtime, ~50% of letsmesh packets lost. The low-volume
davekeogh broker never saturated the goroutine and was unaffected.
Decouple receipt from insertion: the handler now enqueues decoded rows
onto a buffered channel and a single background writer flushes them to
meshcore_packets in batched native inserts (every MESHCORE_BATCH_FLUSH_
SECONDS or MESHCORE_BATCH_MAX_ROWS rows). The inbound goroutine never
blocks, so PINGRESP is always processed in time.
- New batch writer with env-configurable flush interval / max rows /
buffer size (MESHCORE_BATCH_* ), wired in docker-compose.
- Drop server-side async_insert (redundant once we batch app-side).
- Bump PingTimeout 10s -> 20s (env MQTT_PING_TIMEOUT_SECONDS) for margin
against Cloudflare WebSocket buffering jitter.
- Enqueue is non-blocking; rows are dropped+counted only if the buffer
fills (ClickHouse unavailable). A failed batch is dropped and retried
by the next flush (native blocks commit atomically).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* ingest: make MQTT KeepAlive configurable (MQTT_KEEPALIVE_SECONDS)
As a near-silent subscriber, paho emits a PINGREQ roughly every KeepAlive
seconds; lowering it sends client->server frames more often to keep the
Cloudflare-proxied WebSocket path warm in both directions, a lever for the
residual mid-stream "pingresp not received" stalls on the letsmesh broker.
Default unchanged (30s); wired through docker-compose.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* ingest: add configurable MQTT write timeout (MQTT_WRITE_TIMEOUT_SECONDS)
Bounds PINGREQ/SUBSCRIBE writes so a stalled write through the Cloudflare
WebSocket proxy can't hang the client. Default 0 (paho's existing no-timeout
behavior); wired through docker-compose. Recommended ~20s when behind a
buffering reverse proxy.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Alex Vanderpot <alex@Alexs-MacBook-Pro-2.local>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace hardcoded (broker, topic) region slugs with uppercase IATA codes
derived from the meshcore/{IATA} base topic, discovered dynamically from
data (adding a region needs no code change). Adds region groups, Grafana
region/group filtering, and fixes the neighbor graph.
- regions.ts: single source of truth — regionFromTopic / normalizeRegion /
regionSql / resolveSelector / selectorLabel. Legacy slugs (seattle->SEA)
and bare meshcore + meshcore/salish -> SEA still resolve.
- regionGroups.ts + seeded region_groups table: PNW/CAL/DEU/POL.
- migration 004: region ALIAS column on meshcore_packets; 001 views expose
region / regions[]; reworked neighbor MV (region-scoped, no cross-region
edges, drops implausible >150km and (0,0) edges); scheduled meshcore_regions MV.
- API/streaming/actions resolve selectors; stream routes drop the hardcoded
region allow-lists; map node query excludes (0,0) sentinel nodes.
- Dynamic region/group dropdowns (useRegions/RegionSelect); /api/regions.
- Grafana: cascading $region / $region_group template vars + panel filters.
- region-parity.ts (npm run check:regions) guards TS<->SQL drift.
- nix dev shell (flake.nix, Node 24).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Pin the bundled service images to known versions for reproducible releases and
safe in-place reuse of an existing data dir (matching the production deployment).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Bundle Grafana (127.0.0.1:3000) with the grafana-clickhouse-datasource plugin
and an auto-provisioned ClickHouse datasource using the read-only user. Adds
GRAFANA_ADMIN_PASSWORD to .env.example. Verified: datasource health returns OK.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Single root compose brings up the whole stack on one internal network:
clickhouse (healthchecked) -> migrate (one-shot) -> meshcoreingest + meshexplorer,
with the discord-bot behind a "bot" profile. Web app/bot connect as the readonly
ClickHouse user; ingest/migrate use the default user. Named volume replaces the
host /tank path. .env.example documents every variable with placeholders; root
.gitignore keeps real .env out of git. Drops the per-project compose files.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Vendor the ingest service under ingest/ and move the web app under meshexplorer/.
The ingest builds the meshcoreingest daemon and the goose migration runner,
applies the meshcore ClickHouse schema (packets, adverts, unified node view),
and loads its MQTT broker list and ClickHouse settings entirely from environment
variables (MQTT_BROKERS as a JSON array, CLICKHOUSE_*). No credentials are baked
into the source.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>