pyMC_Repeater

mirror of https://github.com/rightup/pyMC_Repeater.git synced 2026-05-03 20:22:15 +02:00

Author	SHA1	Message	Date
Yellowcooln	25e55bdca8	Merge remote-tracking branch 'origin/dev' into buildroot	2026-04-23 20:28:19 -04:00
Yellowcooln	d22ba91f19	Apply Buildroot preset values directly	2026-04-23 16:11:44 -04:00
Yellowcooln	91025e4970	Use raw tty for Buildroot password prompt	2026-04-23 16:09:09 -04:00
Yellowcooln	74f5963a85	Use Buildroot config flow by default	2026-04-23 16:07:32 -04:00
Yellowcooln	9677a39aa8	Make Buildroot password prompt sh-safe	2026-04-23 16:05:31 -04:00
Yellowcooln	ab2c82db16	Drive Buildroot radio config from JSON	2026-04-23 16:01:32 -04:00
Yellowcooln	cc9c81de2a	Seed Buildroot config from repo installer	2026-04-23 15:57:21 -04:00
Lloyd	37cd137bbb	Merge pull request #191 from tjdownes/perf/in-flight-cap perf: replace _route_tasks set with bounded in-flight counter	2026-04-23 16:08:41 +01:00
Yellowcooln	e5c7632700	Handle Buildroot service restarts	2026-04-23 11:05:59 -04:00
Yellowcooln	7e541cd1f1	Use image runtime modules on Buildroot	2026-04-23 10:52:52 -04:00
Yellowcooln	f92dd4ab5f	Recreate contaminated Buildroot venvs	2026-04-23 10:36:20 -04:00
Yellowcooln	1b3f0490ec	Repair Buildroot venv build backend	2026-04-23 10:22:13 -04:00
Yellowcooln	a6818367e8	Fail fast on unusable Buildroot native modules	2026-04-23 09:04:15 -04:00
Yellowcooln	07dc287f50	Keep Buildroot manager sh-compatible	2026-04-23 08:58:58 -04:00
Yellowcooln	0713b571d8	Install Buildroot deps from wheel sources	2026-04-23 08:58:13 -04:00
Yellowcooln	ba2136dfa6	Avoid source builds on Buildroot install	2026-04-23 00:10:50 -04:00
Yellowcooln	95918dc43d	Prefer Rightup wheels on Buildroot install	2026-04-23 00:03:57 -04:00
Yellowcooln	bc809c3021	Explain Buildroot install progress	2026-04-23 00:01:03 -04:00
Yellowcooln	e36b477230	Run Buildroot service as root	2026-04-22 23:14:45 -04:00
Yellowcooln	6c0f4fb842	Fix init script generation for BusyBox	2026-04-22 22:35:16 -04:00
Yellowcooln	b58578acd5	Drop yq dependency from Buildroot install flow	2026-04-22 22:31:46 -04:00
Yellowcooln	4d6993c9e1	Allow Buildroot manager to run under sh	2026-04-22 22:30:18 -04:00
Yellowcooln	34fe07d7b0	Split Buildroot flow into dedicated manager	2026-04-22 22:26:18 -04:00
Yellowcooln	7cbaa9115e	Add Buildroot support to manage script	2026-04-22 21:48:40 -04:00
Lloyd	852939b701	fix: reorder MQTT error handling.	2026-04-22 14:02:42 +01:00
Lloyd	1626b3f307	feat: add max flood hops configuration to repeater settings	2026-04-22 13:52:40 +01:00
TJ Downes	7d1aa57321	fix(router): drain in-flight tasks on shutdown; add drop counter; add tests Addresses PR 191 reviewer feedback: 1. Shutdown drain stop() now waits up to 5 s for in-flight _route_packet tasks to finish, then cancels any that remain. Previously only the queue-consumer loop was cancelled; created tasks were abandoned with no guarantee they completed. Mechanism: _route_tasks set tracks live tasks (added on create, discarded in the done-callback). stop() takes a snapshot and calls asyncio.wait() with timeout=5.0, then cancels the still-pending subset. 2. Drop counter _cap_drop_count increments each time a packet is dropped at the cap. The running total is included in every WARNING log line and also printed at shutdown so operators can tell at a glance whether the safety valve is actually firing in production. 3. Tests (tests/test_packet_router.py) test_cap_drops_packets_when_full — cap=3, send 8 → 5 drops, 3 in-flight test_cap_drop_count_increments — count increments by 1 per drop test_cap_drop_count_zero_... — count stays 0 when cap never reached test_stop_waits_for_in_flight_tasks — slow task (0.2 s) completes, not cancelled test_stop_cancels_tasks_...timeout — hanging task cancelled after timeout test_route_tasks_set_cleaned_up — set empty after all tasks finish test_counter_matches_set_size — _in_flight == len(_route_tasks) at cap Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-22 05:46:49 -07:00
Lloyd	827b9a9f98	fix: improve logging for MQTT error decoding to reduce noise	2026-04-22 13:28:23 +01:00
Lloyd	9eae6ed872	Merge pull request #193 from tjdownes/perf/sqlite-wal-threadlocal perf: thread-local SQLite connections, synchronous=NORMAL, dedup indexes	2026-04-22 11:00:50 +01:00
Lloyd	986e22de1f	Merge pull request #195 from tjdownes/perf/advert-deque This is a solid, well-analyzed optimization, thank you	2026-04-22 10:45:46 +01:00
Lloyd	af79eaf63f	Merge pull request #192 from tjdownes/perf/hash-once perf: compute packet hash once per packet in the forwarding hot path	2026-04-22 10:38:04 +01:00
Lloyd	5c947e6c2e	feat: enhance WebSocket handling and add throttling for stats broadcasting	2026-04-22 09:48:27 +01:00
Lloyd	40ec2ba293	Merge pull request #194 from tjdownes/perf/debug-log-guards Merged this now. It’s a safe change with no behavioural impact, and it removes unnecessary work in the hot paths when DEBUG logging is off. Happy to revisit if we want to standardise on lazy formatting later, but this gives us an immediate win.	2026-04-22 09:45:55 +01:00
Lloyd	96b3daf6e8	Merge pull request #196 from tjdownes/perf/rrdtool-batch perf(rrdtool): cache get_data() result for 60 s to avoid repeated disk reads	2026-04-22 09:35:31 +01:00
Lloyd	0a77fe67ce	feat: reapply ui changes from PR	2026-04-22 08:39:15 +01:00
Lloyd	db41080dea	Merge pull request #187 from Rigear/feat/mqtt_merge Feat/mqtt merge	2026-04-22 08:37:22 +01:00
Rigear	f50919858d	fix: Force merged web assets from fix-perform-speed branch to fix bad merge of the files	2026-04-21 21:22:02 -07:00
Rigear	c7b2b02316	fix: Fixed extra topic publishing to letsmesh	2026-04-21 21:21:13 -07:00
Rigear	d318334288	Merge remote-tracking branch 'origin/fix-perform-speed' into feat/mqtt_merge	2026-04-21 20:59:42 -07:00
TJ Downes	d592af6e19	fix(rrdtool): replace rrdtool.info() with self-tracked timestamp to eliminate allocation storm Problem ------- update_packet_metrics() called rrdtool.info() (cached for 5 s) to get the RRD's last_update timestamp. rrdtool.info() returns a massive Python dict: 17 data sources × 5 RRAs × ~8 fields each = ~700+ dict entries per call. tracemalloc showed +10696 new allocations / +251 KB at this exact line, flagged as "Investigate" in the memory diagnostics dashboard. The rrdtool.info() approach was also unnecessarily complex: it required a 5-second secondary cache, a _pending_rrd_update buffer, and two extra instance attributes — all to answer one question ("did we already write this period?") that we can answer ourselves with a single integer. Fix --- Replace _last_rrd_info_cache / _last_rrd_info_time / _pending_rrd_update with a single self._last_rrd_update: int = 0 that stores the timestamp of the last successful rrdtool.update() call. The throttle check becomes: if timestamp <= self._last_rrd_update: return On success: self._last_rrd_update = timestamp Zero dict allocations per call. The only downside vs rrdtool.info() is that _last_rrd_update resets to 0 on process restart, meaning the first packet after a restart always triggers a write — correct behaviour. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-21 20:50:27 -07:00
TJ Downes	fdd788212d	perf(rrdtool): cache get_data() result for 60 s to avoid repeated disk reads Problem ------- rrdtool.fetch() is a blocking C library call that reads 24 hours of RRD data from disk. The dashboard can call get_data() on every page refresh. On an SD card each fetch can cost several milliseconds of I/O, and because the RRD step is 60 seconds the data cannot change more often than that — any fetch within the same 60-second window returns identical data. The combined-optimizations branch had a 60-second read cache; rightup's batching refactor inadvertently removed it. This PR restores it. Solution -------- * Add self._get_data_cache: tuple = (0.0, None) to __init__ * In get_data(): set use_cache = (start_time is None and end_time is None) - if use_cache and cache is < 60 s old: return cached result immediately - after a successful live fetch with use_cache: store (now, result) * Explicit start_time / end_time callers always bypass the cache so fine-grained or historical queries are never stale Why 60 s TTL? The RRD step is 60 s, so the database cannot hold a newer sample until the next step boundary. A 60-second cache is tight enough that the dashboard always shows data ≤ one step stale, and loose enough that a burst of refreshes costs one disk read instead of N. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-21 19:55:38 -07:00
TJ Downes	c52ae53cc6	perf(advert): replace list with deque for _recent_drops; use islice for _known_neighbors cap Problem 1 — _recent_drops: the list was evicted with pop(0), which is an O(n) memmove every time a drop is recorded. With maxlen=20 this is negligible today, but pop(0) on a list is always O(n) and the pattern is worth eliminating. Problem 2 — _known_neighbors cap: the eviction path did set(list(self._known_neighbors)[500:]) which first materialises the entire set as a list (O(n) allocation) before slicing. itertools.islice works directly on the set iterator and only allocates the 500 kept items, halving peak memory pressure during cleanup. Changes: * Import itertools (already absent from this file) * Import deque from collections alongside OrderedDict * self._recent_drops initialised as deque(maxlen=20); self._max_recent_drops removed (maxlen is the single source of truth) * Drop-recording block: rebuild deque from generator (preserves pubkey dedup filter) then append — automatic eviction replaces the explicit pop(0) guard * Known-neighbors cap: itertools.islice(self._known_neighbors, 500) replaces list(self._known_neighbors)[500:] Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-21 19:52:44 -07:00
TJ Downes	c0163ce897	perf: guard hot-path debug log f-strings with isEnabledFor(DEBUG) Python evaluates f-string arguments before calling logger.debug(), so in production (INFO level) every debug log call in the hot path still paid the cost of string formatting even though the output was discarded. The most expensive sites are in __call__ (runs on every received packet): - "RX packet: header=0x{...}, payload_len=..., path_len=..., rssi=..., snr=..." - "Packet header=0x{...}, type=..., route=..." And in _calculate_tx_delay (runs on every forwarded packet): - "Route=FLOOD/DIRECT, len=...B, airtime=...ms, delay=...s" - "Congestion detected, score=..., delay multiplier=..." Plus transport code and local-TX debug logs (less frequent but same issue). Fix: wrap each f-string logger.debug() call with if logger.isEnabledFor(logging.DEBUG): so the f-string is never constructed when debug logging is disabled. logger.isEnabledFor() is a pure in-memory integer comparison — essentially free at runtime. In production at INFO level this eliminates string concatenation, attribute lookups (packet.header, len(packet.payload), etc.), and format operations on every forwarded packet. Eight call sites guarded; no logic changes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-21 19:46:11 -07:00
TJ Downes	3397d972ce	perf: thread-local SQLite connections, synchronous=NORMAL, dedup indexes Five targeted changes to sqlite_handler.py, all in the same file. 1. Thread-local persistent connections _connect() previously opened a new sqlite3.connect() on every DB call and ran journal_mode + busy_timeout PRAGMAs each time. On SD-card storage each connection open involves file-system operations; each PRAGMA is a round-trip. threading.local() now caches one connection per thread (write executor thread + event-loop/HTTP threads), eliminating per-call setup overhead. 2. PRAGMA synchronous=NORMAL Default synchronous=FULL flushes WAL frames to disk after every transaction. NORMAL flushes only at WAL checkpoints — safe for this workload (no data loss beyond the current transaction on power failure) and significantly faster on SD cards, which have slow fsync (5-20ms per flush). 3. Migration 8: UNIQUE index on companion_messages(companion_hash, packet_hash) companion_push_message previously deduped via SELECT + INSERT (two statements, two SD-card reads per message). The new UNIQUE index enables INSERT OR IGNORE, replacing the round-trip with a single atomic statement. 4. Migration 9: UNIQUE index on adverts(pubkey) Without this index store_advert's ON CONFLICT clause cannot fire and each advert inserts a new row instead of updating the existing one — unbounded table growth on busy meshes. The migration deduplicates existing rows (keeping the most-recently-seen per pubkey) before adding the index. 5. Remove duplicate get_unsynced_count definition The method was defined twice with the same signature. Python silently uses the last definition; the first was dead code with reversed SQL parameter binding order. Removed the first; added a note to the surviving definition. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-21 19:41:50 -07:00
TJ Downes	4e16fd040d	perf: compute packet hash once per packet in the forwarding hot path Before this change, calculate_packet_hash() (SHA-256 + hex + upper) was called 3 times per forwarded packet and 4 times per dropped packet: __call__ → pkt_hash_full = packet.calculate_packet_hash() #1 → flood/direct_forward → is_duplicate → calculate_packet_hash() #2 → flood/direct_forward → mark_seen → calculate_packet_hash() #3 (drop) → _get_drop_reason → is_duplicate → calculate_packet_hash() #4 pkt_hash_full was computed in __call__ but never threaded down into process_packet, flood_forward, direct_forward, is_duplicate, or _get_drop_reason. Each method recomputed it independently. Fix: add optional packet_hash: Optional[str] = None to is_duplicate, _get_drop_reason, flood_forward, direct_forward, and process_packet. Pass pkt_hash_full from __call__ through the chain. Each method uses the provided hash or falls back to computing it — preserving backward compatibility for external callers (TraceHelper, etc.) that have no pre-computed hash. Result: 1 SHA-256 computation per packet in the hot path regardless of whether the packet is forwarded or dropped. Also adds explicit INVARIANT docstrings to flood_forward, direct_forward, and is_duplicate documenting that these methods must remain synchronous (no await). The is_duplicate + mark_seen pair is atomic within the asyncio event loop; adding an await between them would allow two concurrent tasks to both pass the duplicate check for the same packet — forwarding it twice. Docs: docs/pr_hash_once.md — problem analysis, call-chain diagram, per-method diffs, quantification (~3-8 µs saved per packet), test plan (including hash-count assertion), and proof that passing the original's hash to the deep-copied packet is correct. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-21 19:28:45 -07:00
TJ Downes	cadec00117	perf: replace _route_tasks set with bounded in-flight counter Replace the _route_tasks set in PacketRouter with a simple integer counter (_in_flight / _max_in_flight=30) and add an early-drop guard in _process_queue. Problems solved: 1. No cap on concurrent sleeping tasks: burst arrivals (multi-hop amplification, collision retries) could stack unbounded _route_packet tasks, each holding a packet closure and asyncio Task overhead, before the duty-cycle gate fired. 2. _route_tasks set held a strong reference to every Task object for the full duration of its sleep — unnecessary in Python 3.12+ where the event loop already holds tasks alive. 3. stop() iterated the full set to cancel tasks on shutdown — O(n) where n is the in-flight count at shutdown time. Fix: _in_flight counter increments before create_task and decrements in the _on_route_done callback. The cap check (>= 30) in _process_queue is a last-resort safety valve — LoRa airtime and the duty-cycle gate keep _in_flight in the low single digits under normal load. Also lower companion dedup prune threshold from 1000 to 200: the original 1000 allowed stale entries to accumulate for hundreds of PATH packets before the O(n) dict comprehension sweep ran. Trade-off documented: explicit task cancellation on shutdown is removed; tasks are cancelled implicitly by event loop shutdown with identical outcome (no packet transmits after the radio is closed regardless). Docs: docs/pr_in_flight_cap.md — full problem analysis, alternative approaches (semaphore, keep set + add cap), proof of counter sufficiency, rationale for cap=30, and unit + field test plan. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-21 18:47:16 -07:00
Lloyd	c82f0cfce6	feat:add ui websockets teardown.	2026-04-21 14:47:18 +01:00
Lloyd	be56e919fd	feat: add server-side airtime bucket aggregation for optimized chart rendering	2026-04-21 14:46:30 +01:00
Lloyd	81a3b70415	feat: implement graceful shutdown handling and version cache optimizations	2026-04-21 12:07:08 +01:00
Lloyd	9797e08421	feat: implement background scheduling for deferred network publishing tasks, tidy shutdown process	2026-04-21 10:07:15 +01:00

1 2 3 4 5 ...

608 Commits