Addresses PR 191 reviewer feedback:
1. Shutdown drain
stop() now waits up to 5 s for in-flight _route_packet tasks to finish,
then cancels any that remain. Previously only the queue-consumer loop was
cancelled; created tasks were abandoned with no guarantee they completed.
Mechanism: _route_tasks set tracks live tasks (added on create, discarded
in the done-callback). stop() takes a snapshot and calls asyncio.wait()
with timeout=5.0, then cancels the still-pending subset.
2. Drop counter
_cap_drop_count increments each time a packet is dropped at the cap.
The running total is included in every WARNING log line and also printed
at shutdown so operators can tell at a glance whether the safety valve is
actually firing in production.
3. Tests (tests/test_packet_router.py)
test_cap_drops_packets_when_full — cap=3, send 8 → 5 drops, 3 in-flight
test_cap_drop_count_increments — count increments by 1 per drop
test_cap_drop_count_zero_... — count stays 0 when cap never reached
test_stop_waits_for_in_flight_tasks — slow task (0.2 s) completes, not cancelled
test_stop_cancels_tasks_...timeout — hanging task cancelled after timeout
test_route_tasks_set_cleaned_up — set empty after all tasks finish
test_counter_matches_set_size — _in_flight == len(_route_tasks) at cap
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Reviewer concern (PR 190):
The 1-second backoff sleep for local_transmission retry happened inside
`async with self._tx_lock`, blocking all other queued TX tasks for the
full second — hurting latency and throughput under load.
Fix — tighten lock scope to one attempt per acquisition:
Before: acquire lock → [attempt 0 → sleep(1) → attempt 1] → release
After: for each attempt:
[sleep(1) if retry] ← OUTSIDE the lock
acquire lock
re-check can_transmit ← fresh check every acquisition
attempt single send
record_tx on success
release lock
The duty-cycle gate now runs on every lock acquisition (not just the first),
which is correct: airtime state may change during the backoff sleep.
Tests added (tests/test_tx_lock.py):
1. test_concurrent_sends_do_not_interleave — two tasks racing to the same
delay timer must never overlap inside send_packet.
2. test_duty_cycle_toctou_is_fixed — second packet is dropped when the
first consumes the budget inside the lock.
3. test_local_retry_releases_lock_during_backoff — a concurrent relayed
packet fires at ~0.1s while local retry sleeps 1s; confirms it is not
blocked by the backoff.
4. test_non_local_failure_propagates — relayed send failure raises
immediately with exactly one attempt.
5. test_duty_cycle_rechecked_on_retry — if the budget is exhausted during
backoff, the retry is dropped by the in-lock gate (not sent).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Merged this now. It’s a safe change with no behavioural impact, and it removes unnecessary work in the hot paths when DEBUG logging is off. Happy to revisit if we want to standardise on lazy formatting later, but this gives us an immediate win.