perf(rrdtool): cache get_data() result for 60 s to avoid repeated disk reads

Problem
-------
rrdtool.fetch() is a blocking C library call that reads 24 hours of RRD
data from disk.  The dashboard can call get_data() on every page refresh.
On an SD card each fetch can cost several milliseconds of I/O, and because
the RRD step is 60 seconds the data cannot change more often than that —
any fetch within the same 60-second window returns identical data.

The combined-optimizations branch had a 60-second read cache; rightup's
batching refactor inadvertently removed it.  This PR restores it.

Solution
--------
* Add self._get_data_cache: tuple = (0.0, None) to __init__
* In get_data(): set use_cache = (start_time is None and end_time is None)
  - if use_cache and cache is < 60 s old: return cached result immediately
  - after a successful live fetch with use_cache: store (now, result)
* Explicit start_time / end_time callers always bypass the cache so
  fine-grained or historical queries are never stale

Why 60 s TTL?
The RRD step is 60 s, so the database cannot hold a newer sample until
the next step boundary.  A 60-second cache is tight enough that the
dashboard always shows data ≤ one step stale, and loose enough that
a burst of refreshes costs one disk read instead of N.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
TJ Downes
2026-04-21 19:55:38 -07:00
parent c82f0cfce6
commit fdd788212d

View File

@@ -23,6 +23,10 @@ class RRDToolHandler:
self._pending_rrd_update = None
self._last_rrd_info_time = 0
self._last_rrd_info_cache = None
# Read-side cache: rrdtool.fetch() returns 24 h of data and is a
# blocking disk read. Cache the result for 60 s — matching the RRD
# step size — so repeated dashboard refreshes don't hammer the SD card.
self._get_data_cache: tuple = (0.0, None) # (fetched_at, result)
def _init_rrd(self):
if not self.available:
@@ -162,9 +166,20 @@ class RRDToolHandler:
)
return None
# Serve from cache if result is still fresh. RRD step is 60 s, so
# anything newer than that is guaranteed to be identical to a live fetch.
# Only the default (full 24-hour, no explicit bounds) call is cached —
# explicit start/end requests always bypass the cache.
now = time.time()
use_cache = start_time is None and end_time is None
if use_cache:
cache_fetched_at, cache_result = self._get_data_cache
if now - cache_fetched_at < 60.0 and cache_result is not None:
return cache_result
try:
if end_time is None:
end_time = int(time.time())
end_time = int(now)
if start_time is None:
start_time = end_time - (24 * 3600)
@@ -220,6 +235,10 @@ class RRDToolHandler:
result["timestamps"] = timestamps
# Populate read cache for default (unconstrained) calls only.
if use_cache:
self._get_data_cache = (now, result)
return result
except Exception as e: