Merge pull request #68 from jkrauska/profileTop

Add database indexes for 10X improvement in page load for /top
2026-03-04 23:27:46 +01:00 · 2025-10-10 12:58:41 -07:00
parent d7b830e2f7 60131007df
commit ae0b0944f0
5 changed files with 444 additions and 3 deletions
--- a/PERFORMANCE_OPTIMIZATION.md
+++ b/PERFORMANCE_OPTIMIZATION.md
@@ -0,0 +1,203 @@
+# /top Endpoint Performance Optimization
+
+## Problem
+The `/top` endpoint was taking over 1 second to execute due to inefficient database queries. The query joins three tables (node, packet, packet_seen) and performs COUNT aggregations on large result sets without proper indexes.
+
+## Root Cause Analysis
+
+The `get_top_traffic_nodes()` query in `meshview/store.py` executes:
+
+```sql
+SELECT 
+    n.node_id,
+    n.long_name,
+    n.short_name,
+    n.channel,
+    COUNT(DISTINCT p.id) AS total_packets_sent,
+    COUNT(ps.packet_id) AS total_times_seen
+FROM node n
+LEFT JOIN packet p ON n.node_id = p.from_node_id
+    AND p.import_time >= DATETIME('now', 'localtime', '-24 hours')
+LEFT JOIN packet_seen ps ON p.id = ps.packet_id
+GROUP BY n.node_id, n.long_name, n.short_name
+HAVING total_packets_sent > 0
+ORDER BY total_times_seen DESC;
+```
+
+### Performance Bottlenecks Identified:
+
+1. **Missing composite index on packet(from_node_id, import_time)**
+   - The query filters packets by BOTH `from_node_id` AND `import_time >= -24 hours`
+   - Without a composite index, SQLite must:
+     - Scan using `idx_packet_from_node_id` index
+     - Then filter each result by `import_time` (expensive!)
+   
+2. **Missing index on packet_seen(packet_id)**
+   - The LEFT JOIN to packet_seen uses `packet_id` as the join key
+   - Without an index, SQLite performs a table scan for each packet
+   - With potentially millions of packet_seen records, this is very slow
+
+## Solution
+
+### 1. Added Database Indexes
+
+Modified `meshview/models.py` to include two new indexes:
+
+```python
+# In Packet class
+Index("idx_packet_from_node_time", "from_node_id", desc("import_time"))
+
+# In PacketSeen class  
+Index("idx_packet_seen_packet_id", "packet_id")
+```
+
+### 2. Added Performance Profiling
+
+Modified `meshview/web.py` `/top` endpoint to include:
+- Timing instrumentation for database queries
+- Timing for data processing
+- Detailed logging with `[PROFILE /top]` prefix
+- On-page performance metrics display
+
+### 3. Created Migration Script
+
+Created `add_db_indexes.py` to add indexes to existing databases.
+
+## Implementation Steps
+
+### Step 1: Stop the Database Writer
+```bash
+# Stop startdb.py if it's running
+pkill -f startdb.py
+```
+
+### Step 2: Run Migration Script
+```bash
+python add_db_indexes.py
+```
+
+Expected output:
+```
+======================================================================
+Database Index Migration for /top Endpoint Performance
+======================================================================
+Connecting to database: sqlite+aiosqlite:///path/to/packets.db
+
+======================================================================
+Checking for index: idx_packet_from_node_time
+======================================================================
+Creating index idx_packet_from_node_time...
+  Table: packet
+  Columns: from_node_id, import_time DESC
+  Purpose: Speeds up filtering packets by sender and time range
+✓ Index created successfully in 2.34 seconds
+
+======================================================================
+Checking for index: idx_packet_seen_packet_id
+======================================================================
+Creating index idx_packet_seen_packet_id...
+  Table: packet_seen
+  Columns: packet_id
+  Purpose: Speeds up joining packet_seen with packet table
+✓ Index created successfully in 3.12 seconds
+
+... (index listings)
+
+======================================================================
+Migration completed successfully!
+======================================================================
+```
+
+### Step 3: Restart Services
+```bash
+
+# Restart server  
+python mvrun.py &
+```
+
+### Step 4: Verify Performance Improvement
+
+1. Visit `/top` endpoint eg http://127.0.0.1:8081/top?perf=true
+2. Scroll to bottom of page
+3. Check the Performance Metrics panel
+4. Compare DB query time before and after
+
+**Expected Results:**
+- **Before:** 1000-2000ms query time
+- **After:** 50-200ms query time  
+- **Improvement:** 80-95% reduction
+
+## Performance Metrics
+
+The `/top` page now displays at the bottom:
+
+```
+⚡ Performance Metrics
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+Database Query:    45.23ms
+Data Processing:   2.15ms
+Total Time:        47.89ms
+Nodes Processed:   156
+Total Packets:     45,678
+Times Seen:        123,456
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+```
+
+
+## Technical Details
+
+### Why Composite Index Works
+
+SQLite can use a composite index `(from_node_id, import_time DESC)` to:
+1. Quickly find all packets for a specific `from_node_id`
+2. Filter by `import_time` without additional I/O (data is already sorted)
+3. Both operations use a single index lookup
+
+### Why packet_id Index Works
+
+The `packet_seen` table can have millions of rows. Without an index:
+- Each packet requires a full table scan of packet_seen
+- O(n * m) complexity where n=packets, m=packet_seen rows
+
+With the index:
+- Each packet uses an index lookup
+- O(n * log m) complexity - dramatically faster
+
+### Index Size Impact
+
+- `idx_packet_from_node_time`: ~10-20% of packet table size
+- `idx_packet_seen_packet_id`: ~5-10% of packet_seen table size
+- Total additional disk space: typically 50-200MB depending on data volume
+- Performance gain: 80-95% query time reduction
+
+## Future Optimizations
+
+If query is still slow after indexes:
+
+1. **Add ANALYZE**: Run `ANALYZE;` to update SQLite query planner statistics
+2. **Consider materialized view**: Pre-compute traffic stats in a background job
+3. **Add caching**: Cache results for 5-10 minutes using Redis/memcached
+4. **Partition data**: Archive old packet_seen records
+
+## Rollback
+
+If needed, indexes can be removed:
+
+```sql
+DROP INDEX IF EXISTS idx_packet_from_node_time;
+DROP INDEX IF EXISTS idx_packet_seen_packet_id;
+```
+
+## Files Modified
+
+- `meshview/models.py` - Added index definitions
+- `meshview/web.py` - Added performance profiling
+- `meshview/templates/top.html` - Added metrics display
+- `add_db_indexes.py` - Migration script (NEW)
+- `PERFORMANCE_OPTIMIZATION.md` - This documentation (NEW)
+
+## Support
+
+For questions or issues:
+1. Verify indexes exist: `python add_db_indexes.py` (safe to re-run)
+2. Review SQLite EXPLAIN QUERY PLAN for the query
--- a/add_db_indexes.py
+++ b/add_db_indexes.py
@@ -0,0 +1,154 @@
+#!/usr/bin/env python3
+"""
+Migration script to add performance indexes
+
+This script adds two critical indexes:
+1. idx_packet_from_node_time: Composite index on packet(from_node_id, import_time DESC)
+2. idx_packet_seen_packet_id: Index on packet_seen(packet_id)
+
+These indexes significantly improve the performance of the get_top_traffic_nodes() query.
+
+Usage:
+    python add_db_indexes.py
+
+The script will:
+- Connect to your database in WRITE mode
+- Check if indexes already exist
+- Create missing indexes
+- Report timing for each operation
+"""
+
+import asyncio
+import time
+
+from sqlalchemy import text
+from sqlalchemy.ext.asyncio import create_async_engine
+
+from meshview.config import CONFIG
+
+
+async def add_indexes():
+    # Get database connection string and remove read-only flag
+    db_string = CONFIG["database"]["connection_string"]
+    if "?mode=ro" in db_string:
+        db_string = db_string.replace("?mode=ro", "")
+
+    print(f"Connecting to database: {db_string}")
+
+    # Create engine with write access
+    engine = create_async_engine(db_string, echo=False, connect_args={"uri": True})
+
+    try:
+        async with engine.begin() as conn:
+            # Check and create idx_packet_from_node_time
+            print("\n" + "=" * 70)
+            print("Checking for index: idx_packet_from_node_time")
+            print("=" * 70)
+
+            result = await conn.execute(
+                text("""
+                SELECT name FROM sqlite_master 
+                WHERE type='index' AND name='idx_packet_from_node_time'
+            """)
+            )
+
+            if result.fetchone():
+                print("✓ Index idx_packet_from_node_time already exists")
+            else:
+                print("Creating index idx_packet_from_node_time...")
+                print("  Table: packet")
+                print("  Columns: from_node_id, import_time DESC")
+                print("  Purpose: Speeds up filtering packets by sender and time range")
+
+                start_time = time.perf_counter()
+                await conn.execute(
+                    text("""
+                    CREATE INDEX idx_packet_from_node_time 
+                    ON packet(from_node_id, import_time DESC)
+                """)
+                )
+                elapsed = time.perf_counter() - start_time
+
+                print(f"✓ Index created successfully in {elapsed:.2f} seconds")
+
+            # Check and create idx_packet_seen_packet_id
+            print("\n" + "=" * 70)
+            print("Checking for index: idx_packet_seen_packet_id")
+            print("=" * 70)
+
+            result = await conn.execute(
+                text("""
+                SELECT name FROM sqlite_master 
+                WHERE type='index' AND name='idx_packet_seen_packet_id'
+            """)
+            )
+
+            if result.fetchone():
+                print("✓ Index idx_packet_seen_packet_id already exists")
+            else:
+                print("Creating index idx_packet_seen_packet_id...")
+                print("  Table: packet_seen")
+                print("  Columns: packet_id")
+                print("  Purpose: Speeds up joining packet_seen with packet table")
+
+                start_time = time.perf_counter()
+                await conn.execute(
+                    text("""
+                    CREATE INDEX idx_packet_seen_packet_id 
+                    ON packet_seen(packet_id)
+                """)
+                )
+                elapsed = time.perf_counter() - start_time
+
+                print(f"✓ Index created successfully in {elapsed:.2f} seconds")
+
+            # Show index info
+            print("\n" + "=" * 70)
+            print("Current indexes on packet table:")
+            print("=" * 70)
+            result = await conn.execute(
+                text("""
+                SELECT name, sql FROM sqlite_master 
+                WHERE type='index' AND tbl_name='packet'
+                ORDER BY name
+            """)
+            )
+            for row in result:
+                if row[1]:  # Skip auto-indexes (they have NULL sql)
+                    print(f"  • {row[0]}")
+
+            print("\n" + "=" * 70)
+            print("Current indexes on packet_seen table:")
+            print("=" * 70)
+            result = await conn.execute(
+                text("""
+                SELECT name, sql FROM sqlite_master 
+                WHERE type='index' AND tbl_name='packet_seen'
+                ORDER BY name
+            """)
+            )
+            for row in result:
+                if row[1]:  # Skip auto-indexes
+                    print(f"  • {row[0]}")
+
+            print("\n" + "=" * 70)
+            print("Migration completed successfully!")
+            print("=" * 70)
+            print("\nNext steps:")
+            print("1. Restart your web server (mvrun.py)")
+            print("2. Visit /top endpoint and check the performance metrics")
+            print("3. Compare DB query time with previous measurements")
+            print("\nExpected improvement: 50-90% reduction in query time")
+
+    except Exception as e:
+        print(f"\n❌ Error during migration: {e}")
+        raise
+    finally:
+        await engine.dispose()
+
+
+if __name__ == "__main__":
+    print("=" * 70)
+    print("Database Index Migration for Endpoint Performance")
+    print("=" * 70)
+    asyncio.run(add_indexes())
--- a/meshview/models.py
+++ b/meshview/models.py
@@ -56,6 +56,8 @@ class Packet(Base):
        Index("idx_packet_from_node_id", "from_node_id"),
        Index("idx_packet_to_node_id", "to_node_id"),
        Index("idx_packet_import_time", desc("import_time")),
+        # Composite index for /top endpoint performance - filters by from_node_id AND import_time
+        Index("idx_packet_from_node_time", "from_node_id", desc("import_time")),
    )


@@ -77,7 +79,11 @@ class PacketSeen(Base):
    topic: Mapped[str] = mapped_column(nullable=True)
    import_time: Mapped[datetime] = mapped_column(nullable=True)

-    __table_args__ = (Index("idx_packet_seen_node_id", "node_id"),)
+    __table_args__ = (
+        Index("idx_packet_seen_node_id", "node_id"),
+        # Index for /top endpoint performance - JOIN on packet_id
+        Index("idx_packet_seen_packet_id", "packet_id"),
+    )


 class Traceroute(Base):
--- a/meshview/templates/top.html
+++ b/meshview/templates/top.html
@@ -250,4 +250,41 @@ updateTable();
 updateStatsAndChart();
 window.addEventListener('resize', () => chart.resize());
 </script>
+
+{% if timing_data %}
+<!-- Performance Metrics Summary -->
+<div style="background-color: #1a1d21; border: 1px solid #444; border-radius: 8px; padding: 15px; margin: 20px auto; max-width: 800px; color: #fff;">
+    <h3 style="margin-top: 0; color: #4CAF50;">⚡ Performance Metrics</h3>
+    <div style="display: grid; grid-template-columns: repeat(auto-fit, minmax(200px, 1fr)); gap: 15px;">
+        <div>
+            <strong>Database Query:</strong><br>
+            <span style="color: #FFD700; font-size: 1.2em;">{{ timing_data.db_query_ms }}ms</span>
+        </div>
+        <div>
+            <strong>Data Processing:</strong><br>
+            <span style="color: #FFD700; font-size: 1.2em;">{{ timing_data.processing_ms }}ms</span>
+        </div>
+        <div>
+            <strong>Total Time:</strong><br>
+            <span style="color: #FFD700; font-size: 1.2em;">{{ timing_data.total_ms }}ms</span>
+        </div>
+        <div>
+            <strong>Nodes Processed:</strong><br>
+            <span style="color: #4CAF50; font-size: 1.2em;">{{ timing_data.node_count }}</span>
+        </div>
+        <div>
+            <strong>Total Packets:</strong><br>
+            <span style="color: #4CAF50; font-size: 1.2em;">{{ "{:,}".format(timing_data.total_packets) }}</span>
+        </div>
+        <div>
+            <strong>Times Seen:</strong><br>
+            <span style="color: #4CAF50; font-size: 1.2em;">{{ "{:,}".format(timing_data.total_seen) }}</span>
+        </div>
+    </div>
+    <p style="margin-bottom: 0; margin-top: 10px; font-size: 0.9em; color: #888;">
+        📊 Use these metrics to measure performance before and after database index changes
+    </p>
+</div>
+{% endif %}
+
 {% endblock %}
--- a/meshview/web.py
+++ b/meshview/web.py
@@ -1232,22 +1232,63 @@ async def stats(request):

@routes.get("/top")
 async def top(request):
+    import time
+
    try:
+        # Check if performance metrics should be displayed
+        show_perf = request.query.get("perf", "").lower() in ("true", "1", "yes")
+
+        # Start overall timing
+        start_time = time.perf_counter()
+        timing_data = None
+
        node_id = request.query.get("node_id")  # Get node_id from the URL query parameters

        if node_id:
            # If node_id is provided, fetch traffic data for the specific node
+            db_start = time.perf_counter()
            node_traffic = await store.get_node_traffic(int(node_id))
-            template = env.get_template("node_traffic.html")  # Render a different template
+            db_time = time.perf_counter() - db_start
+
+            template = env.get_template("node_traffic.html")
            html_content = template.render(
                traffic=node_traffic, node_id=node_id, site_config=CONFIG
            )
        else:
            # Otherwise, fetch top traffic nodes as usual
+            db_start = time.perf_counter()
            top_nodes = await store.get_top_traffic_nodes()
+            db_time = time.perf_counter() - db_start
+
+            # Data processing timing
+            process_start = time.perf_counter()
+
+            # Count records processed
+            total_packets = sum(node.get('total_packets_sent', 0) for node in top_nodes)
+            total_seen = sum(node.get('total_times_seen', 0) for node in top_nodes)
+
+            process_time = time.perf_counter() - process_start
+
+            # Calculate total time
+            total_time = time.perf_counter() - start_time
+
+            # Only include timing_data if perf parameter is set
+            if show_perf:
+                timing_data = {
+                    'db_query_ms': f"{db_time * 1000:.2f}",
+                    'processing_ms': f"{process_time * 1000:.2f}",
+                    'total_ms': f"{total_time * 1000:.2f}",
+                    'node_count': len(top_nodes),
+                    'total_packets': total_packets,
+                    'total_seen': total_seen,
+                }
+
            template = env.get_template("top.html")
            html_content = template.render(
-                nodes=top_nodes, site_config=CONFIG, SOFTWARE_RELEASE=SOFTWARE_RELEASE
+                nodes=top_nodes,
+                timing_data=timing_data,
+                site_config=CONFIG,
+                SOFTWARE_RELEASE=SOFTWARE_RELEASE,
            )

        return web.Response(