Files
meshview/docs/TIMESTAMP_MIGRATION.md
2025-11-03 14:47:29 -08:00

194 lines
5.7 KiB
Markdown

# High-Resolution Timestamp Migration
This document describes the implementation of GitHub issue #55: storing high-resolution timestamps as integers in the database for improved performance and query efficiency.
## Overview
The meshview database now stores timestamps in two formats:
1. **TEXT format** (`import_time`): Human-readable ISO8601 format with microseconds (e.g., `2025-03-12 04:15:56.058038`)
2. **INTEGER format** (`import_time_us`): Microseconds since Unix epoch (1970-01-01 00:00:00 UTC)
The dual format approach provides:
- **Backward compatibility**: Existing `import_time` TEXT columns remain unchanged
- **Performance**: Fast integer comparisons and math operations
- **Precision**: Microsecond resolution for accurate timing
- **Efficiency**: Compact storage and fast indexed lookups
## Database Changes
### New Columns Added
Three tables have new `import_time_us` columns:
1. **packet.import_time_us** (INTEGER)
- Stores when the packet was imported into the database
- Indexed for fast queries
2. **packet_seen.import_time_us** (INTEGER)
- Stores when the packet_seen record was imported
- Indexed for performance
3. **traceroute.import_time_us** (INTEGER)
- Stores when the traceroute was imported
- Indexed for fast lookups
### New Indexes
The following indexes were created for optimal query performance:
```sql
CREATE INDEX idx_packet_import_time_us ON packet(import_time_us DESC);
CREATE INDEX idx_packet_from_node_time_us ON packet(from_node_id, import_time_us DESC);
CREATE INDEX idx_packet_seen_import_time_us ON packet_seen(import_time_us);
CREATE INDEX idx_traceroute_import_time_us ON traceroute(import_time_us);
```
## Migration Process
### For Existing Databases
Run the migration script to add the new columns and populate them from existing data:
```bash
python migrate_add_timestamp_us.py [database_path]
```
If no path is provided, it defaults to `packets.db` in the current directory.
The migration script:
1. Checks if migration is needed (idempotent)
2. Adds `import_time_us` columns to the three tables
3. Populates the new columns from existing `import_time` values
4. Creates indexes for optimal performance
5. Verifies the migration completed successfully
### For New Databases
New databases created with the updated schema will automatically include the `import_time_us` columns. The MQTT store module populates both columns when inserting new records.
## Code Changes
### Models (meshview/models.py)
The ORM models now include the new `import_time_us` fields:
```python
class Packet(Base):
import_time: Mapped[datetime] = mapped_column(nullable=True)
import_time_us: Mapped[int] = mapped_column(BigInteger, nullable=True)
```
### MQTT Store (meshview/mqtt_store.py)
The data ingestion logic now populates both timestamp columns using UTC time:
```python
now = datetime.datetime.now(datetime.timezone.utc)
now_us = int(now.timestamp() * 1_000_000)
# Both columns are populated
import_time=now,
import_time_us=now_us,
```
**Important**: All new timestamps use UTC (Coordinated Universal Time) for consistency across time zones.
## Using the New Timestamps
### Example Queries
**Query packets from the last 7 days:**
```sql
-- Old way (slower)
SELECT * FROM packet
WHERE import_time >= datetime('now', '-7 days');
-- New way (faster)
SELECT * FROM packet
WHERE import_time_us >= (strftime('%s', 'now', '-7 days') * 1000000);
```
**Query packets in a specific time range:**
```sql
SELECT * FROM packet
WHERE import_time_us BETWEEN 1759254380000000 AND 1759254390000000;
```
**Calculate time differences (in microseconds):**
```sql
SELECT
id,
(import_time_us - LAG(import_time_us) OVER (ORDER BY import_time_us)) / 1000000.0 as seconds_since_last
FROM packet
LIMIT 10;
```
### Converting Timestamps
**From datetime to microseconds (UTC):**
```python
import datetime
now = datetime.datetime.now(datetime.timezone.utc)
now_us = int(now.timestamp() * 1_000_000)
```
**From microseconds to datetime:**
```python
import datetime
timestamp_us = 1759254380813451
dt = datetime.datetime.fromtimestamp(timestamp_us / 1_000_000)
```
**In SQL queries:**
```sql
-- Datetime to microseconds
SELECT CAST((strftime('%s', import_time) || substr(import_time, 21, 6)) AS INTEGER);
-- Microseconds to datetime (approximate)
SELECT datetime(import_time_us / 1000000, 'unixepoch');
```
## Performance Benefits
The integer timestamp columns provide significant performance improvements:
1. **Faster comparisons**: Integer comparisons are much faster than string/datetime comparisons
2. **Smaller index size**: Integer indexes are more compact than datetime indexes
3. **Range queries**: BETWEEN operations on integers are highly optimized
4. **Math operations**: Easy to calculate time differences, averages, etc.
5. **Sorting**: Integer sorting is faster than datetime sorting
## Backward Compatibility
The original `import_time` TEXT columns remain unchanged:
- Existing code continues to work
- Human-readable timestamps still available
- Gradual migration to new columns possible
- No breaking changes for existing queries
## Future Work
Future improvements could include:
- Migrating queries to use `import_time_us` columns
- Deprecating the TEXT `import_time` columns (after transition period)
- Adding helper functions for timestamp conversion
- Creating views that expose both formats
## Testing
The migration was tested on a production database with:
- 132,466 packet records
- 1,385,659 packet_seen records
- 28,414 traceroute records
All records were successfully migrated with microsecond precision preserved.
## References
- GitHub Issue: #55 - Storing High-Resolution Timestamps in SQLite
- SQLite datetime functions: https://www.sqlite.org/lang_datefunc.html
- Python datetime module: https://docs.python.org/3/library/datetime.html