5.2 KiB
Message Retry Logic
This document describes the retry logic implementation for message sending in the MeshCore MQTT bridge.
Overview
The bridge now includes automatic retry logic for both direct messages (send_msg) and channel messages (send_chan_msg) to improve reliability, especially for multi-hop mesh network scenarios where messages may fail to reach their destination.
Features
Acknowledgement Tracking
- Monitors
MSG_SENTevents from MeshCore that includeexpected_ackandsuggested_timeout - Waits for acknowledgement (ACK) events with the specified timeout
- Tracks pending acknowledgements to determine message delivery success
Retry Mechanism
- Automatically retries failed message sends up to a configurable number of times
- Uses exponential backoff between retries to avoid network congestion
- Provides detailed logging of retry attempts and outcomes
Path Reset
- After exhausting regular retries, can reset the routing path and try once more
- Useful when the mesh network topology has changed or cached routes are stale
- Only applies to direct messages (not channel messages)
Configuration
Configuration Parameters
| Parameter | Type | Default | Range | Description |
|---|---|---|---|---|
message_retry_count |
int | 3 | 0-10 | Number of retry attempts after initial send |
message_retry_delay |
float | 2.0 | 0.5-30.0 | Base delay in seconds between retries |
reset_path_on_failure |
bool | true | - | Reset routing path after max retries |
Configuration Methods
1. Configuration File (YAML)
meshcore:
connection_type: tcp
address: 192.168.1.100
port: 12345
message_retry_count: 5
message_retry_delay: 3.0
reset_path_on_failure: true
2. Environment Variables
export MESHCORE_MESSAGE_RETRY_COUNT=5
export MESHCORE_MESSAGE_RETRY_DELAY=3.0
export MESHCORE_RESET_PATH_ON_FAILURE=true
3. Command Line Arguments
python -m meshcore_mqtt.main \
--meshcore-message-retry-count 5 \
--meshcore-message-retry-delay 3.0 \
--meshcore-reset-path-on-failure
How It Works
Retry Flow
- Initial Send: Message is sent using MeshCore's
send_msg()orsend_chan_msg() - Check Response: If response includes
expected_ackandsuggested_timeout:- Wait for acknowledgement with the suggested timeout
- If ACK received → Success
- If timeout → Continue to retry logic
- Retry Logic:
- Retry up to
message_retry_counttimes - Wait
message_retry_delay * (2^attempt)seconds between retries (exponential backoff) - Log each retry attempt
- Retry up to
- Path Reset (direct messages only):
- After exhausting regular retries, if
reset_path_on_failureis true - Reset the routing path (sends a trace packet to refresh routing)
- Try sending once more with the new path
- After exhausting regular retries, if
- Final Result: Return success or failure after all attempts
Exponential Backoff
The delay between retries increases exponentially to avoid overwhelming the network:
- 1st retry:
message_retry_delayseconds (default 2s) - 2nd retry:
message_retry_delay * 2seconds (default 4s) - 3rd retry:
message_retry_delay * 4seconds (default 8s) - And so on...
Example Timeline
With default settings (3 retries, 2s base delay, path reset enabled):
- 0s: Initial send attempt
- 2s: First retry (if initial failed)
- 6s: Second retry (2s + 4s delay)
- 14s: Third retry (2s + 4s + 8s delay)
- 15s: Path reset and final attempt (if enabled)
Logging
The retry logic provides detailed logging at different levels:
- INFO: Retry attempts, successful acknowledgements
- WARNING: Missing acknowledgements, retry notifications
- ERROR: Final failure after all retries exhausted
- DEBUG: ACK tracking details, timeout information
Example log output:
INFO - Sending message to Alice (attempt 1/4)
WARNING - No acknowledgement received for message to Alice
INFO - Retrying in 2.0 seconds...
INFO - Sending message to Alice (attempt 2/4)
INFO - Message to Alice acknowledged successfully
Use Cases
Multi-Hop Networks
In mesh networks where messages must traverse multiple nodes, the retry logic significantly improves delivery reliability by:
- Handling temporary routing failures
- Adapting to topology changes
- Working around intermittent node availability
Network Congestion
Exponential backoff helps prevent network congestion by:
- Spacing out retry attempts
- Giving the network time to recover
- Avoiding message storms
Dynamic Topologies
Path reset functionality helps with:
- Stale routing tables
- Node mobility
- Network reconfiguration
Limitations
- Retry logic only applies to
send_msgandsend_chan_msgcommands - ACK tracking depends on MeshCore library providing acknowledgement events
- Path reset may not be effective for all network issues
- Maximum total time for all retries depends on configuration but could exceed 30 seconds with aggressive settings
Testing
The implementation includes comprehensive unit tests covering:
- Successful message delivery on first attempt
- Retry after initial failure
- Exponential backoff timing
- Path reset functionality
- ACK timeout handling
- Configuration validation
Run tests with:
pytest tests/test_retry_logic.py -v