Add sensor_health_node to saltybot_health_monitor package. Monitors 8 sensor topics for staleness, publishing DiagnosticArray on /saltybot/diagnostics and MQTT JSON on saltybot/health. Sensors monitored (configurable thresholds): /camera/color/image_raw, /camera/depth/image_rect_raw, /camera/color/camera_info, /scan, /imu/data, /saltybot/uwb/range, /saltybot/battery, /saltybot/motor_daemon/status Each sensor: OK/WARN/ERROR based on topic age vs warn_s/error_s thresholds. Critical sensors (camera, lidar, imu, motor_daemon) escalate overall status. Files added: sensor_health_node.py — SensorWatcher + SensorHealthNode config/sensor_health_params.yaml — per-sensor thresholds launch/sensor_health.launch.py test/test_sensor_health.py — 35 tests, all passing setup.py/package.xml updated: sensor_msgs, diagnostic_msgs deps + new entry point. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
SaltyBot Health Monitor
Central system health monitor for SaltyBot. Tracks heartbeats from all critical nodes, detects failures, triggers auto-restart, and publishes system health status.
Features
- Heartbeat Monitoring: Subscribes to heartbeat signals from all tracked nodes
- Automatic Dead Node Detection: Marks nodes as DOWN if silent >5 seconds
- Auto-Restart Capability: Attempts to restart dead nodes via ROS2 launch
- System Health Publishing: Publishes
/saltybot/system_healthJSON with full status - Face Alerts: Triggers visual alerts on robot face display for critical failures
- Configurable: YAML-based node list and timeout parameters
Topics
Subscribed
/saltybot/<node_name>/heartbeat(std_msgs/String): Heartbeat from each monitored node
Published
/saltybot/system_health(std_msgs/String): System health status as JSON/saltybot/face/alert(std_msgs/String): Critical alerts for face display
Configuration
Edit config/health_config.yaml to configure:
- monitored_nodes: List of all nodes to track
- heartbeat_timeout_s: Seconds before node is marked DOWN (default: 5s)
- check_frequency_hz: Health check rate (default: 1Hz)
- enable_auto_restart: Enable automatic restart attempts (default: true)
- critical_nodes: Nodes that trigger face alerts when down
Launch
# Default launch with built-in config
ros2 launch saltybot_health_monitor health_monitor.launch.py
# Custom config
ros2 launch saltybot_health_monitor health_monitor.launch.py \
config_file:=/path/to/custom_config.yaml
# Disable auto-restart
ros2 launch saltybot_health_monitor health_monitor.launch.py \
enable_auto_restart:=false
Health Status JSON
The /saltybot/system_health topic publishes:
{
"timestamp": "2025-03-05T10:00:00.123456",
"uptime_s": 3600.5,
"nodes": {
"rover_driver": {
"status": "UP",
"time_since_heartbeat_s": 0.5,
"heartbeat_count": 1200,
"restart_count": 0,
"expected": true
},
"slam_node": {
"status": "DOWN",
"time_since_heartbeat_s": 6.0,
"heartbeat_count": 500,
"restart_count": 1,
"expected": true
}
},
"critical_down": ["slam_node"],
"system_healthy": false
}
Node Integration
Each node should publish heartbeats periodically (e.g., every 1-2 seconds):
# In your ROS2 node
heartbeat_pub = self.create_publisher(String, "/saltybot/node_name/heartbeat", 10)
heartbeat_pub.publish(String(data="node_name:alive"))
Restart Behavior
When a node is detected as DOWN:
- Health monitor logs a warning
- If
enable_auto_restart: true, queues a restart command - Node status changes to "RESTARTING"
- Restart count is incremented
- Face alert is published for critical nodes
The actual restart mechanism can be:
- Direct ROS2 launch subprocess
- Systemd service restart
- Custom restart script
- Manual restart via external monitor
Debugging
Check health status:
ros2 topic echo /saltybot/system_health
Simulate a node heartbeat:
ros2 topic pub /saltybot/test_node/heartbeat std_msgs/String '{data: "test_node:alive"}'
View monitor logs:
ros2 launch saltybot_health_monitor health_monitor.launch.py | grep health