sl-jetson 8e03a209be feat: ROS2 sensor health monitor (Issue #566)
Add sensor_health_node to saltybot_health_monitor package. Monitors 8
sensor topics for staleness, publishing DiagnosticArray on
/saltybot/diagnostics and MQTT JSON on saltybot/health.

Sensors monitored (configurable thresholds):
  /camera/color/image_raw, /camera/depth/image_rect_raw,
  /camera/color/camera_info, /scan, /imu/data,
  /saltybot/uwb/range, /saltybot/battery, /saltybot/motor_daemon/status

Each sensor: OK/WARN/ERROR based on topic age vs warn_s/error_s thresholds.
Critical sensors (camera, lidar, imu, motor_daemon) escalate overall status.

Files added:
  sensor_health_node.py — SensorWatcher + SensorHealthNode
  config/sensor_health_params.yaml — per-sensor thresholds
  launch/sensor_health.launch.py
  test/test_sensor_health.py — 35 tests, all passing

setup.py/package.xml updated: sensor_msgs, diagnostic_msgs deps + new entry point.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-14 11:47:01 -04:00
..

SaltyBot Health Monitor

Central system health monitor for SaltyBot. Tracks heartbeats from all critical nodes, detects failures, triggers auto-restart, and publishes system health status.

Features

  • Heartbeat Monitoring: Subscribes to heartbeat signals from all tracked nodes
  • Automatic Dead Node Detection: Marks nodes as DOWN if silent >5 seconds
  • Auto-Restart Capability: Attempts to restart dead nodes via ROS2 launch
  • System Health Publishing: Publishes /saltybot/system_health JSON with full status
  • Face Alerts: Triggers visual alerts on robot face display for critical failures
  • Configurable: YAML-based node list and timeout parameters

Topics

Subscribed

  • /saltybot/<node_name>/heartbeat (std_msgs/String): Heartbeat from each monitored node

Published

  • /saltybot/system_health (std_msgs/String): System health status as JSON
  • /saltybot/face/alert (std_msgs/String): Critical alerts for face display

Configuration

Edit config/health_config.yaml to configure:

  • monitored_nodes: List of all nodes to track
  • heartbeat_timeout_s: Seconds before node is marked DOWN (default: 5s)
  • check_frequency_hz: Health check rate (default: 1Hz)
  • enable_auto_restart: Enable automatic restart attempts (default: true)
  • critical_nodes: Nodes that trigger face alerts when down

Launch

# Default launch with built-in config
ros2 launch saltybot_health_monitor health_monitor.launch.py

# Custom config
ros2 launch saltybot_health_monitor health_monitor.launch.py \
  config_file:=/path/to/custom_config.yaml

# Disable auto-restart
ros2 launch saltybot_health_monitor health_monitor.launch.py \
  enable_auto_restart:=false

Health Status JSON

The /saltybot/system_health topic publishes:

{
  "timestamp": "2025-03-05T10:00:00.123456",
  "uptime_s": 3600.5,
  "nodes": {
    "rover_driver": {
      "status": "UP",
      "time_since_heartbeat_s": 0.5,
      "heartbeat_count": 1200,
      "restart_count": 0,
      "expected": true
    },
    "slam_node": {
      "status": "DOWN",
      "time_since_heartbeat_s": 6.0,
      "heartbeat_count": 500,
      "restart_count": 1,
      "expected": true
    }
  },
  "critical_down": ["slam_node"],
  "system_healthy": false
}

Node Integration

Each node should publish heartbeats periodically (e.g., every 1-2 seconds):

# In your ROS2 node
heartbeat_pub = self.create_publisher(String, "/saltybot/node_name/heartbeat", 10)
heartbeat_pub.publish(String(data="node_name:alive"))

Restart Behavior

When a node is detected as DOWN:

  1. Health monitor logs a warning
  2. If enable_auto_restart: true, queues a restart command
  3. Node status changes to "RESTARTING"
  4. Restart count is incremented
  5. Face alert is published for critical nodes

The actual restart mechanism can be:

  • Direct ROS2 launch subprocess
  • Systemd service restart
  • Custom restart script
  • Manual restart via external monitor

Debugging

Check health status:

ros2 topic echo /saltybot/system_health

Simulate a node heartbeat:

ros2 topic pub /saltybot/test_node/heartbeat std_msgs/String '{data: "test_node:alive"}'

View monitor logs:

ros2 launch saltybot_health_monitor health_monitor.launch.py | grep health