sl-firmware c66a5ce974 feat: Add Issue #408 - ROS2 system health monitor with node heartbeats + auto-restart
Implements central health monitoring system for SaltyBot with:
- Heartbeat subscription from /saltybot/<node_name>/heartbeat
- Dead node detection (>5s timeout, configurable)
- Automatic restart via ros2 launch with configurable retry limits
- System health publishing to /saltybot/system_health (JSON)
- Face alert integration for CRITICAL node failures
- Full_stack.launch.py integration at t=1s launch sequence

Package structure:
- saltybot_system_health: Main ROS2 package
  - health_monitor_node.py: Central monitoring node
  - msg/SystemHealth.msg, msg/NodeStatus.msg: Health status messages
  - config/health_monitor.yaml: Node definitions and criticality levels
  - launch/health_monitor.launch.py: Standalone launch

Configuration:
- heartbeat_timeout: 5.0 seconds (node marked DEAD if missing)
- monitor_freq: 2.0 Hz (check interval)
- auto_restart: enabled with max 3 restarts per node
- face_alert: triggers on CRITICAL node down

Node definitions include: robot_state_publisher, STM32 bridge,
cmd_vel bridge, sensors (RPLIDAR, RealSense), SLAM (RTAB-Map),
Nav2, perception, follower, and rosbridge.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-03-04 22:44:30 -05:00

14 lines
419 B
Plaintext

# NodeStatus.msg — Status of a single ROS2 node
#
# node_name : Name of the monitored node (e.g., saltybot_bridge)
# status : ALIVE, DEGRADED, DEAD
# last_heartbeat : Timestamp of last received heartbeat
# downtime_sec : Seconds since last heartbeat
# restart_count : Number of auto-restarts performed
#
string node_name
string status
int64 last_heartbeat_ms
float32 downtime_sec
uint32 restart_count