Implements central health monitoring system for SaltyBot with: - Heartbeat subscription from /saltybot/<node_name>/heartbeat - Dead node detection (>5s timeout, configurable) - Automatic restart via ros2 launch with configurable retry limits - System health publishing to /saltybot/system_health (JSON) - Face alert integration for CRITICAL node failures - Full_stack.launch.py integration at t=1s launch sequence Package structure: - saltybot_system_health: Main ROS2 package - health_monitor_node.py: Central monitoring node - msg/SystemHealth.msg, msg/NodeStatus.msg: Health status messages - config/health_monitor.yaml: Node definitions and criticality levels - launch/health_monitor.launch.py: Standalone launch Configuration: - heartbeat_timeout: 5.0 seconds (node marked DEAD if missing) - monitor_freq: 2.0 Hz (check interval) - auto_restart: enabled with max 3 restarts per node - face_alert: triggers on CRITICAL node down Node definitions include: robot_state_publisher, STM32 bridge, cmd_vel bridge, sensors (RPLIDAR, RealSense), SLAM (RTAB-Map), Nav2, perception, follower, and rosbridge. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
14 lines
419 B
Plaintext
14 lines
419 B
Plaintext
# NodeStatus.msg — Status of a single ROS2 node
|
|
#
|
|
# node_name : Name of the monitored node (e.g., saltybot_bridge)
|
|
# status : ALIVE, DEGRADED, DEAD
|
|
# last_heartbeat : Timestamp of last received heartbeat
|
|
# downtime_sec : Seconds since last heartbeat
|
|
# restart_count : Number of auto-restarts performed
|
|
#
|
|
string node_name
|
|
string status
|
|
int64 last_heartbeat_ms
|
|
float32 downtime_sec
|
|
uint32 restart_count
|