sl-firmware 9683fd3685 feat: Add ROS2 system health monitor (Issue #408)
Implement centralized health monitoring node that:
- Subscribes to /saltybot/<node>/heartbeat from all tracked nodes
- Tracks expected nodes from YAML configuration
- Marks nodes DEAD if silent >5 seconds
- Triggers auto-restart via ros2 launch when nodes fail
- Publishes /saltybot/system_health JSON with full status
- Alerts face display on critical node failures

Features:
- Configurable heartbeat timeout (default 5s)
- Automatic dead node detection and restart
- System health JSON publishing (timestamp, uptime, node status, critical alerts)
- Face alert system for critical failures
- Rate-limited alerting to avoid spam
- Comprehensive monitoring config with critical/important node tiers

Package structure:
- saltybot_health_monitor: Main health monitoring node
- health_config.yaml: Configurable list of monitored nodes
- health_monitor.launch.py: Launch file with parameters
- Unit tests for heartbeat parsing and health status generation

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-03-05 08:52:52 -05:00

31 lines
982 B
Python

from setuptools import setup
package_name = "saltybot_health_monitor"
setup(
name=package_name,
version="0.1.0",
packages=[package_name],
data_files=[
("share/ament_index/resource_index/packages", [f"resource/{package_name}"]),
(f"share/{package_name}", ["package.xml"]),
(f"share/{package_name}/launch", ["launch/health_monitor.launch.py"]),
(f"share/{package_name}/config", ["config/health_config.yaml"]),
],
install_requires=["setuptools", "pyyaml"],
zip_safe=True,
maintainer="sl-controls",
maintainer_email="sl-controls@saltylab.local",
description=(
"System health monitor: tracks node heartbeats, detects down nodes, "
"triggers auto-restart, publishes system health status"
),
license="MIT",
tests_require=["pytest"],
entry_points={
"console_scripts": [
"health_monitor_node = saltybot_health_monitor.health_monitor_node:main",
],
},
)