Implement centralized health monitoring node that: - Subscribes to /saltybot/<node>/heartbeat from all tracked nodes - Tracks expected nodes from YAML configuration - Marks nodes DEAD if silent >5 seconds - Triggers auto-restart via ros2 launch when nodes fail - Publishes /saltybot/system_health JSON with full status - Alerts face display on critical node failures Features: - Configurable heartbeat timeout (default 5s) - Automatic dead node detection and restart - System health JSON publishing (timestamp, uptime, node status, critical alerts) - Face alert system for critical failures - Rate-limited alerting to avoid spam - Comprehensive monitoring config with critical/important node tiers Package structure: - saltybot_health_monitor: Main health monitoring node - health_config.yaml: Configurable list of monitored nodes - health_monitor.launch.py: Launch file with parameters - Unit tests for heartbeat parsing and health status generation Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
31 lines
982 B
Python
31 lines
982 B
Python
from setuptools import setup
|
|
|
|
package_name = "saltybot_health_monitor"
|
|
|
|
setup(
|
|
name=package_name,
|
|
version="0.1.0",
|
|
packages=[package_name],
|
|
data_files=[
|
|
("share/ament_index/resource_index/packages", [f"resource/{package_name}"]),
|
|
(f"share/{package_name}", ["package.xml"]),
|
|
(f"share/{package_name}/launch", ["launch/health_monitor.launch.py"]),
|
|
(f"share/{package_name}/config", ["config/health_config.yaml"]),
|
|
],
|
|
install_requires=["setuptools", "pyyaml"],
|
|
zip_safe=True,
|
|
maintainer="sl-controls",
|
|
maintainer_email="sl-controls@saltylab.local",
|
|
description=(
|
|
"System health monitor: tracks node heartbeats, detects down nodes, "
|
|
"triggers auto-restart, publishes system health status"
|
|
),
|
|
license="MIT",
|
|
tests_require=["pytest"],
|
|
entry_points={
|
|
"console_scripts": [
|
|
"health_monitor_node = saltybot_health_monitor.health_monitor_node:main",
|
|
],
|
|
},
|
|
)
|