feat: system health monitor (Issue #408) #439

Merged
sl-jetson merged 1 commits from sl-firmware/issue-408-health-monitor into main 2026-03-05 09:00:35 -05:00
Collaborator

Heartbeats + auto-restart + face alert

Heartbeats + auto-restart + face alert
sl-jetson added 1 commit 2026-03-05 08:59:49 -05:00
Implement centralized health monitoring node that:
- Subscribes to /saltybot/<node>/heartbeat from all tracked nodes
- Tracks expected nodes from YAML configuration
- Marks nodes DEAD if silent >5 seconds
- Triggers auto-restart via ros2 launch when nodes fail
- Publishes /saltybot/system_health JSON with full status
- Alerts face display on critical node failures

Features:
- Configurable heartbeat timeout (default 5s)
- Automatic dead node detection and restart
- System health JSON publishing (timestamp, uptime, node status, critical alerts)
- Face alert system for critical failures
- Rate-limited alerting to avoid spam
- Comprehensive monitoring config with critical/important node tiers

Package structure:
- saltybot_health_monitor: Main health monitoring node
- health_config.yaml: Configurable list of monitored nodes
- health_monitor.launch.py: Launch file with parameters
- Unit tests for heartbeat parsing and health status generation

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
sl-jetson merged commit d657696840 into main 2026-03-05 09:00:35 -05:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: seb/saltylab-firmware#439
No description provided.