# saltybot_gesture_recognition Hand and body gesture recognition via MediaPipe on Jetson Orin GPU (Issue #454). Detects human hand and body gestures in real-time camera feed and publishes recognized gestures for multimodal interaction. Integrates with voice command router for combined audio+gesture control. ## Recognized Gestures ### Hand Gestures - **wave** — Lateral wrist oscillation (temporal) | Greeting, acknowledgment - **point** — Index extended, others curled | Direction indication ("left"/"right"/"up"/"forward") - **stop_palm** — All fingers extended, palm forward | Emergency stop (e-stop) - **thumbs_up** — Thumb extended up, fist closed | Confirmation, approval - **come_here** — Beckoning: index curled toward palm (temporal) | Call to approach - **follow** — Index extended horizontally | Follow me ### Body Gestures - **arms_up** — Both wrists above shoulders | Stop / emergency - **arms_spread** — Arms extended laterally | Back off / clear space - **crouch** — Hips below standing threshold | Come closer ## Performance - **Frame Rate**: 10–15 fps on Jetson Orin (with GPU acceleration) - **Latency**: ~100–150 ms end-to-end - **Range**: 2–5 meters (optimal 2–3 m) - **Accuracy**: ~85–90% for known gestures (varies by lighting, occlusion) - **Simultaneous Detections**: Up to 10 people + gestures per frame ## Topics ### Published - **`/saltybot/gestures`** (`saltybot_social_msgs/GestureArray`) Array of detected gestures with type, confidence, position, source (hand/body) ## Parameters | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | `camera_topic` | str | `/camera/color/image_raw` | RGB camera topic | | `confidence_threshold` | float | 0.7 | Min confidence to publish (0–1) | | `publish_hz` | float | 15.0 | Output rate (Hz) | | `max_distance_m` | float | 5.0 | Max gesture range (meters) | | `enable_gpu` | bool | true | Use Jetson GPU acceleration | ## Messages ### GestureArray ``` Header header Gesture[] gestures uint32 count ``` ### Gesture (from saltybot_social_msgs) ``` Header header string gesture_type # "wave", "point", "stop_palm", etc. int32 person_id # -1 if unidentified float32 confidence # 0–1 (typically >= 0.7) int32 camera_id # 0=front float32 hand_x, hand_y # Normalized position (0–1) bool is_right_hand # True for right hand string direction # For "point": "left"/"right"/"up"/"forward"/"down" string source # "hand" or "body_pose" ``` ## Usage ### Launch Node ```bash ros2 launch saltybot_gesture_recognition gesture_recognition.launch.py ``` ### With Custom Parameters ```bash ros2 launch saltybot_gesture_recognition gesture_recognition.launch.py \ camera_topic:='/camera/front/image_raw' \ confidence_threshold:=0.75 \ publish_hz:=20.0 ``` ### Using Config File ```bash ros2 launch saltybot_gesture_recognition gesture_recognition.launch.py \ --ros-args --params-file config/gesture_params.yaml ``` ## Algorithm ### MediaPipe Hands - 21 landmarks per hand (wrist + finger joints) - Detects: palm orientation, finger extension, hand pose - Model complexity: 0 (lite, faster) for Jetson ### MediaPipe Pose - 33 body landmarks (shoulders, hips, wrists, knees, etc.) - Detects: arm angle, body orientation, posture - Model complexity: 1 (balanced accuracy/speed) ### Gesture Classification 1. **Thumbs-up**: Thumb extended >0.3, no other fingers extended 2. **Stop-palm**: All fingers extended, palm normal > 0.3 (facing camera) 3. **Point**: Only index extended, direction from hand position 4. **Wave**: High variance in hand x-position over ~5 frames 5. **Beckon**: High variance in hand y-position over ~4 frames 6. **Arms-up**: Both wrists > shoulder height 7. **Arms-spread**: Wrist distance > shoulder width × 1.2 8. **Crouch**: Hip-y > shoulder-y + 0.3 ### Confidence Scoring - MediaPipe detection confidence × gesture classification confidence - Temporal smoothing: history over last 10 frames - Threshold: 0.7 (configurable) for publication ## Integration with Voice Command Router ```python # Listen to both topics rospy.Subscriber('/saltybot/speech', SpeechTranscript, voice_callback) rospy.Subscriber('/saltybot/gestures', GestureArray, gesture_callback) def multimodal_command(voice_cmd, gesture): # "robot forward" (voice) + point-forward (gesture) = confirmed forward if gesture.gesture_type == 'point' and gesture.direction == 'forward': if 'forward' in voice_cmd: nav.set_goal(forward_pos) # High confidence ``` ## Dependencies - `mediapipe` — Hand and Pose detection - `opencv-python` — Image processing - `numpy`, `scipy` — Numerical computation - `rclpy` — ROS2 Python client - `saltybot_social_msgs` — Custom gesture messages ## Build & Test ### Build ```bash colcon build --packages-select saltybot_gesture_recognition ``` ### Run Tests ```bash pytest jetson/ros2_ws/src/saltybot_gesture_recognition/test/ ``` ### Benchmark on Jetson Orin ```bash ros2 run saltybot_gesture_recognition gesture_node \ --ros-args -p publish_hz:=30.0 & ros2 topic hz /saltybot/gestures # Expected: ~15 Hz (GPU-limited, not message processing) ``` ## Troubleshooting **Issue**: Low frame rate (< 10 Hz) - **Solution**: Reduce camera resolution or use model_complexity=0 **Issue**: False positives (confidence > 0.7 but wrong gesture) - **Solution**: Increase `confidence_threshold` to 0.75–0.8 **Issue**: Doesn't detect gestures at distance > 3m - **Solution**: Improve lighting, move closer, or reduce `max_distance_m` ## Future Enhancements - **Dynamic Gesture Timeout**: Stop publishing after 2s without update - **Person Association**: Match gestures to tracked persons (from `saltybot_multi_person_tracker`) - **Custom Gesture Training**: TensorFlow Lite fine-tuning on robot-specific gestures - **Gesture Sequences**: Recognize multi-step command chains ("wave → point → thumbs-up") - **Sign Language**: ASL/BSL recognition (larger model, future Phase) - **Accessibility**: Voice + gesture for accessibility (e.g., hands-free "stop") ## Performance Targets (Jetson Orin Nano Super) | Metric | Target | Actual | |--------|--------|--------| | Frame Rate | 10+ fps | ~15 fps (GPU) | | Latency | <200 ms | ~100–150 ms | | Max People | 5–10 | ~10 (GPU-limited) | | Confidence | 0.7+ | 0.75–0.95 | | GPU Memory | <1 GB | ~400–500 MB | ## References - [MediaPipe Solutions](https://developers.google.com/mediapipe/solutions) - [MediaPipe Hands](https://developers.google.com/mediapipe/solutions/vision/hand_landmarker) - [MediaPipe Pose](https://developers.google.com/mediapipe/solutions/vision/pose_landmarker) ## License MIT