feat(scene): semantic scene understanding — YOLOv8n TRT + room classification + hazards (Issue #141) #153

Merged
sl-jetson merged 1 commits from sl-perception/issue-141-scene-understanding into main 2026-03-02 10:07:29 -05:00
Collaborator

Issue #141 — Semantic Scene Understanding

Summary

  • saltybot_scene_msgs — 4 msgs: SceneObject, SceneObjectArray, RoomClassification, BehaviorHint
  • saltybot_scene — 3 nodes: scene detector (YOLOv8n TRT FP16), behavior adapter, Nav2 costmap publisher
  • 15+ FPS target on Jetson Orin Nano Super (Ampere GPU, 67 TOPS)

Architecture

/camera/color/image_raw ──┐
/camera/depth/image_rect_raw ─┤  scene_detector_node
/camera/color/camera_info ──┘  │ YOLOv8n TRT FP16 (640×640)
                               │ HazardClassifier (depth patterns)
                               │ RoomClassifier (rule_based / MobileNetV2 TRT)
                               ├──► /social/scene/objects       (SceneObjectArray ~15 FPS)
                               ├──► /social/scene/room_type     (RoomClassification ~2 Hz)
                               └──► /social/scene/hazards       (SceneObjectArray on-hazard)

/social/scene/room_type ─┐
/social/scene/hazards ───┤  behavior_adapter_node
                          └──► /social/scene/behavior_hint  (BehaviorHint, on-change+1Hz)

/social/scene/objects ───┐  costmap_publisher_node
                          ├──► /social/scene/obstacle_cloud  (PointCloud2 → Nav2)
                          └──► /social/scene/object_markers  (MarkerArray → RViz)

Detection Classes

Scene-filtered COCO subset: person, cat, dog, chair, couch, potted plant, dining table, toilet, tv, microwave, oven, sink, refrigerator + custom: door (80), stairs (81), wet_floor (82)

Hazard Detection

Hazard Method
Stairs Horizontal depth band alternation (≥3 bands, 0.3–3m range)
Floor drop Row-mean depth cliff in lower 20% of frame (>0.25m jump)
Wet floor High depth std-dev in floor strip (>0.08m)
Glass door Near-zero depth + strong Sobel vertical edges in RGB
Pet YOLO class 15 (cat) or 16 (dog) → gentle approach

Room Classification

Rule-based (always): object co-occurrence weights → softmax → room type (8 classes)
MobileNetV2 TRT (optional): 224×224 RGB → 8 logits; built via build_scene_trt -- --room

Behavior Adaptation

Room/Hazard Speed Personality
living_room 0.6 m/s gentle
kitchen 0.5 m/s careful
outdoor 1.5 m/s active
stairs/drop 0.1 m/s cautious + alert
wet floor 0.3 m/s careful + alert
pet nearby 0.3 m/s gentle

Nav2 Integration

Obstacle cloud: ring of PointCloud2 points around each object (radius 0.15m normal, 0.4m for hazards) → Nav2 obstacle_layer via observation_sources. Config snippet in scene_params.yaml.

Test Plan

  • colcon build --packages-select saltybot_scene_msgs saltybot_scene
  • ros2 run saltybot_scene build_scene_trt — builds TRT engine
  • ros2 launch saltybot_scene scene_understanding.launch.py
  • ros2 topic hz /social/scene/objects → ≥15 Hz
  • ros2 topic hz /social/scene/room_type → ~2 Hz
  • ros2 topic echo /social/scene/room_type → correct room name
  • ros2 topic echo /social/scene/behavior_hint → speed/personality updates
  • Walk robot toward stairs → /social/scene/hazards publishes HAZARD_STAIRS
  • ros2 topic hz /social/scene/obstacle_cloud → objects visible in costmap

🤖 Generated with Claude Code

## Issue #141 — Semantic Scene Understanding ### Summary - **`saltybot_scene_msgs`** — 4 msgs: `SceneObject`, `SceneObjectArray`, `RoomClassification`, `BehaviorHint` - **`saltybot_scene`** — 3 nodes: scene detector (YOLOv8n TRT FP16), behavior adapter, Nav2 costmap publisher - 15+ FPS target on Jetson Orin Nano Super (Ampere GPU, 67 TOPS) ### Architecture ``` /camera/color/image_raw ──┐ /camera/depth/image_rect_raw ─┤ scene_detector_node /camera/color/camera_info ──┘ │ YOLOv8n TRT FP16 (640×640) │ HazardClassifier (depth patterns) │ RoomClassifier (rule_based / MobileNetV2 TRT) ├──► /social/scene/objects (SceneObjectArray ~15 FPS) ├──► /social/scene/room_type (RoomClassification ~2 Hz) └──► /social/scene/hazards (SceneObjectArray on-hazard) /social/scene/room_type ─┐ /social/scene/hazards ───┤ behavior_adapter_node └──► /social/scene/behavior_hint (BehaviorHint, on-change+1Hz) /social/scene/objects ───┐ costmap_publisher_node ├──► /social/scene/obstacle_cloud (PointCloud2 → Nav2) └──► /social/scene/object_markers (MarkerArray → RViz) ``` ### Detection Classes Scene-filtered COCO subset: person, cat, dog, chair, couch, potted plant, dining table, toilet, tv, microwave, oven, sink, refrigerator + custom: door (80), stairs (81), wet_floor (82) ### Hazard Detection | Hazard | Method | |---|---| | Stairs | Horizontal depth band alternation (≥3 bands, 0.3–3m range) | | Floor drop | Row-mean depth cliff in lower 20% of frame (>0.25m jump) | | Wet floor | High depth std-dev in floor strip (>0.08m) | | Glass door | Near-zero depth + strong Sobel vertical edges in RGB | | Pet | YOLO class 15 (cat) or 16 (dog) → gentle approach | ### Room Classification Rule-based (always): object co-occurrence weights → softmax → room type (8 classes) MobileNetV2 TRT (optional): 224×224 RGB → 8 logits; built via `build_scene_trt -- --room` ### Behavior Adaptation | Room/Hazard | Speed | Personality | |---|---|---| | living_room | 0.6 m/s | gentle | | kitchen | 0.5 m/s | careful | | outdoor | 1.5 m/s | active | | stairs/drop | 0.1 m/s | cautious + alert | | wet floor | 0.3 m/s | careful + alert | | pet nearby | 0.3 m/s | gentle | ### Nav2 Integration Obstacle cloud: ring of PointCloud2 points around each object (radius 0.15m normal, 0.4m for hazards) → Nav2 `obstacle_layer` via `observation_sources`. Config snippet in `scene_params.yaml`. ### Test Plan - [ ] `colcon build --packages-select saltybot_scene_msgs saltybot_scene` - [ ] `ros2 run saltybot_scene build_scene_trt` — builds TRT engine - [ ] `ros2 launch saltybot_scene scene_understanding.launch.py` - [ ] `ros2 topic hz /social/scene/objects` → ≥15 Hz - [ ] `ros2 topic hz /social/scene/room_type` → ~2 Hz - [ ] `ros2 topic echo /social/scene/room_type` → correct room name - [ ] `ros2 topic echo /social/scene/behavior_hint` → speed/personality updates - [ ] Walk robot toward stairs → `/social/scene/hazards` publishes HAZARD_STAIRS - [ ] `ros2 topic hz /social/scene/obstacle_cloud` → objects visible in costmap 🤖 Generated with [Claude Code](https://claude.com/claude-code)
sl-perception added 1 commit 2026-03-02 10:00:34 -05:00
New packages:
  saltybot_scene_msgs — 4 msgs (SceneObject, SceneObjectArray, RoomClassification, BehaviorHint)
  saltybot_scene      — 3 nodes + launch + config + TRT build script

Nodes:
  scene_detector_node   — YOLOv8-nano TRT FP16 (target ≥15 FPS @ 640×640);
                          synchronized RGB+depth input; filters scene classes
                          (chairs, tables, doors, stairs, pets, appliances);
                          3D back-projection via aligned depth; depth-based hazard
                          scan (HazardClassifier); room classification at 2Hz;
                          publishes /social/scene/objects + /social/scene/hazards
                          + /social/scene/room_type
  behavior_adapter_node — adapts speed_limit_mps + personality_mode from room
                          type and hazard severity; publishes BehaviorHint on
                          /social/scene/behavior_hint (on-change + 1Hz heartbeat)
  costmap_publisher_node — converts SceneObjectArray → PointCloud2 disc rings
                           for Nav2 obstacle_layer + MarkerArray for RViz;
                           publishes /social/scene/obstacle_cloud

Modules:
  yolo_utils.py        — YOLOv8 preprocess/postprocess (letterbox, cx/cy/w/h decode,
                         NMS), COCO+custom class table (door=80, stairs=81, wet=82),
                         hazard-by-class mapping
  room_classifier.py   — rule-based (object co-occurrence weights + softmax) with
                         optional MobileNetV2 TRT/ONNX backend (Places365-style 8-class)
  hazard_classifier.py — depth-only hazard patterns: drop (row-mean cliff), stairs
                         (alternating depth bands), wet floor (depth std-dev), glass
                         (zero depth + strong Sobel edges in RGB)

scripts/build_scene_trt.py — export YOLOv8n → ONNX → TRT FP16; optionally build
                             MobileNetV2 room classifier engine; includes benchmark

Topic map:
  /social/scene/objects          SceneObjectArray  ~15+ FPS
  /social/scene/room_type        RoomClassification ~2 Hz
  /social/scene/hazards          SceneObjectArray  on hazard
  /social/scene/behavior_hint    BehaviorHint      on-change + 1 Hz
  /social/scene/obstacle_cloud   PointCloud2       Nav2 obstacle_layer
  /social/scene/object_markers   MarkerArray       RViz debug

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
sl-jetson merged commit cb961edb9f into main 2026-03-02 10:07:29 -05:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: seb/saltylab-firmware#153
No description provided.