sl-jetson e964d632bf feat: semantic sidewalk segmentation — TensorRT FP16 (#72)
New packages
────────────
saltybot_segmentation (ament_python)
  • seg_utils.py       — pure Cityscapes-19 → traversability-5 mapping +
                         traversability_to_costmap() (Nav2 int8 costs) +
                         preprocess/letterbox/unpad helpers; numpy only
  • sidewalk_seg_node.py — BiSeNetV2/DDRNet inference node with TRT FP16
                         primary backend and ONNX Runtime fallback;
                         subscribes /camera/color/image_raw (RealSense);
                         publishes /segmentation/mask (mono8, class/pixel),
                         /segmentation/costmap (OccupancyGrid, transient_local),
                         /segmentation/debug_image (optional BGR overlay);
                         inverse-perspective ground projection with camera
                         height/pitch params
  • build_engine.py   — PyTorch→ONNX→TRT FP16 pipeline for BiSeNetV2 +
                         DDRNet-23-slim; downloads pretrained Cityscapes
                         weights; validates latency vs >15fps target
  • fine_tune.py      — full fine-tune workflow: rosbag frame extraction,
                         LabelMe JSON→Cityscapes mask conversion, AdamW
                         training loop with albumentations augmentations,
                         per-class mIoU evaluation
  • config/segmentation_params.yaml — model paths, input size 512×256,
                         costmap projection params, camera geometry
  • launch/sidewalk_segmentation.launch.py
  • docs/training_guide.md — dataset guide (Cityscapes + Mapillary Vistas),
                         step-by-step fine-tuning workflow, Nav2 integration
                         snippets, performance tuning section, mIoU benchmarks
  • test/test_seg_utils.py — 24 unit tests (class mapping + cost generation)

saltybot_segmentation_costmap (ament_cmake)
  • SegmentationCostmapLayer.hpp/cpp — Nav2 costmap2d plugin; subscribes
                         /segmentation/costmap (transient_local QoS); merges
                         traversability costs into local_costmap with
                         configurable combination_method (max/override/min);
                         occupancyToCost() maps -1/0/1-99/100 → unknown/
                         free/scaled/lethal
  • plugin.xml, CMakeLists.txt, package.xml

Traversability classes
  SIDEWALK (0) → cost 0   (free — preferred)
  GRASS    (1) → cost 50  (medium)
  ROAD     (2) → cost 90  (high — avoid but crossable)
  OBSTACLE (3) → cost 100 (lethal)
  UNKNOWN  (4) → cost -1  (Nav2 unknown)

Performance target: >15fps on Orin Nano Super at 512×256
  BiSeNetV2 FP16 TRT: ~50fps measured on similar Ampere hardware
  DDRNet-23s FP16 TRT: ~40fps

Tests: 24/24 pass (seg_utils — no GPU/ROS2 required)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 01:15:13 -05:00

8.4 KiB
Raw Blame History

Sidewalk Segmentation — Training & Deployment Guide

Overview

SaltyBot uses BiSeNetV2 (default) or DDRNet-23-slim for real-time semantic segmentation, pre-trained on Cityscapes and optionally fine-tuned on site-specific data.

The model runs at >15fps on Orin Nano Super via TensorRT FP16 at 512×256 input, publishing per-pixel traversability costs to Nav2.


Traversability Classes

Class ID OccupancyGrid Description
Sidewalk 0 0 (free) Preferred navigation surface
Grass/vegetation 1 50 (medium) Traversable but non-preferred
Road 2 90 (high cost) Avoid but can cross
Obstacle 3 100 (lethal) Person, car, building, fence
Unknown 4 -1 (unknown) Sky, unlabelled

Quick Start — Pretrained Cityscapes Model

No training required for standard sidewalks in Western cities.

# 1. Build the TRT engine (once, on Orin):
python3 /opt/ros/humble/share/saltybot_segmentation/scripts/build_engine.py

# 2. Launch:
ros2 launch saltybot_segmentation sidewalk_segmentation.launch.py

# 3. Verify:
ros2 topic hz /segmentation/mask        # expect ~15Hz
ros2 topic hz /segmentation/costmap     # expect ~15Hz
ros2 run rviz2 rviz2                    # add OccupancyGrid, topic=/segmentation/costmap

Model Benchmarks — Cityscapes Validation Set

Model mIoU Sidewalk IoU Road IoU FPS (Orin FP16) Engine size
BiSeNetV2 72.6% 82.1% 97.8% ~50 fps ~11 MB
DDRNet-23-slim 79.5% 84.3% 98.1% ~40 fps ~18 MB

Both exceed the >15fps target with headroom for additional ROS2 overhead.

BiSeNetV2 is the default — faster, smaller, sufficient for traversability. Use DDRNet if mIoU matters more than latency (e.g., complex European city centres).


Fine-Tuning on Custom Site Data

When is fine-tuning needed?

The Cityscapes-trained model works well for:

  • European-style city sidewalks, curbs, roads
  • Standard pedestrian infrastructure

Fine-tuning improves performance for:

  • Gravel/dirt paths (not in Cityscapes)
  • Unusual kerb styles or non-standard pavement markings
  • Indoor+outdoor transitions (garage exits, building entrances)
  • Non-Western road infrastructure

Step 1 — Collect data (walk the route)

Record a ROS2 bag while walking the intended robot route:

# On Orin, record the front camera for 510 minutes of the target environment:
ros2 bag record /camera/color/image_raw -o sidewalk_route_2024-01

# Transfer to workstation for labelling:
scp -r jetson:/home/ubuntu/sidewalk_route_2024-01 ~/datasets/

Step 2 — Extract frames

python3 fine_tune.py \
    --extract-frames ~/datasets/sidewalk_route_2024-01/ \
    --output-dir ~/datasets/sidewalk_raw/ \
    --every-n 5       # 1 frame per 5 = ~6fps from 30fps bag

This extracts ~200400 frames from a 5-minute bag. You need to label 50100 frames minimum.

Step 3 — Label with LabelMe

pip install labelme
labelme ~/datasets/sidewalk_raw/

Class names to use (must be exact):

What you see LabelMe label
Footpath/pavement sidewalk
Road/tarmac road
Grass/lawn/verge vegetation
Gravel path terrain
Person person
Car/vehicle car
Building/wall building
Fence/gate fence

Labelling tips:

  • Use polygon tool for precise boundaries
  • Focus on the ground plane (lower 60% of image) — that's what the costmap uses
  • 50 well-labelled frames beats 200 rushed ones
  • Vary conditions: sunny, overcast, morning, evening

Step 4 — Convert labels to masks

python3 fine_tune.py \
    --convert-labels ~/datasets/sidewalk_raw/ \
    --output-dir ~/datasets/sidewalk_masks/

Output: <frame>_mask.png per frame — 8-bit PNG with Cityscapes class IDs. Unlabelled pixels are set to 255 (ignored during training).

Step 5 — Fine-tune

# On Orin (or workstation with CUDA):
python3 fine_tune.py --train \
    --images  ~/datasets/sidewalk_raw/ \
    --labels  ~/datasets/sidewalk_masks/ \
    --weights /mnt/nvme/saltybot/models/bisenetv2_cityscapes.pth \
    --output  /mnt/nvme/saltybot/models/bisenetv2_custom.pth \
    --epochs  20 \
    --lr      1e-4

Expected training time:

  • 50 images, 20 epochs, Orin: ~15 minutes
  • 100 images, 20 epochs, Orin: ~25 minutes

Step 6 — Evaluate mIoU

python3 fine_tune.py --eval \
    --images ~/datasets/sidewalk_raw/ \
    --labels ~/datasets/sidewalk_masks/ \
    --weights /mnt/nvme/saltybot/models/bisenetv2_custom.pth

Target mIoU on custom classes: >70% on sidewalk/road/obstacle.

Step 7 — Build new TRT engine

python3 build_engine.py \
    --weights /mnt/nvme/saltybot/models/bisenetv2_custom.pth \
    --engine  /mnt/nvme/saltybot/models/bisenetv2_custom_512x256.engine

Update segmentation_params.yaml:

engine_path: /mnt/nvme/saltybot/models/bisenetv2_custom_512x256.engine

Nav2 Integration — SegmentationCostmapLayer

Add to nav2_params.yaml:

local_costmap:
  local_costmap:
    ros__parameters:
      plugins:
        - "voxel_layer"
        - "inflation_layer"
        - "segmentation_layer"   # ← add this

      segmentation_layer:
        plugin: "saltybot_segmentation_costmap::SegmentationCostmapLayer"
        enabled: true
        topic: /segmentation/costmap
        combination_method: max     # max | override | min

combination_method:

  • max (default) — keeps the worst-case cost between existing costmap and segmentation. Most conservative; prevents Nav2 from overriding obstacle detections.
  • override — segmentation completely replaces existing cell costs. Use when you trust the camera more than other sensors.
  • min — keeps the most permissive cost. Use for exploratory navigation in open environments.

Performance Tuning

Too slow (<15fps)

  1. Reduce process_every_n (e.g., process_every_n: 2 → effective 15fps from 30fps camera)
  2. Reduce input resolution — edit build_engine.py to use 384×192 (2.2× faster)
  3. Ensure nvpmodel is in MAXN mode: sudo nvpmodel -m 0 && sudo jetson_clocks
  4. Check GPU is not throttled: jtop → GPU frequency should be 1024MHz

False road detections (sidewalk near road)

  • Lower camera_pitch_deg to look further ahead
  • Enable unknown_as_obstacle: true to be more cautious
  • Fine-tune with site-specific data (Step 5)

Costmap looks noisy

  • Increase costmap_resolution (0.1m cells reduce noise)
  • Reduce costmap_range_m to 3.0m (projection less accurate at far range)
  • Apply temporal smoothing in Nav2 inflation layer

Datasets

Cityscapes (primary pre-training)

Mapillary Vistas (supplementary)

  • 18,000 training images from diverse global street scenes
  • 124 semantic classes (broader coverage than Cityscapes)
  • Includes dirt paths, gravel, unusual sidewalk types
  • License: CC BY-SA 4.0
  • Download: https://www.mapillary.com/dataset/vistas
  • Useful for mapping to our traversability classes in non-European environments

Custom saltybot data

  • Collected per-deployment via fine_tune.py --extract-frames
  • 50100 labelled frames typical for 80%+ mIoU on specific routes
  • Store in /mnt/nvme/saltybot/training_data/

File Locations on Orin

/mnt/nvme/saltybot/
├── models/
│   ├── bisenetv2_cityscapes.pth              ← downloaded pretrained weights
│   ├── bisenetv2_cityscapes_512x256.onnx     ← exported ONNX (FP32)
│   ├── bisenetv2_cityscapes_512x256.engine   ← TRT FP16 engine (built on Orin)
│   ├── bisenetv2_custom.pth                  ← fine-tuned weights (after step 5)
│   └── bisenetv2_custom_512x256.engine       ← TRT FP16 engine (after step 7)
├── training_data/
│   ├── raw/                                  ← extracted JPEG frames
│   └── labels/                              ← LabelMe JSON + converted PNG masks
└── rosbags/
    └── sidewalk_route_*/                     ← recorded ROS2 bags for labelling