# Sidewalk Segmentation — Training & Deployment Guide ## Overview SaltyBot uses **BiSeNetV2** (default) or **DDRNet-23-slim** for real-time semantic segmentation, pre-trained on Cityscapes and optionally fine-tuned on site-specific data. The model runs at **>15fps on Orin Nano Super** via TensorRT FP16 at 512×256 input, publishing per-pixel traversability costs to Nav2. --- ## Traversability Classes | Class | ID | OccupancyGrid | Description | |-------|----|---------------|-------------| | Sidewalk | 0 | 0 (free) | Preferred navigation surface | | Grass/vegetation | 1 | 50 (medium) | Traversable but non-preferred | | Road | 2 | 90 (high cost) | Avoid but can cross | | Obstacle | 3 | 100 (lethal) | Person, car, building, fence | | Unknown | 4 | -1 (unknown) | Sky, unlabelled | --- ## Quick Start — Pretrained Cityscapes Model No training required for standard sidewalks in Western cities. ```bash # 1. Build the TRT engine (once, on Orin): python3 /opt/ros/humble/share/saltybot_segmentation/scripts/build_engine.py # 2. Launch: ros2 launch saltybot_segmentation sidewalk_segmentation.launch.py # 3. Verify: ros2 topic hz /segmentation/mask # expect ~15Hz ros2 topic hz /segmentation/costmap # expect ~15Hz ros2 run rviz2 rviz2 # add OccupancyGrid, topic=/segmentation/costmap ``` --- ## Model Benchmarks — Cityscapes Validation Set | Model | mIoU | Sidewalk IoU | Road IoU | FPS (Orin FP16) | Engine size | |-------|------|--------------|----------|-----------------|-------------| | BiSeNetV2 | 72.6% | 82.1% | 97.8% | ~50 fps | ~11 MB | | DDRNet-23-slim | 79.5% | 84.3% | 98.1% | ~40 fps | ~18 MB | Both exceed the **>15fps target** with headroom for additional ROS2 overhead. BiSeNetV2 is the default — faster, smaller, sufficient for traversability. Use DDRNet if mIoU matters more than latency (e.g., complex European city centres). --- ## Fine-Tuning on Custom Site Data ### When is fine-tuning needed? The Cityscapes-trained model works well for: - European-style city sidewalks, curbs, roads - Standard pedestrian infrastructure Fine-tuning improves performance for: - Gravel/dirt paths (not in Cityscapes) - Unusual kerb styles or non-standard pavement markings - Indoor+outdoor transitions (garage exits, building entrances) - Non-Western road infrastructure ### Step 1 — Collect data (walk the route) Record a ROS2 bag while walking the intended robot route: ```bash # On Orin, record the front camera for 5–10 minutes of the target environment: ros2 bag record /camera/color/image_raw -o sidewalk_route_2024-01 # Transfer to workstation for labelling: scp -r jetson:/home/ubuntu/sidewalk_route_2024-01 ~/datasets/ ``` ### Step 2 — Extract frames ```bash python3 fine_tune.py \ --extract-frames ~/datasets/sidewalk_route_2024-01/ \ --output-dir ~/datasets/sidewalk_raw/ \ --every-n 5 # 1 frame per 5 = ~6fps from 30fps bag ``` This extracts ~200–400 frames from a 5-minute bag. You need to label **50–100 frames** minimum. ### Step 3 — Label with LabelMe ```bash pip install labelme labelme ~/datasets/sidewalk_raw/ ``` **Class names to use** (must be exact): | What you see | LabelMe label | |--------------|---------------| | Footpath/pavement | `sidewalk` | | Road/tarmac | `road` | | Grass/lawn/verge | `vegetation` | | Gravel path | `terrain` | | Person | `person` | | Car/vehicle | `car` | | Building/wall | `building` | | Fence/gate | `fence` | **Labelling tips:** - Use **polygon** tool for precise boundaries - Focus on the ground plane (lower 60% of image) — that's what the costmap uses - 50 well-labelled frames beats 200 rushed ones - Vary conditions: sunny, overcast, morning, evening ### Step 4 — Convert labels to masks ```bash python3 fine_tune.py \ --convert-labels ~/datasets/sidewalk_raw/ \ --output-dir ~/datasets/sidewalk_masks/ ``` Output: `_mask.png` per frame — 8-bit PNG with Cityscapes class IDs. Unlabelled pixels are set to 255 (ignored during training). ### Step 5 — Fine-tune ```bash # On Orin (or workstation with CUDA): python3 fine_tune.py --train \ --images ~/datasets/sidewalk_raw/ \ --labels ~/datasets/sidewalk_masks/ \ --weights /mnt/nvme/saltybot/models/bisenetv2_cityscapes.pth \ --output /mnt/nvme/saltybot/models/bisenetv2_custom.pth \ --epochs 20 \ --lr 1e-4 ``` Expected training time: - 50 images, 20 epochs, Orin: ~15 minutes - 100 images, 20 epochs, Orin: ~25 minutes ### Step 6 — Evaluate mIoU ```bash python3 fine_tune.py --eval \ --images ~/datasets/sidewalk_raw/ \ --labels ~/datasets/sidewalk_masks/ \ --weights /mnt/nvme/saltybot/models/bisenetv2_custom.pth ``` Target mIoU on custom classes: >70% on sidewalk/road/obstacle. ### Step 7 — Build new TRT engine ```bash python3 build_engine.py \ --weights /mnt/nvme/saltybot/models/bisenetv2_custom.pth \ --engine /mnt/nvme/saltybot/models/bisenetv2_custom_512x256.engine ``` Update `segmentation_params.yaml`: ```yaml engine_path: /mnt/nvme/saltybot/models/bisenetv2_custom_512x256.engine ``` --- ## Nav2 Integration — SegmentationCostmapLayer Add to `nav2_params.yaml`: ```yaml local_costmap: local_costmap: ros__parameters: plugins: - "voxel_layer" - "inflation_layer" - "segmentation_layer" # ← add this segmentation_layer: plugin: "saltybot_segmentation_costmap::SegmentationCostmapLayer" enabled: true topic: /segmentation/costmap combination_method: max # max | override | min ``` **combination_method:** - `max` (default) — keeps the worst-case cost between existing costmap and segmentation. Most conservative; prevents Nav2 from overriding obstacle detections. - `override` — segmentation completely replaces existing cell costs. Use when you trust the camera more than other sensors. - `min` — keeps the most permissive cost. Use for exploratory navigation in open environments. --- ## Performance Tuning ### Too slow (<15fps) 1. **Reduce process_every_n** (e.g., `process_every_n: 2` → effective 15fps from 30fps camera) 2. **Reduce input resolution** — edit `build_engine.py` to use 384×192 (2.2× faster) 3. **Ensure nvpmodel is in MAXN mode**: `sudo nvpmodel -m 0 && sudo jetson_clocks` 4. Check GPU is not throttled: `jtop` → GPU frequency should be 1024MHz ### False road detections (sidewalk near road) - Lower `camera_pitch_deg` to look further ahead - Enable `unknown_as_obstacle: true` to be more cautious - Fine-tune with site-specific data (Step 5) ### Costmap looks noisy - Increase `costmap_resolution` (0.1m cells reduce noise) - Reduce `costmap_range_m` to 3.0m (projection less accurate at far range) - Apply temporal smoothing in Nav2 inflation layer --- ## Datasets ### Cityscapes (primary pre-training) - **2975 training / 500 validation** finely annotated images - 19 semantic classes (roads, sidewalks, people, vehicles, etc.) - License: Cityscapes Terms of Use (non-commercial research) - Download: https://www.cityscapes-dataset.com/ - Required for training from scratch; BiSeNetV2 pretrained checkpoint (~25MB) available at: https://github.com/CoinCheung/BiSeNet/releases ### Mapillary Vistas (supplementary) - **18,000 training images** from diverse global street scenes - 124 semantic classes (broader coverage than Cityscapes) - Includes dirt paths, gravel, unusual sidewalk types - License: CC BY-SA 4.0 - Download: https://www.mapillary.com/dataset/vistas - Useful for mapping to our traversability classes in non-European environments ### Custom saltybot data - Collected per-deployment via `fine_tune.py --extract-frames` - 50–100 labelled frames typical for 80%+ mIoU on specific routes - Store in `/mnt/nvme/saltybot/training_data/` --- ## File Locations on Orin ``` /mnt/nvme/saltybot/ ├── models/ │ ├── bisenetv2_cityscapes.pth ← downloaded pretrained weights │ ├── bisenetv2_cityscapes_512x256.onnx ← exported ONNX (FP32) │ ├── bisenetv2_cityscapes_512x256.engine ← TRT FP16 engine (built on Orin) │ ├── bisenetv2_custom.pth ← fine-tuned weights (after step 5) │ └── bisenetv2_custom_512x256.engine ← TRT FP16 engine (after step 7) ├── training_data/ │ ├── raw/ ← extracted JPEG frames │ └── labels/ ← LabelMe JSON + converted PNG masks └── rosbags/ └── sidewalk_route_*/ ← recorded ROS2 bags for labelling ```