saltylab-firmware/README.md at 859236109532c5cd31a664073dae7054e8643a91

sl-jetson a9b2242a2c feat(social): Orin dev environment — JetPack 6 + TRT conversion + systemd (#88 )

- Dockerfile.social: social-bot container with faster-whisper, llama-cpp-python
  (CUDA), piper-tts, insightface, pyannote.audio, OpenWakeWord, pyaudio
- scripts/convert_models.sh: TRT FP16 conversion for SCRFD-10GF, ArcFace-R100,
  ECAPA-TDNN; CTranslate2 setup for Whisper; Piper voice download; benchmark suite
- config/asound.conf: ALSA USB mic (card1) + USB speaker (card2) config
- models/README.md: version-pinned model table, /models/ layout, perf targets
- systemd/: saltybot-social.service + saltybot.target + install_systemd.sh
- docker-compose.yml: saltybot-social service with GPU, audio device passthrough,
  NVMe volume mounts for /models and /social_db

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-03-02 08:08:57 -05:00

2.9 KiB

Raw Blame History

Layout

/models/
├── onnx/                          # Source ONNX models (version-pinned)
│   ├── scrfd_10g_bnkps.onnx      # Face detection — InsightFace SCRFD-10GF
│   ├── arcface_r100.onnx         # Face recognition — ArcFace R100 (buffalo_l)
│   └── ecapa_tdnn.onnx           # Speaker embedding — ECAPA-TDNN (SpeechBrain export)
│
├── engines/                       # TensorRT FP16 compiled engines
│   ├── scrfd_10g_fp16.engine     # SCRFD → TRT FP16 (640×640)
│   ├── arcface_r100_fp16.engine  # ArcFace → TRT FP16 (112×112)
│   └── ecapa_tdnn_fp16.engine    # ECAPA-TDNN → TRT FP16 (variable len)
│
├── whisper-small-ct2/             # faster-whisper CTranslate2 format (auto-downloaded)
│   ├── model.bin
│   └── tokenizer.json
│
├── piper/                         # Piper TTS voice models
│   ├── en_US-lessac-medium.onnx
│   └── en_US-lessac-medium.onnx.json
│
├── gguf/                          # Quantized LLM (llama-cpp-python)
│   └── phi-3-mini-4k-instruct-q4_k_m.gguf  # ~2.2GB — Phi-3-mini Q4_K_M
│
└── speechbrain_ecapa/             # SpeechBrain pretrained checkpoint cache

Model Versions

Model	Version	Source	Size
SCRFD-10GF	InsightFace 0.7	GitHub releases	17MB
ArcFace R100 (w600k_r50)	InsightFace buffalo_l	Auto via insightface	166MB
ECAPA-TDNN	SpeechBrain spkrec-ecapa-voxceleb	HuggingFace	87MB
Whisper small	faster-whisper 1.0+	CTranslate2 hub	488MB
Piper en_US-lessac-medium	Rhasspy piper-voices	HuggingFace	63MB
Phi-3-mini-4k Q4_K_M	microsoft/Phi-3-mini-4k-instruct	GGUF / HuggingFace	2.2GB

Setup

# From within the social container:
/scripts/convert_models.sh all          # download + convert all models
/scripts/convert_models.sh benchmark    # run latency benchmark suite
/scripts/convert_models.sh health       # check GPU memory

Performance Targets (Orin Nano Super, JetPack 6, FP16)

Model	Input	Target	Typical
SCRFD-10GF	640×640	<15ms	~8ms
ArcFace R100	4×112×112	<5ms	~3ms
ECAPA-TDNN	1s audio	<20ms	~12ms
Whisper small	1s audio	<300ms	~180ms
Piper lessac-medium	10 words	<200ms	~60ms
Phi-3-mini Q4_K_M	prompt	<500ms TTFT	~350ms

LLM Download

# Download Phi-3-mini GGUF manually (2.2GB):
wget -O /models/gguf/phi-3-mini-4k-instruct-q4_k_m.gguf \
  "https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf/resolve/main/Phi-3-mini-4k-instruct-q4.gguf"

# Or use llama-cpp-python's built-in download:
python3 -c "
from llama_cpp import Llama
llm = Llama.from_pretrained(
    repo_id='microsoft/Phi-3-mini-4k-instruct-gguf',
    filename='Phi-3-mini-4k-instruct-q4.gguf',
    cache_dir='/models/gguf',
    n_gpu_layers=20
)
"

2.9 KiB Raw Blame History Unescape Escape

Social-bot Model Directory

Layout

Model Versions

Setup

Performance Targets (Orin Nano Super, JetPack 6, FP16)

LLM Download

2.9 KiB

Raw Blame History