Social-bot Model Directory
Layout
/models/
├── onnx/ # Source ONNX models (version-pinned)
│ ├── scrfd_10g_bnkps.onnx # Face detection — InsightFace SCRFD-10GF
│ ├── arcface_r100.onnx # Face recognition — ArcFace R100 (buffalo_l)
│ └── ecapa_tdnn.onnx # Speaker embedding — ECAPA-TDNN (SpeechBrain export)
│
├── engines/ # TensorRT FP16 compiled engines
│ ├── scrfd_10g_fp16.engine # SCRFD → TRT FP16 (640×640)
│ ├── arcface_r100_fp16.engine # ArcFace → TRT FP16 (112×112)
│ └── ecapa_tdnn_fp16.engine # ECAPA-TDNN → TRT FP16 (variable len)
│
├── whisper-small-ct2/ # faster-whisper CTranslate2 format (auto-downloaded)
│ ├── model.bin
│ └── tokenizer.json
│
├── piper/ # Piper TTS voice models
│ ├── en_US-lessac-medium.onnx
│ └── en_US-lessac-medium.onnx.json
│
├── gguf/ # Quantized LLM (llama-cpp-python)
│ └── phi-3-mini-4k-instruct-q4_k_m.gguf # ~2.2GB — Phi-3-mini Q4_K_M
│
└── speechbrain_ecapa/ # SpeechBrain pretrained checkpoint cache
Model Versions
| Model |
Version |
Source |
Size |
| SCRFD-10GF |
InsightFace 0.7 |
GitHub releases |
17MB |
| ArcFace R100 (w600k_r50) |
InsightFace buffalo_l |
Auto via insightface |
166MB |
| ECAPA-TDNN |
SpeechBrain spkrec-ecapa-voxceleb |
HuggingFace |
87MB |
| Whisper small |
faster-whisper 1.0+ |
CTranslate2 hub |
488MB |
| Piper en_US-lessac-medium |
Rhasspy piper-voices |
HuggingFace |
63MB |
| Phi-3-mini-4k Q4_K_M |
microsoft/Phi-3-mini-4k-instruct |
GGUF / HuggingFace |
2.2GB |
Setup
# From within the social container:
/scripts/convert_models.sh all # download + convert all models
/scripts/convert_models.sh benchmark # run latency benchmark suite
/scripts/convert_models.sh health # check GPU memory
Performance Targets (Orin Nano Super, JetPack 6, FP16)
| Model |
Input |
Target |
Typical |
| SCRFD-10GF |
640×640 |
<15ms |
~8ms |
| ArcFace R100 |
4×112×112 |
<5ms |
~3ms |
| ECAPA-TDNN |
1s audio |
<20ms |
~12ms |
| Whisper small |
1s audio |
<300ms |
~180ms |
| Piper lessac-medium |
10 words |
<200ms |
~60ms |
| Phi-3-mini Q4_K_M |
prompt |
<500ms TTFT |
~350ms |
LLM Download
# Download Phi-3-mini GGUF manually (2.2GB):
wget -O /models/gguf/phi-3-mini-4k-instruct-q4_k_m.gguf \
"https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf/resolve/main/Phi-3-mini-4k-instruct-q4.gguf"
# Or use llama-cpp-python's built-in download:
python3 -c "
from llama_cpp import Llama
llm = Llama.from_pretrained(
repo_id='microsoft/Phi-3-mini-4k-instruct-gguf',
filename='Phi-3-mini-4k-instruct-q4.gguf',
cache_dir='/models/gguf',
n_gpu_layers=20
)
"