# Social-bot Model Directory ## Layout ``` /models/ ├── onnx/ # Source ONNX models (version-pinned) │ ├── scrfd_10g_bnkps.onnx # Face detection — InsightFace SCRFD-10GF │ ├── arcface_r100.onnx # Face recognition — ArcFace R100 (buffalo_l) │ └── ecapa_tdnn.onnx # Speaker embedding — ECAPA-TDNN (SpeechBrain export) │ ├── engines/ # TensorRT FP16 compiled engines │ ├── scrfd_10g_fp16.engine # SCRFD → TRT FP16 (640×640) │ ├── arcface_r100_fp16.engine # ArcFace → TRT FP16 (112×112) │ └── ecapa_tdnn_fp16.engine # ECAPA-TDNN → TRT FP16 (variable len) │ ├── whisper-small-ct2/ # faster-whisper CTranslate2 format (auto-downloaded) │ ├── model.bin │ └── tokenizer.json │ ├── piper/ # Piper TTS voice models │ ├── en_US-lessac-medium.onnx │ └── en_US-lessac-medium.onnx.json │ ├── gguf/ # Quantized LLM (llama-cpp-python) │ └── phi-3-mini-4k-instruct-q4_k_m.gguf # ~2.2GB — Phi-3-mini Q4_K_M │ └── speechbrain_ecapa/ # SpeechBrain pretrained checkpoint cache ``` ## Model Versions | Model | Version | Source | Size | |---|---|---|---| | SCRFD-10GF | InsightFace 0.7 | GitHub releases | 17MB | | ArcFace R100 (w600k_r50) | InsightFace buffalo_l | Auto via insightface | 166MB | | ECAPA-TDNN | SpeechBrain spkrec-ecapa-voxceleb | HuggingFace | 87MB | | Whisper small | faster-whisper 1.0+ | CTranslate2 hub | 488MB | | Piper en_US-lessac-medium | Rhasspy piper-voices | HuggingFace | 63MB | | Phi-3-mini-4k Q4_K_M | microsoft/Phi-3-mini-4k-instruct | GGUF / HuggingFace | 2.2GB | ## Setup ```bash # From within the social container: /scripts/convert_models.sh all # download + convert all models /scripts/convert_models.sh benchmark # run latency benchmark suite /scripts/convert_models.sh health # check GPU memory ``` ## Performance Targets (Orin Nano Super, JetPack 6, FP16) | Model | Input | Target | Typical | |---|---|---|---| | SCRFD-10GF | 640×640 | <15ms | ~8ms | | ArcFace R100 | 4×112×112 | <5ms | ~3ms | | ECAPA-TDNN | 1s audio | <20ms | ~12ms | | Whisper small | 1s audio | <300ms | ~180ms | | Piper lessac-medium | 10 words | <200ms | ~60ms | | Phi-3-mini Q4_K_M | prompt | <500ms TTFT | ~350ms | ## LLM Download ```bash # Download Phi-3-mini GGUF manually (2.2GB): wget -O /models/gguf/phi-3-mini-4k-instruct-q4_k_m.gguf \ "https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf/resolve/main/Phi-3-mini-4k-instruct-q4.gguf" # Or use llama-cpp-python's built-in download: python3 -c " from llama_cpp import Llama llm = Llama.from_pretrained( repo_id='microsoft/Phi-3-mini-4k-instruct-gguf', filename='Phi-3-mini-4k-instruct-q4.gguf', cache_dir='/models/gguf', n_gpu_layers=20 ) " ```