# Social-bot Model Directory

## Layout

```
/models/
├── onnx/                          # Source ONNX models (version-pinned)
│   ├── scrfd_10g_bnkps.onnx      # Face detection — InsightFace SCRFD-10GF
│   ├── arcface_r100.onnx         # Face recognition — ArcFace R100 (buffalo_l)
│   └── ecapa_tdnn.onnx           # Speaker embedding — ECAPA-TDNN (SpeechBrain export)
│
├── engines/                       # TensorRT FP16 compiled engines
│   ├── scrfd_10g_fp16.engine     # SCRFD → TRT FP16 (640×640)
│   ├── arcface_r100_fp16.engine  # ArcFace → TRT FP16 (112×112)
│   └── ecapa_tdnn_fp16.engine    # ECAPA-TDNN → TRT FP16 (variable len)
│
├── whisper-small-ct2/             # faster-whisper CTranslate2 format (auto-downloaded)
│   ├── model.bin
│   └── tokenizer.json
│
├── piper/                         # Piper TTS voice models
│   ├── en_US-lessac-medium.onnx
│   └── en_US-lessac-medium.onnx.json
│
├── gguf/                          # Quantized LLM (llama-cpp-python)
│   └── phi-3-mini-4k-instruct-q4_k_m.gguf  # ~2.2GB — Phi-3-mini Q4_K_M
│
└── speechbrain_ecapa/             # SpeechBrain pretrained checkpoint cache
```

## Model Versions

| Model | Version | Source | Size |
|---|---|---|---|
| SCRFD-10GF | InsightFace 0.7 | GitHub releases | 17MB |
| ArcFace R100 (w600k_r50) | InsightFace buffalo_l | Auto via insightface | 166MB |
| ECAPA-TDNN | SpeechBrain spkrec-ecapa-voxceleb | HuggingFace | 87MB |
| Whisper small | faster-whisper 1.0+ | CTranslate2 hub | 488MB |
| Piper en_US-lessac-medium | Rhasspy piper-voices | HuggingFace | 63MB |
| Phi-3-mini-4k Q4_K_M | microsoft/Phi-3-mini-4k-instruct | GGUF / HuggingFace | 2.2GB |

## Setup

```bash
# From within the social container:
/scripts/convert_models.sh all          # download + convert all models
/scripts/convert_models.sh benchmark    # run latency benchmark suite
/scripts/convert_models.sh health       # check GPU memory
```

## Performance Targets (Orin Nano Super, JetPack 6, FP16)

| Model | Input | Target | Typical |
|---|---|---|---|
| SCRFD-10GF | 640×640 | <15ms | ~8ms |
| ArcFace R100 | 4×112×112 | <5ms | ~3ms |
| ECAPA-TDNN | 1s audio | <20ms | ~12ms |
| Whisper small | 1s audio | <300ms | ~180ms |
| Piper lessac-medium | 10 words | <200ms | ~60ms |
| Phi-3-mini Q4_K_M | prompt | <500ms TTFT | ~350ms |

## LLM Download

```bash
# Download Phi-3-mini GGUF manually (2.2GB):
wget -O /models/gguf/phi-3-mini-4k-instruct-q4_k_m.gguf \
  "https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf/resolve/main/Phi-3-mini-4k-instruct-q4.gguf"

# Or use llama-cpp-python's built-in download:
python3 -c "
from llama_cpp import Llama
llm = Llama.from_pretrained(
    repo_id='microsoft/Phi-3-mini-4k-instruct-gguf',
    filename='Phi-3-mini-4k-instruct-q4.gguf',
    cache_dir='/models/gguf',
    n_gpu_layers=20
)
"
```