feat(social): First Encounter routine — autonomous introduction + person enrollment #400

Closed
opened 2026-03-04 13:01:02 -05:00 by seb · 0 comments
Owner

Overview

When Salty detects an unknown person (no face match in gallery), he should initiate a First Encounter routine — a natural, friendly self-introduction that captures basic details about the new person. This must run fully offline (no internet required); captured data is queued for cloud AI processing when connectivity is available.

Flow

  1. Detection: Unknown face detected (no match in speaker_embeddings.json or face gallery)
  2. Greeting: Salty introduces himself naturally via TTS through the Jabra speaker
    • e.g. "Hey! I dont think weve met. Im Salty — Im the little gremlin that lives in this robot."
  3. Name capture: "Whats your name?" → STT → store
  4. Small talk: "So what are you up to?" / "What brings you here?" → STT → store context
  5. Face enrollment: Capture face embedding + ECAPA-TDNN voice embedding during conversation
  6. Wrap up: "Nice to meet you, [name]! Ill remember you next time."

Data Captured (offline)

  • Face embedding (SCRFD + ArcFace)
  • Voice fingerprint (ECAPA-TDNN speaker embedding)
  • Name (STT transcription)
  • Context/notes (STT transcription of small talk)
  • Timestamp + location context
  • Photo snapshot (RealSense RGB)

Offline Queue

  • Store encounter data as JSON in /home/seb/encounter-queue/
  • When internet available, push to cloud AI (Salty/OpenClaw) for:
    • Memory storage (who this person is, context)
    • Relationship tier assignment (stranger → regular → favorite)
    • Notification to Tee if relevant

Requirements

  • Must feel natural, not robotic — conversational pace, appropriate pauses
  • Must work without internet (all models run locally on Orin)
  • Should integrate with existing person_state_tracker_node and social_enrollment package
  • Face display should show Social expression (4) during encounter
  • TTS via Piper (local) through Jabra speaker
  • Encounter can be interrupted gracefully (person walks away → save partial data)

Depends On

  • #393 (wake word model)
  • #394 (face display bridge)
  • Working speech pipeline with STT
  • Piper TTS installed on Orin
  • Face recognition pipeline (SCRFD + ArcFace)

Nice to Have

  • Detect language preference (Whisper LID) and switch TTS language
  • Remember partial encounters (saw face before but never completed intro)
  • Multi-person: if two unknowns approach, handle sequentially

Reported by: Salty + Tee (design session)

## Overview When Salty detects an unknown person (no face match in gallery), he should initiate a **First Encounter** routine — a natural, friendly self-introduction that captures basic details about the new person. This must run **fully offline** (no internet required); captured data is queued for cloud AI processing when connectivity is available. ## Flow 1. **Detection:** Unknown face detected (no match in `speaker_embeddings.json` or face gallery) 2. **Greeting:** Salty introduces himself naturally via TTS through the Jabra speaker - e.g. "Hey! I dont think weve met. Im Salty — Im the little gremlin that lives in this robot." 3. **Name capture:** "Whats your name?" → STT → store 4. **Small talk:** "So what are you up to?" / "What brings you here?" → STT → store context 5. **Face enrollment:** Capture face embedding + ECAPA-TDNN voice embedding during conversation 6. **Wrap up:** "Nice to meet you, [name]! Ill remember you next time." ## Data Captured (offline) - Face embedding (SCRFD + ArcFace) - Voice fingerprint (ECAPA-TDNN speaker embedding) - Name (STT transcription) - Context/notes (STT transcription of small talk) - Timestamp + location context - Photo snapshot (RealSense RGB) ## Offline Queue - Store encounter data as JSON in `/home/seb/encounter-queue/` - When internet available, push to cloud AI (Salty/OpenClaw) for: - Memory storage (who this person is, context) - Relationship tier assignment (stranger → regular → favorite) - Notification to Tee if relevant ## Requirements - Must feel natural, not robotic — conversational pace, appropriate pauses - Must work without internet (all models run locally on Orin) - Should integrate with existing `person_state_tracker_node` and `social_enrollment` package - Face display should show Social expression (4) during encounter - TTS via Piper (local) through Jabra speaker - Encounter can be interrupted gracefully (person walks away → save partial data) ## Depends On - #393 (wake word model) - #394 (face display bridge) - Working speech pipeline with STT - Piper TTS installed on Orin - Face recognition pipeline (SCRFD + ArcFace) ## Nice to Have - Detect language preference (Whisper LID) and switch TTS language - Remember partial encounters (saw face before but never completed intro) - Multi-person: if two unknowns approach, handle sequentially Reported by: Salty + Tee (design session)
Sign in to join this conversation.
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: seb/saltylab-firmware#400
No description provided.