3 Commits

Author SHA1 Message Date
5043578934 feat(social): speech pipeline + LLM conversation + TTS + orchestrator (#81 #83 #85 #89)
Issue #81 — Speech pipeline:
- speech_pipeline_node.py: OpenWakeWord "hey_salty" → Silero VAD → faster-whisper
  STT (Orin GPU, <500ms wake-to-transcript) → ECAPA-TDNN speaker diarization
- speech_utils.py: pcm16↔float32, EnergyVad, UtteranceSegmenter (pre-roll, max-
  duration), cosine speaker identification — all pure Python, no ROS2/GPU needed
- Publishes /social/speech/transcript (SpeechTranscript) + /social/speech/vad_state

Issue #83 — Conversation engine:
- conversation_node.py: llama-cpp-python GGUF (Phi-3-mini Q4_K_M, 20 GPU layers),
  streaming token output, per-person sliding-window context (4K tokens), summary
  compression, SOUL.md system prompt, group mode
- llm_context.py: PersonContext, ContextStore (JSON persistence), build_llama_prompt
  (ChatML format), context compression via LLM summarization
- Publishes /social/conversation/response (ConversationResponse, partial + final)

Issue #85 — Streaming TTS:
- tts_node.py: Piper ONNX streaming synthesis, sentence-by-sentence first-chunk
  streaming (<200ms to first audio), sounddevice USB speaker playback, volume control
- tts_utils.py: split_sentences, pcm16_to_wav_bytes, chunk_pcm, apply_volume, strip_ssml

Issue #89 — Pipeline orchestrator:
- orchestrator_node.py: IDLE→LISTENING→THINKING→SPEAKING state machine, GPU memory
  watchdog (throttle at <2GB free), rolling latency stats (p50/p95 per stage),
  VAD watchdog (alert if speech pipeline hangs), /social/orchestrator/state JSON pub
- social_bot.launch.py: brings up all 4 nodes with TimerAction delays

New messages: SpeechTranscript.msg, VadState.msg, ConversationResponse.msg
Config YAMLs: speech_params, conversation_params, tts_params, orchestrator_params
Tests: 58 tests (28 speech_utils + 30 llm_context/tts_utils), all passing

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-02 08:23:19 -05:00
5143e5bfac feat(social): Issue #86 — physical expression + motor attention
ESP32-C3 NeoPixel sketch (esp32/social_expression/social_expression.ino):
  - Adafruit NeoPixel + ArduinoJson, serial JSON protocol 115200 8N1
  - Mood→colour: happy=green, curious=blue, annoyed=red, playful=rainbow
  - Idle breathing animation (sine-modulated warm white)
  - Auto-falls to idle after IDLE_TIMEOUT_MS (3 s) with no command

ROS2 saltybot_social_msgs (new package):
  - Mood.msg — {mood, intensity}
  - Person.msg — {track_id, bearing_rad, distance_m, confidence, is_speaking, source}
  - PersonArray.msg — {persons[], active_id}

ROS2 saltybot_social (new package):
  - expression_node: subscribes /social/mood → JSON serial to ESP32-C3
    reconnects on port error; sends idle frame after idle_timeout_s
  - attention_node: subscribes /social/persons → /cmd_vel rotation-only
    proportional control with dead zone; prefers active speaker, falls
    back to highest-confidence person; lost-target idle after 2 s
  - launch/social.launch.py — combined launch
  - config YAML for both nodes with documented parameters
  - test/test_attention.py — 15 pytest-only unit tests

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 23:35:59 -05:00
84790412d6 feat(social): multi-modal person state tracker (Issue #82) 2026-03-01 23:08:22 -05:00