Issue #81 — Speech pipeline: - speech_pipeline_node.py: OpenWakeWord "hey_salty" → Silero VAD → faster-whisper STT (Orin GPU, <500ms wake-to-transcript) → ECAPA-TDNN speaker diarization - speech_utils.py: pcm16↔float32, EnergyVad, UtteranceSegmenter (pre-roll, max- duration), cosine speaker identification — all pure Python, no ROS2/GPU needed - Publishes /social/speech/transcript (SpeechTranscript) + /social/speech/vad_state Issue #83 — Conversation engine: - conversation_node.py: llama-cpp-python GGUF (Phi-3-mini Q4_K_M, 20 GPU layers), streaming token output, per-person sliding-window context (4K tokens), summary compression, SOUL.md system prompt, group mode - llm_context.py: PersonContext, ContextStore (JSON persistence), build_llama_prompt (ChatML format), context compression via LLM summarization - Publishes /social/conversation/response (ConversationResponse, partial + final) Issue #85 — Streaming TTS: - tts_node.py: Piper ONNX streaming synthesis, sentence-by-sentence first-chunk streaming (<200ms to first audio), sounddevice USB speaker playback, volume control - tts_utils.py: split_sentences, pcm16_to_wav_bytes, chunk_pcm, apply_volume, strip_ssml Issue #89 — Pipeline orchestrator: - orchestrator_node.py: IDLE→LISTENING→THINKING→SPEAKING state machine, GPU memory watchdog (throttle at <2GB free), rolling latency stats (p50/p95 per stage), VAD watchdog (alert if speech pipeline hangs), /social/orchestrator/state JSON pub - social_bot.launch.py: brings up all 4 nodes with TimerAction delays New messages: SpeechTranscript.msg, VadState.msg, ConversationResponse.msg Config YAMLs: speech_params, conversation_params, tts_params, orchestrator_params Tests: 58 tests (28 speech_utils + 30 llm_context/tts_utils), all passing Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
34 lines
1.3 KiB
XML
34 lines
1.3 KiB
XML
<?xml version="1.0"?>
|
|
<?xml-model href="http://download.ros.org/schema/package_format3.xsd" schematypens="http://www.w3.org/2001/XMLSchema"?>
|
|
<package format="3">
|
|
<name>saltybot_social</name>
|
|
<version>0.1.0</version>
|
|
<description>
|
|
Social interaction layer for saltybot.
|
|
speech_pipeline_node: wake word + VAD + Whisper STT + diarization (Issue #81).
|
|
conversation_node: local LLM with per-person context (Issue #83).
|
|
tts_node: streaming TTS with Piper first-chunk (Issue #85).
|
|
orchestrator_node: pipeline state machine + GPU watchdog + latency profiler (Issue #89).
|
|
person_state_tracker: multi-modal person identity fusion (Issue #82).
|
|
expression_node: LED expression + motor attention (Issue #86).
|
|
</description>
|
|
<maintainer email="seb@vayrette.com">seb</maintainer>
|
|
<license>MIT</license>
|
|
<depend>rclpy</depend>
|
|
<depend>std_msgs</depend>
|
|
<depend>geometry_msgs</depend>
|
|
<depend>sensor_msgs</depend>
|
|
<depend>vision_msgs</depend>
|
|
<depend>saltybot_social_msgs</depend>
|
|
<depend>tf2_ros</depend>
|
|
<depend>tf2_geometry_msgs</depend>
|
|
<depend>cv_bridge</depend>
|
|
<test_depend>ament_copyright</test_depend>
|
|
<test_depend>ament_flake8</test_depend>
|
|
<test_depend>ament_pep257</test_depend>
|
|
<test_depend>python3-pytest</test_depend>
|
|
<export>
|
|
<build_type>ament_python</build_type>
|
|
</export>
|
|
</package>
|