seb/saltylab-firmware

Fork 0

History

sl-jetson 86c60f48e6

social-bot integration tests / Lint (flake8 + pep257) (push) Failing after 19s

Details

social-bot integration tests / Lint (flake8 + pep257) (pull_request) Failing after 17s

Details

social-bot integration tests / Core integration tests (mock sensors, no GPU) (push) Has been skipped

Details

social-bot integration tests / Core integration tests (mock sensors, no GPU) (pull_request) Has been skipped

Details

social-bot integration tests / Latency profiling (GPU, Orin) (push) Has been cancelled

Details

social-bot integration tests / Latency profiling (GPU, Orin) (pull_request) Has been cancelled

Details

feat: First Encounter social interaction launch (Issue #400 )

Add encounter.launch.py orchestrating all First Encounter nodes:
- encounter_sync_service (offline queue backend)
- social_enrollment_node (face/voice enrollment)
- first_encounter_node (interaction orchestrator)
- wake_word_node (speech detection)
- face_display_bridge_node (UI frontend)

Include in full_stack.launch.py at t=9s with enable_encounter flag.
Add encounter_params.yaml with configurable greeting, TTS voice,
enrollment thresholds, database paths, and cloud sync settings.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

2026-03-04 13:13:22 -05:00

hey_salty_synthetic.npy

feat: First Encounter social interaction launch (Issue #400 )

2026-03-04 13:13:22 -05:00

hey_salty.npy

feat: hey salty wake word template (Issue #393 )

2026-03-04 12:44:36 -05:00

README.md

feat: hey salty wake word template (Issue #393 )

2026-03-04 12:44:36 -05:00

README.md

SaltyBot Wake Word Models

Current Model: hey_salty.npy

Issue #393 — Custom OpenWakeWord model for "hey salty" wake phrase detection.

Model Details

File: hey_salty.npy
Type: Log-mel spectrogram template (numpy array)
Shape: (40, 61) — 40 mel bands, ~61 time frames
Generation Method: Synthetic speech using sine-wave approximation
Integration: Used by wake_word_node.py via cosine similarity matching

How It Works

The wake_word_node subscribes to raw PCM-16 audio at 16 kHz mono and:

Maintains a sliding window of the last 1.5 seconds of audio
Extracts log-mel spectrogram features every 100 ms
Compares the log-mel features to this template via cosine similarity
Fires a detection event (/saltybot/wake_word_detected → True) when:
- Energy gate: RMS amplitude > threshold (default 0.02)
- Match gate: Cosine similarity > threshold (default 0.82)
Applies cooldown (default 2.0 s) to prevent rapid re-fires

Configuration (wake_word_params.yaml)

template_path: "jetson/ros2_ws/src/saltybot_social/models/hey_salty.npy"
energy_threshold: 0.02      # RMS gate
match_threshold: 0.82       # cosine-similarity threshold
cooldown_s: 2.0             # minimum gap between detections (s)

Adjust match_threshold to control sensitivity:

Lower (e.g., 0.75) → more sensitive, higher false-positive rate
Higher (e.g., 0.90) → less sensitive, more robust to noise

Retraining with Real Recordings (Future)

To improve accuracy, follow these steps on a development machine:

1. Collect Training Data

Record 10–20 natural utterances of "hey salty" in varied conditions:

Different speakers (male, female, child)
Different background noise (quiet room, kitchen, outdoor)
Different distances from microphone

# Using arecord (ALSA) on Jetson or Linux:
for i in {1..20}; do
  echo "Recording sample $i. Say 'hey salty'..."
  arecord -r 16000 -f S16_LE -c 1 "hey_salty_${i}.wav"
done

2. Extract Templates from Training Data

Use the same DSP pipeline as wake_word_node.py:

import numpy as np
from wake_word_node import compute_log_mel

samples = []
for wav_file in glob("hey_salty_*.wav"):
    sr, data = scipy.io.wavfile.read(wav_file)
    # Resample to 16kHz if needed
    float_data = data / 32768.0  # convert PCM-16 to [-1, 1]
    log_mel = compute_log_mel(float_data, sr=16000, n_fft=512, n_mels=40)
    samples.append(log_mel)

# Pad to same length, average
max_len = max(m.shape[1] for m in samples)
padded = [np.pad(m, ((0, 0), (0, max_len - m.shape[1])), mode='edge')
          for m in samples]
template = np.mean(padded, axis=0).astype(np.float32)
np.save("hey_salty.npy", template)

3. Test and Tune

Replace the current template with your new one
Test with wake_word_node in real environment
Adjust match_threshold in wake_word_params.yaml to find the sweet spot
Collect false-positive and false-negative cases; add them to training set
Retrain

4. Version Control

Once satisfied, replace models/hey_salty.npy and commit:

git add jetson/ros2_ws/src/saltybot_social/models/hey_salty.npy
git commit -m "refactor: hey salty template with real training data (v2)"

Files

generate_wake_word_template.py — Script to synthesize and generate template
hey_salty.npy — Current template (generated from synthetic speech)
README.md — This file

References

wake_word_node.py — Wake word detection node (cosine similarity, energy gating)
wake_word_params.yaml — Detection parameters
test_wake_word.py — Unit tests for DSP pipeline

Future Improvements

Collect real user recordings
Fine-tune with multiple speakers/environments
Evaluate false-positive rate
Consider speaker-adaptive templates (per user)
Explore end-to-end learned models (TinyWakeWord, etc.)

README.md Unescape Escape

SaltyBot Wake Word Models

Current Model: hey_salty.npy

Model Details

How It Works

Configuration (wake_word_params.yaml)

Retraining with Real Recordings (Future)

1. Collect Training Data

2. Extract Templates from Training Data

3. Test and Tune

4. Version Control

Files

References

Future Improvements

README.md