/reference/voice-notes
Voice notes
Configure voice transcription for Discord and Telegram bots. Use local Whisper or NVIDIA NIM for speech-to-text.
Voice notes
Send voice messages to free-claude-code via Discord or Telegram. The bot transcribes audio and processes it as a normal text prompt.
How it works
- You send a voice message in Discord or Telegram
- The bot downloads the audio file
- Audio is transcribed using configured Whisper backend
- Transcribed text is sent to the LLM as a prompt
- Claude responds to the transcribed content
Backend options
| Backend | Cost | Speed | Privacy | Requirements |
|---|---|---|---|---|
| Local Whisper | Free | Medium | Complete offline | CPU or GPU, ~2GB disk |
| NVIDIA NIM | API credits | Fast | Audio sent to NVIDIA | NVIDIA API key |
Local Whisper setup
Local Whisper runs entirely on your machine. No internet required after initial model download.
Install
# Source install
uv sync --extra voice_local
# Package install
uv tool install "free-claude-code[voice_local] @ git+https://github.com/Alishahryar1/free-claude-code.git"
Configure
VOICE_NOTE_ENABLED=true
WHISPER_DEVICE="cpu" # or "cuda" for GPU acceleration
WHISPER_MODEL="base" # model size (see table below)
Model sizes
| Model | Size | Speed | Quality | VRAM (GPU) |
|---|---|---|---|---|
tiny | 39M | Fastest | Basic | 1 GB |
base | 74M | Fast | Good | 1 GB |
small | 244M | Medium | Better | 2 GB |
medium | 769M | Slower | Great | 5 GB |
large-v2 | 1550M | Slow | Excellent | 10 GB |
large-v3 | 1550M | Slow | Excellent | 10 GB |
large-v3-turbo | 1550M | Medium | Excellent | 10 GB |
Recommendation: Start with base for CPU, large-v3-turbo for GPU.
GPU acceleration
For NVIDIA GPUs, use CUDA for 10x+ speedup:
WHISPER_DEVICE="cuda"
Ensure CUDA drivers are installed. The first run downloads PyTorch with CUDA support (~2GB).
Hugging Face token (optional)
For faster model downloads, set a Hugging Face token:
HF_TOKEN="hf_xxxxxxxxxxxxxxxx"
Get a token at huggingface.co/settings/tokens.
NVIDIA NIM setup
NVIDIA NIM provides cloud-based transcription via API.
Install
# Source install
uv sync --extra voice
# Package install
uv tool install "free-claude-code[voice] @ git+https://github.com/Alishahryar1/free-claude-code.git"
Configure
VOICE_NOTE_ENABLED=true
WHISPER_DEVICE="nvidia_nim"
WHISPER_MODEL="openai/whisper-large-v3"
NVIDIA_NIM_API_KEY="nvapi-your-key"
NIM voice models
| Model | Quality | Use Case |
|---|---|---|
openai/whisper-large-v3 | Excellent | General transcription |
nvidia/parakeet-ctc-1.1b-asr | Excellent | English optimized |
Discord voice notes
Discord voice messages are automatically transcribed when VOICE_NOTE_ENABLED=true.
- Hold the microphone icon in Discord
- Record your message
- Release to send
- The bot transcribes and responds
Supported formats: Discord’s default .ogg Opus format.
Telegram voice notes
Telegram voice messages work identically:
- Hold the microphone in Telegram
- Record your message
- Release to send
- Bot transcribes and responds
Troubleshooting
“Voice transcription failed”: Check voice extras are installed (uv sync --extra voice_local or --extra voice).
Slow transcription on CPU: Local Whisper on CPU is usable but slow. Switch to WHISPER_DEVICE="cuda" if you have an NVIDIA GPU, or use WHISPER_MODEL="tiny" for faster (lower quality) results.
Out of memory errors: The Whisper model is too large for your VRAM. Use a smaller model or CPU inference.
“NVIDIA NIM voice failed”: Verify your NVIDIA_NIM_API_KEY is correct and has voice API access.
Transcription quality poor: Upgrade to a larger model (small → medium → large-v3). Ensure clear audio input. Background noise significantly impacts accuracy.
Privacy considerations
Local Whisper: Audio never leaves your machine. Models downloaded from Hugging Face. Complete privacy.
NVIDIA NIM: Audio sent to NVIDIA’s servers for transcription. Considered if you need the fastest transcription and accept cloud processing.
Discord/Telegram: Voice messages pass through Discord/Telegram servers before reaching your bot. This is inherent to the platform, not free-claude-code.
Disabling voice
To disable voice note processing:
VOICE_NOTE_ENABLED=false
The bot will ignore voice messages when disabled.