/reference/voice-notes

Voice notes

Configure voice transcription for Discord and Telegram bots. Use local Whisper or NVIDIA NIM for speech-to-text.

Voice notes

Send voice messages to free-claude-code via Discord or Telegram. The bot transcribes audio and processes it as a normal text prompt.

How it works

You send a voice message in Discord or Telegram
The bot downloads the audio file
Audio is transcribed using configured Whisper backend
Transcribed text is sent to the LLM as a prompt
Claude responds to the transcribed content

Backend options

Backend	Cost	Speed	Privacy	Requirements
Local Whisper	Free	Medium	Complete offline	CPU or GPU, ~2GB disk
NVIDIA NIM	API credits	Fast	Audio sent to NVIDIA	NVIDIA API key

Local Whisper setup

Local Whisper runs entirely on your machine. No internet required after initial model download.

Install

# Source install
uv sync --extra voice_local

# Package install
uv tool install "free-claude-code[voice_local] @ git+https://github.com/Alishahryar1/free-claude-code.git"

Configure

VOICE_NOTE_ENABLED=true
WHISPER_DEVICE="cpu"        # or "cuda" for GPU acceleration
WHISPER_MODEL="base"        # model size (see table below)

Model sizes

Model	Size	Speed	Quality	VRAM (GPU)
`tiny`	39M	Fastest	Basic	1 GB
`base`	74M	Fast	Good	1 GB
`small`	244M	Medium	Better	2 GB
`medium`	769M	Slower	Great	5 GB
`large-v2`	1550M	Slow	Excellent	10 GB
`large-v3`	1550M	Slow	Excellent	10 GB
`large-v3-turbo`	1550M	Medium	Excellent	10 GB

Recommendation: Start with base for CPU, large-v3-turbo for GPU.

GPU acceleration

For NVIDIA GPUs, use CUDA for 10x+ speedup:

WHISPER_DEVICE="cuda"

Ensure CUDA drivers are installed. The first run downloads PyTorch with CUDA support (~2GB).

Hugging Face token (optional)

For faster model downloads, set a Hugging Face token:

HF_TOKEN="hf_xxxxxxxxxxxxxxxx"

Get a token at huggingface.co/settings/tokens.

NVIDIA NIM setup

NVIDIA NIM provides cloud-based transcription via API.

Install

# Source install
uv sync --extra voice

# Package install
uv tool install "free-claude-code[voice] @ git+https://github.com/Alishahryar1/free-claude-code.git"

Configure

VOICE_NOTE_ENABLED=true
WHISPER_DEVICE="nvidia_nim"
WHISPER_MODEL="openai/whisper-large-v3"
NVIDIA_NIM_API_KEY="nvapi-your-key"

NIM voice models

Model	Quality	Use Case
`openai/whisper-large-v3`	Excellent	General transcription
`nvidia/parakeet-ctc-1.1b-asr`	Excellent	English optimized

Discord voice notes

Discord voice messages are automatically transcribed when VOICE_NOTE_ENABLED=true.

Hold the microphone icon in Discord
Record your message
Release to send
The bot transcribes and responds

Supported formats: Discord’s default .ogg Opus format.

Telegram voice notes

Telegram voice messages work identically:

Hold the microphone in Telegram
Record your message
Release to send
Bot transcribes and responds

Troubleshooting

“Voice transcription failed”: Check voice extras are installed (uv sync --extra voice_local or --extra voice).

Slow transcription on CPU: Local Whisper on CPU is usable but slow. Switch to WHISPER_DEVICE="cuda" if you have an NVIDIA GPU, or use WHISPER_MODEL="tiny" for faster (lower quality) results.

Out of memory errors: The Whisper model is too large for your VRAM. Use a smaller model or CPU inference.

“NVIDIA NIM voice failed”: Verify your NVIDIA_NIM_API_KEY is correct and has voice API access.

Transcription quality poor: Upgrade to a larger model (small → medium → large-v3). Ensure clear audio input. Background noise significantly impacts accuracy.

Privacy considerations

Local Whisper: Audio never leaves your machine. Models downloaded from Hugging Face. Complete privacy.

NVIDIA NIM: Audio sent to NVIDIA’s servers for transcription. Considered if you need the fastest transcription and accept cloud processing.

Discord/Telegram: Voice messages pass through Discord/Telegram servers before reaching your bot. This is inherent to the platform, not free-claude-code.

Disabling voice

To disable voice note processing:

VOICE_NOTE_ENABLED=false

The bot will ignore voice messages when disabled.