/reference/voice-notes

Voice notes

Configure voice transcription for Discord and Telegram bots. Use local Whisper or NVIDIA NIM for speech-to-text.

Voice notes

Send voice messages to free-claude-code via Discord or Telegram. The bot transcribes audio and processes it as a normal text prompt.

How it works

  1. You send a voice message in Discord or Telegram
  2. The bot downloads the audio file
  3. Audio is transcribed using configured Whisper backend
  4. Transcribed text is sent to the LLM as a prompt
  5. Claude responds to the transcribed content

Backend options

BackendCostSpeedPrivacyRequirements
Local WhisperFreeMediumComplete offlineCPU or GPU, ~2GB disk
NVIDIA NIMAPI creditsFastAudio sent to NVIDIANVIDIA API key

Local Whisper setup

Local Whisper runs entirely on your machine. No internet required after initial model download.

Install

# Source install
uv sync --extra voice_local

# Package install
uv tool install "free-claude-code[voice_local] @ git+https://github.com/Alishahryar1/free-claude-code.git"

Configure

VOICE_NOTE_ENABLED=true
WHISPER_DEVICE="cpu"        # or "cuda" for GPU acceleration
WHISPER_MODEL="base"        # model size (see table below)

Model sizes

ModelSizeSpeedQualityVRAM (GPU)
tiny39MFastestBasic1 GB
base74MFastGood1 GB
small244MMediumBetter2 GB
medium769MSlowerGreat5 GB
large-v21550MSlowExcellent10 GB
large-v31550MSlowExcellent10 GB
large-v3-turbo1550MMediumExcellent10 GB

Recommendation: Start with base for CPU, large-v3-turbo for GPU.

GPU acceleration

For NVIDIA GPUs, use CUDA for 10x+ speedup:

WHISPER_DEVICE="cuda"

Ensure CUDA drivers are installed. The first run downloads PyTorch with CUDA support (~2GB).

Hugging Face token (optional)

For faster model downloads, set a Hugging Face token:

HF_TOKEN="hf_xxxxxxxxxxxxxxxx"

Get a token at huggingface.co/settings/tokens.

NVIDIA NIM setup

NVIDIA NIM provides cloud-based transcription via API.

Install

# Source install
uv sync --extra voice

# Package install
uv tool install "free-claude-code[voice] @ git+https://github.com/Alishahryar1/free-claude-code.git"

Configure

VOICE_NOTE_ENABLED=true
WHISPER_DEVICE="nvidia_nim"
WHISPER_MODEL="openai/whisper-large-v3"
NVIDIA_NIM_API_KEY="nvapi-your-key"

NIM voice models

ModelQualityUse Case
openai/whisper-large-v3ExcellentGeneral transcription
nvidia/parakeet-ctc-1.1b-asrExcellentEnglish optimized

Discord voice notes

Discord voice messages are automatically transcribed when VOICE_NOTE_ENABLED=true.

  1. Hold the microphone icon in Discord
  2. Record your message
  3. Release to send
  4. The bot transcribes and responds

Supported formats: Discord’s default .ogg Opus format.

Telegram voice notes

Telegram voice messages work identically:

  1. Hold the microphone in Telegram
  2. Record your message
  3. Release to send
  4. Bot transcribes and responds

Troubleshooting

“Voice transcription failed”: Check voice extras are installed (uv sync --extra voice_local or --extra voice).

Slow transcription on CPU: Local Whisper on CPU is usable but slow. Switch to WHISPER_DEVICE="cuda" if you have an NVIDIA GPU, or use WHISPER_MODEL="tiny" for faster (lower quality) results.

Out of memory errors: The Whisper model is too large for your VRAM. Use a smaller model or CPU inference.

“NVIDIA NIM voice failed”: Verify your NVIDIA_NIM_API_KEY is correct and has voice API access.

Transcription quality poor: Upgrade to a larger model (smallmediumlarge-v3). Ensure clear audio input. Background noise significantly impacts accuracy.

Privacy considerations

Local Whisper: Audio never leaves your machine. Models downloaded from Hugging Face. Complete privacy.

NVIDIA NIM: Audio sent to NVIDIA’s servers for transcription. Considered if you need the fastest transcription and accept cloud processing.

Discord/Telegram: Voice messages pass through Discord/Telegram servers before reaching your bot. This is inherent to the platform, not free-claude-code.

Disabling voice

To disable voice note processing:

VOICE_NOTE_ENABLED=false

The bot will ignore voice messages when disabled.