Set up speech services with Speaches
Speaches is an OpenAI-compatible speech server for speech-to-text (STT) and text-to-speech (TTS). With pre-loaded models, you can use it right out of the box, or easily integrate it as a drop-in backend for any app supporting the OpenAI SDK.
This guide walks you through installing and using Speaches on Olares, including speech-to-text, text-to-speech, Audio Chat, API access, and basic model management.
Learning objectives
In this guide, you will learn how to:
- Install Speaches on Olares.
- Transcribe or translate audio files using speech-to-text.
- Generate speech from text using text-to-speech.
- Have voice conversations with an AI model using Audio Chat.
- Access the Speaches API from other apps.
- Manage speech models.
Prerequisites
- Olares is running on a device with an NVIDIA GPU.
- Ollama installed and running with at least one chat model downloaded (required for Audio Chat only).
Install Speaches
Open Market and search for "Speaches".

Click Get, then Install, and wait for installation to complete.
After installation, you will see two icons on Launchpad:
- Speaches: The main interface for speech-to-text, text-to-speech, and audio chat.
- Speaches Terminal: A command-line terminal for managing models.
Model setup on first launch
When you open Speaches for the first time, it downloads and initializes its built-in models. Depending on your network connection, this process may take some time.
If initialization does not finish within 30 minutes, it may time out and be canceled automatically. If this happens, wait until your network connection is stable, then open Speaches again to retry initialization.
Use Speaches
Speaches ships with two models ready to use out of the box:
| Model | Type | Purpose |
|---|---|---|
Systran/faster-whisper-small | STT | Speech recognition and translation |
speaches-ai/Kokoro-82M-v1.0-ONNX | TTS | Speech synthesis |
Transcribe audio
Open Speaches and click the Speech-to-Text tab.
Under Model, select a STT model, such as
Systran/faster-whisper-small.Under Task, select transcribe.
Upload an audio file or click mic to record audio from your microphone.
(Optional) Enable Stream if you want to receive partial results while transcription is still in progress.
Click Generate.

The transcription appears in Textbox after processing completes.
Translate audio to English
Speaches can automatically detect the language of the audio and translate it into English.
- Open Speaches and click the Speech-to-Text tab.
- Under Model, select a STT model, such as
Systran/faster-whisper-small. - Under Task, select translate.
- Upload an audio file or click mic to record audio from your microphone.
- (Optional) Enable Stream if you want to receive partial results while translation is still in progress.
- Click Generate.

The English translation appears in Textbox after processing completes.
Generate speech from text
Open Speaches and click the Text-to-Speech tab.
Enter the text you want to convert in Input Text.
Under Model, select a TTS model.
Select a voice from Voice.
Under Response Format, select an output format.
Click Generate Speech.

Play the generated audio and download it if needed.
Chat with AI using voice
Use Audio Chat to talk to an AI model with voice, text, or an audio file. Speaches first converts your voice to text, sends the text to the chat model, and can convert the reply back to speech.
INFO
- Audio Chat requires Ollama to be installed, with at least one chat model downloaded.
- Audio playback is currently available for English replies only. For other languages, the reply is shown as text only.
Start a voice conversation
Open Speaches and click the Audio Chat tab.
Under Chat Model, select an Ollama model, such as
qwen2.5:7b.Send a message using one of these methods:
- Audio file: Upload an audio file.
- Text: Type your message in the input field next to the microphone icon and send it.
- Voice: Click mic to record your message, then click send to send it.

Wait for Speaches to generate the reply.
WARNING
The full voice pipeline (STT, LLM, TTS) takes time to complete. Do not refresh the page while a reply is being generated, as you might see UI flickering during processing.
Optional: Improve transcription accuracy for Audio Chat
Audio Chat uses the pre-installed Systran/faster-whisper-small speech-to-text model by default. For better transcription accuracy, you can switch to a larger model such as Systran/faster-whisper-large-v3.
More GPU resources may be required
Larger models require more GPU resources. If generation tasks start failing after switching to a larger model, see Why do tasks fail after switching to a larger model.
Open Speaches Terminal and download the model:
bashhf download Systran/faster-whisper-large-v3If you see a warning about
HF_TOKEN, you can ignore it. The model download can still continue without this setting.Go to Settings > Applications > Speaches > Manage environment variables.
Click edit_square next to
SPEACHES_WHISPER_MODEL.Set the value as the model you downloaded, for example,
Systran/faster-whisper-large-v3, then click Confirm.
Click Apply to save the changes.
Speaches restarts automatically to apply the change.
Wait for service initialization
After the app shows as running again, wait a little longer before using it, as the service may still be initializing.
Manage models
Manage models when you want to use a different model, improve quality, or free up storage space.
Check downloaded models
To see all downloaded models, open Speaches Terminal and run:
hf cache listDownload a new model
Open Speaches Terminal and run:
bashhf download <model-name>For example:
bash# Download a larger Whisper model for higher accuracy hf download Systran/faster-whisper-medium # Highest accuracy Whisper model, requires more memory hf download Systran/faster-whisper-large-v3Shared model storage
Models are downloaded to Olares Files, at
/Home/Huggingface/speaches/. If other apps on your Olares also use Hugging Face models, they share this directory.Refresh the Speaches page to load the new model into the list.
Remove a model
To free up storage space, you can remove models you no longer need:
- Open Speaches Terminal and run:
hf cache rm model/<model_name>For example:
hf cache rm model/Systran/faster-whisper-medium- Refresh the Speaches page to update the model list.
Switch to CPU mode
Speaches uses GPU mode by default. If needed, you can switch it to CPU mode instead. CPU mode is slower and is mainly suitable for small tasks.
To switch to CPU mode:
Go to Settings > Applications > Speaches > Manage environment variables.
Click edit_square next to
SPEACHES_GPU, change its value tofalse, then click Confirm.
Click Apply to save the changes.
Speaches automatically redeploys in CPU mode. Processing will be slower compared to GPU mode.
FAQs
Why does Audio Chat show an error?
Audio Chat requires Ollama to be running with at least one chat model downloaded. If Ollama is not installed or has no models available, Audio Chat displays an error.
To fix this issue, install Ollama and download a chat model by following the Ollama guide. Speaches detects Ollama automatically, so you do not need to restart Speaches.
Why do tasks fail after switching to a larger model?
This issue usually happens when the GPU is in Memory slicing mode.
Larger models require more VRAM. If Speaches is assigned only a small amount of VRAM, generation tasks may fail after you switch to a larger model.
To fix this issue:
- Increase the VRAM assigned to Speaches in Memory slicing mode.
- Or switch the GPU to another mode.
For detailed instructions, see Manage GPU resources.
Can I use a different Ollama instance for Audio Chat?
Yes. Update the CHAT_COMPLETION_BASE_URL in the deployment configuration:
Open Control Hub and navigate to Browse > System > speachesserver-shared > Deployments > speaches.
Click edit_square to edit the YAML file.

In Edit YAML, find
CHAT_COMPLETION_BASE_URL, and update its value to your Ollama endpoint. Make sure the URL ends with/v1.
Go to Settings > Applications > Speaches, click Stop, then click Resume to restart Speaches.
Learn more
- Speaches official documentation: Full API reference and model compatibility.
- Ollama: Download and run local AI models.
- Open WebUI: Chat interface that can use Speaches as a speech backend.