Configuration
The Aloud TTS plugin offers a variety of settings to customize your text-to-speech experience. You can access them by navigating to Settings → Aloud Text To Speech.
General Settings
Section titled “General Settings”These settings control the core behavior of the plugin.
- TTS Provider: Choose the text-to-speech service you want to use. Each provider offers different voices and pricing. Supported providers include OpenAI, Google Gemini, ElevenLabs, Inworld, many other TTS providers, and any OpenAI-compatible API.
- Playback Speed: Adjust the default playback speed. The default is
1.0x. This can also be adjusted from the player UI. - Audio Folder: The directory in your vault where exported audio files are saved. The default is
aloud/.
Cache Settings
Section titled “Cache Settings”These settings control the audio cache behavior.
- Cache Storage:
Device: (Default) Stores audio on your local device’s storage (IndexedDB). The cache is not synced across devices.Vault: Stores audio in a.ttsfolder inside your vault. This allows the cache to be synced, but increases your vault’s size.
- Cache Duration: How long audio files are kept in the cache before being automatically deleted. The default is 7 days.
Provider Settings
Section titled “Provider Settings”Each TTS provider has its own specific settings. You only need to configure the provider you have selected.
OpenAI
Section titled “OpenAI”- API Key: Your API key from OpenAI.
- Model: The TTS model to use (e.g.,
tts-1,tts-1-hd,gpt-4o-mini-tts). - Voice: The voice to use for playback.
Google Gemini
Section titled “Google Gemini”- API Key: Your API key for the Gemini API.
- Model: The Gemini model to use.
- Voice: The voice to use for playback.
Hume AI
Section titled “Hume AI”- API Key: Your API key from Hume AI.
- Voice: The Hume AI voice to use.
Inworld
Section titled “Inworld”- API Key: Your Inworld API key (Basic Auth base64 format) from Inworld.
- Model: The Inworld TTS model to use:
Standard(inworld-tts-1)Max Quality(inworld-tts-1-max)
- Voice: The Inworld voice to use. The voice list is loaded from the API after entering a valid API key.
OpenAI-Compatible API
Section titled “OpenAI-Compatible API”For users who self-host a TTS service or use a third-party provider with an OpenAI-compatible API.
- API Key: The API key for your service.
- API Base URL: The URL of your API endpoint (e.g.,
http://localhost:8020/v1). - Model: The name of the model your service uses.
- Voice: The name of the voice to use.
ElevenLabs
Section titled “ElevenLabs”- API Key: Your API key from ElevenLabs.
- Model: One of the supported ElevenLabs TTS models (e.g.,
eleven_multilingual_v2,eleven_flash_v2.5). - Voice: The ElevenLabs voice ID to use.
- Stability: Controls how stable/consistent the voice is (0–1).
- Similarity Boost: Controls how closely the output matches the base voice (0–1).
- Context Mode: Optionally include previous sentences for continuity.
Azure Speech Services
Section titled “Azure Speech Services”- API Key: Your Azure Speech Services API key.
- Region: Your Azure Speech resource region (e.g.,
eastus). - Voice: The Azure voice to use (e.g.,
en-US-JennyNeural). - Output Format: Select the desired audio format (e.g., MP3/WAV variants).
MiniMax
Section titled “MiniMax”- Use China Mainland Endpoint: Toggle this if you have an API key from the China mainland platform (platform.minimaxi.com). Leave disabled for the international platform (platform.minimax.io). API keys from one platform are not compatible with the other.
- API Key: Your API key from MiniMax.
- GroupId: Your MiniMax GroupId. This is required and is appended to requests as a
?GroupId=query parameter. - Model: One of:
speech-2.6-hd,speech-2.6-turbo,speech-02-hd,speech-02-turbo,speech-01-hd,speech-01-turbo. - Voice: A supported voice id
Notes:
- MiniMax operates two separate platforms with incompatible API keys:
- International (default): platform.minimax.io — higher minimum recharge (~$25)
- China Mainland: platform.minimaxi.com — lower entry cost
- MiniMax integration currently uses non-streaming synthesis and returns
mp3decoded fromhex. - Advanced options like
language_boost,timbre_weights, or streaming are not yet exposed in settings.
AWS Polly
Section titled “AWS Polly”- AWS Access Key ID: Your IAM access key with
polly:SynthesizeSpeechandpolly:DescribeVoicespermissions. - AWS Secret Access Key: The corresponding secret key.
- Region: The AWS region to use (e.g.,
us-east-1). - Voice: The Polly voice to use (e.g.,
Joanna). - Engine: Select
neural(recommended where available) orstandard.
Notes:
- Keys are stored locally inside Obsidian’s plugin data. For best security, consider using a limited-permission IAM user.
- The voice list is loaded from Polly when credentials and region are set.