Effortlessly convert your speech to text directly within Raycast using the power of whisper.cpp. This extension provides a simple interface to record audio, transcribe and refine it locally, privately on your machine. Refine the text with custom prompts privately using ollama, or additionally with Raycast AI or any v1 (OpenAI) compatible API.
whisper.cpp running locally on your machine through Raycast.Before installing the extension, you need the following installed and configured on your system:
whisper.cpp: You must install whisper-cpp.
brew install whisper-cppDownload Whisper Model extension command. This will configure the model's path automatically.ggml-{model}.bin) and point the extension to it's path in preferences.sox: This extension uses the SoX (Sound eXchange) utility for audio recording.
brew install sox
*The extension currently default for sox to be at /opt/homebrew/bin/sox. If yours is installed somewhere else, point the extension to it's executable in preferences.This ectension is now available to download from the Raycast Store. However if you'd prefer to build from source see below
After installing, you have to configure the extension preferences in Raycast, if you installed both SoX and whisper-cpp using homebrew, and download a model using the extension this should all be pre-configured for you, the extension will also confirm both SoX and whisper-cli path on first launch which will allow you to immediately start using simple dictation once configured:
⌘ + ,).Extensions.whisper.cpp executable (e.g., /path/to/your/whisper.cpp/build/bin/whisper-cli)./usr/local/bin/whisper-cpp.bin model file (e.g., /path/to/your/whisper.cpp/models/ggml-base.en.bin)./usr/local/bin/soxPaste Text: Pastes the text into the active application.Copy to Clipboard: Copies the text to the clipboard.None (Show Options): Shows the transcribed text in Raycast with manual Paste/Copy actions (Default).Configure AI Refinement by default.Download Whisper Model command and choose the model you would like to download with Enter.Enter if you have multiple models downloaded.Ctrl+XDictate Text command. The extension window will appear, showing a "RECORDING AUDIO..." message and a waveform animation. Start speaking clearly.
Enter when you are finished speaking.⌘ + . or click "Cancel Recording" to abort.whisper.cpp processes the audio. This may take a few seconds depending on the audio length and model size.Paste Text: Pastes the content.Copy Text (⌘ + Enter): Copies the content.PreferencesClose (Esc): Closes the Raycast window.Dictation History anytime you need a past transcription. It currently stores up to 100.
Ctrl+XCtrl+Shift+X for a fresh start.Automate the formatting/style of your transcriptions by refining them using AI. This feature can reformat text, correct grammar, or apply custom instructions based on your needs.
How it Works:
Configure AI Refinement and press Enter.http://localhost:11434, https://api.openai.com).llama3.2:latest, gpt-4o-latest). If using Ollama, make sure this model is pulled and available: ollama ls.Configure AI Refinement command to manage how the AI refines your text.Using the Configure AI Refinement Command:
This command allows you to customize the instructions given to the AI:
Ctrl-E).Ctrl + X).When AI refinement is enabled, after the initial transcription, the text will be sent to your chosen AI along with the active prompt. The refined text will then be handled the same as regularly transcribed text, and stored in your dictation history.
The extension downloader currently supports the following whisper models, however you can download any model you might need from ggervanov/whisper.cpp and configure it's path in the extension's preferences:
tiny.en, 78 MB) - Smallest, speediest, least accurate however optimised for english languagebase.en, 148 MB) - Small and speedy, same size as base but more accurate if just transcribing in englishsmall.en, 488MB) - Optimised for english, slightly larger and more accurate than base while not consuming too many resourcesmedium.en, 1.53 GB - Slightly larger again, optimised for english, transcriptions will be slower than above and consume more resources, but will be more accuratetiny, 78 MB) - Smallest, speediest, least accuratebase, 148 MB) - Small, speedy and multilingualsmall, 488 MB) - Still pretty speedy and multilingualmedium, 1.53 GB) - Slower, more accurate and multilingual.large-v3, 3.1 GB) - The largest, slowest, most accurate model available. Use only if you have a powerful computer or a lot o time on your hands, especially for longer transcriptions.large-v3-turbo, 1.62GB) - Based on the large model but much faster at the cost of accuracy. Has a chance to begin repeating itself on longer transcriptions.sox is installed correctly (brew install sox)./opt/homebrew/bin/sox is correct for your installation. If not, you may need to edit dictate.tsx or create a symlink.sox manually) has microphone permissions in System Settings > Privacy & Security > Microphone..bin file.main binary).whisper.cpp.Developer Tools > Show Extension Logs).npm install and npm run build in the extension directory.ollama ls to check installed models with ollama, or find external API models in their documentationThis project is licensed under the MIT License - see the LICENSE file for details (or state MIT directly if no file exists).
whisper.cpp project.