Press, speak, paste.
Dictation that never leaves your device.
On-device AI dictation for Apple Silicon Macs (M1 or newer, macOS 14+) — Windows on the roadmap. No cloud round-trip, no app account, no waiting. Hold a key, speak, and the cleaned-up text lands on your clipboard ready to paste.
Hotkey ⌘⌥. — push to talk
On-device.
Whisper or Parakeet for transcription, Apple Intelligence or Gemma 4 for cleanup. Everything runs on your device.
No account needed.
Nothing to sign up for to use the app. Commercial licenses do require a billing email at Stripe.
No dictation telemetry.
Audio, transcripts and crash reports never leave your machine. See our Privacy page for the full network picture.
Works on a plane.
After the one-time model download on first run, Vox runs without internet. Verifiable with any network monitor.
Estimated time saved
Speaking is roughly 3× faster than typing.
Stanford measured 153 WPM speaking vs 52 WPM typing. For one person typing ~3,000 words a day at work, the gap is about 40 minutes a day. Slide your hourly rate to see what those minutes could be worth. Figures are illustrative; your savings will vary.
Default $75 — close to the BLS average for US software developers. Move it to whatever your time is actually worth.
Estimated savings · per year
147 hr
$11,000
≈ 3.7 full work-weeks of typing you don't have to do — about $917/mo at $75/hr.
Where these numbers come from →
- Speaking ≈ 3× faster than typing — Ruan et al., Stanford HCI (2016): 153 WPM speech vs 52 WPM typing in English; 123 vs 43 WPM in Mandarin.
- Conversational speaking sits at ~150 WPM — National Center for Voice and Speech via VirtualSpeech.
- 50 WPM typing — adult average is 40 WPM; office workers typically target 60 WPM. 50 is a defensible midpoint (Wonderlic; TypingSpeedHub 2025).
- 3,000 words/day — knowledge workers send ~40 emails/day at ~75 words each, plus Slack and AI chats; ~3,000 words is a defensible midpoint (cloudHQ; Boomerang via EmailAnalytics).
- $75/hr default — the BLS mean wage for US software developers (May 2024) is $66.78/hr ($138,890/yr). $75 nudges that up slightly to reflect the startup premium most readers will recognize.
- We assume 220 workdays per year (US standard, excluding weekends and ~10 holidays). The math doesn't count time spent reading or thinking — only the keystroke-vs-utterance gap on text you compose.
Free for personal use
Free for you. Paid for your company.
Vox is fair-source: free for personal use — your own writing, side projects, hobby work — under the perpetual Personal Use license in the EULA. If you (or anyone on your team) use Vox as part of your job at a company with more than one person, each user needs a commercial license. Pricing starts at $12 USD/seat/mo. See plans & pricing →
How it works
Three keys. No setup.
No account, no model selection, no permissions tour. Vox ships with sensible defaults so the first dictation works.
Hold the hotkey
Default ⌘⌥. on Mac (Ctrl+Alt+. on Windows). Vox shows a small listening pill near your cursor.
Speak normally
Filler words and self-corrections are fine — Vox cleans them up. Don't worry about punctuation.
Release. Paste.
Cleaned-up text lands on your clipboard. Press ⌘V where you want it. No silent keystroke synthesis.
Voice modes
One hotkey, the right voice.
Vox picks a mode based on the app you're typing into. Each mode is a tuned cleanup engine — same dictation, different output style.
General
Anywhere
Balanced cleanup. Drops fillers, fixes self-corrections, enumerates lists.
You say
“uh so like the meeting is um at three pm tomorrow”
Vox writes
The meeting is at 3 PM tomorrow.
Mail · Gmail · Outlook
Formal, fully punctuated email body. Never invents a salutation or sign-off.
You say
“hey just wanted to follow up on the proposal um can we sync next week”
Vox writes
Just wanted to follow up on the proposal. Can we sync next week?
Chat
Slack · Discord · iMessage
Casual and short. Fragments OK, contractions preserved, ruthless trim.
You say
“yeah I think that works for me um lets just do it on tuesday then”
Vox writes
yeah works for me — let's do tuesday
Code Comment
Xcode · GitHub · VS Code
Present-tense third-person. Preserves identifiers verbatim. No markdown synthesis.
You say
“so this method invalidates the cache when the user updates their profile”
Vox writes
Invalidates the cache when the user updates their profile.
Notes
Apple Notes · Notion · Obsidian
Full sentences. Bullets on enumeration, paragraph breaks at topic shifts.
You say
“things to do today buy groceries pick up dry cleaning email Sara also need to book the flight”
Vox writes
Things to do today: • Buy groceries • Pick up dry cleaning • Email Sara • Book the flight
Roll your own.
Custom modes with your own system prompt, post-processing rules, and per-app auto-trigger. Useful for stand-up updates, PR reviews, or your own tone.
Vox vs cloud dictation
The architecture is the differentiator.
Most dictation tools send your audio to a server. Vox doesn't. The differences cascade from there.
- Where audio is processedVoxOn your deviceCloudTypically on their servers
- Internet required at runtimeVoxNo (after first-run model download)CloudTypically yes
- App account requiredVoxNoCloudTypically yes
- Audio retained after transcriptionVoxNever written to diskCloudDepends on the vendor
- Telemetry during dictationVoxNoneCloudVaries by vendor
- Works on a planeVoxYesCloudTypically no
- Network inspectionVoxIndependently verifiable with Little Snitch or GlassWireCloudServer-side, not user-verifiable
We don't name competitors here on purpose — the architecture is the comparison, not the brand.
FAQ
Questions, answered.
Does Vox work offline?
Yes. Audio is transcribed by a local model — Whisper or the faster NVIDIA Parakeet model — running on your Mac's Neural Engine, Metal GPU, or a Windows NVIDIA GPU (with CPU fallback). Cleanup runs on Apple Intelligence (macOS 26+) or a local Gemma 4 model. No part of the pipeline requires the internet at runtime.What gets sent to a server?
Nothing during dictation. Vox does not collect audio, transcripts, telemetry, crash reports, or analytics. The only network calls Vox makes are one-time model downloads on first run, and an optional update check.What hardware does Vox need?
Today: any Apple Silicon Mac (M1 or newer — late 2020 onwards) running macOS 14 or newer. Intel Macs are not supported. A Windows build is on the roadmap — when released, it will target Windows 11 with an optional NVIDIA GPU for faster Parakeet transcription (CPU fallback supported). Cleanup with Gemma 4 will run on either platform; Apple Intelligence cleanup stays macOS-only.Do I need to set up a hotkey?
Vox ships with ⌘⌥. on Mac (Ctrl+Alt+. on Windows) bound by default — works without macOS Accessibility permission. You can rebind in Settings.Will it auto-paste?
No. Vox copies the cleaned-up text to your clipboard. You press ⌘V where you want it. This is deliberate: silent keystroke synthesis is fragile, requires Accessibility permission, and breaks confirmations in apps that intercept paste.Is Vox open source?
The dictation pipeline ships with the desktop app; auditing what does and doesn't leave your machine is something you can verify with any network monitor. We're considering open-sourcing the modes engine — let us know if it would matter to you.Can I use Vox at work?
Vox is free for personal use — your own writing, side projects, hobby work. If you use Vox as part of your job at a company with more than one person, you need a commercial license. See Vox for Teams.What about other languages?
Vox supports Whisper (multilingual) and Parakeet (English, with multilingual variants on the roadmap). The cleanup engine is currently tuned for English — other languages will produce competent transcripts with light cleanup until per-language tuning ships.