Press, speak, paste.

Dictation that never leaves your device.

On-device AI dictation for Apple Silicon Macs (M1 or newer, macOS 14+) — Windows on the roadmap. No cloud round-trip, no app account, no waiting. Hold a key, speak, and the cleaned-up text lands on your clipboard ready to paste.

Download for Apple Silicon Mac

Apple Silicon (M1 or newer) · macOS 14+Windows · On the roadmapFree for personal useNo account. No cloud. No tracking.

Hold ⌘⌥. to startON-DEVICE

Hotkey ⌘⌥. — push to talk

On-device.

Whisper or Parakeet for transcription, Apple Intelligence or Gemma 4 for cleanup. Everything runs on your device.

No account needed.

Nothing to sign up for to use the app. Commercial licenses do require a billing email at Stripe.

No dictation telemetry.

Audio, transcripts and crash reports never leave your machine. See our Privacy page for the full network picture.

Works on a plane.

After the one-time model download on first run, Vox runs without internet. Verifiable with any network monitor.

Estimated time saved

Speaking is roughly 3× faster than typing.

Stanford measured 153 WPM speaking vs 52 WPM typing. For one person typing ~3,000 words a day at work, the gap is about 40 minutes a day. Slide your hourly rate to see what those minutes could be worth. Figures are illustrative; your savings will vary.

What your hour is worth$75/hr

Default $75 — close to the BLS average for US software developers. Move it to whatever your time is actually worth.

Estimated savings · per year

147 hr

$11,000

≈ 3.7 full work-weeks of typing you don't have to do — about $917/mo at $75/hr.

Math: 3,000 words ÷ 50 WPM = 60 min typing. ÷ 150 WPM = 20 min speaking. → 40 min back per day · 220 workdays/year · $75/hr.

Where these numbers come from →

Speaking ≈ 3× faster than typing — Ruan et al., Stanford HCI (2016): 153 WPM speech vs 52 WPM typing in English; 123 vs 43 WPM in Mandarin.
Conversational speaking sits at ~150 WPM — National Center for Voice and Speech via VirtualSpeech.
50 WPM typing — adult average is 40 WPM; office workers typically target 60 WPM. 50 is a defensible midpoint (Wonderlic; TypingSpeedHub 2025).
3,000 words/day — knowledge workers send ~40 emails/day at ~75 words each, plus Slack and AI chats; ~3,000 words is a defensible midpoint (cloudHQ; Boomerang via EmailAnalytics).
$75/hr default — the BLS mean wage for US software developers (May 2024) is $66.78/hr ($138,890/yr). $75 nudges that up slightly to reflect the startup premium most readers will recognize.
We assume 220 workdays per year (US standard, excluding weekends and ~10 holidays). The math doesn't count time spent reading or thinking — only the keystroke-vs-utterance gap on text you compose.

Free for personal use

Free for you. Paid for your company.

Vox is fair-source: free for personal use — your own writing, side projects, hobby work — under the perpetual Personal Use license in the EULA. If you (or anyone on your team) use Vox as part of your job at a company with more than one person, each user needs a commercial license. Pricing starts at $12 USD/seat/mo. See plans & pricing →

How it works

Three keys. No setup.

No account, no model selection, no permissions tour. Vox ships with sensible defaults so the first dictation works.

01⌘⌥.

Hold the hotkey

Default ⌘⌥. on Mac (Ctrl+Alt+. on Windows). Vox shows a small listening pill near your cursor.

Speak normally

Filler words and self-corrections are fine — Vox cleans them up. Don't worry about punctuation.

03⌘V

Release. Paste.

The architecture is the differentiator.

Most dictation tools send your audio to a server. Vox doesn't. The differences cascade from there.

Where audio is processedVoxOn your deviceCloudTypically on their servers
Internet required at runtimeVoxNo (after first-run model download)CloudTypically yes
App account requiredVoxNoCloudTypically yes
Audio retained after transcriptionVoxNever written to diskCloudDepends on the vendor
Telemetry during dictationVoxNoneCloudVaries by vendor
Works on a planeVoxYesCloudTypically no
Network inspectionVoxIndependently verifiable with Little Snitch or GlassWireCloudServer-side, not user-verifiable

Property	Vox	Typical cloud tool
Where audio is processed	On your device	Typically on their servers
Internet required at runtime	No (after first-run model download)	Typically yes
App account required	No	Typically yes
Audio retained after transcription	Never written to disk	Depends on the vendor
Telemetry during dictation	None	Varies by vendor
Works on a plane	Yes	Typically no
Network inspection	Independently verifiable with Little Snitch or GlassWire	Server-side, not user-verifiable

We don't name competitors here on purpose — the architecture is the comparison, not the brand.

FAQ

Questions, answered.

Does Vox work offline?
Yes. Audio is transcribed by a local model — Whisper or the faster NVIDIA Parakeet model — running on your Mac's Neural Engine, Metal GPU, or a Windows NVIDIA GPU (with CPU fallback). Cleanup runs on Apple Intelligence (macOS 26+) or a local Gemma 4 model. No part of the pipeline requires the internet at runtime.
What gets sent to a server?
Nothing during dictation. Vox does not collect audio, transcripts, telemetry, crash reports, or analytics. The only network calls Vox makes are one-time model downloads on first run, and an optional update check.
What hardware does Vox need?
Today: any Apple Silicon Mac (M1 or newer — late 2020 onwards) running macOS 14 or newer. Intel Macs are not supported. A Windows build is on the roadmap — when released, it will target Windows 11 with an optional NVIDIA GPU for faster Parakeet transcription (CPU fallback supported). Cleanup with Gemma 4 will run on either platform; Apple Intelligence cleanup stays macOS-only.
Do I need to set up a hotkey?
Vox ships with ⌘⌥. on Mac (Ctrl+Alt+. on Windows) bound by default — works without macOS Accessibility permission. You can rebind in Settings.
Will it auto-paste?
No. Vox copies the cleaned-up text to your clipboard. You press ⌘V where you want it. This is deliberate: silent keystroke synthesis is fragile, requires Accessibility permission, and breaks confirmations in apps that intercept paste.
Is Vox open source?
The dictation pipeline ships with the desktop app; auditing what does and doesn't leave your machine is something you can verify with any network monitor. We're considering open-sourcing the modes engine — let us know if it would matter to you.
Can I use Vox at work?
Vox is free for personal use — your own writing, side projects, hobby work. If you use Vox as part of your job at a company with more than one person, you need a commercial license. See Vox for Teams.
What about other languages?
Vox supports Whisper (multilingual) and Parakeet (English, with multilingual variants on the roadmap). The cleanup engine is currently tuned for English — other languages will produce competent transcripts with light cleanup until per-language tuning ships.