MacOS menu bar app for private, on-device meeting transcription.

Recordy is a menu bar app that records, transcribes, and summarizes meetings entirely on-device. No audio leaves the machine. No cloud service, no account required.

Role Solo Designer + Engineer via Claude Code

Timeline 2026

Platform macOS app

Stack Swift, SwiftUI, AVFoundation, ScreenCaptureKit, WhisperKit

The Principle

Nothing leaves the machine.

Built for professionals with a hard constraint: lawyers, therapists, financial advisors, and HR teams who need accurate meeting records but cannot route audio through a cloud service for security compliance. Most alternatives either send audio to external servers or stay local at the cost of usability and setup overhead. I designed Recordy around the opposite: it captures both sides of a call, transcribes locally, and exports a transcript with no configuration required.

The Design

Speaker separation for clarity.

I approached speaker separation by leaning on what macOS already provides. The microphone captures the user; system audio captures the far end of the call. I routed each stream through its own transcription pass, producing a naturally labeled transcript. [You] and [Them] are determined by audio source, which means no configuration and no ambiguity.

[You] and [Them] assigned by audio sources.

Two audio streams = two transcription passes.

Keeping the panel visible.

I designed the panel to stay open during a call, not disappear into the background. It gives continuous visual confirmation that recording is running, and stays accessible whenever the user needs to interact with it mid-call.

Two files on export.

Each session exports a WAV audio file for playback reference and a Markdown file as the written record, including speaker labels, timestamps, and a summary. The audio is there if you need to reference something with accuracy; the transcript is used for quick search reference, or even for AI summaries.

Technical Challenges

Debugging through live testing.

When a room goes quiet, the transcription model can invent words that were never spoken, filling the silence with text that sounds real. A single volume cutoff could not catch this everywhere, since every room is different. So I added a series of filters that each catch a different kind of false output.

What I Learned

Debugging revealed how interconnected the system actually was.

Issues rarely had a single cause. A problem in transcription often traced back to how audio was being routed, or how the engine was initialized. Live testing during real calls was the only way to surface what static testing couldn't. Fixing one thing consistently exposed something else. The process forced a much deeper understanding of how each piece depended on the others than I had anticipated going in.

Previous Project Next Project