Performance: reduce memory footprint and idle energy use by PatelUtkarsh · Pull Request #12 · PatelUtkarsh/audio-type

PatelUtkarsh · 2026-05-02T06:18:39Z

Summary

A pass over AudioType to address two real-world signals:

Idle Energy Impact 26 (a menu-bar app should idle near zero)
96 MB resident after long use vs. ~16 MB on a fresh launch (~6× memory growth)

17 focused commits, each independently reviewable and revertable. No behavioral changes — same hotkey, same UX, same output — just less work and less retained state.

Highlights

Idle / hot-path

Tear down AVAudioEngine between recordings so the audio HAL doesn't stay warm. Previously the engine and its buffer survived for the app's lifetime. (f87d0ca)
Cache CGEventSource for the duration of a key-stroke insertion instead of recreating one per character. (820c7e8)
Use clipboard paste for inserts longer than 30 chars — saves ~1 ms per character of post-recording latency. (820c7e8)
Return CGEvent unretained from the global event-tap callback, fixing a leak on every fn keystroke. (56716c1)

SwiftUI / AppKit lifecycle

Cache the RecordingOverlay NSHostingView and drive its text via an observable, instead of rebuilding the SwiftUI graph + Metal layers every recording. (275c623)
Cache the four tinted status-bar icons instead of re-rendering on every state transition. (bd99574)
Remove notification observers in MenuBarController.deinit to break a retain cycle. (30363b1)

Audio path

Vectorise per-tap RMS via vDSP_measqv, no intermediate [Float] allocation, defer { unlock() } for safety. (3c2cc13)
Rewrite WAVEncoder.encode: preallocated Data, vDSP_vfix16 for Float32→Int16 conversion in one shot instead of ~480 000 appendLittleEndian calls per 30 s clip. (0bd3203)
Switch upload to URLSession.upload(for:from:) so the request body isn't kept in memory after the request completes. (6a0dfa1)

Concurrency / lifecycle

Hold the active transcriptionTask so a fresh recording cancels any in-flight transcription from the previous one — prevents stale text from landing in the user's new context. (4cd8d0d)
Resolve the engine once at recording start and reuse it after stop, instead of re-resolving (and re-hitting Keychain) on the post-recording hot path. (42855a2)
Retain self for the event-tap lifetime in HotKeyManager, invalidate the run-loop source on stop. (d003aac)

Misc

In-memory cache for Keychain reads, invalidated on save/delete. Removes the keychain round-trip from every transcription. (283ed78)
Compile a single NSRegularExpression once in TextPostProcessor instead of recompiling per pass. (f22a1d5)
@MainActor on AppDelegate to silence the pre-existing [#SendableClosureCaptures] warning at AudioTypeApp.swift:53; replaces DispatchQueue.main.async with await MainActor.run. (7272bd0)
Stop auto-closing the onboarding window the moment all permissions flip green — let the user dismiss it themselves via the existing "Get Started" button. (541448d)
Two SwiftLint violations introduced by the perf work, fixed. (b4f9efe)

Verification

swift build clean
swiftlint lint AudioType — 0 violations, 0 serious in 18 files
After fresh launch + onboarding dismissal: 16 MB resident (matches pre-change baseline)
After first transcription: 21 MB (one-time costs: lazy overlay, AVAudioEngine Core Audio buffers, URLSession TLS state, regex compile)
The second transcription onwards is the real signal — needs to stay flat. Recommend the maintainer record 5–10 short clips and watch Activity Monitor.

Notes for review

0bd3203 (WAV encoder rewrite) is the highest-risk commit. Output should be byte-identical to the previous encoder. If transcription quality regresses on this branch, that's the place to look first.
The vDSP commits link Accelerate.framework. I checked: it's served from the shared dyld cache and doesn't show up as resident memory cost in the actual measurements.
Two pre-existing untracked files (key-debug.swift, key-debug-README.txt) were left alone.

The recorder held an AVAudioEngine for the lifetime of the process and kept the audio buffer's high-water capacity forever. For a menu-bar app that lives for days this kept the audio HAL warm and steadily grew RAM. - Make audioEngine optional, create on startRecording, nil on stop - Drop buffer capacity instead of removeAll(keepingCapacity: true) - Move buffer out via COW transfer instead of copying on stop AUDIOTYPE-1

Per-character keystroke synthesis with a 1 ms sleep was the dominant post-release latency for any non-trivial transcription, and a fresh CGEventSource was being allocated for every character. - Route text > 30 chars through the existing clipboard paste path - Create CGEventSource once per insertion, pass it to insertCharacter AUDIOTYPE-1

@published

Each recording start and recording→processing transition was building a fresh NSHostingView<RecordingOverlay>, which leaked the SwiftUI graph and Metal layers and was a primary contributor to the +80 MB drift seen after long sessions. - Add overlayText to AudioLevelMonitor as @published - Read text from the env object inside RecordingOverlay - Build the NSHostingView once and just mutate overlayText afterwards AUDIOTYPE-1

Selector-based observers on NotificationCenter.default were never explicitly removed. While unzeroed-weak crashes are no longer a risk on modern macOS, explicit cleanup is best practice and matters if the controller is ever re-instantiated. AUDIOTYPE-1

NSImage.tinted uses lockFocus/unlockFocus, which allocates a fresh offscreen bitmap rep on every call. With four distinct icon states that's a fixed set; pre-render each one once and reuse. AUDIOTYPE-1

The tap callback fires on every modifier-key change system-wide. Each return value used Unmanaged.passRetained(event), which adds a retain the system then has to release — wasted work per event. Apple's own sample code returns passUnretained because the event is already owned by the system. AUDIOTYPE-1

The audio-tap callback fires every ~100 ms during recording. Each call was allocating an intermediate [Float] from the converter output, then appending it to the main buffer (a second copy), then computing RMS via a scalar Array.reduce. - Compute RMS with vDSP_measqv (vectorised, ~5-10× faster) - Append directly from UnsafeBufferPointer; no intermediate Array - Wrap all bufferLock acquires with defer { unlock() } for safety AUDIOTYPE-1

The encoder allocated an intermediate [Int16] (~960 KB for a 30 s clip), let Data realloc as it grew from 0, then made one appendLittleEndian call per sample (~480 000 calls). - Allocate final Data once at exact size - Write header in place via storeBytes - Clip + scale + Float→Int16 conversion via vDSP into the data region Produces byte-identical output. Significant peak-memory reduction and encode-time speedup on long recordings. AUDIOTYPE-1

Setting URLRequest.httpBody and calling URLSession.shared.data(for:) typically holds the body in two places (the request and URLSession's internal copy). For ~2 MB WAV bodies that's wasted memory. upload(for: from:) takes the body once and forwards it. - buildRequest now returns (URLRequest, Data) instead of mutating httpBody - transcribe now uses upload(for:from:) AUDIOTYPE-1

The processor was running ~85 case-insensitive replacingOccurrences calls on the full transcription, each O(n), and rebuilding a merged dictionary on every call. Dictionary iteration order is also undefined, so identical inputs could produce different outputs across runs. - Compile a single NSRegularExpression with alternation, longest-first - Cache the compiled regex + lookup; rebuild only on catalog changes - Apply replacements in one match-and-stitch pass - Deterministic output ordering as a side benefit Substring (not word-bounded) matching is preserved to match prior behavior. AUDIOTYPE-1

Every transcription resolved the API key via SecItemCopyMatching, often multiple times across the engine's isAvailable/apiKey accessors. Cache the resolved value (including the absent state) and invalidate on save/delete. AUDIOTYPE-1

A new Task.detached was spawned per recording without holding the handle. If the user fired the hotkey while a previous transcription was still in-flight (e.g. slow network), both would race and stale text could land in the user's new focus. - Hold the task in transcriptionTask - Cancel any pending task before starting a new recording AUDIOTYPE-1

EngineResolver.resolve() was called twice — once via anyEngineAvailable at startRecording and once at transcribeAndInsert. Each call instantiated a fresh engine and (for cloud engines) hit the Keychain. Resolve once at recording start and reuse for the matching transcription. This also ensures the engine identity can't flip mid-recording if the user edits settings during capture. AUDIOTYPE-1

The previous code passed self unretained as the tap's refcon. If self were ever released while a callback was in-flight on another thread, takeUnretainedValue would dereference freed memory. - Retain self with passRetained on startListening; release on stopListening - Invalidate the CFMachPort before tearing down the run loop source so no further callbacks can fire while we clean up - Release the retain after the tap is dead so any in-flight callback still sees a live self This makes deinit unreachable while listening, which is the correct trade-off — cleanup must go through stopListening explicitly (which TranscriptionManager.cleanup already does). AUDIOTYPE-1

- KeychainHelper: opening_brace — keep brace on the same line as the multi-line if-let condition - WAVEncoder: redundant_void_return — drop explicit -> Void on the withUnsafeMutableBytes closure AUDIOTYPE-1

PatelUtkarsh added 17 commits May 2, 2026 11:47

Cache tinted status-bar icons instead of re-rendering per state

635811d

NSImage.tinted uses lockFocus/unlockFocus, which allocates a fresh offscreen bitmap rep on every call. With four distinct icon states that's a fixed set; pre-render each one once and reuse. AUDIOTYPE-1

Fix SwiftLint violations introduced by perf work

3430ab2

- KeychainHelper: opening_brace — keep brace on the same line as the multi-line if-let condition - WAVEncoder: redundant_void_return — drop explicit -> Void on the withUnsafeMutableBytes closure AUDIOTYPE-1

Mark AppDelegate @mainactor to silence Sendable warning

89aa5dc

Stop auto-closing onboarding; let user click Get Started

6252cc4

PatelUtkarsh merged commit 0976e9d into main May 2, 2026
3 checks passed

PatelUtkarsh mentioned this pull request May 5, 2026

chore(release): Bump version to 2.3.0 #13

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance: reduce memory footprint and idle energy use#12

Performance: reduce memory footprint and idle energy use#12
PatelUtkarsh merged 17 commits into
mainfrom
perf/memory-and-energy-fixes

PatelUtkarsh commented May 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

PatelUtkarsh commented May 2, 2026

Summary

Highlights

Idle / hot-path

SwiftUI / AppKit lifecycle

Audio path

Concurrency / lifecycle

Misc

Verification

Notes for review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant