Performance: reduce memory footprint and idle energy use#12
Merged
Conversation
The recorder held an AVAudioEngine for the lifetime of the process and kept the audio buffer's high-water capacity forever. For a menu-bar app that lives for days this kept the audio HAL warm and steadily grew RAM. - Make audioEngine optional, create on startRecording, nil on stop - Drop buffer capacity instead of removeAll(keepingCapacity: true) - Move buffer out via COW transfer instead of copying on stop AUDIOTYPE-1
Per-character keystroke synthesis with a 1 ms sleep was the dominant post-release latency for any non-trivial transcription, and a fresh CGEventSource was being allocated for every character. - Route text > 30 chars through the existing clipboard paste path - Create CGEventSource once per insertion, pass it to insertCharacter AUDIOTYPE-1
Each recording start and recording→processing transition was building a fresh NSHostingView<RecordingOverlay>, which leaked the SwiftUI graph and Metal layers and was a primary contributor to the +80 MB drift seen after long sessions. - Add overlayText to AudioLevelMonitor as @published - Read text from the env object inside RecordingOverlay - Build the NSHostingView once and just mutate overlayText afterwards AUDIOTYPE-1
Selector-based observers on NotificationCenter.default were never explicitly removed. While unzeroed-weak crashes are no longer a risk on modern macOS, explicit cleanup is best practice and matters if the controller is ever re-instantiated. AUDIOTYPE-1
NSImage.tinted uses lockFocus/unlockFocus, which allocates a fresh offscreen bitmap rep on every call. With four distinct icon states that's a fixed set; pre-render each one once and reuse. AUDIOTYPE-1
The tap callback fires on every modifier-key change system-wide. Each return value used Unmanaged.passRetained(event), which adds a retain the system then has to release — wasted work per event. Apple's own sample code returns passUnretained because the event is already owned by the system. AUDIOTYPE-1
The audio-tap callback fires every ~100 ms during recording. Each call
was allocating an intermediate [Float] from the converter output, then
appending it to the main buffer (a second copy), then computing RMS via
a scalar Array.reduce.
- Compute RMS with vDSP_measqv (vectorised, ~5-10× faster)
- Append directly from UnsafeBufferPointer; no intermediate Array
- Wrap all bufferLock acquires with defer { unlock() } for safety
AUDIOTYPE-1
The encoder allocated an intermediate [Int16] (~960 KB for a 30 s clip), let Data realloc as it grew from 0, then made one appendLittleEndian call per sample (~480 000 calls). - Allocate final Data once at exact size - Write header in place via storeBytes - Clip + scale + Float→Int16 conversion via vDSP into the data region Produces byte-identical output. Significant peak-memory reduction and encode-time speedup on long recordings. AUDIOTYPE-1
Setting URLRequest.httpBody and calling URLSession.shared.data(for:) typically holds the body in two places (the request and URLSession's internal copy). For ~2 MB WAV bodies that's wasted memory. upload(for: from:) takes the body once and forwards it. - buildRequest now returns (URLRequest, Data) instead of mutating httpBody - transcribe now uses upload(for:from:) AUDIOTYPE-1
The processor was running ~85 case-insensitive replacingOccurrences calls on the full transcription, each O(n), and rebuilding a merged dictionary on every call. Dictionary iteration order is also undefined, so identical inputs could produce different outputs across runs. - Compile a single NSRegularExpression with alternation, longest-first - Cache the compiled regex + lookup; rebuild only on catalog changes - Apply replacements in one match-and-stitch pass - Deterministic output ordering as a side benefit Substring (not word-bounded) matching is preserved to match prior behavior. AUDIOTYPE-1
Every transcription resolved the API key via SecItemCopyMatching, often multiple times across the engine's isAvailable/apiKey accessors. Cache the resolved value (including the absent state) and invalidate on save/delete. AUDIOTYPE-1
A new Task.detached was spawned per recording without holding the handle. If the user fired the hotkey while a previous transcription was still in-flight (e.g. slow network), both would race and stale text could land in the user's new focus. - Hold the task in transcriptionTask - Cancel any pending task before starting a new recording AUDIOTYPE-1
EngineResolver.resolve() was called twice — once via anyEngineAvailable at startRecording and once at transcribeAndInsert. Each call instantiated a fresh engine and (for cloud engines) hit the Keychain. Resolve once at recording start and reuse for the matching transcription. This also ensures the engine identity can't flip mid-recording if the user edits settings during capture. AUDIOTYPE-1
The previous code passed self unretained as the tap's refcon. If self were ever released while a callback was in-flight on another thread, takeUnretainedValue would dereference freed memory. - Retain self with passRetained on startListening; release on stopListening - Invalidate the CFMachPort before tearing down the run loop source so no further callbacks can fire while we clean up - Release the retain after the tap is dead so any in-flight callback still sees a live self This makes deinit unreachable while listening, which is the correct trade-off — cleanup must go through stopListening explicitly (which TranscriptionManager.cleanup already does). AUDIOTYPE-1
- KeychainHelper: opening_brace — keep brace on the same line as the multi-line if-let condition - WAVEncoder: redundant_void_return — drop explicit -> Void on the withUnsafeMutableBytes closure AUDIOTYPE-1
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
A pass over
AudioTypeto address two real-world signals:17 focused commits, each independently reviewable and revertable. No behavioral changes — same hotkey, same UX, same output — just less work and less retained state.
Highlights
Idle / hot-path
AVAudioEnginebetween recordings so the audio HAL doesn't stay warm. Previously the engine and its buffer survived for the app's lifetime. (f87d0ca)CGEventSourcefor the duration of a key-stroke insertion instead of recreating one per character. (820c7e8)820c7e8)CGEventunretained from the global event-tap callback, fixing a leak on every fn keystroke. (56716c1)SwiftUI / AppKit lifecycle
RecordingOverlayNSHostingViewand drive its text via an observable, instead of rebuilding the SwiftUI graph + Metal layers every recording. (275c623)bd99574)MenuBarController.deinitto break a retain cycle. (30363b1)Audio path
vDSP_measqv, no intermediate[Float]allocation,defer { unlock() }for safety. (3c2cc13)WAVEncoder.encode: preallocatedData,vDSP_vfix16for Float32→Int16 conversion in one shot instead of ~480 000appendLittleEndiancalls per 30 s clip. (0bd3203)URLSession.upload(for:from:)so the request body isn't kept in memory after the request completes. (6a0dfa1)Concurrency / lifecycle
transcriptionTaskso a fresh recording cancels any in-flight transcription from the previous one — prevents stale text from landing in the user's new context. (4cd8d0d)42855a2)selffor the event-tap lifetime inHotKeyManager, invalidate the run-loop source onstop. (d003aac)Misc
283ed78)NSRegularExpressiononce inTextPostProcessorinstead of recompiling per pass. (f22a1d5)@MainActoronAppDelegateto silence the pre-existing[#SendableClosureCaptures]warning atAudioTypeApp.swift:53; replacesDispatchQueue.main.asyncwithawait MainActor.run. (7272bd0)541448d)b4f9efe)Verification
swift buildcleanswiftlint lint AudioType—0 violations, 0 serious in 18 filesNotes for review
0bd3203(WAV encoder rewrite) is the highest-risk commit. Output should be byte-identical to the previous encoder. If transcription quality regresses on this branch, that's the place to look first.Accelerate.framework. I checked: it's served from the shared dyld cache and doesn't show up as resident memory cost in the actual measurements.key-debug.swift,key-debug-README.txt) were left alone.