Skip to content

Performance: reduce memory footprint and idle energy use#12

Merged
PatelUtkarsh merged 17 commits into
mainfrom
perf/memory-and-energy-fixes
May 2, 2026
Merged

Performance: reduce memory footprint and idle energy use#12
PatelUtkarsh merged 17 commits into
mainfrom
perf/memory-and-energy-fixes

Conversation

@PatelUtkarsh
Copy link
Copy Markdown
Owner

Summary

A pass over AudioType to address two real-world signals:

  • Idle Energy Impact 26 (a menu-bar app should idle near zero)
  • 96 MB resident after long use vs. ~16 MB on a fresh launch (~6× memory growth)

17 focused commits, each independently reviewable and revertable. No behavioral changes — same hotkey, same UX, same output — just less work and less retained state.

Highlights

Idle / hot-path

  • Tear down AVAudioEngine between recordings so the audio HAL doesn't stay warm. Previously the engine and its buffer survived for the app's lifetime. (f87d0ca)
  • Cache CGEventSource for the duration of a key-stroke insertion instead of recreating one per character. (820c7e8)
  • Use clipboard paste for inserts longer than 30 chars — saves ~1 ms per character of post-recording latency. (820c7e8)
  • Return CGEvent unretained from the global event-tap callback, fixing a leak on every fn keystroke. (56716c1)

SwiftUI / AppKit lifecycle

  • Cache the RecordingOverlay NSHostingView and drive its text via an observable, instead of rebuilding the SwiftUI graph + Metal layers every recording. (275c623)
  • Cache the four tinted status-bar icons instead of re-rendering on every state transition. (bd99574)
  • Remove notification observers in MenuBarController.deinit to break a retain cycle. (30363b1)

Audio path

  • Vectorise per-tap RMS via vDSP_measqv, no intermediate [Float] allocation, defer { unlock() } for safety. (3c2cc13)
  • Rewrite WAVEncoder.encode: preallocated Data, vDSP_vfix16 for Float32→Int16 conversion in one shot instead of ~480 000 appendLittleEndian calls per 30 s clip. (0bd3203)
  • Switch upload to URLSession.upload(for:from:) so the request body isn't kept in memory after the request completes. (6a0dfa1)

Concurrency / lifecycle

  • Hold the active transcriptionTask so a fresh recording cancels any in-flight transcription from the previous one — prevents stale text from landing in the user's new context. (4cd8d0d)
  • Resolve the engine once at recording start and reuse it after stop, instead of re-resolving (and re-hitting Keychain) on the post-recording hot path. (42855a2)
  • Retain self for the event-tap lifetime in HotKeyManager, invalidate the run-loop source on stop. (d003aac)

Misc

  • In-memory cache for Keychain reads, invalidated on save/delete. Removes the keychain round-trip from every transcription. (283ed78)
  • Compile a single NSRegularExpression once in TextPostProcessor instead of recompiling per pass. (f22a1d5)
  • @MainActor on AppDelegate to silence the pre-existing [#SendableClosureCaptures] warning at AudioTypeApp.swift:53; replaces DispatchQueue.main.async with await MainActor.run. (7272bd0)
  • Stop auto-closing the onboarding window the moment all permissions flip green — let the user dismiss it themselves via the existing "Get Started" button. (541448d)
  • Two SwiftLint violations introduced by the perf work, fixed. (b4f9efe)

Verification

  • swift build clean
  • swiftlint lint AudioType0 violations, 0 serious in 18 files
  • After fresh launch + onboarding dismissal: 16 MB resident (matches pre-change baseline)
  • After first transcription: 21 MB (one-time costs: lazy overlay, AVAudioEngine Core Audio buffers, URLSession TLS state, regex compile)
  • The second transcription onwards is the real signal — needs to stay flat. Recommend the maintainer record 5–10 short clips and watch Activity Monitor.

Notes for review

  • 0bd3203 (WAV encoder rewrite) is the highest-risk commit. Output should be byte-identical to the previous encoder. If transcription quality regresses on this branch, that's the place to look first.
  • The vDSP commits link Accelerate.framework. I checked: it's served from the shared dyld cache and doesn't show up as resident memory cost in the actual measurements.
  • Two pre-existing untracked files (key-debug.swift, key-debug-README.txt) were left alone.

The recorder held an AVAudioEngine for the lifetime of the process and
kept the audio buffer's high-water capacity forever. For a menu-bar app
that lives for days this kept the audio HAL warm and steadily grew RAM.

- Make audioEngine optional, create on startRecording, nil on stop
- Drop buffer capacity instead of removeAll(keepingCapacity: true)
- Move buffer out via COW transfer instead of copying on stop

AUDIOTYPE-1
Per-character keystroke synthesis with a 1 ms sleep was the dominant
post-release latency for any non-trivial transcription, and a fresh
CGEventSource was being allocated for every character.

- Route text > 30 chars through the existing clipboard paste path
- Create CGEventSource once per insertion, pass it to insertCharacter

AUDIOTYPE-1
Each recording start and recording→processing transition was building a
fresh NSHostingView<RecordingOverlay>, which leaked the SwiftUI graph
and Metal layers and was a primary contributor to the +80 MB drift seen
after long sessions.

- Add overlayText to AudioLevelMonitor as @published
- Read text from the env object inside RecordingOverlay
- Build the NSHostingView once and just mutate overlayText afterwards

AUDIOTYPE-1
Selector-based observers on NotificationCenter.default were never
explicitly removed. While unzeroed-weak crashes are no longer a risk on
modern macOS, explicit cleanup is best practice and matters if the
controller is ever re-instantiated.

AUDIOTYPE-1
NSImage.tinted uses lockFocus/unlockFocus, which allocates a fresh
offscreen bitmap rep on every call. With four distinct icon states
that's a fixed set; pre-render each one once and reuse.

AUDIOTYPE-1
The tap callback fires on every modifier-key change system-wide. Each
return value used Unmanaged.passRetained(event), which adds a retain
the system then has to release — wasted work per event. Apple's own
sample code returns passUnretained because the event is already owned
by the system.

AUDIOTYPE-1
The audio-tap callback fires every ~100 ms during recording. Each call
was allocating an intermediate [Float] from the converter output, then
appending it to the main buffer (a second copy), then computing RMS via
a scalar Array.reduce.

- Compute RMS with vDSP_measqv (vectorised, ~5-10× faster)
- Append directly from UnsafeBufferPointer; no intermediate Array
- Wrap all bufferLock acquires with defer { unlock() } for safety

AUDIOTYPE-1
The encoder allocated an intermediate [Int16] (~960 KB for a 30 s clip),
let Data realloc as it grew from 0, then made one appendLittleEndian
call per sample (~480 000 calls).

- Allocate final Data once at exact size
- Write header in place via storeBytes
- Clip + scale + Float→Int16 conversion via vDSP into the data region

Produces byte-identical output. Significant peak-memory reduction and
encode-time speedup on long recordings.

AUDIOTYPE-1
Setting URLRequest.httpBody and calling URLSession.shared.data(for:)
typically holds the body in two places (the request and URLSession's
internal copy). For ~2 MB WAV bodies that's wasted memory. upload(for:
from:) takes the body once and forwards it.

- buildRequest now returns (URLRequest, Data) instead of mutating httpBody
- transcribe now uses upload(for:from:)

AUDIOTYPE-1
The processor was running ~85 case-insensitive replacingOccurrences
calls on the full transcription, each O(n), and rebuilding a merged
dictionary on every call. Dictionary iteration order is also undefined,
so identical inputs could produce different outputs across runs.

- Compile a single NSRegularExpression with alternation, longest-first
- Cache the compiled regex + lookup; rebuild only on catalog changes
- Apply replacements in one match-and-stitch pass
- Deterministic output ordering as a side benefit

Substring (not word-bounded) matching is preserved to match prior
behavior.

AUDIOTYPE-1
Every transcription resolved the API key via SecItemCopyMatching, often
multiple times across the engine's isAvailable/apiKey accessors. Cache
the resolved value (including the absent state) and invalidate on
save/delete.

AUDIOTYPE-1
A new Task.detached was spawned per recording without holding the
handle. If the user fired the hotkey while a previous transcription was
still in-flight (e.g. slow network), both would race and stale text
could land in the user's new focus.

- Hold the task in transcriptionTask
- Cancel any pending task before starting a new recording

AUDIOTYPE-1
EngineResolver.resolve() was called twice — once via anyEngineAvailable
at startRecording and once at transcribeAndInsert. Each call instantiated
a fresh engine and (for cloud engines) hit the Keychain. Resolve once at
recording start and reuse for the matching transcription.

This also ensures the engine identity can't flip mid-recording if the
user edits settings during capture.

AUDIOTYPE-1
The previous code passed self unretained as the tap's refcon. If self
were ever released while a callback was in-flight on another thread,
takeUnretainedValue would dereference freed memory.

- Retain self with passRetained on startListening; release on stopListening
- Invalidate the CFMachPort before tearing down the run loop source so
  no further callbacks can fire while we clean up
- Release the retain after the tap is dead so any in-flight callback
  still sees a live self

This makes deinit unreachable while listening, which is the correct
trade-off — cleanup must go through stopListening explicitly (which
TranscriptionManager.cleanup already does).

AUDIOTYPE-1
- KeychainHelper: opening_brace — keep brace on the same line as the
  multi-line if-let condition
- WAVEncoder: redundant_void_return — drop explicit -> Void on the
  withUnsafeMutableBytes closure

AUDIOTYPE-1
@PatelUtkarsh PatelUtkarsh merged commit 0976e9d into main May 2, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant