Fix menu-bar stats popover stuck on loading spinner (#78)#79
Open
iliyami wants to merge 2 commits into
Open
Conversation
The Live system stats popover could hang on its loading spinner forever. `stats` is only assigned after collect() and measure() both return, serially, with no timeout and no partial rendering, so a single blocking syscall on the first iteration pins the panel at nil with no recovery. Root cause: getDiskInfo() read .volumeAvailableCapacityForImportantUsageKey, which computes purgeable space and can block indefinitely on machines with heavy APFS snapshots or Time Machine backups. The popover chrome still renders, so it is a background-actor hang, not a main-thread block or a crash loop (the helper is an SMAppService login item with no KeepAlive, so a crash would drop the menu-bar icon, not leave it spinning). - Disk: read free/total via statfs (fast, in-kernel) instead of the purgeable "important usage" resource key. New DiskUsage.fromStatfs keeps the clamping math under test. - Polling: fence each collector step with withTimeout, assign stats the moment collect() returns (network no longer gates the panel), and os_log slow or timed-out steps so the culprit is diagnosable from a user's Console. - Network: guard against NULL ifa_addr in getNetworkBytes (latent crash on interfaces lacking a link-layer address). - Add a withTimeout + TimeoutError helper, with tests. Bump version to 1.11.5.
Tags with a SemVer prerelease identifier (a hyphen, e.g. v1.11.5-beta.1) now publish as a GitHub pre-release instead of a stable release: - Marked --prerelease, so GitHub excludes it from /releases/latest and the manual in-app updater (which queries that endpoint) never offers it. - The Homebrew cask update step is skipped, so brew users are unaffected. - Tag/version come from the pushed tag, and the beta version is stamped into the binary so About/Settings report it without a committed VERSION bump. - Stable tags (no hyphen) are guarded to match the committed VERSION file so a stable release can never drift from the binary's appVersion. This is the channel used to hand the menu-bar stats fix to a reporter for testing before it ships to everyone.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #78.
Problem
The menu bar "Live system stats" popover could hang on its loading spinner forever. The spinner shows whenever
stats == nil, and the polling loop only assignedstatsafter bothcollect()andmeasure()returned, serially, with no timeout and no partial rendering. A single blocking syscall on the first iteration pinned the panel at nil with no recovery.The popover chrome (header, footer, button) still renders, which tells us the main thread is fine and the stuck work is the background
collect()actor task. It is a hang, not a crash loop: the helper is anSMAppServicelogin item with noKeepAlive, so a crash would drop the menu bar icon rather than leave it spinning.Prime suspect:
getDiskInfo()read.volumeAvailableCapacityForImportantUsageKey, which computes purgeable space and can block indefinitely on machines with heavy APFS snapshots or Time Machine backups. See the issue for the full evidence trail.Changes
statfs("/")(fast, in-kernel, no purgeable/Spotlight dependency) instead of the "important usage" resource key. NewDiskUsage.fromStatfskeeps the byte-scaling and clamping math under test.withTimeouthelper, assignstatsthe momentcollect()returns so the network measurement can never gate the panel, andos_logany slow or timed-out step so a future hang can be pinned to a specific collector from a user's Console without a debug build.ifa_addringetNetworkBytes(), a latent crash on interfaces that lack a link-layer address.withTimeout+TimeoutErrorto MacCleanKit, with tests.Defense in depth
The
statfsswap removes the known blocker at its source. The per-step timeout is a safety net for any other call that might block in the future, and it degrades to "show what we have" plus a log line rather than a frozen panel. Note that Swift cancellation is cooperative, so the timeout keeps the UI responsive but does not kill a wedged syscall; that is why the disk read also had to be made fundamentally non-blocking.Testing
TimeoutTests: returns value when fast, throwsTimeoutErrorwhen slow (verified it returns in ~55ms, not the operation's 10s), propagates the operation's own error.DiskStatsTests.testFromStatfsMultipliesBlockCountsByBlockSize.check-version-sync.sh,swift build,swift testall green locally.Field confirmation still wanted
A
sample MacCleanMenu 3from the reporter while the spinner shows would confirm the disk read is the exact blocker. This fix is correct either way, and the newos_loglines will also show which step was slow once they run a build with it.