Improve Playground CLI child process error handling#2844
Improve Playground CLI child process error handling#2844fredrikekelund merged 18 commits intotrunkfrom
Conversation
📊 Performance Test ResultsComparing 10dc943 vs trunk app-size
site-editor
site-startup
Results are median values from multiple test runs. Legend: 🟢 Improvement (faster) | 🔴 Regression (slower) | ⚪ No change (<50ms diff) |
| // Cache the process list returned from the process manager for a very short time to make multiple | ||
| // calls in quick succession more efficient | ||
| const listProcesses = cacheFunctionTTL( async () => { | ||
| async function listProcesses() { |
There was a problem hiding this comment.
Removing the cacheFunctionTTL is not crucial to this PR's purpose. I just started viewing it as a liability while working on the stopWordPressServer logic. isProcessRunning depends on listProcesses, which caches its result for 1 second. That could easily lead to a sneaky bug at some point, and communicating with the process manager daemon over a socket is pretty fast to begin with.
| signal.addEventListener( | ||
| 'abort', | ||
| () => { | ||
| throw new Error( 'Operation aborted' ); | ||
| }, | ||
| { once: true } | ||
| ); | ||
| stopSignal.throwIfAborted(); |
There was a problem hiding this comment.
It was a mistake on my part to think that an exception thrown from this event handler would be caught by the surrounding try..catch. We need to rely on throwIfAborted instead.
| function sendErrorMessage( messageId: string, error: unknown ): Promise< void > { | ||
| return new Promise( ( resolve ) => { | ||
| const errorResponse: ChildMessageRaw = { | ||
| originalMessageId: messageId, | ||
| topic: 'error', | ||
| errorMessage: parsePhpError( error ), | ||
| errorStack: error instanceof Error ? error.stack : undefined, | ||
| cliArgs: lastCliArgs ?? undefined, | ||
| }; | ||
| process.send!( errorResponse, () => { | ||
| resolve(); | ||
| } ); | ||
| } ); | ||
| } |
There was a problem hiding this comment.
Not sure if we really need to wait for the process.send callback or if that operation is effectively synchronous, but since the next thing that happens after this function is called is that we kill the process, I thought it best to.
| } finally { | ||
| delete abortControllers[ validMessage.messageId ]; |
epeicher
left a comment
There was a problem hiding this comment.
Thanks @fredrikekelund! I tested it and received the expected error message when stopping a site that was starting. I can also see the timestamped logs. Also, code changes LGTM!
| Stopping a Starting site | Timestamped Logs |
|---|---|
![]() |
![]() |


Related issues
How AI was used in this PR
Codex was used to implement the timestamping in the process manager daemon and to reason about
startingPromiselogic in the child process.Proposed Changes
A couple of simpler changes:
stackprop onCliCommandErrorerrors. This is because the strongest "event grouping" signal Sentry uses is the error stack trace. AllCliCommandErrorstack traces are identical, because we always construct them at the same place inexecute-command.ts. This currently results in allCliCommandErrorerrors being grouped into a single Sentry issue, which is incorrect.And some more complex changes:
startServerand madestopServerable to stop an in-flightstartServeroperation.CliCommandErrorSentry events. It's possible to run asite stopCLI command for a site that's currently starting, and it's also possible that users might triggersite stopoperations from Studio if another error leads to Studio losing track of the current started/stopped state for sites. Reproducing this error from within Studio is impossible under normal circumstances, so this falls mostly under "robustness".stopWordPressServerwait for the child process to exit cleanly (with fallback), and I've made the child process actually exit cleanly in response tostop-servermessages.stop-servermessage should lead to a0exit code unless something actually went wrong.Testing Instructions
For the changes in
wordpress-server-child.ts, code review and CI are paramount. However, we can verify that stopping an in-flight start operation works as expected:npm run cli:buildnode apps/cli/dist/cli/main.js site stop --path PATH_TO_SITEin the first tab. Don't run this command yet. Leave it so you can quickly return to the tab after starting the site.node apps/cli/dist/cli/main.js site start --path PATH_TO_SITEin the second tab and run the command.Failed to start WordPress server: This operation was abortedTest that log file lines are timestamped:
~/.studio/pm2/logsnpm startPre-merge Checklist