fix(session): accurate wait exit codes + KV-warm distinction (#1442, #1439) by RyderFreeman4Logos · Pull Request #1445 · RyderFreeman4Logos/cli-sub-agent

RyderFreeman4Logos · 2026-05-17T16:27:53Z

Summary

csa session wait exits 1 despite result.toml showing success #1442: csa session wait now reads result.toml to determine actual exit code instead of using daemon process exit. Sessions that completed successfully no longer report false failures.
csa session wait: distinguish KV-warm exit from real timeout via exit code + caller hint #1439: KV-warm exits (session alive, interval reached) now use exit 0 with structured CSA:SESSION_WAIT_KV_WARM hint. Exit 124 reserved for actual timeouts.

Test plan

just pre-commit passes
csa review --check-verdict PASS (codex gpt-5.5)
Verify: successful session → wait exits 0
Verify: KV-warm exit prints CSA:SESSION_WAIT_KV_WARM hint with exit 0

Closes #1442, Closes #1439

🤖 Generated with Claude Code

`csa session wait` was reporting `status=failure exit=1` for sessions whose `result.toml` recorded `status=success exit_code=0`. The wait loop loaded the refreshed `result.toml` after seeing the daemon completion packet, then dropped it via an mtime-based filter that preferred the daemon completion packet whenever `result.toml` was older or had the same mtime (except for one narrow corner case). In practice this filter always lost: the agent writes `result.toml` before the daemon process exits and writes `daemon-completion.toml`, so `result.toml.mtime <= daemon-completion.toml.mtime` is the normal ordering. Any session where the daemon happened to exit non-zero for reasons unrelated to the session outcome (post-write cleanup failure, parent SIGTERM, etc.) would be reported as a failure to callers. Trust `result.toml` whenever the refresh succeeds. It is the canonical session artifact recording the actual outcome; the daemon completion packet only records the daemon process exit and is meant to be a fallback when `result.toml` is unavailable. The earlier regression test `handle_session_wait_prefers_daemon_completion_exit_code` codified the buggy behaviour and is renamed/inverted to `handle_session_wait_prefers_result_toml_over_daemon_completion_exit_code` so the result-authority contract is explicit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`csa session wait` previously emitted exit code 124 whenever the configured wait cap fired, regardless of whether the underlying session was still healthy. AI agents (and shell callers reading the standard Unix timeout convention) interpreted 124 as failure, even though the loop's intent has always been a polite poll cap that keeps the caller's KV cache warm. This change splits the exit-cap behaviour: * Session daemon still alive at the cap (the common case): emit `CSA:SESSION_WAIT_KV_WARM` with `status=alive` and return exit code 0. The accompanying CSA:CALLER_HINT explicitly tells callers the exit is healthy and to re-wait in a fresh Bash call. * Daemon already gone with no `result.toml` at the cap (defensive, rare): emit `CSA:SESSION_WAIT_TIMEOUT` with `status=dead` and return 124. Earlier loop branches usually catch this case with exit-1 first. Existing tests that asserted the legacy 124-on-alive behaviour are updated, and a new regression test `kv_warm_tests::handle_session_wait_ kv_warm_exit_when_daemon_alive_at_cap` pins the contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request updates the session wait mechanism to prioritize result.toml as the authoritative source for session outcomes, addressing issues where daemon process exit codes caused false failures. It also introduces a new 'KV-warm' exit behavior (returning code 0) when a session remains active at the wait timeout, enabling callers to warm their KV caches before re-waiting. Review feedback identifies opportunities to reduce redundancy in liveness checks and string formatting within session_cmds_daemon_wait.rs, as well as a suggestion to consolidate duplicated test utility code into a shared module to improve maintainability.

gemini-code-assist · 2026-05-17T16:30:39Z

+            let session_alive = session_has_terminal_process(&session_dir)
+                || csa_process::ToolLiveness::is_alive(&session_dir);


The check session_has_terminal_process(&session_dir) is redundant here because csa_process::ToolLiveness::is_alive(&session_dir) already includes checks for live processes (both via daemon.pid and lock files). Simplifying this to just use is_alive improves readability and reduces redundant filesystem operations.

Suggested change

let session_alive = session_has_terminal_process(&session_dir)

|| csa_process::ToolLiveness::is_alive(&session_dir);

let session_alive = csa_process::ToolLiveness::is_alive(&session_dir);

gemini-code-assist · 2026-05-17T16:30:39Z

+                    cd = cd
+                        .as_ref()
+                        .map(|p| format!(" --cd '{p}'"))
+                        .unwrap_or_default(),


The cd argument formatting is redundant here. It was already calculated and stored in cd_arg at line 479. Using cd_arg directly simplifies the code and ensures consistency with the other eprintln! calls in this block.

Suggested change

cd = cd

.as_ref()

.map(|p| format!(" --cd '{p}'"))

.unwrap_or_default(),

cd = cd_arg,

gemini-code-assist · 2026-05-17T16:30:39Z

+struct EnvVarGuard {
+    key: &'static str,
+    original: Option<String>,
+}
+
+impl EnvVarGuard {
+    fn set(key: &'static str, value: impl AsRef<std::ffi::OsStr>) -> Self {
+        let original = std::env::var(key).ok();
+        // SAFETY: test-scoped env mutation guarded by a process-wide mutex.
+        unsafe { std::env::set_var(key, value) };
+        Self { key, original }
+    }
+}
+
+impl Drop for EnvVarGuard {
+    fn drop(&mut self) {
+        // SAFETY: test-scoped env mutation guarded by a process-wide mutex.
+        unsafe {
+            match self.original.as_deref() {
+                Some(value) => std::env::set_var(self.key, value),
+                None => std::env::remove_var(self.key),
+            }
+        }
+    }
+}
+
+#[cfg(target_os = "linux")]
+fn read_process_start_time_ticks(pid: u32) -> u64 {
+    let stat_path = format!("/proc/{pid}/stat");
+    let content = std::fs::read_to_string(stat_path).expect("read /proc stat");
+    let close_paren = content.rfind(')').expect("stat comm terminator");
+    let after_comm = &content[close_paren + 1..];
+    let mut parts = after_comm.split_whitespace();
+    parts.next().expect("state");
+    parts.next().expect("ppid");
+    parts.next().expect("pgrp");
+    for _ in 0..16 {
+        parts.next().expect("intermediate stat field");
+    }
+    parts
+        .next()
+        .expect("starttime")
+        .parse::<u64>()
+        .expect("starttime parse")
+}
+
+#[cfg(target_os = "linux")]
+fn daemon_pid_record(pid: u32) -> String {
+    format!("{pid} {}\n", read_process_start_time_ticks(pid))
+}


The EnvVarGuard struct and read_process_start_time_ticks helper function are duplicated from other test files in this crate (e.g., session_cmds_tests_tail_wait.rs, session_cmds_tests_daemon_pid_tail.rs). This duplication increases maintenance overhead. Consider moving these shared test utilities to a common module within the test suite to improve maintainability.

RyderFreeman4Logos and others added 2 commits May 17, 2026 09:03

RyderFreeman4Logos merged commit 80bd6f0 into main May 17, 2026
5 of 7 checks passed

gemini-code-assist Bot reviewed May 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(session): accurate wait exit codes + KV-warm distinction (#1442, #1439)#1445

fix(session): accurate wait exit codes + KV-warm distinction (#1442, #1439)#1445
RyderFreeman4Logos merged 2 commits into
mainfrom
fix/1442-1439-session-wait-exit-code

RyderFreeman4Logos commented May 17, 2026

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 17, 2026

Uh oh!

gemini-code-assist Bot May 17, 2026

Uh oh!

gemini-code-assist Bot May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		let session_alive = session_has_terminal_process(&session_dir)
		\|\| csa_process::ToolLiveness::is_alive(&session_dir);

	let session_alive = session_has_terminal_process(&session_dir)
	\|\| csa_process::ToolLiveness::is_alive(&session_dir);
	let session_alive = csa_process::ToolLiveness::is_alive(&session_dir);

Conversation

RyderFreeman4Logos commented May 17, 2026

Summary

Test plan

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant