Fix Codex session provider repair after provider switch by YBloom · Pull Request #704 · jlcodes99/cockpit-tools

YBloom · 2026-05-09T07:21:12Z

Summary

repair stale Codex session provider metadata after switching the default profile between OAuth, API Key, and Local API Service providers
repair the affected Codex profile before instance launch when bound account injection changes its provider
add directory-level session visibility repair coverage for rollout files and state_5.sqlite rows
fix an existing browser timer type so frontend typecheck/build can pass

Verification

npm run typecheck
npm run build
git diff --check

Not run

cargo fmt: not run because this local machine does not have rustfmt installed
cargo test codex_session_visibility: not run because this local machine does not have cargo installed

Copilot

Pull request overview

This PR adds an automatic “session visibility” repair path for Codex profiles when the effective provider changes (OAuth / API Key / Local API Service), ensuring rollout files and state_5.sqlite thread metadata don’t remain pinned to a stale provider after switching.

Changes:

Add a directory-scoped repair API (repair_session_visibility_for_dir) plus unit tests covering rollout + SQLite rewrites and no-op behavior.
Trigger automatic repairs after account switches and after enabling local access for the default Codex home.
Trigger automatic repairs before instance launch when bound-account injection changes a profile’s provider, and fix a frontend timer type so TS build/typecheck passes.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
src/stores/usePlatformLayoutStore.ts	Adjust timer handle typing for browser `window.setTimeout` usage to satisfy frontend typechecking.
src-tauri/src/modules/codex_session_visibility.rs	Introduce single-directory repair helper + add tests for rollout/SQLite provider repair behavior.
src-tauri/src/commands/codex.rs	Invoke automatic session visibility repair after default-home provider changes (account switch / local access activate).
src-tauri/src/commands/codex_instance.rs	Repair profile session visibility pre-launch when bound account injection alters provider.
CHANGELOG.zh-CN.md	Document the Codex provider-switch repair behavior.
CHANGELOG.md	Document the Codex provider-switch repair behavior (English).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    let backup_dir = backup_instance_files(
+        data_dir,
+        &rollout_changes,
+        sqlite_rows_to_update > 0,
+        instance_id,
+        &target_provider,
+    )?;
+    let backup_dir_string = backup_dir.to_string_lossy().to_string();


+        return Ok(CodexSessionVisibilityRepairItem {
+            instance_id: instance_id.to_string(),
+            instance_name: instance_name.to_string(),
+            target_provider,
+            changed_rollout_file_count: 0,
+            updated_sqlite_row_count: 0,
+            skipped_sqlite_file: sqlite_scan.skipped_unusable_database,
+            backup_dir: None,
+            running: false,
+        });


+    match modules::codex_session_visibility::repair_session_visibility_for_dir(
+        profile_dir,
+        "__launch__",
+        "启动实例",
+    ) {


Str1ckl4nd · 2026-05-10T12:28:42Z

感谢修这个问题。我本地遇到的现象应该和这个 PR 是同一类问题，但目前这个修复可能还漏了一个关键场景。

下面是我用本地只读监听脚本抓到的脱敏时间线，操作是在 Cockpit 的 Codex 页面里点击订阅账号 / 本地 API 服务的启动按钮：

22:17:47 初始状态
config_provider = codex_local_access
auth = apikey:agt
sqlite_threads = { codex_local_access: 366 }

22:17:57 切到 OAuth / 订阅账号后
config_provider = openai
auth = unknown:none
sqlite_threads = { codex_local_access: 366 }

22:18:07-22:18:24 修复过程中处于混合状态
sqlite_threads = { codex_local_access: 50, openai: 316 }
sqlite_threads = { codex_local_access: 103, openai: 263 }
sqlite_threads = { openai: 366 }

22:19:19 再切回本地 API 服务后
config_provider = codex_local_access
auth = apikey:agt
sqlite_threads = { openai: 367 }

危险状态是最后这一段：Codex 当前已经拿到本地 API 服务的 agt... key，但历史会话还标记为 openai。如果这时恢复旧会话，就可能把 agt... 发到 https://api.openai.com/v1/responses，最终报：401 Incorrect API key provided: agt_code...。

我本地观察到两个问题：

provider/auth 切换和历史会话 provider 修复不是原子流程。config.toml / auth.json 会先变，state_5.sqlite 和 rollout 元数据随后才逐步修复。Codex 启动或恢复会话如果发生在这个窗口里，就会进入混合 provider 状态。
rollout 文件里可能不止一条 session_meta。我本地有 37 个 rollout 文件共 53 条残留的 session_meta.payload.model_provider，当时 SQLite 已经看起来修好了，但 rollout 里仍有残留。当前实现看起来仍然使用 read_first_line() / updated_first_line，也就是说可能只修每个 rollout 的第一条 session_meta。

建议修法：

provider 切换路径需要同步完成：记录 before_provider，写入目标 auth/config 后，在启动或恢复会话前阻塞等待 profile repair 完成，至少覆盖 state_5.sqlite 和 rollout 文件。
rollout 修复不要只改第一行，而是全量扫描 JSONL：逐行解析，如果 row.type == "session_meta"，就把 row.payload.model_provider 改成目标 provider。只有发生变化时再原子写回文件。
增加一个测试：同一个 rollout 里有多条 session_meta，并且只有第二条或后面的 provider 是旧值；期望所有 stale session_meta 都被修复，而不是只修第一行。

我本地临时修复脚本的逻辑大致是：

1. 从 config.toml 读取目标 provider：model_provider，缺省为 openai。
2. UPDATE threads SET model_provider = target WHERE model_provider <> target。
3. 遍历 sessions/**/rollout-*.jsonl 和 archived_sessions/**/rollout-*.jsonl：
   逐行解析 JSON；
   如果 row.type == "session_meta"，设置 row.payload.model_provider = target；
   文件有变化时原子写回。

全量修复后，本地审计结果收敛为：

config_provider = codex_local_access
sqlite_threads = { codex_local_access: 367 }
rollout_session_meta = { codex_local_access: 440 }

YBloom · 2026-05-13T06:23:24Z

Updated the PR branch with a narrower follow-up for the stale rollout metadata case.

What changed:

The provider switch paths already block on repair_session_visibility_for_dir after writing target auth/config and before launch, for both default Codex account switches and Local API Service activation.
Rollout repair no longer rewrites only the first line. It now scans the full JSONL file and updates every type == "session_meta" row whose payload.model_provider is stale, then writes the file atomically only when something changed.
Added a regression test where the first session_meta already matches the target provider but a later session_meta is stale; the repair now updates both to the target provider.
Also kept the OAuth reverse-switch fix that removes legacy top-level base_url from config.toml.

Validation I could run locally:

npm run typecheck passed
npm run build passed
git diff --check passed

Not run locally:

cargo fmt / Rust tests, because this machine still does not have cargo / rustfmt installed.

Extra local observation while reproducing: after the on-disk repair converged, an old orphaned Codex app-server process that started days earlier was still holding an old rollout file descriptor and could continue sending requests with a cached stale provider. The PR fix reduces the startup/switch window, but already-running old app-server processes may still need to be restarted once after applying the fix.

Copilot AI review requested due to automatic review settings May 9, 2026 07:21

Copilot started reviewing on behalf of YBloom May 9, 2026 07:21 View session

Copilot AI reviewed May 9, 2026

View reviewed changes

YBloom added 3 commits May 13, 2026 13:11

fix codex session provider repair

b14a797

address codex session repair review

561e794

fix codex rollout provider repair

c5ea9f7

YBloom force-pushed the fix/codex-session-provider-repair branch from 35f26c7 to c5ea9f7 Compare May 13, 2026 06:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Codex session provider repair after provider switch#704

Fix Codex session provider repair after provider switch#704
YBloom wants to merge 3 commits into
jlcodes99:mainfrom
YBloom:fix/codex-session-provider-repair

YBloom commented May 9, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Str1ckl4nd commented May 10, 2026

Uh oh!

YBloom commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

YBloom commented May 9, 2026

Summary

Verification

Not run

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Str1ckl4nd commented May 10, 2026

Uh oh!

YBloom commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants