Skip to content

Tag PreCompact/SessionEnd commits with session UUID + name; document reconstruction path #18

@djdarcy

Description

@djdarcy

Tag PreCompact/SessionEnd commits with session UUID + name; document reconstruction path

Problem

Today csb's automated commits all share the same generic subjects:

~/.claude noise: sync transient state files
~/.claude user: sync configs, skills, session logs, and plugins

When a user asks "did the PreCompact hook fire for session f2d0d074-... before compaction #6?", answering it requires:

  1. Find the session JSONL on disk and grep for compact_boundary events (timestamps).
  2. git log -- <jsonl path> and align timestamps by hand.
  3. Cross-reference against an external sesslog if one exists.

This worked, and on a recent verification of session f2d0d074-f06c-435c-94c3-46606a91d32c (the "AMD_INTIGRITI" session) it confirmed 8/8 compactions had a csb commit landing 2-4 minutes before each compact_boundary event. But the inspection took several minutes of git archaeology because the commit subjects carry no session attribution. With many concurrent sessions on a busy machine, attribution gets even harder.

Separately, but related: csb's longer-term goal is to make any past session fully reassemblable -- so a tool like Claude Code History Viewer can read an entire conversation including content that was summarized away by compaction. That goal depends on knowing exactly what csb has captured per session and being able to retrieve the right snapshot quickly.

Findings from the verification (motivating context)

A direct before/after diff was performed on the AMD_INTIGRITI session across compaction #6:

  • Pre-compact commit 50a8421 captured 4412 lines of the JSONL.
  • Post-compact commit 84e895a had 4938 lines, with git diff --numstat showing 526 insertions, 0 deletions.
  • SHA1 of the pre-compact JSONL (full file) matches SHA1 of the first 4412 lines of the post-compact JSONL exactly.

Conclusion: Claude Code writes the main session JSONL append-only -- pre-compact transcript content is preserved on disk even after the compaction summary is appended. csb's role for the main JSONL is defense-in-depth (against deletion, corruption, manual cleanup), not primary recovery.

But two adjacent classes of files behave differently and do depend on csb:

  • subagents/agent-*.jsonl -- written-once-frozen. Live on disk so usually present, but csb is the only off-disk record if the project dir is wiped.
  • tool-results/*.txt -- transient. The AMD_INTIGRITI session has many agent-*.meta.json sidecars but only 2 surviving tool-results/*.txt files on disk. Older tool-result snippets are pruned. For sessions older than the prune horizon, csb is the sole source of these.

This means full reassembly of an old session in CCHV requires pulling from both the live disk (main JSONL, surviving subagents/tool-results) and from csb's git history (whichever transient files have already been pruned). Knowing which commits cover which session is the precondition.

Proposed solution

Part A -- Enrich commit subjects with session attribution

When the PreCompact or SessionEnd hook fires, csb already receives the triggering session UUID via the hook payload. Use it -- plus the friendly name from ~/.claude/session-states/${UUID}.json (the same file claude-session-logger's /sessioninfo command reads) -- to produce richer commit subjects:

csb noise: PreCompact f2d0d074 (AMD_INTIGRITI__reply-caught-up__2026-05-01_DONE)
csb user: PreCompact f2d0d074 (AMD_INTIGRITI__reply-caught-up__2026-05-01_DONE)
csb noise: SessionEnd 8ace3e9d (cchv-session-flag-sanity-check)
csb noise: backup (manual)                           # for `csb backup` outside hooks

Format: csb {noise|user}: {trigger} {short-uuid} ({name}) where:

  • {trigger} is PreCompact, SessionEnd, or backup (manual/cron)
  • {short-uuid} is the first 8 chars of the session UUID
  • {name} is the friendly name from session-states/${UUID}.json:current_name if available, else omitted

The csb prefix replaces the current ~/.claude to make these commits self-identifying in git log output across mixed repos.

Part B -- Document and tooling-support the reconstruction path

Add a csb reconstruct <session-id> subcommand that produces a reassembled view of a session by combining live disk state and git-archived state:

csb reconstruct f2d0d074
# Or full UUID, or name fragment:
csb reconstruct AMD_INTIGRITI

# What it does:
#   1. Resolve session-id -> UUID via the index (same resolver csb show uses)
#   2. Read the live main JSONL from ~/.claude/projects/<proj>/<uuid>.jsonl
#      (already complete via append-only behavior)
#   3. Read live subagents/* and tool-results/* if present
#   4. For any subagent/tool-result NOT on disk, recover the most-recent
#      git-archived version
#   5. Emit a single bundled view (default: directory layout that mirrors
#      ~/.claude/projects/<proj>/<uuid>/ but is fully populated)

Optional output modes:

  • --out <dir> -- write the reassembled session to a directory (default: <cwd>/reconstructed/<uuid>/)
  • --cchv -- emit in a layout CCHV (claude-code-history-viewer) can open directly
  • --at <commit> -- reconstruct as-of a specific csb commit hash

This subcommand is what makes the new commit subjects load-bearing: rather than scanning all noise commits for a touched-file path, the resolver can git log --grep "csb noise: .* {short-uuid}" to find all session-relevant commits in O(few-rows-back) instead of O(every-commit-ever).

Implementation approach

Phase 1 -- Commit subject enrichment

  • hooks/scripts/backup-hook.py: Read ${CLAUDE_HOOK_EVENT} and ${CLAUDE_SESSION_ID} from environment. Pass through to csb backup.
  • claude_session_backup/cli.py: Add --trigger {PreCompact|SessionEnd|manual} and --session-id <uuid> flags to csb backup.
  • claude_session_backup/git_ops.py: Update commit-subject template to include trigger + short-uuid + name. Read session-states/${UUID}.json to look up the name; fall back gracefully if file missing or schema differs.
  • Tests: assert commit subjects match the new format under PreCompact / SessionEnd / manual triggers; assert graceful fallback when session-states/ is missing or unreadable.

Phase 2 -- csb reconstruct subcommand

  • claude_session_backup/commands.py: New cmd_reconstruct(args).
  • claude_session_backup/git_ops.py: New helper iter_archived_files(session_uuid) that walks git log --grep "csb .* {short-uuid}" and yields (commit, path, blob) tuples.
  • Reassembly logic: prefer live disk; merge in git-archived versions only for files missing from disk. Detect divergence (live differs from git) and warn.
  • Tests: synthetic session with deleted tool-results -> reconstruct restores them; live-disk-divergent file -> warning surfaces.

Design considerations

  • UUID-prefix collisions: 8 chars is fine for human eyeballing in git log, but the resolver should match on full UUID for correctness. The 8-char display is cosmetic.
  • Name sanitization: session names can contain spaces, slashes, and arbitrary unicode. Sanitize for the commit subject (strip newlines, trim to ~60 chars, replace path separators).
  • Performance: reading session-states/${UUID}.json on every commit is one small file read -- negligible. Cache for the duration of a single csb backup invocation.
  • Backward compatibility: pre-existing commits with the generic subject keep working with csb show/csb resume. The reconstruct command falls back to file-path matching for older commits -- no migration needed.
  • Privacy: session names sometimes include sensitive context (project codenames, private domains). Add a --no-name config option that omits the name from commit subjects but keeps the UUID. Default behavior includes the name.
  • Interaction with Session fork tracking -- parent/child relationships across branched sessions #15 (Session fork tracking): rich commit subjects make parent/child session traversal easier later. Coordinate the schema so both features share commit-subject parsing.

Acceptance criteria

  • PreCompact hook commits use subject format csb {noise|user}: PreCompact {short-uuid} ({name})
  • SessionEnd hook commits use the same format with SessionEnd instead of PreCompact
  • Manual csb backup (no hook context) commits use csb {noise|user}: backup (manual)
  • If session-states/${UUID}.json is missing or its current_name is empty, the ({name}) segment is omitted cleanly (no ()).
  • --no-name config flag suppresses the name segment globally for users who consider it sensitive
  • csb reconstruct <session-id> resolves a UUID prefix or name fragment and produces a reassembled directory tree
  • Reconstruction prefers live-disk files over git-archived versions and warns on divergence
  • csb reconstruct --cchv <session-id> produces output that CCHV can open via --session <path> (depends on CCHV --session <path> form, ref. CCHV PR #261 commit B)
  • Documentation: short doc page covering "what csb captures" and "how reconstruction works" -- explicitly noting that the main JSONL is append-only and csb's primary irreplaceable role is for tool-results
  • Verification recipe documented: how to confirm PreCompact fired for a given session (the same recipe used in this issue's findings section)
  • Tests for new commit-subject format under all three trigger types
  • Tests for csb reconstruct with a synthetic deleted-tool-results scenario

Related issues

Analysis

See 2026-05-02__23-07-43__verify-csb-precompact-hook-on-amd-intigriti-session.md for the full verification analysis (compaction inventory, csb commit alignment table, append-only proof via SHA1 match, subagent/tool-results pruning observations).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestepicLarge multi-part feature or effort

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions