Conversation
Removes fields that are unnecessary and converts full.jsonl files to transcript.jsonl files as per specification in RFD 9 Entire-Checkpoint: 7a4de317c664
There was a problem hiding this comment.
Pull request overview
Adds a transcript “compaction” layer to transform checkpoint full.jsonl transcripts into a smaller transcript.jsonl-style JSONL format intended for API consumption by removing/flattening fields and splitting out user tool results.
Changes:
- Introduce
Compact+ helpers to convertfull.jsonl→ compacted JSONL (dropping certain line types, stripping thinking blocks, minimizing tool results). - Add golden/fixture-based unit tests covering common conversion scenarios, truncation behavior, and deterministic field ordering.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| cmd/entire/cli/transcript/compact.go | Implements the compaction/conversion logic and JSON-building helpers. |
| cmd/entire/cli/transcript/compact_test.go | Adds golden tests/fixtures for compaction output, truncation, and edge cases. |
| prettyGot, _ := json.MarshalIndent(got, "", " ") | ||
| prettyWant, _ := json.MarshalIndent(want, "", " ") |
There was a problem hiding this comment.
In assertJSONLines, json.MarshalIndent errors are ignored (prettyGot, _ := ...). errchkjson is enabled for test files too, so this will fail lint. Please handle these errors (e.g., fail the test) or add a narrowly-scoped //nolint:errchkjson with an explanation if you intentionally want to ignore them.
| prettyGot, _ := json.MarshalIndent(got, "", " ") | |
| prettyWant, _ := json.MarshalIndent(want, "", " ") | |
| prettyGot, err := json.MarshalIndent(got, "", " ") | |
| if err != nil { | |
| t.Fatalf("line %d: failed to marshal actual JSON for diff: %v\nvalue: %#v", i, err, got) | |
| } | |
| prettyWant, err := json.MarshalIndent(want, "", " ") | |
| if err != nil { | |
| t.Fatalf("line %d: failed to marshal expected JSON for diff: %v\nvalue: %#v", i, err, want) | |
| } |
cmd/entire/cli/transcript/compact.go
Outdated
| entryType := unquote(raw["type"]) | ||
| if droppedTypes[entryType] { | ||
| return nil | ||
| } | ||
|
|
||
| switch entryType { | ||
| case "assistant": | ||
| return convertAssistant(raw, opts) | ||
| case "user": | ||
| return convertUser(raw, opts) | ||
| default: | ||
| return nil // drop unknown types in the new format | ||
| } |
There was a problem hiding this comment.
convertLine only reads the entry type from the type field. Cursor transcripts use role (and the transcript package already normalizes role→type in parse.go), so compaction would silently drop all Cursor lines. Consider falling back to role when type is missing/empty (or reusing the existing normalization logic) so both formats are supported.
| // extractUserContent separates user message content into text and tool_result entries. | ||
| func extractUserContent(contentRaw json.RawMessage) (string, []toolResultEntry) { | ||
| // String content | ||
| var str string | ||
| if json.Unmarshal(contentRaw, &str) == nil { | ||
| return str, nil | ||
| } | ||
|
|
||
| // Array content | ||
| var blocks []map[string]json.RawMessage | ||
| if json.Unmarshal(contentRaw, &blocks) != nil { | ||
| return "", nil | ||
| } | ||
|
|
||
| var texts []string | ||
| var toolResults []toolResultEntry | ||
|
|
||
| for _, block := range blocks { | ||
| blockType := unquote(block["type"]) | ||
|
|
||
| if blockType == "tool_result" { | ||
| toolResults = append(toolResults, toolResultEntry{ | ||
| toolUseID: unquote(block["tool_use_id"]), | ||
| }) | ||
| continue | ||
| } | ||
|
|
||
| if blockType == "text" { | ||
| texts = append(texts, unquote(block["text"])) | ||
| } | ||
| } | ||
|
|
||
| text := "" | ||
| if len(texts) > 0 { | ||
| text = texts[0] | ||
| for i := 1; i < len(texts); i++ { | ||
| text += "\n\n" + texts[i] | ||
| } | ||
| } | ||
|
|
||
| return text, toolResults | ||
| } |
There was a problem hiding this comment.
User text extraction here doesn’t apply textutil.StripIDEContextTags, so IDE/system tags like <ide_opened_file> or Cursor’s <user_query> wrappers can leak into the compacted transcript. The rest of the codebase relies on transcript.ExtractUserContent / StripIDEContextTags to remove these; this function should strip them too (after joining text blocks).
| // marshalOrdered produces a JSON object with keys in the given order. | ||
| // Pairs with nil values are omitted. | ||
| func marshalOrdered(pairs ...interface{}) []byte { | ||
| var buf bytes.Buffer | ||
| buf.WriteByte('{') | ||
| first := true | ||
| for i := 0; i < len(pairs)-1; i += 2 { | ||
| key := pairs[i].(string) | ||
| val, _ := pairs[i+1].(json.RawMessage) | ||
| if val == nil { | ||
| continue | ||
| } | ||
| if !first { | ||
| buf.WriteByte(',') | ||
| } | ||
| keyJSON, _ := json.Marshal(key) | ||
| buf.Write(keyJSON) | ||
| buf.WriteByte(':') | ||
| buf.Write(val) | ||
| first = false | ||
| } | ||
| buf.WriteByte('}') | ||
| return buf.Bytes() | ||
| } | ||
|
|
||
| func mustMarshal(v interface{}) json.RawMessage { | ||
| b, _ := json.Marshal(v) | ||
| return b | ||
| } |
There was a problem hiding this comment.
marshalOrdered / mustMarshal currently ignore encoding/json errors and also ignore the ok result of a type assertion (val, _ := ...). With errchkjson and errcheck (check-type-assertions/check-blank) enabled in this repo, this will fail lint. Recommend changing these helpers to either (1) avoid json.Marshal for constant keys, and (2) return errors instead of discarding them, and (3) avoid comma-ok assertions if you’re not going to handle the boolean.
| lineBytes, err := reader.ReadBytes('\n') | ||
| if err != nil && err != io.EOF { | ||
| return nil, err | ||
| } |
There was a problem hiding this comment.
Compact returns the raw ReadBytes error directly. wrapcheck is enabled and elsewhere in this package errors are wrapped with context (e.g., parse.go). Consider wrapping this with a message like “failed to read transcript” to satisfy linting and improve diagnosability.
Entire-Checkpoint: 8ab41f9871d4
Entire-Checkpoint: 29768eb8b417
Removes fields that are unused/unnecessary for the API consumption and converts full.jsonl files to transcript.jsonl files.
Entire-Checkpoint: 7a4de317c664
Note
Medium Risk
Introduces new transcript transformation logic that affects what data is retained/omitted (e.g., tool results, thinking blocks, dropped entry types), so downstream consumers may see behavior changes if assumptions differ across agent formats.
Overview
Adds a new
transcript.Compactpipeline that convertsfull.jsonltranscripts into a normalizedtranscript.jsonlstream with per-line metadata (v,agent,cli_version) and optional truncation viaStartLine.The compaction drops non-message noise entries, normalizes cross-agent schemas (supports
type,role,human,gemini, and Factory AI Droidtype:"message"envelopes), strips IDE context tags from user text, and removes assistant thinking blocks/tool_use.caller.User messages containing
tool_resultblocks are split into a user text line plus one or moreuser_tool_resultlines, withtoolUseResultminimized to only API-relevant fields; extensive golden/edge-case tests were added to lock output semantics and field order.Written by Cursor Bugbot for commit 0593dab. Configure here.