Skip to content

Add support for compacting full.jsonl#754

Draft
computermode wants to merge 7 commits intomainfrom
convert-full-transcript
Draft

Add support for compacting full.jsonl#754
computermode wants to merge 7 commits intomainfrom
convert-full-transcript

Conversation

@computermode
Copy link
Contributor

@computermode computermode commented Mar 23, 2026

Removes fields that are unused/unnecessary for the API consumption and converts full.jsonl files to transcript.jsonl files.

Entire-Checkpoint: 7a4de317c664


Note

Medium Risk
Introduces new transcript transformation logic that affects what data is retained/omitted (e.g., tool results, thinking blocks, dropped entry types), so downstream consumers may see behavior changes if assumptions differ across agent formats.

Overview
Adds a new transcript.Compact pipeline that converts full.jsonl transcripts into a normalized transcript.jsonl stream with per-line metadata (v, agent, cli_version) and optional truncation via StartLine.

The compaction drops non-message noise entries, normalizes cross-agent schemas (supports type, role, human, gemini, and Factory AI Droid type:"message" envelopes), strips IDE context tags from user text, and removes assistant thinking blocks/tool_use.caller.

User messages containing tool_result blocks are split into a user text line plus one or more user_tool_result lines, with toolUseResult minimized to only API-relevant fields; extensive golden/edge-case tests were added to lock output semantics and field order.

Written by Cursor Bugbot for commit 0593dab. Configure here.

Removes fields that are unnecessary and converts full.jsonl files to
transcript.jsonl files as per specification in RFD 9

Entire-Checkpoint: 7a4de317c664
Copilot AI review requested due to automatic review settings March 23, 2026 18:34
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a transcript “compaction” layer to transform checkpoint full.jsonl transcripts into a smaller transcript.jsonl-style JSONL format intended for API consumption by removing/flattening fields and splitting out user tool results.

Changes:

  • Introduce Compact + helpers to convert full.jsonl → compacted JSONL (dropping certain line types, stripping thinking blocks, minimizing tool results).
  • Add golden/fixture-based unit tests covering common conversion scenarios, truncation behavior, and deterministic field ordering.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
cmd/entire/cli/transcript/compact.go Implements the compaction/conversion logic and JSON-building helpers.
cmd/entire/cli/transcript/compact_test.go Adds golden tests/fixtures for compaction output, truncation, and edge cases.

Comment on lines +345 to +346
prettyGot, _ := json.MarshalIndent(got, "", " ")
prettyWant, _ := json.MarshalIndent(want, "", " ")
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In assertJSONLines, json.MarshalIndent errors are ignored (prettyGot, _ := ...). errchkjson is enabled for test files too, so this will fail lint. Please handle these errors (e.g., fail the test) or add a narrowly-scoped //nolint:errchkjson with an explanation if you intentionally want to ignore them.

Suggested change
prettyGot, _ := json.MarshalIndent(got, "", " ")
prettyWant, _ := json.MarshalIndent(want, "", " ")
prettyGot, err := json.MarshalIndent(got, "", " ")
if err != nil {
t.Fatalf("line %d: failed to marshal actual JSON for diff: %v\nvalue: %#v", i, err, got)
}
prettyWant, err := json.MarshalIndent(want, "", " ")
if err != nil {
t.Fatalf("line %d: failed to marshal expected JSON for diff: %v\nvalue: %#v", i, err, want)
}

Copilot uses AI. Check for mistakes.
Comment on lines +72 to +84
entryType := unquote(raw["type"])
if droppedTypes[entryType] {
return nil
}

switch entryType {
case "assistant":
return convertAssistant(raw, opts)
case "user":
return convertUser(raw, opts)
default:
return nil // drop unknown types in the new format
}
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

convertLine only reads the entry type from the type field. Cursor transcripts use role (and the transcript package already normalizes role→type in parse.go), so compaction would silently drop all Cursor lines. Consider falling back to role when type is missing/empty (or reusing the existing normalization logic) so both formats are supported.

Copilot uses AI. Check for mistakes.
Comment on lines +183 to +224
// extractUserContent separates user message content into text and tool_result entries.
func extractUserContent(contentRaw json.RawMessage) (string, []toolResultEntry) {
// String content
var str string
if json.Unmarshal(contentRaw, &str) == nil {
return str, nil
}

// Array content
var blocks []map[string]json.RawMessage
if json.Unmarshal(contentRaw, &blocks) != nil {
return "", nil
}

var texts []string
var toolResults []toolResultEntry

for _, block := range blocks {
blockType := unquote(block["type"])

if blockType == "tool_result" {
toolResults = append(toolResults, toolResultEntry{
toolUseID: unquote(block["tool_use_id"]),
})
continue
}

if blockType == "text" {
texts = append(texts, unquote(block["text"]))
}
}

text := ""
if len(texts) > 0 {
text = texts[0]
for i := 1; i < len(texts); i++ {
text += "\n\n" + texts[i]
}
}

return text, toolResults
}
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User text extraction here doesn’t apply textutil.StripIDEContextTags, so IDE/system tags like <ide_opened_file> or Cursor’s <user_query> wrappers can leak into the compacted transcript. The rest of the codebase relies on transcript.ExtractUserContent / StripIDEContextTags to remove these; this function should strip them too (after joining text blocks).

Copilot uses AI. Check for mistakes.
Comment on lines +285 to +313
// marshalOrdered produces a JSON object with keys in the given order.
// Pairs with nil values are omitted.
func marshalOrdered(pairs ...interface{}) []byte {
var buf bytes.Buffer
buf.WriteByte('{')
first := true
for i := 0; i < len(pairs)-1; i += 2 {
key := pairs[i].(string)
val, _ := pairs[i+1].(json.RawMessage)
if val == nil {
continue
}
if !first {
buf.WriteByte(',')
}
keyJSON, _ := json.Marshal(key)
buf.Write(keyJSON)
buf.WriteByte(':')
buf.Write(val)
first = false
}
buf.WriteByte('}')
return buf.Bytes()
}

func mustMarshal(v interface{}) json.RawMessage {
b, _ := json.Marshal(v)
return b
}
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

marshalOrdered / mustMarshal currently ignore encoding/json errors and also ignore the ok result of a type assertion (val, _ := ...). With errchkjson and errcheck (check-type-assertions/check-blank) enabled in this repo, this will fail lint. Recommend changing these helpers to either (1) avoid json.Marshal for constant keys, and (2) return errors instead of discarding them, and (3) avoid comma-ok assertions if you’re not going to handle the boolean.

Copilot uses AI. Check for mistakes.
Comment on lines +43 to +46
lineBytes, err := reader.ReadBytes('\n')
if err != nil && err != io.EOF {
return nil, err
}
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compact returns the raw ReadBytes error directly. wrapcheck is enabled and elsewhere in this package errors are wrapped with context (e.g., parse.go). Consider wrapping this with a message like “failed to read transcript” to satisfy linting and improve diagnosability.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants