Skip to content

feat: add multiline log support#5

Merged
STRRL merged 4 commits intomasterfrom
multiline-log
Feb 27, 2026
Merged

feat: add multiline log support#5
STRRL merged 4 commits intomasterfrom
multiline-log

Conversation

@STRRL
Copy link
Owner

@STRRL STRRL commented Feb 27, 2026

Summary

  • Add pkg/multiline/ package with Datadog-style timestamp auto-detection to identify log entry boundaries (tokenizer → token graph → probability matching against 70+ known timestamp formats)
  • Multiline merger combines continuation lines (stack traces, tracebacks, goroutine dumps) into single logical entries before parsing
  • Wire multiline merging into all three CLI paths: ingest, analyze, and debug workspace
  • Add EndLineNumber field to store schema to track multiline entry spans
  • Detection code adapted from Datadog Agent (Apache 2.0, see NOTICE file); merger and CLI wiring are self-written

Test plan

  • 18 unit tests for tokenizer, token graph, detector, and merger
  • 5 integration tests covering Java stack traces, Python tracebacks, Go panics, mixed timestamp formats, and single-line passthrough
  • All existing tests pass unchanged (go test ./pkg/...)
  • Build succeeds (go build ./cmd/lapp/)
  • Manual test: go run ./cmd/lapp/ ingest pkg/multiline/testdata/java_stacktrace.log --db /tmp/test.duckdb
  • Verify stack trace stored as single entry, single-line logs unaffected

Note

Medium Risk
Moderate risk because it changes how logs are chunked and persisted (multiline merging + DB schema change), which can affect downstream parsing/pattern discovery and any existing DuckDB databases or queries.

Overview
Adds first-class multiline log entry detection and merging so stack traces/tracebacks/panic dumps are treated as a single logical entry instead of many lines.

All three CLI flows (ingest, analyze, debug workspace) now run logs through the new pkg/multiline detector+merger before parsing/analysis, and ingestion persists merged entries with a new end_line_number column/EndLineNumber field to record the original span. The PR also includes extensive unit/integration tests plus a NOTICE file documenting Apache-licensed Datadog-derived detector components.

Written by Cursor Bugbot for commit 87d79b6. This will update automatically on new commits. Configure here.

chatgpt-codex-connector[bot]

This comment was marked as resolved.

chatgpt-codex-connector[bot]

This comment was marked as resolved.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2ec9c20838

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +65 to +66
if end-start < m.minimumTokenLength {
return matchContext{}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Handle short timestamp prefixes in token graph matching

matchProbability enforces end-start >= minimumTokenLength, but end-start is counting token transitions, not tokens, so this effectively requires one extra token and makes short known prefixes impossible to match. In this commit we explicitly include "11:42:35.173" and "11:42:35,173" in pkg/multiline/detector.go, yet those prefixes tokenize to only 7 tokens (6 transitions), so they never satisfy this gate and IsNewEntry stays false for logs that start with time-only timestamps; those logs will be over-merged into large multiline entries instead of preserving entry boundaries.

Useful? React with 👍 / 👎.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Free Tier Details

You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Missing schema migration for existing databases
    • Added ALTER TABLE statement with IF NOT EXISTS to properly add the end_line_number column to existing databases that were created before the multiline feature.

Create PR

Or push these changes by commenting:

@cursor push 31a71f14f4
Preview (31a71f14f4)
diff --git a/pkg/store/duckdb.go b/pkg/store/duckdb.go
--- a/pkg/store/duckdb.go
+++ b/pkg/store/duckdb.go
@@ -57,6 +57,11 @@
 		}
 	}
 
+	_, err = s.db.Exec(`ALTER TABLE log_entries ADD COLUMN IF NOT EXISTS end_line_number INTEGER`)
+	if err != nil {
+		return fmt.Errorf("add end_line_number column: %w", err)
+	}
+
 	return nil
 }
This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.

return fmt.Errorf("create table: %w", err)
}
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing schema migration for existing databases

Medium Severity

The Init() comment claims it "migrates older schemas by adding missing columns," but the implementation only uses CREATE TABLE IF NOT EXISTS, which does nothing when the table already exists. If a user runs against an existing database created before this change, the end_line_number column won't be added, and all subsequent INSERT and SELECT statements referencing end_line_number will fail with SQL errors.

Fix in Cursor Fix in Web

Add a multiline merger that detects log entry boundaries using
Datadog-style timestamp auto-detection (tokenizer + token graph),
so stack traces and other multiline entries are combined before parsing.

Detection code adapted from Datadog Agent (Apache 2.0, see NOTICE).
Merger and CLI wiring are self-written.

Pipeline change: Ingestor → multiline.Merge() → parse loop → Store
Covers the full path: ingestor → multiline.Merge → parser chain → DuckDB
store → query verification. Replaces the manual test steps in the PR.
…, and endLine tracking

- Fall back to line-by-line output when no timestamp is ever detected,
  preventing silent concatenation of unrelated log lines
- Recompute bufBytes from current line after overflow flush instead of
  carrying stale accumulated size
- Check overflow and flush before updating endLine to avoid overlapping
  line ranges in emitted entries
@STRRL STRRL merged commit cb39894 into master Feb 27, 2026
3 checks passed
@STRRL STRRL deleted the multiline-log branch February 27, 2026 20:21
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

}

return true
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dead production code only used in tests

Low Severity

Both tokensToString and isMatch are defined in tokenizer.go (production code) but are never called from any production code path — the only callers are in tokenizer_test.go. Defining test helpers in non-test files means they are compiled into the production binary unnecessarily and can mislead developers into thinking these functions are part of the active logic. In Go, helpers needed only by tests are conventionally placed in _test.go files so they are excluded from production builds.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant