feat: add multiline log support by STRRL · Pull Request #5 · STRRL/lapp

STRRL · 2026-02-27T02:18:39Z

Summary

Add pkg/multiline/ package with Datadog-style timestamp auto-detection to identify log entry boundaries (tokenizer → token graph → probability matching against 70+ known timestamp formats)
Multiline merger combines continuation lines (stack traces, tracebacks, goroutine dumps) into single logical entries before parsing
Wire multiline merging into all three CLI paths: ingest, analyze, and debug workspace
Add EndLineNumber field to store schema to track multiline entry spans
Detection code adapted from Datadog Agent (Apache 2.0, see NOTICE file); merger and CLI wiring are self-written

Test plan

18 unit tests for tokenizer, token graph, detector, and merger
5 integration tests covering Java stack traces, Python tracebacks, Go panics, mixed timestamp formats, and single-line passthrough
All existing tests pass unchanged (go test ./pkg/...)
Build succeeds (go build ./cmd/lapp/)
Manual test: go run ./cmd/lapp/ ingest pkg/multiline/testdata/java_stacktrace.log --db /tmp/test.duckdb
Verify stack trace stored as single entry, single-line logs unaffected

Note

Medium Risk
Moderate risk because it changes how logs are chunked and persisted (multiline merging + DB schema change), which can affect downstream parsing/pattern discovery and any existing DuckDB databases or queries.

Overview
Adds first-class multiline log entry detection and merging so stack traces/tracebacks/panic dumps are treated as a single logical entry instead of many lines.

All three CLI flows (ingest, analyze, debug workspace) now run logs through the new pkg/multiline detector+merger before parsing/analysis, and ingestion persists merged entries with a new end_line_number column/EndLineNumber field to record the original span. The PR also includes extensive unit/integration tests plus a NOTICE file documenting Apache-licensed Datadog-derived detector components.

^{Written by Cursor Bugbot for commit 87d79b6. This will update automatically on new commits. Configure here.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2ec9c20838

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-02-27T02:39:52Z

pkg/multiline/tokengraph.go

+	if end-start < m.minimumTokenLength {
+		return matchContext{}


Handle short timestamp prefixes in token graph matching

matchProbability enforces end-start >= minimumTokenLength, but end-start is counting token transitions, not tokens, so this effectively requires one extra token and makes short known prefixes impossible to match. In this commit we explicitly include "11:42:35.173" and "11:42:35,173" in pkg/multiline/detector.go, yet those prefixes tokenize to only 7 tokens (6 transitions), so they never satisfy this gate and IsNewEntry stays false for logs that start with time-only timestamps; those logs will be over-merged into large multiline entries instead of preserving entry boundaries.

Useful? React with 👍 / 👎.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Free Tier Details

You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

✅ Fixed: Missing schema migration for existing databases
- Added ALTER TABLE statement with IF NOT EXISTS to properly add the end_line_number column to existing databases that were created before the multiline feature.

Or push these changes by commenting:

@cursor push 31a71f14f4

Preview (31a71f14f4)

diff --git a/pkg/store/duckdb.go b/pkg/store/duckdb.go
--- a/pkg/store/duckdb.go
+++ b/pkg/store/duckdb.go
@@ -57,6 +57,11 @@
 		}
 	}
 
+	_, err = s.db.Exec(`ALTER TABLE log_entries ADD COLUMN IF NOT EXISTS end_line_number INTEGER`)
+	if err != nil {
+		return fmt.Errorf("add end_line_number column: %w", err)
+	}
+
 	return nil
 }

_{This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.}

cursor · 2026-02-27T02:41:55Z

pkg/store/duckdb.go

 			return fmt.Errorf("create table: %w", err)
 		}
 	}
+


Missing schema migration for existing databases

Medium Severity

The Init() comment claims it "migrates older schemas by adding missing columns," but the implementation only uses CREATE TABLE IF NOT EXISTS, which does nothing when the table already exists. If a user runs against an existing database created before this change, the end_line_number column won't be added, and all subsequent INSERT and SELECT statements referencing end_line_number will fail with SQL errors.

Add a multiline merger that detects log entry boundaries using Datadog-style timestamp auto-detection (tokenizer + token graph), so stack traces and other multiline entries are combined before parsing. Detection code adapted from Datadog Agent (Apache 2.0, see NOTICE). Merger and CLI wiring are self-written. Pipeline change: Ingestor → multiline.Merge() → parse loop → Store

Covers the full path: ingestor → multiline.Merge → parser chain → DuckDB store → query verification. Replaces the manual test steps in the PR.

…, and endLine tracking - Fall back to line-by-line output when no timestamp is ever detected, preventing silent concatenation of unrelated log lines - Recompute bufBytes from current line after overflow flush instead of carrying stale accumulated size - Check overflow and flush before updating endLine to avoid overlapping line ranges in emitted entries

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

cursor · 2026-02-27T20:29:31Z

pkg/multiline/tokenizer.go

+	}
+
+	return true
+}


Dead production code only used in tests

Low Severity

Both tokensToString and isMatch are defined in tokenizer.go (production code) but are never called from any production code path — the only callers are in tokenizer_test.go. Defining test helpers in non-test files means they are compiled into the production binary unnecessarily and can mislead developers into thinking these functions are part of the active logic. In Go, helpers needed only by tests are conventionally placed in _test.go files so they are excluded from production builds.

This comment was marked as resolved.

Sign in to view

chatgpt-codex-connector bot reviewed Feb 27, 2026

View reviewed changes

cursor bot reviewed Feb 27, 2026

View reviewed changes

STRRL added 4 commits February 27, 2026 12:19

test: add e2e tests for multiline ingest-to-store pipeline

266bed5

Covers the full path: ingestor → multiline.Merge → parser chain → DuckDB store → query verification. Replaces the manual test steps in the PR.

test: remove unnecessary e2e test for multiline pipeline

e45387d

STRRL force-pushed the multiline-log branch from 2ec9c20 to 87d79b6 Compare February 27, 2026 20:20

STRRL merged commit cb39894 into master Feb 27, 2026
3 checks passed

STRRL deleted the multiline-log branch February 27, 2026 20:21

cursor bot reviewed Feb 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add multiline log support#5

feat: add multiline log support#5
STRRL merged 4 commits intomasterfrom
multiline-log

STRRL commented Feb 27, 2026 •

edited by cursor bot

Loading

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Feb 27, 2026

Uh oh!

cursor bot left a comment •

edited

Loading

Uh oh!

cursor bot Feb 27, 2026

Uh oh!

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

STRRL commented Feb 27, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

cursor bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cursor bot Feb 27, 2026

Choose a reason for hiding this comment

Missing schema migration for existing databases

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Feb 27, 2026

Choose a reason for hiding this comment

Dead production code only used in tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

STRRL commented Feb 27, 2026 •

edited by cursor bot

Loading

cursor bot left a comment •

edited

Loading