refactor: move loghub loader into integration_test and fix lint issues#9
refactor: move loghub loader into integration_test and fix lint issues#9
Conversation
Move pkg/test/loghub/ to integration_test/loghub/ since it is only used by integration tests. Unexport FileIngestor and Ingestor interface as internal implementation details behind the Ingest() function. Fix 13 pre-existing golangci-lint issues: - Replace fmt.Errorf with errors.Errorf in analyze/debug commands - Add nolint directives for vendored Datadog multiline code - Fix weakCond in tokengraph (nil check → len check) - Remove unused parameter from truncate helper in tests
Stdin piping via "-" is not needed. Simplify fileIngestor and readLines to always open a file path directly, and remove related docs/examples.
gosec was producing only false positives (file permissions on temp workspace files, safe integer interpolation, bounded array loops, etc.) requiring nolint comments on every call site. Disable it entirely.
…linters Remove remaining nolint:gocritic, nolint:gocyclo, nolint:exhaustive, and nolint:gosec comments. Add path-based exclusion in .golangci.yml for pkg/multiline/ (vendored from Datadog agent) instead.
Delete the query CLI command and the pkg/querier thin wrapper. Callers now use store methods directly. Rename PatternInput.PatternID to PatternUUID for clarity. Update CLAUDE.md to reflect removed module.
Remove Parser interface, JSONParser, and ChainParser (only DrainParser exists). Replace per-line Parse() with batch Feed([]string) and Templates() returning DrainCluster with Count. Add MatchTemplate() helper for re-matching lines against discovered templates.
…ing, add extra delimiters - Change DrainCluster.ID from string to uuid.UUID for type safety - Rename PatternID to PatternUUIDString at DB/serialization boundaries - Add pipe, equals, comma as extra Drain delimiters for better tokenization - Update MatchTemplate to use same tokenization as Drain - Fix integration tests to assign pattern IDs via MatchTemplate before storing
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5302a90920
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Replace the pattern_id VARCHAR column on log_entries with a labels JSON
column containing {"pattern": "<semantic-id>"}. Rewrite the ingest flow
to first discover and label patterns in memory, then store log entries
with matched pattern labels in a second pass. Update all SQL queries to
use json_extract_string for label access and JOIN via semantic_id.
…ler to semantic - Move workspace builder from pkg/analyzer into pkg/analyzer/workspace subpackage - Replace inline fmt.Fprintf formatting with embedded Go text/template files - Add AGENTS.md.tmpl, summary.txt.tmpl, errors.txt.tmpl rendered via template.ExecuteTemplate - Rename pkg/labeler to pkg/semantic to better reflect its purpose
…tmpl files ParseFS strips the directory prefix when registering template names, so ExecuteTemplate should use bare filenames like "summary.txt.tmpl" instead of "templates/summary.txt.tmpl".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 77955a5996
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Store pattern_id (UUID) in log entry labels alongside semantic_id and join PatternSummaries on the UUID primary key. This prevents inflated counts when multiple patterns share the same LLM-generated semantic_id. Also add TestMain to integration tests to auto-load .env.test via godotenv, removing the need to pass env vars on the command line.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e9f4ff1764
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 117f919b13
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if tpl, ok := pattern.MatchTemplate(ml.Content, templates); ok { | ||
| if sid, found := semanticIDMap[tpl.ID.String()]; found { | ||
| entry.Labels = map[string]string{ |
There was a problem hiding this comment.
Match against persisted templates before writing labels
discoverAndSavePatterns only records templates with Count > 1 in semanticIDMap, but storeLogsWithLabels still calls pattern.MatchTemplate over the full template set and only writes labels if that first match is present in the map. When overlapping templates exist, a line can first-match a singleton template (filtered out) while also matching a persisted template, which leaves entry.Labels empty and causes that log row to disappear from PatternSummaries despite a valid persisted pattern being available. This undercounts templates in round 2; matching should be restricted to persisted templates (or retry other matches) before skipping labels.
Useful? React with 👍 / 👎.
Move pkg/test/loghub/ to integration_test/loghub/ since it is only used
by integration tests. Unexport FileIngestor and Ingestor interface as
internal implementation details behind the Ingest() function.
Fix 13 pre-existing golangci-lint issues: