-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Context: Why Logs, Why Now
Two tectonic shifts are reshaping how engineers understand production systems:
- LLMs can now process unstructured text at scale — what was previously logs' greatest weakness (no schema, no structure) is now a strength. AI excels at extracting meaning from natural language.
- The "Logs Are All You Need" thesis is gaining traction — the idea that logs are a superset of metrics and traces (metrics = aggregated events, traces = collections of start/end events, logs = raw events) is converging from multiple independent sources.
Intellectual Lineage
| Source | Core Argument |
|---|---|
| Honeycomb / Charity Majors — "Observability 2.0" | Build on "arbitrarily-wide structured log events" as a single source of truth. "The bridge from Observability 1.0 to 2.0 is made up of logs, not metrics." |
| Ivan Burmistrov — "All you need is Wide Events" | Traces, logs, and metrics are all special cases of "wide events." Unify on this primitive instead of maintaining three separate pipelines. |
| Sazabi Manifesto — "Logs Are All You Need" | "With logs, you can reconstruct metrics and traces, giving you three 'pillars' for the price of one." AI makes this viable now. Monitoring is dead; agentic anomaly detection is the future. |
| GreptimeDB | "Metrics, logs, and traces aren't separate storage systems but different query views of the same underlying data." |
What LAPP Is
LAPP (Log Auto Pattern Pipeline) is a CLI tool that automatically discovers log patterns, semantifies them with LLMs, and provides structured viewing of log streams.
Core Design Principles
- Cluster first, LLM second — Drain/Grok/JSON parsers discover templates cheaply (no API cost). LLM is used once per session to semantify templates, not per log line.
- Progressive, not perfect — Discover patterns, inspect leftover, iterate. Zero information loss.
- Composable — Unix-pipe friendly. Reads stdin, outputs structured results. Fits into existing workflows.
Architecture (Current)
stdin / log file
│
▼
Ingestor (streaming LogLine channel)
│
▼
Parser Chain (first match wins)
├─ JSONParser → detects JSON objects, extracts message/keys
├─ GrokParser → SYSLOG, Apache common/combined patterns
├─ DrainParser → online clustering via go-drain3
└─ (LLM stub) → placeholder for future direct LLM parsing
│
▼
DuckDB Store
├─ log_entries (line_number, raw, template_id, template)
└─ (planned) semantic_labels (template_id, semantic_id, description)
│
├──▶ Query CLI (templates, query by template/time)
├──▶ Analyzer Agent (eino ADK + OpenRouter, workspace-based)
└──▶ (planned) Log Viewer (color-coded templates, leftover)
This follows the IBM "Label Broadcasting" pattern: cluster first (90%+ volume reduction), apply LLM to cluster representatives, broadcast labels back. Result: 99.7% reduction in inference cost compared to per-line LLM calls.
Progress
✅ Phase 0: Parser Pipeline (Done)
- Multi-strategy parser chain: JSON → Grok → Drain (first match wins)
- Streaming ingestor (file or stdin)
- DuckDB storage with batch insert
- Query layer: template summaries, filter by template ID / time range
- CLI:
ingest,templates,query - Unit tests for all packages
- Integration tests across all 14 Loghub-2.0 datasets
✅ Phase 0.5: Agentic Analyzer (Done)
- Workspace builder: generates summary.txt + errors.txt from parsed logs
- AI agent via cloudwego/eino ADK + OpenRouter
- Filesystem tools (grep, read_file, execute) for agent investigation
- CLI:
analyze,debug workspace,debug run
🔲 Phase 1: LLM Semantic Labeling (Next)
Give every discovered template a human-readable semantic ID and description.
- Add
semantic_labelstable in DuckDB (template_id → semantic_id, description) - After ingest, collect all templates + sample lines
- Single LLM call: input templates → output semantic IDs and descriptions
- Store labels in DuckDB, enrich query output
- Update
templatesCLI to show semantic info
Before: D1: "Starting <*> on port <*>" / D2: "Connection to <*> timed out after <*> ms"
After: server-startup: "Server starting on a port" / connection-timeout: "DB connection timeout"
🔲 Phase 2: Log Viewer
Color-coded log viewer with template filtering and leftover highlighting.
- Each semantic template gets a distinct color
- Filter by template (show only "connection-timeout" logs)
- Unmatched/leftover logs shown in gray
- TUI (bubbletea/lipgloss) or simple web UI
🔲 Phase 3: Iterative Refinement
- Inspect leftover (unmatched) bucket
- Re-run discovery on leftover lines
- Add new patterns without losing existing ones
- Progressive refinement loop
🔲 Phase 4: Substream Intelligence
- Per-template rate/volume statistics over time
- Trend detection (is this error pattern increasing?)
- Anomaly detection (volume spike detection)
- Time-window queries ("what happened around 14:30?")
🔲 Phase 5: Pipeline & Integration
- Real-time streaming
- Pipeline-as-config (YAML/JSON)
- Integration with log sources (k8s, Docker, journald)
- Output to structured formats (JSONL, OpenTelemetry)
- MCP server for LLM agent access
Tech Stack
| Component | Technology |
|---|---|
| Language | Go |
| CLI | cobra |
| Log Parsing | go-drain3, trivago/grok, JSON detection |
| Storage | DuckDB (via duckdb-go/v2) |
| LLM Agent | cloudwego/eino ADK + OpenRouter |
| LLM Model | google/gemini-3-flash-preview (configurable) |
Evolution from Original Prototype
The original prototype (v0.1) used Bun + Vercel AI SDK + Zod with a "LLM generates regex" approach. The current implementation (v0.2+) switched to Go with a "cluster first, LLM second" approach:
| Aspect | v0.1 (prototype) | v0.2+ (current) |
|---|---|---|
| Pattern discovery | LLM generates regex directly | Drain/Grok/JSON clustering (free, fast) |
| LLM role | Generate regex patterns | Semantify discovered templates |
| LLM cost | Per-discovery session, all lines sampled | One call per session, templates only |
| Storage | JSON files | DuckDB (queryable, scalable) |
| Runtime | Bun (TypeScript) | Go |
The key insight from v0.1 — "use LLM once, then execute cheaply at scale" — is preserved. The LLM's role shifted from "generate regex" to "label templates", which is more cost-effective and reliable.
Differentiation
vs. Classical Log Parsers (Drain3, Logram)
Drain3 produces cryptic templates like User <*> logged in from <*>. LAPP adds an LLM semantification layer: each template gets a human-readable ID and description. Drain does the heavy lifting; LLM provides the understanding.
vs. Commercial Tools (Datadog, Splunk, Elastic)
| Dimension | Commercial Tools | LAPP |
|---|---|---|
| Cost | $0.10–$1.80+/GB | Free, open source |
| Data residency | Logs sent to vendor cloud | Logs stay local |
| Lock-in | Patterns locked to vendor | DuckDB + portable labels |
| Setup | Full platform deployment | cat logs.txt | lapp ingest - |
vs. k8sgpt
k8sgpt diagnoses Kubernetes resource issues. LAPP is log-format agnostic — works with any text log from any source.
References
- IBM: Scalable LLM-Based Log Analytics (arXiv 2511.14803) — "Label Broadcasting" technique
- LLMParser (ICSE 2024) — 0.96 average parsing accuracy
- SoK: LLM-based Log Parsing (2025) — systematic review of 29 papers
- Drain3 (logpai) — production streaming log template miner
- k8sgpt — LLM-powered Kubernetes diagnostics
- Sazabi Manifesto — "Logs Are All You Need"
- Honeycomb: Observability 2.0