fix: resolve all 18 docs-as-spec test failures#424
Conversation
Check result.success and compare fixed_sql against original_sql before displaying the auto-fix section. Previously, fixed_sql was always truthy (a string even when unchanged), causing the tool to show the original broken SQL under an "Auto-Fixed SQL" heading. Fixes TICKET-001 in #422. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The handler was returning {diff, equivalent} but the tool expected
{has_changes, unified_diff, additions, deletions, change_count, similarity}.
Since has_changes was always undefined (falsy), the tool always reported
"The two SQL queries are identical."
Now returns proper SqlDiffResult fields with token-aware comparison that
handles single-line SQL correctly.
Fixes TICKET-002 in #422.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Added TypeScript-level detection for YEAR(col) = NNNN and MONTH(col) = N patterns that prevent index/partition pruning. The Rust core.rewrite() doesn't handle these cases, so we augment with TS-level rewrites that convert function-wrapped predicates to range predicates. Fixes TICKET-003 in #422. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Integrated the TypeScript-level non-sargable predicate rewrites into sql_optimize so it produces actual optimized SQL instead of just listing anti-patterns with unchanged SQL. Fixes TICKET-004 in #422. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- TRY_TO_NUMBER → SAFE_CAST(... AS NUMERIC) - ARRAY_AGG(...) WITHIN GROUP (ORDER BY ...) → ARRAY_AGG(... ORDER BY ...) - IFF(cond, a, b) → IF(cond, a, b) for BigQuery (not CASE WHEN) - QUALIFY preserved as-is for BigQuery (native support since 2021) - FLATTEN(input => expr) → UNNEST(expr) BigQuery removed from QUALIFY_TARGETS since it supports QUALIFY natively. Fixes TICKET-005 in #422. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When core.formatSql() returns single-line SQL, apply TypeScript-level pretty-printing that breaks before major clauses (SELECT, FROM, WHERE, etc.), indents sub-clauses (JOIN, AND, OR), and splits SELECT column lists onto separate lines. Fixes TICKET-006 in #422. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Added TypeScript-level keyword fuzzy matching using Levenshtein distance before passing SQL to the Rust fix engine. Handles common keyword typos like SELCT→SELECT, WERE→WHERE, FORM→FROM that the Rust engine can't fix because it only does schema-based fuzzy matching. Fixes TICKET-007 in #422. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…typos 1. Fixed serialization bug: CorrectionResult.iterations is an array of CorrectionIteration objects, not a scalar. Now properly formats each iteration step instead of printing [object Object]. 2. Added keyword typo pre-correction (same as altimate_core_fix) so SELCT→SELECT, WERE→WHERE etc. are fixed before the Rust correction loop. Fixes TICKET-008 in #422. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CompareResult from Rust uses {identical, diffs} not {differences}. The
tool was checking data.differences which was always undefined, causing
it to always report "Queries are structurally identical."
Fixes TICKET-009 in #422.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When Rust checkEquivalence reports queries as non-equivalent, retry with
normalized forms: convert = 'x' to IN ('x') and vice versa. This handles
the common case where WHERE col = 'active' and WHERE col IN ('active')
are semantically equivalent but the AST comparison misses it.
Fixes TICKET-010 in #422.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
LineageResult stores edges in queries[].edges and cross-query links in impact_map, not at the root level. The tool was checking data.edges which was always undefined. Now extracts edges from all nested queries and impact_map entries. Fixes TICKET-011 in #422. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Added TypeScript-level detection for comma-separated tables in the FROM clause (e.g., FROM orders o, customers c). The Rust checkSemantics doesn't flag this pattern. Now reports it as an implicit_cross_join warning with a suggestion to use explicit JOIN syntax. Fixes TICKET-012 in #422. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…Result
MigrationResult from Rust uses {findings, safe, overall_risk} not {risks}.
MigrationFinding uses {risk, operation, message, mitigation} not
{severity, type, recommendation}. The tool was checking data.risks which
was always undefined, so it always reported "safe."
Now correctly reads findings and maps MigrationFinding fields.
Fixes TICKET-013 in #422.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pass depth="full" to core.columnLineage() to enable multi-hop CTE lineage tracing. Previously the depth parameter was omitted, potentially causing the engine to only trace direct edges and miss columns flowing through CTE chains (e.g., users.id → active_users CTE → output). Fixes TICKET-014 in #422. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Rust PII classifier doesn't have a CreditCard category. Added TypeScript-level pattern matching for credit_card_number, card_last_four, card_expiry, cvv, cardholder, and similar column names, classified as "Financial" PII with 0.9 confidence. Fixes TICKET-015 in #422. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rmat
SchemaDiff.changes is a tagged union (column_added, column_removed,
column_type_changed, etc.) but the tool expected ColumnChange format
with {severity, change_type, message}. The raw changes had no severity
field, so has_breaking_changes was always false and the summary showed
"? → ? columns" with all zeros.
Now transforms each SchemaChange variant into ColumnChange with proper
severity, builds summary with actual column counts from the schemas.
Fixes TICKET-016 in #422.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Added TypeScript-level detection for two documented rules that the Rust lint engine doesn't fire: - WINDOW_WITHOUT_PARTITION: Flags window functions with OVER() or OVER(ORDER BY ...) but no PARTITION BY clause. - LARGE_IN_LIST: Flags IN clauses with 20+ values, suggesting a CTE with VALUES for better readability and performance. Fixes TICKET-017 in #422. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… policies When no schema context is provided, the policy checker now extracts table names from the SQL via extractMetadata() and builds a minimal schema so DML statements (DELETE, UPDATE, INSERT, MERGE) pass validation. As a fallback, if validation still fails, the handler parses the policy JSON directly to check forbidden_operations rules against the statement type, ensuring policy violations are reported even when full validation isn't possible. Fixes TICKET-018 in #422. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review.
Tip: disable this comment in your organization's Code Review settings.
📝 WalkthroughWalkthroughThis PR addresses multiple documented issues from the Docs-vs-Reality audit, introducing Snowflake→BigQuery SQL preprocessing, exported keyword typo correction via fuzzy matching, enhanced diff comparison using token normalization and Dice similarity, and improved output formatting and error handling across multiple tools. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant TypeScript as TS Preprocessor
participant Rust as Rust Parser/Emitter
Client->>TypeScript: translate(sql, "snowflake", "bigquery")
TypeScript->>TypeScript: preprocessSnowflakeToBigQuery(sql)
TypeScript->>TypeScript: TRY_TO_NUMBER → SAFE_CAST
TypeScript->>TypeScript: ARRAY_AGG(...) WITHIN GROUP → rewrite
TypeScript->>TypeScript: FLATTEN(...) → UNNEST(...)
TypeScript->>TypeScript: IFF(cond,a,b) → IF(cond,a,b)
TypeScript->>Rust: send preprocessed_sql
Rust->>Rust: parse & emit BigQuery dialect
Rust-->>TypeScript: translated_sql
TypeScript-->>Client: return translated_sql
sequenceDiagram
participant Client
participant Analyzer as SQL Analyzer
participant Tokenizer as Token Normalizer
participant Comparator as Similarity Engine
Client->>Analyzer: diff(old_sql, new_sql)
Analyzer->>Tokenizer: normalize whitespace & tokenize
Tokenizer->>Tokenizer: split into tokens
Tokenizer-->>Analyzer: normalized_old, normalized_new
Analyzer->>Comparator: compute Dice coefficient (char bigrams)
Comparator->>Comparator: additions & deletions count
Comparator-->>Analyzer: similarity, change_count, unified_diff
Analyzer-->>Client: { has_changes, similarity, changes, unified_diff }
sequenceDiagram
participant Client
participant Handler as Fix Handler
participant TypeScript as TS Keyword Fixer
participant Rust as Rust Fixer
Client->>Handler: fix(sql)
Handler->>TypeScript: fixKeywordTypos(sql)
TypeScript->>TypeScript: Levenshtein distance matching
TypeScript-->>Handler: { sql: corrected_sql, corrections: [...] }
Handler->>Rust: invoke rust_fix(corrected_sql)
Rust-->>Handler: { fixed_sql, changes, ... }
Handler->>Handler: merge keyword corrections + rust changes
Handler-->>Client: { fixed_sql, corrections, changes, success }
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (9)
packages/opencode/src/altimate/native/sql/register.ts (3)
739-757: Summary computation iterates schemas twice unnecessarily.
oldSchema.tableNames()andnewSchema.tableNames()are called, then for each table,columnNames()is called. This is fine for small schemas but could be optimized.For large schemas, consider caching the column counts during the change iteration loop.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opencode/src/altimate/native/sql/register.ts` around lines 739 - 757, The summary recomputes old_column_count and new_column_count by re-iterating oldSchema.tableNames() and newSchema.tableNames(); instead, compute and cache column counts during the primary change-collection loop (or build a table->columnCount map) so you don't call columnNames() twice. Modify the code that builds changes (the logic that produces changes, dropped, added, typeChanged, renamed) to increment two counters (oldColCount, newColCount) as you visit each table/column (or store column arrays in a map keyed by table name) and then use those cached counts/values in the summary object instead of re-iterating oldSchema and newSchema.
256-314:prettyPrintSqlis a reasonable fallback formatter.The function handles major clauses, sub-clauses, and SELECT column splitting. However, it may produce unexpected results for complex nested queries or CASE expressions within SELECT lists.
Consider adding a comment noting this is a best-effort formatter and may not handle all edge cases perfectly.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opencode/src/altimate/native/sql/register.ts` around lines 256 - 314, The prettyPrintSql fallback formatter (function prettyPrintSql) is best-effort and can mis-handle complex nested queries and CASE expressions; add a short explanatory comment at the top of the prettyPrintSql function noting it is a non-fully-compliant, best-effort formatter that may not correctly format all edge cases (nested subqueries, CASE in SELECT lists, complex JOIN/ON nesting) so future maintainers know its limitations and when to use a proper SQL formatter library.
574-593: MONTH pattern rewrite is less useful without year context.The comment
/* consider range predicate for partition pruning */is accurate—EXTRACT(MONTH FROM col) = Nis still non-sargable. This provides awareness but not an actual fix.Consider documenting that this is an advisory rewrite rather than a performance optimization, or skip emitting it if no actionable change is made.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opencode/src/altimate/native/sql/register.ts` around lines 574 - 593, The current MONTH(...) rewrite replaces the original SQL with EXTRACT(MONTH FROM ...) yet offers no actionable optimization and remains non-sargable; update the logic around monthPattern, rewrittenSql, and the rewrites push so you either (A) skip performing the replacement in rewrittenSql and instead push an advisory entry (e.g., rule "ADVISORY_NON_SARGABLE" or set rewritten_fragment to the original and rewritten_sql to "") with a clear explanation that this is advisory only and lower confidence, or (B) only emit the replacement and rewrite entry when year context is present (detect a YEAR(...) or date range on the same column) so the change is actionable; adjust the replacement/explanation/confidence accordingly and ensure entries pushed to rewrites accurately reflect advisory vs. actionable changes (refer to monthPattern, replacement, rewrittenSql, and the "NON_SARGABLE_PREDICATE" entry).packages/opencode/src/altimate/native/altimate-core.ts (4)
171-205: Keyword correction may produce false positives on identifier-like tokens.The heuristic on line 179 skips tokens with underscores or lowercase starting chars longer than 3 characters, but tokens like
SELCT(5 chars, no underscore, starts uppercase) would be corrected. However, user-defined functions or table aliases matching the pattern could also be incorrectly "fixed."Consider adding a blocklist for common identifier patterns or requiring the token to appear in a keyword-position context (e.g., at statement start or after specific punctuation).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opencode/src/altimate/native/altimate-core.ts` around lines 171 - 205, The fixKeywordTypos function is producing false positives for identifier-like tokens (e.g., user-defined names like table aliases) — update fixKeywordTypos to be stricter: add a small blocklist of common identifier patterns/names (or a regex check for typical identifiers like camelCase, numeric suffixes, or known function names) and require keyword-position context before applying corrections (e.g., token appears at start of statement or immediately after punctuation/keywords such as "SELECT", "FROM", "WHERE", "(", "," ), using the existing SQL_KEYWORDS and levenshtein logic; narrow the replacement condition so only tokens that both pass the context check and are not in the identifier blocklist are considered for correction and record corrections in corrections[] as before.
711-727: Hardcoding"full"depth for CTE lineage tracing may have performance implications.Forcing full depth ensures multi-hop CTEs are traced, but could be slow for complex queries with deep nesting. Consider making this configurable via params.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opencode/src/altimate/native/altimate-core.ts` around lines 711 - 727, The current registration for "altimate_core.column_lineage" hardcodes the CTE tracing depth to "full" which can cause performance issues for deeply nested queries; modify the handler so it reads an optional params.depth (or similar) value and passes that to core.columnLineage instead of always using "full", defaulting to "full" only when params.depth is undefined; update the parameter name (e.g., depth or cte_depth) and validation in the register callback, and ensure callers can pass this new param while preserving existing behavior when it's not provided.
385-397: Fallback schema creation for DML is a pragmatic workaround.Creating minimal
CREATE TABLE x (id INT)stubs allows DML policy checks to proceed. However, ifextractMetadatareturns qualified names (e.g.,schema.table), the generated DDL may be syntactically invalid for some dialects.Consider quoting table names for safety
if (meta.tables?.length) { const ddl = meta.tables - .map((t: string) => `CREATE TABLE ${t} (id INT);`) + .map((t: string) => `CREATE TABLE "${t}" (id INT);`) .join("\n") schema = core.Schema.fromDdl(ddl)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opencode/src/altimate/native/altimate-core.ts` around lines 385 - 397, The fallback builds DDL like `CREATE TABLE ${t} (id INT);` which breaks when core.extractMetadata returns qualified names (e.g., schema.table); update the fallback in the block that checks params.schema_path and params.schema_context to safely quote or split identifiers from meta.tables before passing to core.Schema.fromDdl: for each table name from meta.tables, detect qualified names, split on the dot into segments, quote each segment with a safe identifier quote (e.g., double quotes) or escape characters, then reassemble into a properly quoted CREATE TABLE statement (CREATE TABLE "schema"."table" (id INT);) so the generated DDL is syntactically valid across dialects.
151-165: Levenshtein implementation is correct but allocates O(m×n) memory.For typical SQL keyword lengths (≤12 chars due to the filter at line 182), this is fine. However, the function is called for every token against all ~100 keywords, which could be slow for very large SQL strings.
For future optimization, consider early-exit when the length difference exceeds the max allowed distance, or use a BK-tree for keyword lookup.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opencode/src/altimate/native/altimate-core.ts` around lines 151 - 165, The levenshtein(a: string, b: string) function currently allocates O(m×n) memory and computes full distance every time; change it to an efficient bounded version that accepts a maxDistance parameter and early-exits when abs(a.length - b.length) > maxDistance or when all entries in the current DP row exceed maxDistance, and reduce memory to O(min(m,n)) using only two rolling rows; update callers (the token-vs-keyword checks) to pass the allowed max distance (the same threshold used for short SQL keywords) so far-away candidates are skipped quickly.packages/opencode/src/altimate/tools/altimate-core-migration.ts (1)
47-56: MigrationFinding field mapping assumes undocumented structure.The code expects
r.risk,r.operation,r.mitigation, andr.rollback_sql, but the Rust handler doesn't transform fields. If the actual response uses different names (e.g.,severityinstead ofrisk,typeinstead ofoperation,recommendationinstead ofmitigation), the fallbacks handle some cases butrollback_sqlhas no fallback.Consider adding a shaping layer in the
altimate_core.migrationhandler (similar to other handlers in this PR) to normalize field names, rather than relying on fallbacks in the formatting function.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opencode/src/altimate/tools/altimate-core-migration.ts` around lines 47 - 56, The MigrationFinding mapping is fragile: update the altimate_core.migration handler to add a shaping/normalization layer that canonicalizes incoming fields (e.g., map severity↔risk, type↔operation, recommendation↔mitigation and any rollback/rollback_sql variants to a single rollback_sql property) so the formatter loop in altimate-core-migration.ts (the for (const r of findings) block and its uses of r.risk, r.operation, r.mitigation, r.recommendation, r.rollback_sql) can rely on normalized names; then simplify the formatter to read the canonical fields (severity, operation, mitigation, rollback_sql) instead of relying on ad-hoc fallback expressions.packages/opencode/src/altimate/tools/altimate-core-track-lineage.ts (1)
53-65: Synthetic cross-query edges from impact_map may duplicate existing edges.If the same source→target relationship exists in both
queries[].edgesandimpact_map, it will appear twice inallEdges. Consider deduplicating.🔧 Optional: Add deduplication
+ // Deduplicate edges by source+target key + const seen = new Set<string>() + const dedupedEdges: Array<Record<string, any>> = [] + for (const edge of allEdges) { + const key = `${formatEdgeRef(edge.source)}→${formatEdgeRef(edge.target)}` + if (!seen.has(key)) { + seen.add(key) + dedupedEdges.push(edge) + } + } + return dedupedEdges - return allEdges🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opencode/src/altimate/tools/altimate-core-track-lineage.ts` around lines 53 - 65, The code that merges synthetic edges from data.impact_map into allEdges can create duplicates when the same source→target already exists in queries[].edges; modify the logic in the block that iterates data.impact_map (using entry.source and entry.affected) to perform deduplication before pushing: maintain a Set of unique keys (e.g. `${source}|${target}|${transform_type}`) built from existing allEdges (which includes edges from queries[].edges) and only push new cross_query entries if their key is not already present, or alternatively dedupe allEdges after concatenation by reducing to unique keys.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/opencode/src/altimate/native/altimate-core.ts`:
- Around line 67-72: The regex in preprocessTryToNumber currently captures
commas and thus mangles multi-argument forms like TRY_TO_NUMBER(expr, format);
update the matcher to only convert single-argument calls by forbidding commas
inside the capture (e.g. change the capture from [^()]+? to something like
[^(),]+?) so only TRY_TO_NUMBER(expr) becomes SAFE_CAST(expr AS NUMERIC),
leaving multi-argument variants unmodified for Rust transpile/failure; edit the
regex in preprocessTryToNumber accordingly.
- Around line 454-474: The current commaJoinPattern only catches the first two
tables and can mis-handle aliases; update the detection logic in
altimate-core.ts (where commaJoinPattern, params.sql, and findings are used) to
detect any comma-separated list in the FROM clause rather than just two tables —
for example, replace the regex with one that captures the full FROM clause up to
the next clause (e.g.
/\bFROM\b([^;()]*?)(?=\bWHERE\b|\bGROUP\b|\bORDER\b|\bLIMIT\b|$)/i) and then
check the captured text for a comma (/,/) to flag implicit cross joins; keep the
existing push into findings (rule "implicit_cross_join") and the data.valid
update, but ensure alias patterns like "AS alias" don't break detection.
- Around line 78-83: preprocessArrayAggWithinGroup's regex fails on nested
parentheses (e.g. ARRAY_AGG(COALESCE(x,0)) WITHIN GROUP ...); replace the regex
approach in preprocessArrayAggWithinGroup with a small parser that scans for
"ARRAY_AGG", finds the opening "(", consumes until the matching closing ")" by
tracking paren depth (so nested parentheses are handled), then checks for the
"WITHIN GROUP (ORDER BY ...)" suffix and rewrites to "ARRAY_AGG(<arg> ORDER BY
<orderExpr>)"; refer to the preprocessArrayAggWithinGroup function and implement
the index-based parenthesis matching and substring replacement instead of using
/[^()]+?/.
---
Nitpick comments:
In `@packages/opencode/src/altimate/native/altimate-core.ts`:
- Around line 171-205: The fixKeywordTypos function is producing false positives
for identifier-like tokens (e.g., user-defined names like table aliases) —
update fixKeywordTypos to be stricter: add a small blocklist of common
identifier patterns/names (or a regex check for typical identifiers like
camelCase, numeric suffixes, or known function names) and require
keyword-position context before applying corrections (e.g., token appears at
start of statement or immediately after punctuation/keywords such as "SELECT",
"FROM", "WHERE", "(", "," ), using the existing SQL_KEYWORDS and levenshtein
logic; narrow the replacement condition so only tokens that both pass the
context check and are not in the identifier blocklist are considered for
correction and record corrections in corrections[] as before.
- Around line 711-727: The current registration for
"altimate_core.column_lineage" hardcodes the CTE tracing depth to "full" which
can cause performance issues for deeply nested queries; modify the handler so it
reads an optional params.depth (or similar) value and passes that to
core.columnLineage instead of always using "full", defaulting to "full" only
when params.depth is undefined; update the parameter name (e.g., depth or
cte_depth) and validation in the register callback, and ensure callers can pass
this new param while preserving existing behavior when it's not provided.
- Around line 385-397: The fallback builds DDL like `CREATE TABLE ${t} (id
INT);` which breaks when core.extractMetadata returns qualified names (e.g.,
schema.table); update the fallback in the block that checks params.schema_path
and params.schema_context to safely quote or split identifiers from meta.tables
before passing to core.Schema.fromDdl: for each table name from meta.tables,
detect qualified names, split on the dot into segments, quote each segment with
a safe identifier quote (e.g., double quotes) or escape characters, then
reassemble into a properly quoted CREATE TABLE statement (CREATE TABLE
"schema"."table" (id INT);) so the generated DDL is syntactically valid across
dialects.
- Around line 151-165: The levenshtein(a: string, b: string) function currently
allocates O(m×n) memory and computes full distance every time; change it to an
efficient bounded version that accepts a maxDistance parameter and early-exits
when abs(a.length - b.length) > maxDistance or when all entries in the current
DP row exceed maxDistance, and reduce memory to O(min(m,n)) using only two
rolling rows; update callers (the token-vs-keyword checks) to pass the allowed
max distance (the same threshold used for short SQL keywords) so far-away
candidates are skipped quickly.
In `@packages/opencode/src/altimate/native/sql/register.ts`:
- Around line 739-757: The summary recomputes old_column_count and
new_column_count by re-iterating oldSchema.tableNames() and
newSchema.tableNames(); instead, compute and cache column counts during the
primary change-collection loop (or build a table->columnCount map) so you don't
call columnNames() twice. Modify the code that builds changes (the logic that
produces changes, dropped, added, typeChanged, renamed) to increment two
counters (oldColCount, newColCount) as you visit each table/column (or store
column arrays in a map keyed by table name) and then use those cached
counts/values in the summary object instead of re-iterating oldSchema and
newSchema.
- Around line 256-314: The prettyPrintSql fallback formatter (function
prettyPrintSql) is best-effort and can mis-handle complex nested queries and
CASE expressions; add a short explanatory comment at the top of the
prettyPrintSql function noting it is a non-fully-compliant, best-effort
formatter that may not correctly format all edge cases (nested subqueries, CASE
in SELECT lists, complex JOIN/ON nesting) so future maintainers know its
limitations and when to use a proper SQL formatter library.
- Around line 574-593: The current MONTH(...) rewrite replaces the original SQL
with EXTRACT(MONTH FROM ...) yet offers no actionable optimization and remains
non-sargable; update the logic around monthPattern, rewrittenSql, and the
rewrites push so you either (A) skip performing the replacement in rewrittenSql
and instead push an advisory entry (e.g., rule "ADVISORY_NON_SARGABLE" or set
rewritten_fragment to the original and rewritten_sql to "") with a clear
explanation that this is advisory only and lower confidence, or (B) only emit
the replacement and rewrite entry when year context is present (detect a
YEAR(...) or date range on the same column) so the change is actionable; adjust
the replacement/explanation/confidence accordingly and ensure entries pushed to
rewrites accurately reflect advisory vs. actionable changes (refer to
monthPattern, replacement, rewrittenSql, and the "NON_SARGABLE_PREDICATE"
entry).
In `@packages/opencode/src/altimate/tools/altimate-core-migration.ts`:
- Around line 47-56: The MigrationFinding mapping is fragile: update the
altimate_core.migration handler to add a shaping/normalization layer that
canonicalizes incoming fields (e.g., map severity↔risk, type↔operation,
recommendation↔mitigation and any rollback/rollback_sql variants to a single
rollback_sql property) so the formatter loop in altimate-core-migration.ts (the
for (const r of findings) block and its uses of r.risk, r.operation,
r.mitigation, r.recommendation, r.rollback_sql) can rely on normalized names;
then simplify the formatter to read the canonical fields (severity, operation,
mitigation, rollback_sql) instead of relying on ad-hoc fallback expressions.
In `@packages/opencode/src/altimate/tools/altimate-core-track-lineage.ts`:
- Around line 53-65: The code that merges synthetic edges from data.impact_map
into allEdges can create duplicates when the same source→target already exists
in queries[].edges; modify the logic in the block that iterates data.impact_map
(using entry.source and entry.affected) to perform deduplication before pushing:
maintain a Set of unique keys (e.g. `${source}|${target}|${transform_type}`)
built from existing allEdges (which includes edges from queries[].edges) and
only push new cross_query entries if their key is not already present, or
alternatively dedupe allEdges after concatenation by reducing to unique keys.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 286f03ca-42cd-472e-8336-a2b4d7fcb872
📒 Files selected for processing (7)
packages/opencode/src/altimate/native/altimate-core.tspackages/opencode/src/altimate/native/sql/register.tspackages/opencode/src/altimate/tools/altimate-core-compare.tspackages/opencode/src/altimate/tools/altimate-core-correct.tspackages/opencode/src/altimate/tools/altimate-core-migration.tspackages/opencode/src/altimate/tools/altimate-core-track-lineage.tspackages/opencode/src/altimate/tools/sql-fix.ts
| export function preprocessTryToNumber(sql: string): string { | ||
| return sql.replace( | ||
| /\bTRY_TO_(?:NUMBER|DECIMAL|NUMERIC)\s*\(\s*([^()]+?)\s*\)/gi, | ||
| "SAFE_CAST($1 AS NUMERIC)", | ||
| ) | ||
| } |
There was a problem hiding this comment.
Regex doesn't handle multi-argument TRY_TO_NUMBER variants.
TRY_TO_NUMBER(expr, format) and TRY_TO_NUMBER(expr, precision, scale) are valid Snowflake syntax. The current regex with non-greedy [^()]+? will match the entire content including commas, producing invalid SAFE_CAST(expr, format AS NUMERIC).
🔧 Proposed fix to handle single-argument only
export function preprocessTryToNumber(sql: string): string {
return sql.replace(
- /\bTRY_TO_(?:NUMBER|DECIMAL|NUMERIC)\s*\(\s*([^()]+?)\s*\)/gi,
+ /\bTRY_TO_(?:NUMBER|DECIMAL|NUMERIC)\s*\(\s*([^(),]+?)\s*\)/gi,
"SAFE_CAST($1 AS NUMERIC)",
)
}Alternatively, leave multi-argument forms unconverted and let Rust transpile handle them (or fail gracefully).
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| export function preprocessTryToNumber(sql: string): string { | |
| return sql.replace( | |
| /\bTRY_TO_(?:NUMBER|DECIMAL|NUMERIC)\s*\(\s*([^()]+?)\s*\)/gi, | |
| "SAFE_CAST($1 AS NUMERIC)", | |
| ) | |
| } | |
| export function preprocessTryToNumber(sql: string): string { | |
| return sql.replace( | |
| /\bTRY_TO_(?:NUMBER|DECIMAL|NUMERIC)\s*\(\s*([^(),]+?)\s*\)/gi, | |
| "SAFE_CAST($1 AS NUMERIC)", | |
| ) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/opencode/src/altimate/native/altimate-core.ts` around lines 67 - 72,
The regex in preprocessTryToNumber currently captures commas and thus mangles
multi-argument forms like TRY_TO_NUMBER(expr, format); update the matcher to
only convert single-argument calls by forbidding commas inside the capture (e.g.
change the capture from [^()]+? to something like [^(),]+?) so only
TRY_TO_NUMBER(expr) becomes SAFE_CAST(expr AS NUMERIC), leaving multi-argument
variants unmodified for Rust transpile/failure; edit the regex in
preprocessTryToNumber accordingly.
| export function preprocessArrayAggWithinGroup(sql: string): string { | ||
| return sql.replace( | ||
| /\bARRAY_AGG\s*\(\s*([^()]+?)\s*\)\s*WITHIN\s+GROUP\s*\(\s*ORDER\s+BY\s+([^()]+?)\s*\)/gi, | ||
| "ARRAY_AGG($1 ORDER BY $2)", | ||
| ) | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for ARRAY_AGG usage patterns in test files to assess complexity
rg -n "ARRAY_AGG.*WITHIN\s+GROUP" --type-add 'test:*.test.ts' --type testRepository: AltimateAI/altimate-code
Length of output: 50
🏁 Script executed:
# Find usages of preprocessArrayAggWithinGroup function
rg -n "preprocessArrayAggWithinGroup"Repository: AltimateAI/altimate-code
Length of output: 302
🏁 Script executed:
# Search for test files related to altimate-core
fd -e test.ts -e spec.ts "altimate-core"Repository: AltimateAI/altimate-code
Length of output: 251
🏁 Script executed:
# Search for ARRAY_AGG patterns in the entire codebase
rg -n "ARRAY_AGG" --type ts --type tsxRepository: AltimateAI/altimate-code
Length of output: 95
🏁 Script executed:
# Look for SQL strings that might contain ARRAY_AGG with parentheses
rg -n "ARRAY_AGG\s*\(" --type ts --type tsx -A 3Repository: AltimateAI/altimate-code
Length of output: 95
🏁 Script executed:
# Search for ARRAY_AGG tests in the test files
rg -n "ARRAY_AGG" packages/opencode/test/altimate/Repository: AltimateAI/altimate-code
Length of output: 50
🏁 Script executed:
# Check the test file for preprocessArrayAggWithinGroup
rg -n "preprocessArrayAggWithinGroup" packages/opencode/test/Repository: AltimateAI/altimate-code
Length of output: 50
🏁 Script executed:
# Look at the context around line 101 in altimate-core.ts
sed -n '95,110p' packages/opencode/src/altimate/native/altimate-core.tsRepository: AltimateAI/altimate-code
Length of output: 566
🏁 Script executed:
# View the entire function and surrounding context
sed -n '78,85p' packages/opencode/src/altimate/native/altimate-core.tsRepository: AltimateAI/altimate-code
Length of output: 297
🏁 Script executed:
# View the altimate-core test files to find test cases
head -100 packages/opencode/test/altimate/altimate-core-native.test.tsRepository: AltimateAI/altimate-code
Length of output: 3608
🏁 Script executed:
# Search for test patterns with ARRAY_AGG
rg -n "ARRAY_AGG" packages/opencode/test/ -A 2 -B 2Repository: AltimateAI/altimate-code
Length of output: 50
🏁 Script executed:
# Look for any mention of WITHIN GROUP in tests
rg -n "WITHIN.*GROUP" packages/opencode/test/ -A 2 -B 2Repository: AltimateAI/altimate-code
Length of output: 50
🏁 Script executed:
# Search for .sql files or SQL strings in test directories
find packages/opencode/test -type f \( -name "*.sql" -o -name "*.json" \)Repository: AltimateAI/altimate-code
Length of output: 169
🏁 Script executed:
# Check what's in the e2e test file
head -50 packages/opencode/test/altimate/altimate-core-e2e.test.tsRepository: AltimateAI/altimate-code
Length of output: 1529
🏁 Script executed:
# Search entire codebase for any ARRAY_AGG patterns (case insensitive)
rg -i "array_agg" packages/opencode/ --type tsRepository: AltimateAI/altimate-code
Length of output: 529
Regex cannot handle nested parentheses in ARRAY_AGG expressions.
The pattern [^()]+? explicitly excludes parentheses and will fail to match expressions like ARRAY_AGG(COALESCE(x, 0)) WITHIN GROUP (ORDER BY ...). No test cases in the codebase exercise nested parentheses, but this is a correctness issue that should be addressed if such expressions could appear in converted SQL.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/opencode/src/altimate/native/altimate-core.ts` around lines 78 - 83,
preprocessArrayAggWithinGroup's regex fails on nested parentheses (e.g.
ARRAY_AGG(COALESCE(x,0)) WITHIN GROUP ...); replace the regex approach in
preprocessArrayAggWithinGroup with a small parser that scans for "ARRAY_AGG",
finds the opening "(", consumes until the matching closing ")" by tracking paren
depth (so nested parentheses are handled), then checks for the "WITHIN GROUP
(ORDER BY ...)" suffix and rewrites to "ARRAY_AGG(<arg> ORDER BY <orderExpr>)";
refer to the preprocessArrayAggWithinGroup function and implement the
index-based parenthesis matching and substring replacement instead of using
/[^()]+?/.
| // Augment with TypeScript-level detection for comma-joins (implicit cross joins) | ||
| const findings = (data.findings as any[]) ?? [] | ||
| const commaJoinPattern = /\bFROM\b\s+(\w+)\s+\w*\s*,\s*(\w+)/gi | ||
| const match = commaJoinPattern.exec(params.sql) | ||
| if (match) { | ||
| const hasCommaJoinFinding = findings.some( | ||
| (f) => f.rule === "implicit_cross_join" || f.rule === "comma_join", | ||
| ) | ||
| if (!hasCommaJoinFinding) { | ||
| findings.push({ | ||
| rule: "implicit_cross_join", | ||
| severity: "warning", | ||
| message: `Implicit cross join detected: comma-separated tables in FROM clause (${match[1]}, ${match[2]}). Use explicit JOIN syntax for clarity.`, | ||
| explanation: "Comma-separated tables in the FROM clause create an implicit cross join. While a WHERE clause may filter the result, explicit JOIN...ON syntax is preferred for readability and to avoid accidental cartesian products.", | ||
| suggestion: "Rewrite using explicit JOIN syntax: FROM table1 JOIN table2 ON ...", | ||
| confidence: 0.8, | ||
| }) | ||
| data.findings = findings | ||
| if (findings.length > 0) data.valid = false | ||
| } | ||
| } |
There was a problem hiding this comment.
Comma-join regex may miss multi-table scenarios.
The pattern /\bFROM\b\s+(\w+)\s+\w*\s*,\s*(\w+)/gi captures only the first two tables. A query like FROM a, b, c would trigger only one warning for (a, b). Additionally, \w* after the first table expects an optional alias but may consume unexpected characters.
🔧 Proposed fix to detect any comma after FROM
- const commaJoinPattern = /\bFROM\b\s+(\w+)\s+\w*\s*,\s*(\w+)/gi
- const match = commaJoinPattern.exec(params.sql)
- if (match) {
+ // Simplified: detect any comma in FROM clause (before WHERE/JOIN/GROUP/ORDER/LIMIT)
+ const fromClause = params.sql.match(/\bFROM\b\s+([\s\S]+?)(?=\bWHERE\b|\bJOIN\b|\bGROUP\b|\bORDER\b|\bLIMIT\b|;|$)/i)
+ const hasCommaJoin = fromClause && fromClause[1].includes(",")
+ if (hasCommaJoin) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/opencode/src/altimate/native/altimate-core.ts` around lines 454 -
474, The current commaJoinPattern only catches the first two tables and can
mis-handle aliases; update the detection logic in altimate-core.ts (where
commaJoinPattern, params.sql, and findings are used) to detect any
comma-separated list in the FROM clause rather than just two tables — for
example, replace the regex with one that captures the full FROM clause up to the
next clause (e.g.
/\bFROM\b([^;()]*?)(?=\bWHERE\b|\bGROUP\b|\bORDER\b|\bLIMIT\b|$)/i) and then
check the captured text for a comma (/,/) to flag implicit cross joins; keep the
existing push into findings (rule "implicit_cross_join") and the data.valid
update, but ensure alias patterns like "AS alias" don't break detection.
Summary
Fixes all 18 tool issues discovered in the docs-as-spec test run (#422). Each issue has its own commit for easy review/revert.
P1 — Critical (3 fixes)
sql_diff— Handler returned wrong field names (diffinstead ofhas_changes/unified_diff), so the tool always said "identical". Added token-aware comparison.altimate_core_compare— Tool checkeddata.differencesbutCompareResultusesdiffs. Fixed field mapping.sql_fix— Showed unfixed SQL as "Auto-Fixed" becausefixed_sqlis always a string. Now checksresult.successand compares against original.P2 — Medium (10 fixes)
sql_rewrite— Added TypeScript-level detection for non-sargable predicates (YEAR(col) = N→ range predicate)sql_optimize— Integrated the same rewrite logic so optimized SQL is actually returnedsql_translate— Added 5 Snowflake→BigQuery transforms:TRY_TO_NUMBER→SAFE_CAST,ARRAY_AGG WITHIN GROUP→inlineORDER BY,IFF→IF(not CASE WHEN), QUALIFY preserved natively,FLATTEN→UNNESTaltimate_core_fix— Added keyword typo correction (SELCT→SELECT, WERE→WHERE) via Levenshtein distancealtimate_core_correct— Fixed[object Object]serialization bug (iterations is an array), added keyword correctionaltimate_core_equivalence— Added= X↔IN (X)normalization retry when Rust says "not equivalent"altimate_core_track_lineage— Fixed field access: edges are inqueries[].edges, not rootedgesaltimate_core_migration— Fixed field name:findingsnotrisks, mappedMigrationFindingfields correctlyaltimate_core_classify_pii— Added credit card column patterns (credit_card_number, cvv, etc.) as "Financial" PIIschema_diff— TransformedSchemaChangetagged union toColumnChangeformat with proper severity/summaryP3 — Low (4 fixes)
sql_format— Added TypeScript-level pretty-printing when core returns single-line SQLaltimate_core_semantics— Added implicit cross join (comma-join) detectionsql_analyze— AddedWINDOW_WITHOUT_PARTITIONandLARGE_IN_LISTrulesaltimate_core_policy— DML statements no longer fail validation; builds minimal schema from extracted table namesRoot causes
Most issues fell into two categories:
Test plan
bun turbo typecheckpasses (all 5 packages)bun test— 4445 pass, 30 fail (same as main, zero regressions)Closes #422
🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes
New Features
Improvements