Skip to content

fix: resolve all 18 docs-as-spec test failures#424

Open
mhallida wants to merge 18 commits intomainfrom
fix/docs-as-spec-18-issues
Open

fix: resolve all 18 docs-as-spec test failures#424
mhallida wants to merge 18 commits intomainfrom
fix/docs-as-spec-18-issues

Conversation

@mhallida
Copy link

@mhallida mhallida commented Mar 23, 2026

Summary

Fixes all 18 tool issues discovered in the docs-as-spec test run (#422). Each issue has its own commit for easy review/revert.

P1 — Critical (3 fixes)

  • sql_diff — Handler returned wrong field names (diff instead of has_changes/unified_diff), so the tool always said "identical". Added token-aware comparison.
  • altimate_core_compare — Tool checked data.differences but CompareResult uses diffs. Fixed field mapping.
  • sql_fix — Showed unfixed SQL as "Auto-Fixed" because fixed_sql is always a string. Now checks result.success and compares against original.

P2 — Medium (10 fixes)

  • sql_rewrite — Added TypeScript-level detection for non-sargable predicates (YEAR(col) = N → range predicate)
  • sql_optimize — Integrated the same rewrite logic so optimized SQL is actually returned
  • sql_translate — Added 5 Snowflake→BigQuery transforms: TRY_TO_NUMBERSAFE_CAST, ARRAY_AGG WITHIN GROUP→inline ORDER BY, IFFIF (not CASE WHEN), QUALIFY preserved natively, FLATTENUNNEST
  • altimate_core_fix — Added keyword typo correction (SELCT→SELECT, WERE→WHERE) via Levenshtein distance
  • altimate_core_correct — Fixed [object Object] serialization bug (iterations is an array), added keyword correction
  • altimate_core_equivalence — Added = XIN (X) normalization retry when Rust says "not equivalent"
  • altimate_core_track_lineage — Fixed field access: edges are in queries[].edges, not root edges
  • altimate_core_migration — Fixed field name: findings not risks, mapped MigrationFinding fields correctly
  • altimate_core_classify_pii — Added credit card column patterns (credit_card_number, cvv, etc.) as "Financial" PII
  • schema_diff — Transformed SchemaChange tagged union to ColumnChange format with proper severity/summary

P3 — Low (4 fixes)

  • sql_format — Added TypeScript-level pretty-printing when core returns single-line SQL
  • altimate_core_semantics — Added implicit cross join (comma-join) detection
  • sql_analyze — Added WINDOW_WITHOUT_PARTITION and LARGE_IN_LIST rules
  • altimate_core_policy — DML statements no longer fail validation; builds minimal schema from extracted table names

Root causes

Most issues fell into two categories:

  1. Field name mismatches between Rust NAPI return types and TypeScript tool code (compare, migration, track_lineage, schema_diff, sql_diff)
  2. Missing TypeScript-level augmentation where the Rust engine lacks certain features (keyword fixing, non-sargable rewrites, PII patterns, translate transforms)

Test plan

Closes #422

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

  • New Features

    • Snowflake-to-BigQuery SQL conversion enhancements for improved dialect compatibility
    • Keyword typo correction using fuzzy matching for SQL queries
    • Credit-card pattern detection for PII classification
  • Improvements

    • Enhanced SQL policy validation with JSON parsing fallback
    • Better equivalence checking with normalization retry logic
    • Improved SQL optimization suggestions and token-aware diff reporting
    • Enhanced column lineage tracking and migration risk assessment

mhallida and others added 18 commits March 23, 2026 16:35
Check result.success and compare fixed_sql against original_sql before
displaying the auto-fix section. Previously, fixed_sql was always truthy
(a string even when unchanged), causing the tool to show the original
broken SQL under an "Auto-Fixed SQL" heading.

Fixes TICKET-001 in #422.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The handler was returning {diff, equivalent} but the tool expected
{has_changes, unified_diff, additions, deletions, change_count, similarity}.
Since has_changes was always undefined (falsy), the tool always reported
"The two SQL queries are identical."

Now returns proper SqlDiffResult fields with token-aware comparison that
handles single-line SQL correctly.

Fixes TICKET-002 in #422.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Added TypeScript-level detection for YEAR(col) = NNNN and MONTH(col) = N
patterns that prevent index/partition pruning. The Rust core.rewrite()
doesn't handle these cases, so we augment with TS-level rewrites that
convert function-wrapped predicates to range predicates.

Fixes TICKET-003 in #422.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Integrated the TypeScript-level non-sargable predicate rewrites into
sql_optimize so it produces actual optimized SQL instead of just listing
anti-patterns with unchanged SQL.

Fixes TICKET-004 in #422.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- TRY_TO_NUMBER → SAFE_CAST(... AS NUMERIC)
- ARRAY_AGG(...) WITHIN GROUP (ORDER BY ...) → ARRAY_AGG(... ORDER BY ...)
- IFF(cond, a, b) → IF(cond, a, b) for BigQuery (not CASE WHEN)
- QUALIFY preserved as-is for BigQuery (native support since 2021)
- FLATTEN(input => expr) → UNNEST(expr)

BigQuery removed from QUALIFY_TARGETS since it supports QUALIFY natively.

Fixes TICKET-005 in #422.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When core.formatSql() returns single-line SQL, apply TypeScript-level
pretty-printing that breaks before major clauses (SELECT, FROM, WHERE,
etc.), indents sub-clauses (JOIN, AND, OR), and splits SELECT column
lists onto separate lines.

Fixes TICKET-006 in #422.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Added TypeScript-level keyword fuzzy matching using Levenshtein distance
before passing SQL to the Rust fix engine. Handles common keyword typos
like SELCT→SELECT, WERE→WHERE, FORM→FROM that the Rust engine can't fix
because it only does schema-based fuzzy matching.

Fixes TICKET-007 in #422.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…typos

1. Fixed serialization bug: CorrectionResult.iterations is an array of
   CorrectionIteration objects, not a scalar. Now properly formats each
   iteration step instead of printing [object Object].

2. Added keyword typo pre-correction (same as altimate_core_fix) so
   SELCT→SELECT, WERE→WHERE etc. are fixed before the Rust correction loop.

Fixes TICKET-008 in #422.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CompareResult from Rust uses {identical, diffs} not {differences}. The
tool was checking data.differences which was always undefined, causing
it to always report "Queries are structurally identical."

Fixes TICKET-009 in #422.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When Rust checkEquivalence reports queries as non-equivalent, retry with
normalized forms: convert = 'x' to IN ('x') and vice versa. This handles
the common case where WHERE col = 'active' and WHERE col IN ('active')
are semantically equivalent but the AST comparison misses it.

Fixes TICKET-010 in #422.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
LineageResult stores edges in queries[].edges and cross-query links in
impact_map, not at the root level. The tool was checking data.edges
which was always undefined. Now extracts edges from all nested queries
and impact_map entries.

Fixes TICKET-011 in #422.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Added TypeScript-level detection for comma-separated tables in the FROM
clause (e.g., FROM orders o, customers c). The Rust checkSemantics
doesn't flag this pattern. Now reports it as an implicit_cross_join
warning with a suggestion to use explicit JOIN syntax.

Fixes TICKET-012 in #422.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…Result

MigrationResult from Rust uses {findings, safe, overall_risk} not {risks}.
MigrationFinding uses {risk, operation, message, mitigation} not
{severity, type, recommendation}. The tool was checking data.risks which
was always undefined, so it always reported "safe."

Now correctly reads findings and maps MigrationFinding fields.

Fixes TICKET-013 in #422.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pass depth="full" to core.columnLineage() to enable multi-hop CTE
lineage tracing. Previously the depth parameter was omitted, potentially
causing the engine to only trace direct edges and miss columns flowing
through CTE chains (e.g., users.id → active_users CTE → output).

Fixes TICKET-014 in #422.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Rust PII classifier doesn't have a CreditCard category. Added
TypeScript-level pattern matching for credit_card_number, card_last_four,
card_expiry, cvv, cardholder, and similar column names, classified as
"Financial" PII with 0.9 confidence.

Fixes TICKET-015 in #422.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rmat

SchemaDiff.changes is a tagged union (column_added, column_removed,
column_type_changed, etc.) but the tool expected ColumnChange format
with {severity, change_type, message}. The raw changes had no severity
field, so has_breaking_changes was always false and the summary showed
"? → ? columns" with all zeros.

Now transforms each SchemaChange variant into ColumnChange with proper
severity, builds summary with actual column counts from the schemas.

Fixes TICKET-016 in #422.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Added TypeScript-level detection for two documented rules that the Rust
lint engine doesn't fire:

- WINDOW_WITHOUT_PARTITION: Flags window functions with OVER() or
  OVER(ORDER BY ...) but no PARTITION BY clause.
- LARGE_IN_LIST: Flags IN clauses with 20+ values, suggesting a CTE
  with VALUES for better readability and performance.

Fixes TICKET-017 in #422.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… policies

When no schema context is provided, the policy checker now extracts table
names from the SQL via extractMetadata() and builds a minimal schema so
DML statements (DELETE, UPDATE, INSERT, MERGE) pass validation.

As a fallback, if validation still fails, the handler parses the policy
JSON directly to check forbidden_operations rules against the statement
type, ensuring policy violations are reported even when full validation
isn't possible.

Fixes TICKET-018 in #422.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review.

Tip: disable this comment in your organization's Code Review settings.

@coderabbitai
Copy link

coderabbitai bot commented Mar 23, 2026

📝 Walkthrough

Walkthrough

This PR addresses multiple documented issues from the Docs-vs-Reality audit, introducing Snowflake→BigQuery SQL preprocessing, exported keyword typo correction via fuzzy matching, enhanced diff comparison using token normalization and Dice similarity, and improved output formatting and error handling across multiple tools.

Changes

Cohort / File(s) Summary
Core Preprocessing & Translation
packages/opencode/src/altimate/native/altimate-core.ts
Added Snowflake→BigQuery converters (preprocessTryToNumber, preprocessArrayAggWithinGroup, preprocessFlatten, preprocessSnowflakeToBigQuery), updated preprocessIff to emit BigQuery-specific IF(), introduced fuzzy-matching keyword typo correction (fixKeywordTypos), and enhanced handlers (fix, correct, policy, semantics, equivalence, classify_pii, column_lineage) with new detection logic and error recovery.
SQL Analysis & Transformation
packages/opencode/src/altimate/native/sql/register.ts
Added TypeScript-level linting for window partitioning and large IN lists, introduced dialect-specific preprocessing routing for Snowflake→BigQuery, added non-sargable predicate detection, replaced diff logic with token-aware comparison (Dice coefficient similarity), added pretty-print fallback, and updated schema_diff summary computation.
Tool Output Formatters
packages/opencode/src/altimate/tools/altimate-core-compare.ts, altimate-core-correct.ts, altimate-core-migration.ts, altimate-core-track-lineage.ts
Updated compare to prioritize diffs field, enhanced correct formatter for nested iterations and fallback descriptions, updated migration to use findings as primary collection, and augmented track-lineage with aggregated edge extraction and normalized endpoint formatting.
Fix Output Control
packages/opencode/src/altimate/tools/sql-fix.ts
Tightened conditional logic to suppress auto-fix section display when result is unsuccessful or fixed SQL is identical to original.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant TypeScript as TS Preprocessor
    participant Rust as Rust Parser/Emitter
    
    Client->>TypeScript: translate(sql, "snowflake", "bigquery")
    TypeScript->>TypeScript: preprocessSnowflakeToBigQuery(sql)
    TypeScript->>TypeScript: TRY_TO_NUMBER → SAFE_CAST
    TypeScript->>TypeScript: ARRAY_AGG(...) WITHIN GROUP → rewrite
    TypeScript->>TypeScript: FLATTEN(...) → UNNEST(...)
    TypeScript->>TypeScript: IFF(cond,a,b) → IF(cond,a,b)
    TypeScript->>Rust: send preprocessed_sql
    Rust->>Rust: parse & emit BigQuery dialect
    Rust-->>TypeScript: translated_sql
    TypeScript-->>Client: return translated_sql
Loading
sequenceDiagram
    participant Client
    participant Analyzer as SQL Analyzer
    participant Tokenizer as Token Normalizer
    participant Comparator as Similarity Engine
    
    Client->>Analyzer: diff(old_sql, new_sql)
    Analyzer->>Tokenizer: normalize whitespace & tokenize
    Tokenizer->>Tokenizer: split into tokens
    Tokenizer-->>Analyzer: normalized_old, normalized_new
    Analyzer->>Comparator: compute Dice coefficient (char bigrams)
    Comparator->>Comparator: additions & deletions count
    Comparator-->>Analyzer: similarity, change_count, unified_diff
    Analyzer-->>Client: { has_changes, similarity, changes, unified_diff }
Loading
sequenceDiagram
    participant Client
    participant Handler as Fix Handler
    participant TypeScript as TS Keyword Fixer
    participant Rust as Rust Fixer
    
    Client->>Handler: fix(sql)
    Handler->>TypeScript: fixKeywordTypos(sql)
    TypeScript->>TypeScript: Levenshtein distance matching
    TypeScript-->>Handler: { sql: corrected_sql, corrections: [...] }
    Handler->>Rust: invoke rust_fix(corrected_sql)
    Rust-->>Handler: { fixed_sql, changes, ... }
    Handler->>Handler: merge keyword corrections + rust changes
    Handler-->>Client: { fixed_sql, corrections, changes, success }
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Suggested labels

contributor

Suggested reviewers

  • mdesmet

Poem

🐰 Hoppy Hop, the Snowflake melts,
BigQuery's IF now gently dwelt,
Keyword typos fuzzy-fixed with care,
Diff tokens dance, similarity's fair,
From audit's bugs, new features bloom!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title directly summarizes the main objective: resolving all 18 docs-as-spec test failures from the linked issue #422.
Description check ✅ Passed The description includes all required sections: Summary explains what changed and why, Test Plan documents testing approach, and Checklist is completed with relevant items marked.
Linked Issues check ✅ Passed The PR comprehensively addresses all 18 issues from #422: 3 P1 critical fixes (sql_diff, altimate_core_compare, sql_fix), 10 P2 medium fixes (rewrites, translates, core tools), and 4 P3 low fixes (format, semantics, analyze, policy).
Out of Scope Changes check ✅ Passed All changes directly address the 18 documented failures in #422. No out-of-scope modifications; scope includes TypeScript augmentations, field-name corrections, and Snowflake→BigQuery transforms as specified.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/docs-as-spec-18-issues

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (9)
packages/opencode/src/altimate/native/sql/register.ts (3)

739-757: Summary computation iterates schemas twice unnecessarily.

oldSchema.tableNames() and newSchema.tableNames() are called, then for each table, columnNames() is called. This is fine for small schemas but could be optimized.

For large schemas, consider caching the column counts during the change iteration loop.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opencode/src/altimate/native/sql/register.ts` around lines 739 -
757, The summary recomputes old_column_count and new_column_count by
re-iterating oldSchema.tableNames() and newSchema.tableNames(); instead, compute
and cache column counts during the primary change-collection loop (or build a
table->columnCount map) so you don't call columnNames() twice. Modify the code
that builds changes (the logic that produces changes, dropped, added,
typeChanged, renamed) to increment two counters (oldColCount, newColCount) as
you visit each table/column (or store column arrays in a map keyed by table
name) and then use those cached counts/values in the summary object instead of
re-iterating oldSchema and newSchema.

256-314: prettyPrintSql is a reasonable fallback formatter.

The function handles major clauses, sub-clauses, and SELECT column splitting. However, it may produce unexpected results for complex nested queries or CASE expressions within SELECT lists.

Consider adding a comment noting this is a best-effort formatter and may not handle all edge cases perfectly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opencode/src/altimate/native/sql/register.ts` around lines 256 -
314, The prettyPrintSql fallback formatter (function prettyPrintSql) is
best-effort and can mis-handle complex nested queries and CASE expressions; add
a short explanatory comment at the top of the prettyPrintSql function noting it
is a non-fully-compliant, best-effort formatter that may not correctly format
all edge cases (nested subqueries, CASE in SELECT lists, complex JOIN/ON
nesting) so future maintainers know its limitations and when to use a proper SQL
formatter library.

574-593: MONTH pattern rewrite is less useful without year context.

The comment /* consider range predicate for partition pruning */ is accurate—EXTRACT(MONTH FROM col) = N is still non-sargable. This provides awareness but not an actual fix.

Consider documenting that this is an advisory rewrite rather than a performance optimization, or skip emitting it if no actionable change is made.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opencode/src/altimate/native/sql/register.ts` around lines 574 -
593, The current MONTH(...) rewrite replaces the original SQL with EXTRACT(MONTH
FROM ...) yet offers no actionable optimization and remains non-sargable; update
the logic around monthPattern, rewrittenSql, and the rewrites push so you either
(A) skip performing the replacement in rewrittenSql and instead push an advisory
entry (e.g., rule "ADVISORY_NON_SARGABLE" or set rewritten_fragment to the
original and rewritten_sql to "") with a clear explanation that this is advisory
only and lower confidence, or (B) only emit the replacement and rewrite entry
when year context is present (detect a YEAR(...) or date range on the same
column) so the change is actionable; adjust the
replacement/explanation/confidence accordingly and ensure entries pushed to
rewrites accurately reflect advisory vs. actionable changes (refer to
monthPattern, replacement, rewrittenSql, and the "NON_SARGABLE_PREDICATE"
entry).
packages/opencode/src/altimate/native/altimate-core.ts (4)

171-205: Keyword correction may produce false positives on identifier-like tokens.

The heuristic on line 179 skips tokens with underscores or lowercase starting chars longer than 3 characters, but tokens like SELCT (5 chars, no underscore, starts uppercase) would be corrected. However, user-defined functions or table aliases matching the pattern could also be incorrectly "fixed."

Consider adding a blocklist for common identifier patterns or requiring the token to appear in a keyword-position context (e.g., at statement start or after specific punctuation).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opencode/src/altimate/native/altimate-core.ts` around lines 171 -
205, The fixKeywordTypos function is producing false positives for
identifier-like tokens (e.g., user-defined names like table aliases) — update
fixKeywordTypos to be stricter: add a small blocklist of common identifier
patterns/names (or a regex check for typical identifiers like camelCase, numeric
suffixes, or known function names) and require keyword-position context before
applying corrections (e.g., token appears at start of statement or immediately
after punctuation/keywords such as "SELECT", "FROM", "WHERE", "(", "," ), using
the existing SQL_KEYWORDS and levenshtein logic; narrow the replacement
condition so only tokens that both pass the context check and are not in the
identifier blocklist are considered for correction and record corrections in
corrections[] as before.

711-727: Hardcoding "full" depth for CTE lineage tracing may have performance implications.

Forcing full depth ensures multi-hop CTEs are traced, but could be slow for complex queries with deep nesting. Consider making this configurable via params.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opencode/src/altimate/native/altimate-core.ts` around lines 711 -
727, The current registration for "altimate_core.column_lineage" hardcodes the
CTE tracing depth to "full" which can cause performance issues for deeply nested
queries; modify the handler so it reads an optional params.depth (or similar)
value and passes that to core.columnLineage instead of always using "full",
defaulting to "full" only when params.depth is undefined; update the parameter
name (e.g., depth or cte_depth) and validation in the register callback, and
ensure callers can pass this new param while preserving existing behavior when
it's not provided.

385-397: Fallback schema creation for DML is a pragmatic workaround.

Creating minimal CREATE TABLE x (id INT) stubs allows DML policy checks to proceed. However, if extractMetadata returns qualified names (e.g., schema.table), the generated DDL may be syntactically invalid for some dialects.

Consider quoting table names for safety
         if (meta.tables?.length) {
           const ddl = meta.tables
-            .map((t: string) => `CREATE TABLE ${t} (id INT);`)
+            .map((t: string) => `CREATE TABLE "${t}" (id INT);`)
             .join("\n")
           schema = core.Schema.fromDdl(ddl)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opencode/src/altimate/native/altimate-core.ts` around lines 385 -
397, The fallback builds DDL like `CREATE TABLE ${t} (id INT);` which breaks
when core.extractMetadata returns qualified names (e.g., schema.table); update
the fallback in the block that checks params.schema_path and
params.schema_context to safely quote or split identifiers from meta.tables
before passing to core.Schema.fromDdl: for each table name from meta.tables,
detect qualified names, split on the dot into segments, quote each segment with
a safe identifier quote (e.g., double quotes) or escape characters, then
reassemble into a properly quoted CREATE TABLE statement (CREATE TABLE
"schema"."table" (id INT);) so the generated DDL is syntactically valid across
dialects.

151-165: Levenshtein implementation is correct but allocates O(m×n) memory.

For typical SQL keyword lengths (≤12 chars due to the filter at line 182), this is fine. However, the function is called for every token against all ~100 keywords, which could be slow for very large SQL strings.

For future optimization, consider early-exit when the length difference exceeds the max allowed distance, or use a BK-tree for keyword lookup.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opencode/src/altimate/native/altimate-core.ts` around lines 151 -
165, The levenshtein(a: string, b: string) function currently allocates O(m×n)
memory and computes full distance every time; change it to an efficient bounded
version that accepts a maxDistance parameter and early-exits when abs(a.length -
b.length) > maxDistance or when all entries in the current DP row exceed
maxDistance, and reduce memory to O(min(m,n)) using only two rolling rows;
update callers (the token-vs-keyword checks) to pass the allowed max distance
(the same threshold used for short SQL keywords) so far-away candidates are
skipped quickly.
packages/opencode/src/altimate/tools/altimate-core-migration.ts (1)

47-56: MigrationFinding field mapping assumes undocumented structure.

The code expects r.risk, r.operation, r.mitigation, and r.rollback_sql, but the Rust handler doesn't transform fields. If the actual response uses different names (e.g., severity instead of risk, type instead of operation, recommendation instead of mitigation), the fallbacks handle some cases but rollback_sql has no fallback.

Consider adding a shaping layer in the altimate_core.migration handler (similar to other handlers in this PR) to normalize field names, rather than relying on fallbacks in the formatting function.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opencode/src/altimate/tools/altimate-core-migration.ts` around lines
47 - 56, The MigrationFinding mapping is fragile: update the
altimate_core.migration handler to add a shaping/normalization layer that
canonicalizes incoming fields (e.g., map severity↔risk, type↔operation,
recommendation↔mitigation and any rollback/rollback_sql variants to a single
rollback_sql property) so the formatter loop in altimate-core-migration.ts (the
for (const r of findings) block and its uses of r.risk, r.operation,
r.mitigation, r.recommendation, r.rollback_sql) can rely on normalized names;
then simplify the formatter to read the canonical fields (severity, operation,
mitigation, rollback_sql) instead of relying on ad-hoc fallback expressions.
packages/opencode/src/altimate/tools/altimate-core-track-lineage.ts (1)

53-65: Synthetic cross-query edges from impact_map may duplicate existing edges.

If the same source→target relationship exists in both queries[].edges and impact_map, it will appear twice in allEdges. Consider deduplicating.

🔧 Optional: Add deduplication
+  // Deduplicate edges by source+target key
+  const seen = new Set<string>()
+  const dedupedEdges: Array<Record<string, any>> = []
+  for (const edge of allEdges) {
+    const key = `${formatEdgeRef(edge.source)}→${formatEdgeRef(edge.target)}`
+    if (!seen.has(key)) {
+      seen.add(key)
+      dedupedEdges.push(edge)
+    }
+  }
+  return dedupedEdges
-  return allEdges
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opencode/src/altimate/tools/altimate-core-track-lineage.ts` around
lines 53 - 65, The code that merges synthetic edges from data.impact_map into
allEdges can create duplicates when the same source→target already exists in
queries[].edges; modify the logic in the block that iterates data.impact_map
(using entry.source and entry.affected) to perform deduplication before pushing:
maintain a Set of unique keys (e.g. `${source}|${target}|${transform_type}`)
built from existing allEdges (which includes edges from queries[].edges) and
only push new cross_query entries if their key is not already present, or
alternatively dedupe allEdges after concatenation by reducing to unique keys.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/opencode/src/altimate/native/altimate-core.ts`:
- Around line 67-72: The regex in preprocessTryToNumber currently captures
commas and thus mangles multi-argument forms like TRY_TO_NUMBER(expr, format);
update the matcher to only convert single-argument calls by forbidding commas
inside the capture (e.g. change the capture from [^()]+? to something like
[^(),]+?) so only TRY_TO_NUMBER(expr) becomes SAFE_CAST(expr AS NUMERIC),
leaving multi-argument variants unmodified for Rust transpile/failure; edit the
regex in preprocessTryToNumber accordingly.
- Around line 454-474: The current commaJoinPattern only catches the first two
tables and can mis-handle aliases; update the detection logic in
altimate-core.ts (where commaJoinPattern, params.sql, and findings are used) to
detect any comma-separated list in the FROM clause rather than just two tables —
for example, replace the regex with one that captures the full FROM clause up to
the next clause (e.g.
/\bFROM\b([^;()]*?)(?=\bWHERE\b|\bGROUP\b|\bORDER\b|\bLIMIT\b|$)/i) and then
check the captured text for a comma (/,/) to flag implicit cross joins; keep the
existing push into findings (rule "implicit_cross_join") and the data.valid
update, but ensure alias patterns like "AS alias" don't break detection.
- Around line 78-83: preprocessArrayAggWithinGroup's regex fails on nested
parentheses (e.g. ARRAY_AGG(COALESCE(x,0)) WITHIN GROUP ...); replace the regex
approach in preprocessArrayAggWithinGroup with a small parser that scans for
"ARRAY_AGG", finds the opening "(", consumes until the matching closing ")" by
tracking paren depth (so nested parentheses are handled), then checks for the
"WITHIN GROUP (ORDER BY ...)" suffix and rewrites to "ARRAY_AGG(<arg> ORDER BY
<orderExpr>)"; refer to the preprocessArrayAggWithinGroup function and implement
the index-based parenthesis matching and substring replacement instead of using
/[^()]+?/.

---

Nitpick comments:
In `@packages/opencode/src/altimate/native/altimate-core.ts`:
- Around line 171-205: The fixKeywordTypos function is producing false positives
for identifier-like tokens (e.g., user-defined names like table aliases) —
update fixKeywordTypos to be stricter: add a small blocklist of common
identifier patterns/names (or a regex check for typical identifiers like
camelCase, numeric suffixes, or known function names) and require
keyword-position context before applying corrections (e.g., token appears at
start of statement or immediately after punctuation/keywords such as "SELECT",
"FROM", "WHERE", "(", "," ), using the existing SQL_KEYWORDS and levenshtein
logic; narrow the replacement condition so only tokens that both pass the
context check and are not in the identifier blocklist are considered for
correction and record corrections in corrections[] as before.
- Around line 711-727: The current registration for
"altimate_core.column_lineage" hardcodes the CTE tracing depth to "full" which
can cause performance issues for deeply nested queries; modify the handler so it
reads an optional params.depth (or similar) value and passes that to
core.columnLineage instead of always using "full", defaulting to "full" only
when params.depth is undefined; update the parameter name (e.g., depth or
cte_depth) and validation in the register callback, and ensure callers can pass
this new param while preserving existing behavior when it's not provided.
- Around line 385-397: The fallback builds DDL like `CREATE TABLE ${t} (id
INT);` which breaks when core.extractMetadata returns qualified names (e.g.,
schema.table); update the fallback in the block that checks params.schema_path
and params.schema_context to safely quote or split identifiers from meta.tables
before passing to core.Schema.fromDdl: for each table name from meta.tables,
detect qualified names, split on the dot into segments, quote each segment with
a safe identifier quote (e.g., double quotes) or escape characters, then
reassemble into a properly quoted CREATE TABLE statement (CREATE TABLE
"schema"."table" (id INT);) so the generated DDL is syntactically valid across
dialects.
- Around line 151-165: The levenshtein(a: string, b: string) function currently
allocates O(m×n) memory and computes full distance every time; change it to an
efficient bounded version that accepts a maxDistance parameter and early-exits
when abs(a.length - b.length) > maxDistance or when all entries in the current
DP row exceed maxDistance, and reduce memory to O(min(m,n)) using only two
rolling rows; update callers (the token-vs-keyword checks) to pass the allowed
max distance (the same threshold used for short SQL keywords) so far-away
candidates are skipped quickly.

In `@packages/opencode/src/altimate/native/sql/register.ts`:
- Around line 739-757: The summary recomputes old_column_count and
new_column_count by re-iterating oldSchema.tableNames() and
newSchema.tableNames(); instead, compute and cache column counts during the
primary change-collection loop (or build a table->columnCount map) so you don't
call columnNames() twice. Modify the code that builds changes (the logic that
produces changes, dropped, added, typeChanged, renamed) to increment two
counters (oldColCount, newColCount) as you visit each table/column (or store
column arrays in a map keyed by table name) and then use those cached
counts/values in the summary object instead of re-iterating oldSchema and
newSchema.
- Around line 256-314: The prettyPrintSql fallback formatter (function
prettyPrintSql) is best-effort and can mis-handle complex nested queries and
CASE expressions; add a short explanatory comment at the top of the
prettyPrintSql function noting it is a non-fully-compliant, best-effort
formatter that may not correctly format all edge cases (nested subqueries, CASE
in SELECT lists, complex JOIN/ON nesting) so future maintainers know its
limitations and when to use a proper SQL formatter library.
- Around line 574-593: The current MONTH(...) rewrite replaces the original SQL
with EXTRACT(MONTH FROM ...) yet offers no actionable optimization and remains
non-sargable; update the logic around monthPattern, rewrittenSql, and the
rewrites push so you either (A) skip performing the replacement in rewrittenSql
and instead push an advisory entry (e.g., rule "ADVISORY_NON_SARGABLE" or set
rewritten_fragment to the original and rewritten_sql to "") with a clear
explanation that this is advisory only and lower confidence, or (B) only emit
the replacement and rewrite entry when year context is present (detect a
YEAR(...) or date range on the same column) so the change is actionable; adjust
the replacement/explanation/confidence accordingly and ensure entries pushed to
rewrites accurately reflect advisory vs. actionable changes (refer to
monthPattern, replacement, rewrittenSql, and the "NON_SARGABLE_PREDICATE"
entry).

In `@packages/opencode/src/altimate/tools/altimate-core-migration.ts`:
- Around line 47-56: The MigrationFinding mapping is fragile: update the
altimate_core.migration handler to add a shaping/normalization layer that
canonicalizes incoming fields (e.g., map severity↔risk, type↔operation,
recommendation↔mitigation and any rollback/rollback_sql variants to a single
rollback_sql property) so the formatter loop in altimate-core-migration.ts (the
for (const r of findings) block and its uses of r.risk, r.operation,
r.mitigation, r.recommendation, r.rollback_sql) can rely on normalized names;
then simplify the formatter to read the canonical fields (severity, operation,
mitigation, rollback_sql) instead of relying on ad-hoc fallback expressions.

In `@packages/opencode/src/altimate/tools/altimate-core-track-lineage.ts`:
- Around line 53-65: The code that merges synthetic edges from data.impact_map
into allEdges can create duplicates when the same source→target already exists
in queries[].edges; modify the logic in the block that iterates data.impact_map
(using entry.source and entry.affected) to perform deduplication before pushing:
maintain a Set of unique keys (e.g. `${source}|${target}|${transform_type}`)
built from existing allEdges (which includes edges from queries[].edges) and
only push new cross_query entries if their key is not already present, or
alternatively dedupe allEdges after concatenation by reducing to unique keys.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 286f03ca-42cd-472e-8336-a2b4d7fcb872

📥 Commits

Reviewing files that changed from the base of the PR and between 528af75 and 0f91f3c.

📒 Files selected for processing (7)
  • packages/opencode/src/altimate/native/altimate-core.ts
  • packages/opencode/src/altimate/native/sql/register.ts
  • packages/opencode/src/altimate/tools/altimate-core-compare.ts
  • packages/opencode/src/altimate/tools/altimate-core-correct.ts
  • packages/opencode/src/altimate/tools/altimate-core-migration.ts
  • packages/opencode/src/altimate/tools/altimate-core-track-lineage.ts
  • packages/opencode/src/altimate/tools/sql-fix.ts

Comment on lines +67 to +72
export function preprocessTryToNumber(sql: string): string {
return sql.replace(
/\bTRY_TO_(?:NUMBER|DECIMAL|NUMERIC)\s*\(\s*([^()]+?)\s*\)/gi,
"SAFE_CAST($1 AS NUMERIC)",
)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Regex doesn't handle multi-argument TRY_TO_NUMBER variants.

TRY_TO_NUMBER(expr, format) and TRY_TO_NUMBER(expr, precision, scale) are valid Snowflake syntax. The current regex with non-greedy [^()]+? will match the entire content including commas, producing invalid SAFE_CAST(expr, format AS NUMERIC).

🔧 Proposed fix to handle single-argument only
 export function preprocessTryToNumber(sql: string): string {
   return sql.replace(
-    /\bTRY_TO_(?:NUMBER|DECIMAL|NUMERIC)\s*\(\s*([^()]+?)\s*\)/gi,
+    /\bTRY_TO_(?:NUMBER|DECIMAL|NUMERIC)\s*\(\s*([^(),]+?)\s*\)/gi,
     "SAFE_CAST($1 AS NUMERIC)",
   )
 }

Alternatively, leave multi-argument forms unconverted and let Rust transpile handle them (or fail gracefully).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export function preprocessTryToNumber(sql: string): string {
return sql.replace(
/\bTRY_TO_(?:NUMBER|DECIMAL|NUMERIC)\s*\(\s*([^()]+?)\s*\)/gi,
"SAFE_CAST($1 AS NUMERIC)",
)
}
export function preprocessTryToNumber(sql: string): string {
return sql.replace(
/\bTRY_TO_(?:NUMBER|DECIMAL|NUMERIC)\s*\(\s*([^(),]+?)\s*\)/gi,
"SAFE_CAST($1 AS NUMERIC)",
)
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opencode/src/altimate/native/altimate-core.ts` around lines 67 - 72,
The regex in preprocessTryToNumber currently captures commas and thus mangles
multi-argument forms like TRY_TO_NUMBER(expr, format); update the matcher to
only convert single-argument calls by forbidding commas inside the capture (e.g.
change the capture from [^()]+? to something like [^(),]+?) so only
TRY_TO_NUMBER(expr) becomes SAFE_CAST(expr AS NUMERIC), leaving multi-argument
variants unmodified for Rust transpile/failure; edit the regex in
preprocessTryToNumber accordingly.

Comment on lines +78 to +83
export function preprocessArrayAggWithinGroup(sql: string): string {
return sql.replace(
/\bARRAY_AGG\s*\(\s*([^()]+?)\s*\)\s*WITHIN\s+GROUP\s*\(\s*ORDER\s+BY\s+([^()]+?)\s*\)/gi,
"ARRAY_AGG($1 ORDER BY $2)",
)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for ARRAY_AGG usage patterns in test files to assess complexity
rg -n "ARRAY_AGG.*WITHIN\s+GROUP" --type-add 'test:*.test.ts' --type test

Repository: AltimateAI/altimate-code

Length of output: 50


🏁 Script executed:

# Find usages of preprocessArrayAggWithinGroup function
rg -n "preprocessArrayAggWithinGroup"

Repository: AltimateAI/altimate-code

Length of output: 302


🏁 Script executed:

# Search for test files related to altimate-core
fd -e test.ts -e spec.ts "altimate-core"

Repository: AltimateAI/altimate-code

Length of output: 251


🏁 Script executed:

# Search for ARRAY_AGG patterns in the entire codebase
rg -n "ARRAY_AGG" --type ts --type tsx

Repository: AltimateAI/altimate-code

Length of output: 95


🏁 Script executed:

# Look for SQL strings that might contain ARRAY_AGG with parentheses
rg -n "ARRAY_AGG\s*\(" --type ts --type tsx -A 3

Repository: AltimateAI/altimate-code

Length of output: 95


🏁 Script executed:

# Search for ARRAY_AGG tests in the test files
rg -n "ARRAY_AGG" packages/opencode/test/altimate/

Repository: AltimateAI/altimate-code

Length of output: 50


🏁 Script executed:

# Check the test file for preprocessArrayAggWithinGroup
rg -n "preprocessArrayAggWithinGroup" packages/opencode/test/

Repository: AltimateAI/altimate-code

Length of output: 50


🏁 Script executed:

# Look at the context around line 101 in altimate-core.ts
sed -n '95,110p' packages/opencode/src/altimate/native/altimate-core.ts

Repository: AltimateAI/altimate-code

Length of output: 566


🏁 Script executed:

# View the entire function and surrounding context
sed -n '78,85p' packages/opencode/src/altimate/native/altimate-core.ts

Repository: AltimateAI/altimate-code

Length of output: 297


🏁 Script executed:

# View the altimate-core test files to find test cases
head -100 packages/opencode/test/altimate/altimate-core-native.test.ts

Repository: AltimateAI/altimate-code

Length of output: 3608


🏁 Script executed:

# Search for test patterns with ARRAY_AGG
rg -n "ARRAY_AGG" packages/opencode/test/ -A 2 -B 2

Repository: AltimateAI/altimate-code

Length of output: 50


🏁 Script executed:

# Look for any mention of WITHIN GROUP in tests
rg -n "WITHIN.*GROUP" packages/opencode/test/ -A 2 -B 2

Repository: AltimateAI/altimate-code

Length of output: 50


🏁 Script executed:

# Search for .sql files or SQL strings in test directories
find packages/opencode/test -type f \( -name "*.sql" -o -name "*.json" \)

Repository: AltimateAI/altimate-code

Length of output: 169


🏁 Script executed:

# Check what's in the e2e test file
head -50 packages/opencode/test/altimate/altimate-core-e2e.test.ts

Repository: AltimateAI/altimate-code

Length of output: 1529


🏁 Script executed:

# Search entire codebase for any ARRAY_AGG patterns (case insensitive)
rg -i "array_agg" packages/opencode/ --type ts

Repository: AltimateAI/altimate-code

Length of output: 529


Regex cannot handle nested parentheses in ARRAY_AGG expressions.

The pattern [^()]+? explicitly excludes parentheses and will fail to match expressions like ARRAY_AGG(COALESCE(x, 0)) WITHIN GROUP (ORDER BY ...). No test cases in the codebase exercise nested parentheses, but this is a correctness issue that should be addressed if such expressions could appear in converted SQL.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opencode/src/altimate/native/altimate-core.ts` around lines 78 - 83,
preprocessArrayAggWithinGroup's regex fails on nested parentheses (e.g.
ARRAY_AGG(COALESCE(x,0)) WITHIN GROUP ...); replace the regex approach in
preprocessArrayAggWithinGroup with a small parser that scans for "ARRAY_AGG",
finds the opening "(", consumes until the matching closing ")" by tracking paren
depth (so nested parentheses are handled), then checks for the "WITHIN GROUP
(ORDER BY ...)" suffix and rewrites to "ARRAY_AGG(<arg> ORDER BY <orderExpr>)";
refer to the preprocessArrayAggWithinGroup function and implement the
index-based parenthesis matching and substring replacement instead of using
/[^()]+?/.

Comment on lines +454 to +474
// Augment with TypeScript-level detection for comma-joins (implicit cross joins)
const findings = (data.findings as any[]) ?? []
const commaJoinPattern = /\bFROM\b\s+(\w+)\s+\w*\s*,\s*(\w+)/gi
const match = commaJoinPattern.exec(params.sql)
if (match) {
const hasCommaJoinFinding = findings.some(
(f) => f.rule === "implicit_cross_join" || f.rule === "comma_join",
)
if (!hasCommaJoinFinding) {
findings.push({
rule: "implicit_cross_join",
severity: "warning",
message: `Implicit cross join detected: comma-separated tables in FROM clause (${match[1]}, ${match[2]}). Use explicit JOIN syntax for clarity.`,
explanation: "Comma-separated tables in the FROM clause create an implicit cross join. While a WHERE clause may filter the result, explicit JOIN...ON syntax is preferred for readability and to avoid accidental cartesian products.",
suggestion: "Rewrite using explicit JOIN syntax: FROM table1 JOIN table2 ON ...",
confidence: 0.8,
})
data.findings = findings
if (findings.length > 0) data.valid = false
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Comma-join regex may miss multi-table scenarios.

The pattern /\bFROM\b\s+(\w+)\s+\w*\s*,\s*(\w+)/gi captures only the first two tables. A query like FROM a, b, c would trigger only one warning for (a, b). Additionally, \w* after the first table expects an optional alias but may consume unexpected characters.

🔧 Proposed fix to detect any comma after FROM
-    const commaJoinPattern = /\bFROM\b\s+(\w+)\s+\w*\s*,\s*(\w+)/gi
-    const match = commaJoinPattern.exec(params.sql)
-    if (match) {
+    // Simplified: detect any comma in FROM clause (before WHERE/JOIN/GROUP/ORDER/LIMIT)
+    const fromClause = params.sql.match(/\bFROM\b\s+([\s\S]+?)(?=\bWHERE\b|\bJOIN\b|\bGROUP\b|\bORDER\b|\bLIMIT\b|;|$)/i)
+    const hasCommaJoin = fromClause && fromClause[1].includes(",")
+    if (hasCommaJoin) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opencode/src/altimate/native/altimate-core.ts` around lines 454 -
474, The current commaJoinPattern only catches the first two tables and can
mis-handle aliases; update the detection logic in altimate-core.ts (where
commaJoinPattern, params.sql, and findings are used) to detect any
comma-separated list in the FROM clause rather than just two tables — for
example, replace the regex with one that captures the full FROM clause up to the
next clause (e.g.
/\bFROM\b([^;()]*?)(?=\bWHERE\b|\bGROUP\b|\bORDER\b|\bLIMIT\b|$)/i) and then
check the captured text for a comma (/,/) to flag implicit cross joins; keep the
existing push into findings (rule "implicit_cross_join") and the data.valid
update, but ensure alias patterns like "AS alias" don't break detection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

18 tool issues found across SQL Tools, Core Tools, Schema Tools, and Lineage

1 participant