Area
- Specification (RFCs)
- Package & tests
- Documentation
Summary
RFC 012 proposes a single canonical scalar expression model for row-level relational meaning in InQL. Filters, computed projection values, grouping keys, and aggregate arguments should all lower through the same scalar-expression contract, while aggregate outputs remain a distinct aggregate-measure layer. This matters because the current direction invites split mini-DSLs for predicates, literals, and projection expressions, which makes the package surface harder to learn, duplicates semantics across planning/lowering layers, and creates room for silent degradation when one public surface accepts expression shapes it cannot actually represent faithfully.
Motivation
The current design pressure is clear: filter(...), with_column(...), grouping keys, and aggregate inputs are all expressing row-level meaning, but they are easy to model as separate builder families because features have been landing incrementally. That split is the wrong end state. It forces authors to remember which helper family belongs to which surface, encourages duplicated semantics across the package/Prism/Substrait boundary, and makes future concise DSL sugar harder because there is no single lowering target.
The most important technical motivation is correctness. InQL should not accept a broad expression shape in a public API and then quietly reinterpret or drop that shape downstream. Unsupported expressions must fail explicitly. RFC 012 is the design step that makes that rule coherent across the whole relational surface instead of one method at a time.
This is also now timely because the package has real filter, aggregate, and computed-column slices in flight. If the expression model is not unified soon, more surface area will accrete around the current split.
RFC document path: docs/rfcs/012_unified_scalar_expression_surface.md
Proposal sketch
Define one canonical scalar expression model for row-level relational authoring in InQL.
The contract is:
- row-level filters consume scalar expressions
- computed projection values consume scalar expressions
- grouping keys consume scalar expressions
- aggregate inputs consume scalar expressions
- aggregate outputs are not row-level scalar expressions; they remain a distinct aggregate-measure layer
Illustrative author-facing shape:
from pub::inql import LazyFrame
from pub::inql.functions import col, lit, gt, add, sum, count
from models import Order, OrderSummary
def enrich_orders(orders: LazyFrame[Order]) -> LazyFrame[Order]:
return (
orders
.filter(gt(col("amount"), lit(100)))
.with_column("amount_plus_fee", add(col("amount"), lit(5)))
)
def summarize_orders(orders: LazyFrame[Order]) -> LazyFrame[OrderSummary]:
return (
orders
.group_by([col("customer_id")])
.agg([sum(col("amount")), count()])
)
Key design constraints:
- unsupported expression shapes must fail explicitly; silent degradation is forbidden
- future concise surfaces such as
.amount > 100 or sum(.amount) should lower into this same model rather than creating a separate semantic path
- the RFC is about the InQL contract, not about introducing new Incan parser syntax directly
- aggregate outputs remain distinct from row-level scalar expressions unless a later RFC says otherwise
The current draft RFC is here:
docs/rfcs/012_unified_scalar_expression_surface.md
Alternatives considered
Keep predicate, literal, and projection surfaces separate.
This preserves the current incremental direction, but it duplicates concepts that are semantically the same and makes drift between authoring surfaces more likely.
Unify only literals.
This is too small. The real problem is not helper naming alone; it is that row-level meaning is being expressed through multiple semantic systems.
Treat aggregate calls as ordinary scalar expressions everywhere.
This collapses a real semantic boundary. Aggregate outputs are group-level values, not row-level values, and blurring that distinction makes typing and position rules less coherent.
Wait for concise DSL syntax first.
That would postpone the contract question until after more syntax work lands, which is backwards. Concise syntax needs a stable lowering target first.
Impact / compatibility
This is additive as an RFC, but it will likely require cleanup of existing public builder families in the InQL package.
Likely compatibility consequences:
- legacy typed literal helpers may need to become compatibility shims
- legacy predicate-specific wrappers may need deprecation if they survive at all
- docs and examples will need to present one canonical row-level expression model
- Prism and Substrait lowering will need one shared contract for scalar expressions and one shared contract for aggregate measures
This should improve migration quality, not hurt it, because it gives a clear north star instead of letting surface drift continue.
Implementation notes (optional)
The relevant design record already exists:
docs/rfcs/012_unified_scalar_expression_surface.md
Likely touch points if accepted:
- this repo:
- RFCs 001, 003, 004, and 007 for cross-RFC coherence
- package builder surfaces and tests
- Prism logical representation
- Substrait lowering and validation
- docs/reference/examples
- Incan:
- only insofar as future scoped DSL sugar from RFC 040 / RFC 045 needs to lower into the InQL-defined scalar-expression contract
Testing should focus on semantic consistency:
- filter, computed projection, grouping keys, and aggregate inputs should all share one expression contract
- unsupported shapes should fail explicitly instead of degrading silently
- package, planning, and lowering layers should agree on the scalar-versus-aggregate boundary
I checked current open InQL issues before drafting this. There are open RFC issues for 000, 003, 004, 005, 006, and 007, plus feature work, but nothing open that already tracks RFC 012 directly.
Checklist
Area
Summary
RFC 012 proposes a single canonical scalar expression model for row-level relational meaning in InQL. Filters, computed projection values, grouping keys, and aggregate arguments should all lower through the same scalar-expression contract, while aggregate outputs remain a distinct aggregate-measure layer. This matters because the current direction invites split mini-DSLs for predicates, literals, and projection expressions, which makes the package surface harder to learn, duplicates semantics across planning/lowering layers, and creates room for silent degradation when one public surface accepts expression shapes it cannot actually represent faithfully.
Motivation
The current design pressure is clear:
filter(...),with_column(...), grouping keys, and aggregate inputs are all expressing row-level meaning, but they are easy to model as separate builder families because features have been landing incrementally. That split is the wrong end state. It forces authors to remember which helper family belongs to which surface, encourages duplicated semantics across the package/Prism/Substrait boundary, and makes future concise DSL sugar harder because there is no single lowering target.The most important technical motivation is correctness. InQL should not accept a broad expression shape in a public API and then quietly reinterpret or drop that shape downstream. Unsupported expressions must fail explicitly. RFC 012 is the design step that makes that rule coherent across the whole relational surface instead of one method at a time.
This is also now timely because the package has real filter, aggregate, and computed-column slices in flight. If the expression model is not unified soon, more surface area will accrete around the current split.
RFC document path:
docs/rfcs/012_unified_scalar_expression_surface.mdProposal sketch
Define one canonical scalar expression model for row-level relational authoring in InQL.
The contract is:
Illustrative author-facing shape:
Key design constraints:
.amount > 100orsum(.amount)should lower into this same model rather than creating a separate semantic pathThe current draft RFC is here:
docs/rfcs/012_unified_scalar_expression_surface.mdAlternatives considered
Keep predicate, literal, and projection surfaces separate.
This preserves the current incremental direction, but it duplicates concepts that are semantically the same and makes drift between authoring surfaces more likely.
Unify only literals.
This is too small. The real problem is not helper naming alone; it is that row-level meaning is being expressed through multiple semantic systems.
Treat aggregate calls as ordinary scalar expressions everywhere.
This collapses a real semantic boundary. Aggregate outputs are group-level values, not row-level values, and blurring that distinction makes typing and position rules less coherent.
Wait for concise DSL syntax first.
That would postpone the contract question until after more syntax work lands, which is backwards. Concise syntax needs a stable lowering target first.
Impact / compatibility
This is additive as an RFC, but it will likely require cleanup of existing public builder families in the InQL package.
Likely compatibility consequences:
This should improve migration quality, not hurt it, because it gives a clear north star instead of letting surface drift continue.
Implementation notes (optional)
The relevant design record already exists:
docs/rfcs/012_unified_scalar_expression_surface.mdLikely touch points if accepted:
Testing should focus on semantic consistency:
I checked current open InQL issues before drafting this. There are open RFC issues for 000, 003, 004, 005, 006, and 007, plus feature work, but nothing open that already tracks RFC 012 directly.
Checklist