Skip to content

feature - add fallible reader chunk streams #579

@dannymeijer

Description

@dannymeijer

Area

  • Runtime / Core crates (stdlib/core/derive)
  • Incan Language (syntax/semantics)
  • Documentation

Problem statement

Streaming IO currently makes users write bounded-read loops by hand:

loop:
    let chunk = input.read_bytes(chunk_size).map_err(map_io_error)?
    if len(chunk) == 0:
        break
    hasher.update(chunk)

That pattern is correct, but it is not the right readability target for ordinary stdlib code. It leaks EOF-as-empty-chunk into every caller, repeats the same loop + read + empty-check scaffold across hashing, encoding, file copying, parsers, uploads/downloads, and makes Incan code read worse than the equivalent Python shape.

Python has a compact chunk-read idiom; the expression reference uses while chunk := file.read(9000):, and Python truth-value testing treats "empty sequences and collections" as false. Sources: https://docs.python.org/3.10/reference/expressions.html#assignment-expressions and https://docs.python.org/3/library/stdtypes.html#truth-value-testing.

Incan should not copy that directly by adding general truthiness or assignment expressions just to make this one IO pattern shorter. The better target is a typed chunk-stream API that makes EOF iterator exhaustion, not an empty byte value every caller must remember to test.

Proposed solution

Add a stdlib chunk stream abstraction for binary readers, likely centered on BinaryReader.chunks(chunk_size) or an equivalent read_chunks(reader, chunk_size) helper.

The desired user-facing shape is:

for chunk in input.chunks(chunk_size):
    hasher.update(chunk)

return hasher.finalize()

For callers that need domain error mapping:

for chunk in input.chunks(chunk_size).map_err(HashError.from_io):
    hasher.update(chunk)

return hasher.finalize()

The stream should yield non-empty bytes chunks and treat EOF as iterator completion. Zero or negative chunk sizes should be rejected up front with a clear error rather than becoming odd loop behavior.

This likely depends on native associated types from RFC 098 or a nearby trait design, because a clean API needs to preserve projected item/error types through chunk streams and adapters. Design options to settle:

  • Iterator with type Item = Result[bytes, IoError]
  • separate FallibleIterator with type Item and type Error
  • named concrete chunk stream type returned by BinaryReader.chunks(size)
  • opaque stream return type if/when Incan supports that shape

Rust's for model is useful prior art here: the Rust Reference says that when an iterator is empty, "the for expression completes." Source: https://doc.rust-lang.org/reference/expressions/loop-expr.html#iterator-loops.

Alternatives considered

Keep the current explicit loop pattern. This is implementable today, but it keeps repeating a low-level EOF convention in user code and stdlib source.

Add Python-style assignment expressions and truthiness. That would make the local example shorter, but it broadens the language for a problem that is better solved as a typed streaming abstraction.

Add one-off helpers in std.hash or std.encoding. That would reduce local duplication, but it would not give file copy, parsers, upload/download code, or future stream consumers a shared vocabulary.

Expose only whole-file read_bytes() helpers. That is explicitly not enough for large-file paths and pushes users toward whole-file materialization.

Scope / acceptance criteria

  • In scope:

    • Define a reader chunk-stream API for bounded binary reads.
    • Define EOF as stream exhaustion, not an emitted empty chunk.
    • Define how stream read errors are represented and mapped.
    • Implement the API for stdlib file/binary reader types.
    • Add examples and docs showing chunked file copy and hash feeding.
    • Migrate stdlib code that currently hand-rolls loop + read_bytes + len(chunk) == 0 where the new API applies.
    • Add tests for normal chunking, empty files, final short chunks, read errors, and invalid chunk sizes.
  • Out of scope:

    • General Python-style truthiness.
    • General assignment expressions / walrus syntax.
    • Async streams unless a follow-up RFC explicitly pulls them in.
    • Generic associated types unless the chosen stream trait requires them.
  • Done when:

    • Incan users can express bounded binary stream consumption with a clear for chunk in ... shape.
    • EOF and error behavior are documented and tested.
    • std.hash and other relevant stdlib surfaces no longer need repeated manual chunk-loop scaffolding for ordinary reader draining.

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentationfeatureNew feature or requestincan language semanticsSuggestions, features, or bugs related to the Incan Language itself (syntax and semantics)runtime / core cratesSuggestions, features, or bugs related to the `incan-core`, `incan-stdlib`, 'incan-derive` crates
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions