Add chunked reader fast path by jsoizo · Pull Request #177 · jsoizo/kotlin-csv

jsoizo · 2026-05-22T00:58:36Z

Summary

keep the public lazy Sequence<Char> reader API intact
add a chunked reader fast path that walks an 8 KB CharArray buffer directly with double buffering for next-char lookahead
route JVM read(InputStream) / readFromFile(File) and kotlinx-io read(Source) through the chunked path
add ParseStateMachine.reset() so the state machine, StringBuilder, and fields ArrayList are reused across rows
extract CsvReader.applyPipeline so I/O wrappers can attach skipEmptyLine + field-count policy without rerouting chars through a Sequence<Char>

Verification

./gradlew check
benchmark/parity row-by-row HARD equivalence test passes

Benchmark results

Measured on the same machine as issue #172 (Apple M4 / 32 GB / JDK 21.0.6 / JMH 1.36, seed=42), primary profile (warmup=5 / iter=5 / fork=2 / 10s).

Before = v2.0.0 primary numbers from issue #172 primary comment (commit 7fe79b4). Reader sources between 7fe79b4 and this branch are touched only by this PR (ParseStateMachine.kt, SequenceParser.kt, CsvReader.kt, ReaderIo.kt, ReaderIoJvm.kt), so the issue numbers are a valid Before for the reader workloads.
After = this PR (cdf3284), measured 2026-05-22.

Throughput (ops/ms, higher is better)

I/O paths (the workloads that drove the issue's long-tail divergence):

Workload	Dataset	Before	After	Delta	vs v1.10.0
`readAll(InputStream)`	SMALL	0.7669	1.5104 ± 0.0121	+97% (1.97×)	1.82× faster
`readAll(InputStream)`	MEDIUM	0.00407	0.00772 ± 0.00005	+90% (1.90×)	1.82× faster
`readAll(InputStream)`	HARD	0.0538	0.0979 ± 0.0016	+82% (1.82×)	1.97× faster
`readAll(File)`	SMALL	0.7372	1.3543 ± 0.0280	+84% (1.84×)	1.68× faster
`readAll(File)`	MEDIUM	0.00377	0.00740 ± 0.00010	+96% (1.96×)	1.85× faster
`readAll(File)`	HARD	0.0554	0.0922 ± 0.0009	+66% (1.66×)	1.91× faster
`open(file){ asSequence().count() }`	SMALL	0.7427	1.4280 ± 0.0134	+92% (1.92×)	1.76× faster
`open(file){ asSequence().count() }`	MEDIUM	0.00375	0.00756 ± 0.00011	+102% (2.02×)	1.84× faster
`open(file){ asSequence().count() }`	HARD	0.0544	0.0872 ± 0.0058	+60% (1.60×)	1.78× faster

String paths (already v2-favoured; included for completeness):

Workload	Dataset	Before	After	Delta	vs v1.10.0
`readAll(String)`	SMALL	1.6116	1.7019 ± 0.0125	+6%	2.04× faster
`readAll(String)`	MEDIUM	0.00819	0.00855 ± 0.00021	+4%	2.10× faster
`readAll(String)`	HARD	0.1115	0.1154 ± 0.0011	+3%	2.45× faster
`readAllWithHeader(String)`	SMALL	1.4490	1.4944 ± 0.0665	+3%	1.94× faster
`readAllWithHeader(String)`	MEDIUM	0.00654	0.00708 ± 0.00012	+8%	1.85× faster
`readAllWithHeader(String)`	HARD	0.0969	0.1019 ± 0.0054	+5%	2.28× faster

Average time (ms/op, lower is better)

I/O paths (Before suffered the long-tail outliers — thrpt × avgt ≈ 3):

Workload	Dataset	Before	After	Delta
`readAll(InputStream)`	SMALL	5.12	0.676 ± 0.017	7.6× faster
`readAll(InputStream)`	MEDIUM	947.30	129.59 ± 2.69	7.3× faster
`readAll(InputStream)`	HARD	57.56	10.23 ± 0.25	5.6× faster
`readAll(File)`	SMALL	4.53	0.718 ± 0.020	6.3× faster
`readAll(File)`	MEDIUM	876.29	134.31 ± 2.64	6.5× faster
`readAll(File)`	HARD	71.47	11.09 ± 0.30	6.4× faster
`open(file){ asSequence().count() }`	SMALL	3.56	0.736 ± 0.048	4.8× faster
`open(file){ asSequence().count() }`	MEDIUM	557.28	143.61 ± 8.90	3.9× faster
`open(file){ asSequence().count() }`	HARD	27.93	11.25 ± 0.36	2.5× faster

Notes

The avgt × thrpt divergence flagged in the issue's primary comment for reader I/O workloads (thrpt × avgt ≈ 3 on Before) is resolved on After. Examples: readAll(InputStream) MEDIUM 0.00772 × 129.59 ≈ 1.00, readAll(File) MEDIUM 0.00740 × 134.31 ≈ 0.99. Long-tail Continuation allocations from the per-char sequence { yield(c) } are gone.
Score errors shrank substantially (e.g. readAll(File) MEDIUM: Before 876.29 ± 158.70 ms/op ≈ 18% noise → After 134.31 ± 2.64 ≈ 2%), consistent with the parser no longer producing GC-driven outliers.
String paths were already v2-favoured (no per-char yield — JDK's String.asSequence() iterator does not allocate Continuations). The chunked path is a small additional win (+3–8%) but there was no long-tail there to begin with.
gc.alloc.rate.norm (B/op) still sits at v1×1.30–1.36 on I/O paths after this PR — this is the structural per-row allocation (e.g. ArrayList.toList() copies, InputStreamReader internal char buffer) that does not produce long-tails. Tracked as a follow-up for v2.0.x; not a blocker for the long-tail goal of this PR.
kotlinx-io vs java.io reader paths sit at ~7–13% CPU overhead with alloc/op parity after this PR. Tracked as a follow-up; not addressed here.

Reproduction

```bash
./gradlew :benchmark:v2:jmh -Pbench.profile=primary -Pjmh.include='.ReadBenchmarksV2.'
```

Apply jmh.warmupIterations/iterations/fork/timeOnIteration/warmup property overrides after the bench.profile when block so short-form CLI flags can override profile defaults during ad-hoc gcprof/stackprof runs.

The v2 reader I/O paths routed every char through a coroutine sequence builder (`BufferedReader.toCharSequence` and `Source.toCharSequence`), which allocated a Continuation per character and produced the avgt × thrpt ≈ 3 long-tail divergence flagged in issue #172's primary profile. Add an eager chunked parser that fills a CharArray buffer and walks it directly, using double buffering to carry the next-char lookahead across chunk boundaries. The public lazy `Sequence<List<String>>` API is unchanged; only the I/O wrappers and an internal pipeline helper are rewired. - `ParseStateMachine.reset()` lets `SequenceParser` reuse one machine instance across rows (no per-row alloc of machine / StringBuilder / fields ArrayList). - `parseRowsFromChunks((CharArray) -> Int, dialect, stripBom)` is the new internal entry point; per-char `sequence { yield(c) }` is gone. - `CsvReader.applyPipeline` exposes the skipEmptyLine + field-count policy stages for I/O wrappers to drive directly. - JVM `read(InputStream)` / `readFromFile(File)` wrap `BufferedReader.read(CharArray)`; kotlinx-io `read(Source)` writes decoded code points straight into the chunk buffer.

codecov · 2026-05-22T01:13:10Z

Codecov Report

❌ Patch coverage is 98.59155% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 94.87%. Comparing base (3decc39) to head (40d41d6).
⚠️ Report is 1 commits behind head on version_2_0_0.

Files with missing lines	Patch %	Lines
...jsoizo/kotlincsv/reader/internal/SequenceParser.kt	98.00%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@                Coverage Diff                @@
##           version_2_0_0     #177      +/-   ##
=================================================
+ Coverage          93.92%   94.87%   +0.94%     
=================================================
  Files                 22       22              
  Lines                461      507      +46     
  Branches             107      116       +9     
=================================================
+ Hits                 433      481      +48     
+ Misses                16       13       -3     
- Partials              12       13       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Add coverage for the new chunked reader fast path so the PR diff is no longer below the codecov threshold: - SequenceParserTest: drive `parseRowsFromChunks` directly with a custom `(CharArray) -> Int` source, including small-buffer chunk boundary swaps, CR/LF on a chunk boundary, the `require(bufferSize >= 2)` guard, BOM strip on/off (default), and an unterminated quote whose tail-flush takes the null-result branch. - CsvReaderJvmIoTest: exercise the I/O pipeline with skipEmptyLine and with an input larger than the default 8 KB chunk so the double buffer swap runs end-to-end. - CsvReaderPathSmokeTest: call the kotlinx-io Path overloads of readFromFile/readAllFromFile with default options, and read a multi-chunk file so the kotlinx-io chunk reader exits via its `index >= limit` branch.

Add follow-up coverage for the chunked reader fast path introduced in PR #177: - doubled quote `""` straddling a chunk boundary, so the cross-buffer next-char lookahead has to find the second `"` at nextBuffer[0] for skipCount=1 to do the right thing - explicit-escape `\\<target>` straddling a chunk boundary, same cross-buffer next-char path with a non-self escape char - lone CR at a chunk end followed by a non-LF char in the next chunk, so the CR terminator must not consume the next field char as part of CRLF - supplementary code point (U+1F600) at the parseRowsFromChunks layer where it just passes through as ordinary chars, and at the kotlinx-io Source layer with the 😀 high surrogate at index `buffer.size - 2` so the low surrogate must land on the reserved last slot — a regression in `limit = buffer.size - 1` would overflow Also rewrite the existing chunked-path test comments to lead with why-the-test-exists instead of restating the parser branch.

jsoizo added 2 commits May 22, 2026 08:00

chore(benchmark): let -P CLI overrides win over bench.profile defaults

a492e7d

Apply jmh.warmupIterations/iterations/fork/timeOnIteration/warmup property overrides after the bench.profile when block so short-form CLI flags can override profile defaults during ad-hoc gcprof/stackprof runs.

jsoizo mentioned this pull request May 22, 2026

Performance regression test #172

Closed

6 tasks

jsoizo merged commit 3a5dc3e into version_2_0_0 May 22, 2026
5 checks passed

jsoizo deleted the perf-reader-alloc branch May 22, 2026 13:29

jsoizo mentioned this pull request May 22, 2026

Add cross-chunk boundary regression tests for chunked reader #178

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add chunked reader fast path#177

Add chunked reader fast path#177
jsoizo merged 3 commits into
version_2_0_0from
perf-reader-alloc

jsoizo commented May 22, 2026

Uh oh!

codecov Bot commented May 22, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

jsoizo commented May 22, 2026

Summary

Verification

Benchmark results

Throughput (ops/ms, higher is better)

Average time (ms/op, lower is better)

Notes

Reproduction

Uh oh!

codecov Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codecov Bot commented May 22, 2026 •

edited

Loading