Version 2.0 🎉 by jsoizo · Pull Request #180 · jsoizo/kotlin-csv

jsoizo · 2026-05-23T12:05:53Z

Closes #149

Summary

Introduces the 2.0 API under com.jsoizo.kotlincsv: CsvDialect, immutable reader/writer configs, sequence-first read/write, header helpers, and v2 exception types.
Adds kotlinx-io based common/JVM/JS/Native I/O, JS file I/O, Native targets, and wasmWasi main compilation support.
Removes the legacy 1.x API surface and refreshes README, the migration guide, CI, benchmarks, parity tests, and property tests for 2.0.0.

Verification

./gradlew checkKotlinAbi jvmTest jsNodeTest macosArm64Test iosSimulatorArm64Test compileKotlinIosArm64 compileTestKotlinIosArm64 compileKotlinLinuxArm64 compileTestKotlinLinuxArm64 compileKotlinWasmWasi koverXmlReport

Change namespace to com.jsoizo

version up KMP to 2.1

Bump up gradle and libraries

Change publishing plugin

Use version catalog file

…Exception, CsvFieldNumDifferentException)

Add com.jsoizo.kotlincsv.CsvDialect as a data class capturing the four CSV format fields (delimiter / quoteChar / escapeChar / lineTerminator) shared by reader and writer, plus RFC4180 and TSV presets. Construction is validated via require(): delimiter must differ from quoteChar and escapeChar, and lineTerminator must not be empty. Existing reader / writer code paths are unchanged -- wiring CsvDialect into them is the next phase.

@test

Add kotest-assertions-core (multiplatform) to commonTest dependencies and rewrite CsvDialectTest using kotest assertion style (shouldThrow, shouldNotBeNull, shouldContain). The kotest runner is intentionally not added; tests still run via kotlin.test's @test annotation.

Align the existing exception tests with the new commonTest convention (kotlin.test runner + kotest-assertions-core assertions). Behaviour unchanged; only assertion call sites are migrated to shouldBe / shouldContain / shouldBeInstanceOf / shouldNotBeNull.

Implements Phase 3 of the v2 migration: a Sequence<Char> -> Sequence<List<String>> core reader living under com.jsoizo.kotlincsv.reader, with the immutable CsvReaderConfig (data class) + CsvReaderConfigBuilder (DSL receiver) split, top-level csvReader { } / csvReader(config) entry points, and the Sequence<List<String>>.withHeader() extension that yields LinkedHashMap rows. The legacy ParseStateMachine is reused unchanged except for an internal isLineComplete() observer added so the new SequenceParser can detect row boundaries while driving the state machine character-by-character. Legacy util.CSVParseFormatException raised by ParseStateMachine is converted into the new exceptions.CsvParseFormatException at the parser boundary. The old client / dsl / util packages are left untouched and continue to work alongside the new API; they will be removed in a later phase.

Add Kotlin/Native targets

- Declare wasmWasi(nodejs) in build.gradle.kts so commonMain (kotlinx-io based reader/writer) compiles for WASI runtimes. - Disable compileTestKotlinWasmWasi / wasmWasiTest / wasmWasiNodeTest until Kotest publishes wasmWasi artifacts; commonTest cannot be compiled for this target otherwise. - Add compileKotlinWasmWasi to the Linux CI job to guard against regressions.

Add wasmWasi target

…ialects For dialects where `escapeChar != quoteChar` (e.g. `CsvDialect(escapeChar = '\\')`), the reader previously treated the escape character literally in the START and DELIMITER states, and accepted only `escapeChar` (not `quoteChar`) as the escaped character in the FIELD state. This rejected CSV produced by other libraries (Python `csv`, Apache Commons CSV, OpenCSV) that emit unquoted escape sequences such as `a\\b` or `a\"b`. Introduce a `handleUnquotedEscape` helper and route START / DELIMITER / FIELD through it so all three states share the same `{escapeChar, quoteChar}` acceptance set. When `escapeChar == quoteChar` (RFC 4180 default) the helper degenerates to a single-element accept set, preserving the existing strict behaviour (e.g. `a"b` under the default dialect still throws). Closes #168

Add a property-based test that generates arbitrary unquoted-safe fields under `CsvDialect(escapeChar = '\\')`, serialises them with the minimal Python-csv style backslash escaping (`\` -> `\\`, `"` -> `\"`), and asserts the reader round-trips them. 500 iterations, seed 0L, follows the project PBT conventions (checkAll + runTest, generators reused from PbtArbs).

…cape feat: parse escape sequences in unquoted fields (closes #168)

Scaffolds four JVM-only subprojects under benchmark/ to visualize the performance characteristics of v1.10.0 (Maven artifact, com.github.doyaaaaaken.*) against v2.0.0 (this branch, com.jsoizo.*) on identical workloads. Subprojects: - benchmark/shared: deterministic data generation (CsvDataGen / DatasetSpec / DataStats) and environment probe. Depends only on kotlin-stdlib so it stays out of every benchmark classpath as a library. - benchmark/v1: JMH source set whose only kotlin-csv on classpath is com.jsoizo:kotlin-csv-jvm:1.10.0. Covers readAll(String/InputStream/File), iterative Sequence over File, readAllWithHeader, writeAll(OutputStream/File). - benchmark/v2: JMH source set whose only kotlin-csv on classpath is the current project. Mirrors the v1 workloads on the v2 API and adds V2BackendBenchmarks comparing java.io vs kotlinx-io paths. - benchmark/parity: JUnit subproject that intentionally puts both v1 and v2 on the test classpath (FQCNs do not collide) and asserts row-by-row equality on the HARD dataset for readAll/readAllWithHeader/writeAll. Classpath isolation is achieved by separating v1 and v2 into different Gradle resolution scopes; this stops Gradle from collapsing kotlinx-coroutines (and other transitive deps) to a single version across the two artifacts. Resolved jmhRuntimeClasspath was verified to contain v1 only on the v1 side and the v2 project only on the v2 side. JMH defaults: warmup=5, iterations=5, fork=2, modes=throughput+avgt, jvmArgs=[-Xms2g,-Xmx2g], JDK 21 toolchain. -Pbench.profile=large|gcprof| stackprof overrides the defaults for the long-running LARGE dataset and profiler runs. -Pjmh.include / -Pjmh.warmupIterations / -Pjmh.iterations / -Pjmh.fork allow per-invocation overrides for smoke runs.

Sets warmup=3, iter=3, fork=1, time=5s and restricts dataset @Param to SMALL and HARD via JMH '-p dataset' equivalent. Intended for the first issue #172 comment so readers see numbers before the full primary run finishes.

The MapProperty<String, ListProperty<String>> setter does not accept a plain List<String>. Wrap the value in objects.listProperty(...).set(...).

Restricts dataset @Param to SMALL/MEDIUM/HARD; the LARGE dataset is covered by the separate 'large' profile per the methodology in #172.

add v2 benchmark

Add direct writer fast path

Apply jmh.warmupIterations/iterations/fork/timeOnIteration/warmup property overrides after the bench.profile when block so short-form CLI flags can override profile defaults during ad-hoc gcprof/stackprof runs.

The v2 reader I/O paths routed every char through a coroutine sequence builder (`BufferedReader.toCharSequence` and `Source.toCharSequence`), which allocated a Continuation per character and produced the avgt × thrpt ≈ 3 long-tail divergence flagged in issue #172's primary profile. Add an eager chunked parser that fills a CharArray buffer and walks it directly, using double buffering to carry the next-char lookahead across chunk boundaries. The public lazy `Sequence<List<String>>` API is unchanged; only the I/O wrappers and an internal pipeline helper are rewired. - `ParseStateMachine.reset()` lets `SequenceParser` reuse one machine instance across rows (no per-row alloc of machine / StringBuilder / fields ArrayList). - `parseRowsFromChunks((CharArray) -> Int, dialect, stripBom)` is the new internal entry point; per-char `sequence { yield(c) }` is gone. - `CsvReader.applyPipeline` exposes the skipEmptyLine + field-count policy stages for I/O wrappers to drive directly. - JVM `read(InputStream)` / `readFromFile(File)` wrap `BufferedReader.read(CharArray)`; kotlinx-io `read(Source)` writes decoded code points straight into the chunk buffer.

Add coverage for the new chunked reader fast path so the PR diff is no longer below the codecov threshold: - SequenceParserTest: drive `parseRowsFromChunks` directly with a custom `(CharArray) -> Int` source, including small-buffer chunk boundary swaps, CR/LF on a chunk boundary, the `require(bufferSize >= 2)` guard, BOM strip on/off (default), and an unterminated quote whose tail-flush takes the null-result branch. - CsvReaderJvmIoTest: exercise the I/O pipeline with skipEmptyLine and with an input larger than the default 8 KB chunk so the double buffer swap runs end-to-end. - CsvReaderPathSmokeTest: call the kotlinx-io Path overloads of readFromFile/readAllFromFile with default options, and read a multi-chunk file so the kotlinx-io chunk reader exits via its `index >= limit` branch.

Add chunked reader fast path

Add follow-up coverage for the chunked reader fast path introduced in PR #177: - doubled quote `""` straddling a chunk boundary, so the cross-buffer next-char lookahead has to find the second `"` at nextBuffer[0] for skipCount=1 to do the right thing - explicit-escape `\\<target>` straddling a chunk boundary, same cross-buffer next-char path with a non-self escape char - lone CR at a chunk end followed by a non-LF char in the next chunk, so the CR terminator must not consume the next field char as part of CRLF - supplementary code point (U+1F600) at the parseRowsFromChunks layer where it just passes through as ordinary chars, and at the kotlinx-io Source layer with the 😀 high surrogate at index `buffer.size - 2` so the low surrogate must land on the reserved last slot — a regression in `limit = buffer.size - 1` would overflow Also rewrite the existing chunked-path test comments to lead with why-the-test-exists instead of restating the parser branch.

Add cross-chunk boundary regression tests for chunked reader

fix some logics & docs

codecov · 2026-05-23T12:13:59Z

Codecov Report

❌ Patch coverage is 92.94118% with 30 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.37%. Comparing base (19c2883) to head (ecaf608).

Files with missing lines	Patch %	Lines
...ain/kotlin/com/jsoizo/kotlincsv/writer/WriterIo.kt	83.33%	4 Missing and 2 partials ⚠️
...soizo/kotlincsv/writer/internal/SequenceEncoder.kt	92.94%	2 Missing and 4 partials ⚠️
...mmonMain/kotlin/com/jsoizo/kotlincsv/CsvDialect.kt	82.75%	3 Missing and 2 partials ⚠️
...jsoizo/kotlincsv/reader/internal/SequenceParser.kt	94.73%	0 Missing and 4 partials ⚠️
.../jsoizo/kotlincsv/reader/CsvReaderConfigBuilder.kt	83.33%	2 Missing ⚠️
...ain/kotlin/com/jsoizo/kotlincsv/reader/ReaderIo.kt	92.00%	2 Missing ⚠️
.../kotlin/com/jsoizo/kotlincsv/writer/WriterIoJvm.kt	88.23%	2 Missing ⚠️
...in/kotlin/com/jsoizo/kotlincsv/reader/CsvReader.kt	96.00%	0 Missing and 1 partial ⚠️
...izo/kotlincsv/reader/internal/ParseStateMachine.kt	95.83%	1 Missing ⚠️
.../kotlin/com/jsoizo/kotlincsv/reader/ReaderIoJvm.kt	94.44%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##               main     #180      +/-   ##
============================================
+ Coverage     86.66%   93.37%   +6.70%     
============================================
  Files            21       22       +1     
  Lines          1282      528     -754     
  Branches        192      122      -70     
============================================
- Hits           1111      493     -618     
+ Misses           32       17      -15     
+ Partials        139       18     -121

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

jsoizo added 30 commits August 12, 2024 22:39

rename namespace to com.jsoizo

7198928

Merge pull request #150 from jsoizo/change_namespace

823c4d0

Change namespace to com.jsoizo

KMP 2.1

3ce34a5

remove Gradle java plugin & jacoco

0c842e4

Merge pull request #153 from jsoizo/kmp_2

6512bb7

version up KMP to 2.1

update Gradle version to 8.8

e7efdca

bump plugins & libraries

654f3cb

package dokka HTML to Javadoc Jar

421a87c

refactor deprecated syntax in build.gradle.kts

98e2904

Merge pull request #154 from jsoizo/bump_up_gradle_and_libraries

09fcaa2

Bump up gradle and libraries

[skip ci] change publishing plugin

6e78b2f

Merge pull request #155 from jsoizo/change_publishing_plugin

5793042

Change publishing plugin

[skip ci] use Version catalog

eaf318b

bundle kotest libraries

d1f52a0

bundle kotest libraries

748198e

change CI setup-java version

a2a1c42

change CI setup-java step name

c42d82e

Merge pull request #156 from jsoizo/versions_toml

45dc338

Use version catalog file

chore: add kotlin.test alongside kotest

a29fc34

feat: add v2 exception classes (MalformedCsvException, CsvParseFormat…

87f760d

…Exception, CsvFieldNumDifferentException)

refactor: inline Const constants and remove Const.kt

f7117fa

refactor: drop Logger injection mechanism

e2dbe2e

refactor: drop autoRenameDuplicateHeaders option

c4180ed

refactor: drop deprecated skipMissMatchedRow option

1043d69

refactor: remove deprecated readNext() method

c38a148

refactor: drop @CsvDslMarker annotation

50a717a

jsoizo added 27 commits May 13, 2026 15:16

Split CI by native host

7d720c5

Merge pull request #170 from jsoizo/modernize-gradle-tooling

3fc8633

Add Kotlin/Native targets

Merge pull request #173 from jsoizo/support_wasm_wasi

3fb0236

Add wasmWasi target

Merge pull request #174 from jsoizo/feat/issue-168-reader-explicit-es…

39cf6c6

…cape feat: parse escape sequences in unquoted fields (closes #168)

Add 'quick' JMH bench profile for short-duration first-cut runs

6933d2f

Sets warmup=3, iter=3, fork=1, time=5s and restricts dataset @Param to SMALL and HARD via JMH '-p dataset' equivalent. Intended for the first issue #172 comment so readers see numbers before the full primary run finishes.

Fix benchmarkParameters DSL: wrap in ListProperty for jmh plugin 0.7.2

35bdc8c

The MapProperty<String, ListProperty<String>> setter does not accept a plain List<String>. Wrap the value in objects.listProperty(...).set(...).

Add 'primary' JMH bench profile (warmup=5, iter=5, fork=2, 10s)

7fe79b4

Restricts dataset @Param to SMALL/MEDIUM/HARD; the LARGE dataset is covered by the separate 'large' profile per the methodology in #172.

Merge branch 'version_2_0_0' into version_2_0_0_benchmark

6f3a2cf

Merge pull request #175 from jsoizo/version_2_0_0_benchmark

5a48524

add v2 benchmark

perf: add direct writer fast path

3decc39

Merge pull request #176 from jsoizo/perf-writer-fast-path

96c76c7

Add direct writer fast path

chore(benchmark): let -P CLI overrides win over bench.profile defaults

a492e7d

Apply jmh.warmupIterations/iterations/fork/timeOnIteration/warmup property overrides after the bench.profile when block so short-form CLI flags can override profile defaults during ad-hoc gcprof/stackprof runs.

Merge pull request #177 from jsoizo/perf-reader-alloc

3a5dc3e

Add chunked reader fast path

Merge pull request #178 from jsoizo/reader-chunked-followups

9c83cf8

Add cross-chunk boundary regression tests for chunked reader

addd .env to gitignore

e4253d9

fix: tighten v2 parsing and release docs

12f3115

fix: clarify CSV row and dialect semantics

609aff3

chore: tidy closing audit follow-ups

d9f642b

remove temporary files

9aba37b

Merge pull request #179 from jsoizo/v2-closing-audit-p0-p1

ecaf608

fix some logics & docs

jsoizo merged commit d1c9c81 into main May 23, 2026
5 checks passed

jsoizo deleted the version_2_0_0 branch May 23, 2026 12:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Version 2.0 🎉#180

Version 2.0 🎉#180
jsoizo merged 120 commits into
mainfrom
version_2_0_0

jsoizo commented May 23, 2026 •

edited

Loading

Uh oh!

codecov Bot commented May 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

jsoizo commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Verification

Uh oh!

codecov Bot commented May 23, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jsoizo commented May 23, 2026 •

edited

Loading