Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 10 additions & 4 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,9 @@

# SmarterJSON Change Log

> 🚧 Getting ready for the 1.0.0 release - sorry for the interface changes - thank you for your patience! 🚧

> ⚠️ **New Interface (since 0.9.7):**
>
> SmarterJSON **always return an `Array`** of documents:
> SmarterJSON **always returns an `Array`** of documents.
>
> `SmarterJSON.process` / `SmarterJSON.process_file` return:
>
Expand All @@ -16,9 +14,17 @@
> ⚠️ We discourage the use of `process(input).first` / `process(input)[0]` because it silently drops potential additional documents
> Please use `process_one` if you are expecting only one JSON doc, e.g. in API payloads.

## 0.9.10 (unreleased)
## 1.0.0 (2026-06-08)

RSpec tests: 1,034

- **The public interface is now stable** — `process`, `process_one`, `process_file`, `generate`, and the documented options; semantic versioning from here on.
- Unknown or wrongly-typed options now raise `ArgumentError` instead of being silently ignored, so a typo (e.g. `symbolize_names:` instead of `symbolize_keys:`) is caught immediately.
- Input tagged `ASCII-8BIT` whose bytes are valid UTF-8 (e.g. a `Net::HTTP` `response.body`) is now read as UTF-8, so its string values compare equal to UTF-8 literals; ASCII-8BIT input that is not valid UTF-8 raises `SmarterJSON::EncodingError` (pass an explicit `encoding:` for legacy encodings).
- Object keys may now use smart/curly quotes too (e.g. JSON pasted from a word processor), not just string values.
- `SmarterJSON.generate` accepts `allow_nan: true` to emit `NaN` / `Infinity` / `-Infinity` (JSON5-style) instead of raising, so non-finite numbers round-trip; the default still raises.
- A numeric literal that overflows `Float` range (e.g. `1e400`) now reports a `:number_overflow` warning via `on_warning` instead of silently becoming `Infinity`.
- `SmarterJSON.generate` is now iterative (like the parser), so serializing a deeply nested structure no longer risks `SystemStackError` — reading and writing are both depth-safe.

## 0.9.9 (2026-06-07)
- Much faster pure-Ruby parsing (the path used without the C extension) — roughly 3× on string-heavy data, ~2× on number-heavy, ~1.7× on object-heavy (on a YJIT-enabled Ruby). Parsed values are unchanged.
Expand Down
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ Three things set it apart:
- Trailing commas; unquoted keys (`{host: localhost}`); single-quoted, triple-quoted (`'''…'''`), and quoteless string values
- Implicit root object — a config file that starts with `key: value`, no outer `{}`
- `NaN`, `Infinity`, hex (`0xFF`), leading `+` / `.`, underscores in numbers (`1_000_000`)
- UTF-8 BOM, smart/curly quotes, Python literals (`True` / `False` / `None`), JavaScript `undefined`
- UTF-8 BOM, smart/curly quotes (in keys and values), Python literals (`True` / `False` / `None`), JavaScript `undefined`
- Mixed CR / LF / CRLF line endings, and any Ruby-supported input encoding (via `encoding:`)
- Duplicate keys (last value wins by default; configurable)

Expand Down Expand Up @@ -176,7 +176,7 @@ Where a like-for-like comparison exists, here is SmarterJSON's C path against ea
| config.jsonc | **1.1× faster** | 1.2× slower | **3.6× faster** |
| deeply_nested | **1.2× faster** | **can't parse** <sup>‡</sup> | **4.1× faster** |
| github_events | ≈ tied | 1.1× slower | **2.7× faster** |
| string_array | **1.1× faster** | ≈ tied | **1.7× faster** |
| string_array | ≈ tied | ≈ tied | **1.6× faster** |
| twitter | **1.3× faster** | 1.2× slower | **3.2× faster** |
| usgs_earthquakes <sup>≠</sup> | **1.4× faster** | 1.1× slower | **3.4× faster** |
| weather_berlin | **1.8× faster** | **1.1× faster** | **3.2× faster** |
Expand All @@ -201,7 +201,7 @@ In short: **SmarterJSON's C path matches or beats Oj/strict on every file** (app
| `decimal_precision` | `:auto` | `:auto` keeps high-precision decimals as `BigDecimal`; `:float` forces `Float`; `:bigdecimal` forces `BigDecimal` |
| `acceleration` | `true` | `true` uses the C extension when compiled and loadable; `false` forces pure Ruby (identical results) |
| `encoding` | `nil` | labels the input's encoding; `nil` keeps the input's own (no transcoding pass; see below) |
| `on_warning` | `nil` | a callable invoked once per lenient fix applied (`:empty_slot`, `:empty_value`, `:duplicate_key`), passed a `SmarterJSON::Warning`; the return value is never changed. See below. |
| `on_warning` | `nil` | a callable invoked once per lenient fix applied (`:empty_slot`, `:empty_value`, `:duplicate_key`, `:number_overflow`), passed a `SmarterJSON::Warning`; the return value is never changed. See below. |

## Examples

Expand Down Expand Up @@ -299,7 +299,7 @@ TEXT

## Nesting & untrusted input

Both the C extension and the pure-Ruby engine are **iterative, not recursive** — they track nesting on an explicit, heap-allocated stack rather than the call stack. So deeply nested input **cannot overflow the call stack or segfault**: nesting is bounded only by available memory, the same posture as Oj (which also ships no nesting limit; the stdlib `json` caps at 100). The `deeply_nested.json` benchmark (212 MB of nesting) is handled without issue.
Both the C extension and the pure-Ruby engine are **iterative, not recursive** — they track nesting on an explicit, heap-allocated stack rather than the call stack. So deeply nested input **cannot overflow the call stack or segfault**: nesting is bounded only by available memory, the same posture as Oj (which also ships no nesting limit; the stdlib `json` caps at 100). The `deeply_nested.json` benchmark (212 MB of nesting) is handled without issue. **`generate` is iterative too**, so serializing a deeply nested Ruby structure can't overflow the stack either — reading *and* writing are both depth-safe.

The trade-off: there is currently **no fixed nesting or input-size limit**, so extremely large or adversarially-nested untrusted input is bounded by memory (it can exhaust RAM), not by a crash. If you process untrusted input and want a hard cap, that's a planned opt-in guard — for now, size-limit upstream.

Expand Down
2 changes: 1 addition & 1 deletion docs/basic_write_api.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ SmarterJSON.generate(Float::INFINITY) # raises SmarterJSON::GenerateError —
SmarterJSON.generate(Float::NAN) # raises SmarterJSON::GenerateError — non-finite Float
```

(`GenerateError` is a kind of `SmarterJSON::Error`, so `rescue SmarterJSON::Error` catches it. `Infinity` and `NaN` are accepted on the *read* side as a leniency, but they are not valid JSON to *write*.)
(`GenerateError` is a kind of `SmarterJSON::Error`, so `rescue SmarterJSON::Error` catches it. `Infinity` and `NaN` are accepted on the *read* side as a leniency; to *write* them, pass `allow_nan: true` and they're emitted as `NaN` / `Infinity` / `-Infinity` (JSON5-style, so SmarterJSON reads them back) — otherwise non-finite values raise, since they aren't valid strict JSON.)

By default `generate` is strict: it only writes the types above and raises on anything else. To serialize `Time`, `Date`, or your own objects, pass `coerce: true` — an unsupported value is then converted by its own `as_json` (whose result is re-emitted, so escaping/`indent`/`sort_keys` still apply) or, failing that, `to_json` (spliced verbatim):

Expand Down
9 changes: 5 additions & 4 deletions docs/options.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ warns.map(&:type) # => [:empty_slot]
warns.first.to_s # => "extra comma, collapsed an empty slot at line 1, col 4"
```

The warning types are `:empty_slot` (a collapsed empty comma slot, e.g. `[1,,2]`), `:empty_value` (a key with no value, read as `null`, e.g. `{a:}`), and `:duplicate_key` (a repeated key that was dropped), plus wrapper-recovery warnings such as `:code_fence_stripped`, `:prefix_text_ignored`, `:suffix_text_ignored`, and `:wrapper_tag_stripped`. Clean input never invokes the handler. Warnings work on both the C and pure-Ruby paths, so `acceleration:` doesn't change them.
The warning types are `:empty_slot` (a collapsed empty comma slot, e.g. `[1,,2]`), `:empty_value` (a key with no value, read as `null`, e.g. `{a:}`), `:duplicate_key` (a repeated key that was dropped), and `:number_overflow` (a numeric literal too large for `Float`, e.g. `1e400`, collapsed to `Infinity`), plus wrapper-recovery warnings such as `:code_fence_stripped`, `:prefix_text_ignored`, `:suffix_text_ignored`, and `:wrapper_tag_stripped`. Clean input never invokes the handler. Warnings work on both the C and pure-Ruby paths, so `acceleration:` doesn't change them.

### A note on `:encoding`

Expand All @@ -59,12 +59,13 @@ These options are passed to [`SmarterJSON.generate`](./basic_write_api.md) as th

| Option | Default | Explanation |
|------------|---------|-----------------------------------------------------------------------------------------------------------------------------|
| `:allow_nan` | `false` | When `true`, non-finite `Float`/`BigDecimal` values emit the JSON5 barewords `NaN` / `Infinity` / `-Infinity` (which SmarterJSON reads back, so they round-trip). When `false` (the default), a non-finite number raises `SmarterJSON::GenerateError` — they aren't valid strict JSON. |
| `:ascii_only` | `false` | Escape every non-ASCII character as `\uXXXX` (astral characters as a UTF-16 surrogate pair). The default emits raw UTF-8. |
| `:coerce` | `false` | When `true`, a value that isn't natively supported is converted by its own `as_json` (the result is re-emitted, so the other options still apply) or, failing that, `to_json` (spliced verbatim). When `false` (the default), such a value raises `SmarterJSON::GenerateError`. |
| `:format` | `:json` | `:json` writes standard JSON (Hash → object, Array → array, scalar → scalar). `:ndjson` writes newline-delimited JSON: an Array becomes one element per line, any other value becomes a single line. |
| `:indent` | `0` | Spaces per nesting level for pretty-printing. `0` (the default) is compact output. Empty objects/arrays stay inline. Not allowed with `:ndjson` (a record must be a single line). |
| `:sort_keys` | `false` | Emit object keys in sorted order (Symbol keys sorted by their string form). Useful for canonical, diff-friendly output. |
| `:ascii_only` | `false` | Escape every non-ASCII character as `\uXXXX` (astral characters as a UTF-16 surrogate pair). The default emits raw UTF-8. |
| `:script_safe` | `false` | Escape the `/` in `</` and the JS line separators U+2028 / U+2029, so output is safe to embed in an HTML `<script>` tag. |
| `:coerce` | `false` | When `true`, a value that isn't natively supported is converted by its own `as_json` (the result is re-emitted, so the other options still apply) or, failing that, `to_json` (spliced verbatim). When `false` (the default), such a value raises `SmarterJSON::GenerateError`. |
| `:sort_keys` | `false` | Emit object keys in sorted order (Symbol keys sorted by their string form). Useful for canonical, diff-friendly output. |

Configuration is validated up front: an unknown option key, a known key with the wrong type or value (a non-Symbol `:format`, a negative/non-Integer `:indent`, a non-boolean flag), or combining `:indent` with `:ndjson`, raises `ArgumentError`.

Expand Down
34 changes: 27 additions & 7 deletions ext/smarter_json/smarter_json.c
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ static ID fj_call_id; /* cached :call (invoking the on_warning handler) */
static VALUE fj_sym_empty_slot;
static VALUE fj_sym_empty_value;
static VALUE fj_sym_duplicate_key;
static VALUE fj_sym_number_overflow;
static ID fj_bigdecimal_id; /* cached BigDecimal() method id (set in Init) */
static ID fj_to_sym_id; /* cached :to_sym (symbolize_keys) */
static ID fj_key_p_id; /* cached :key? (non-default duplicate_key modes) */
Expand Down Expand Up @@ -262,6 +263,8 @@ static inline int fj_needs_ws_skip(int b) {
/* forward declarations (mutual recursion) */
static VALUE fj_parse_value(fj_state *st);
static VALUE fj_parse_member_value(fj_state *st);
static int fj_smart_quote_kind(fj_state *st);
static VALUE fj_parse_smart_string(fj_state *st, int kind);

static void fj_append_utf8(VALUE buf, unsigned long cp) {
char tmp[4];
Expand Down Expand Up @@ -579,7 +582,8 @@ static VALUE fj_float_strtod(const char *p, long n) {
}

/* e10 is the final base-10 exponent (already adjusted by the fraction length). */
static FJ_ALWAYS_INLINE VALUE fj_float_from_parts(uint64_t m10, int m10digits, int64_t e10, int neg, int overflow, const char *p, long n) {
static FJ_ALWAYS_INLINE VALUE fj_float_from_parts(fj_state *st, uint64_t m10, int m10digits, int64_t e10, int neg, int overflow, const char *p, long n) {
double d;
/* Fast path by mantissa width (our scanner accumulates m10 exactly up to 18
digits, flagging overflow beyond):
1..18 digits -> Eisel-Lemire, correctly-rounded for any exact uint64 mantissa
Expand All @@ -589,10 +593,18 @@ static FJ_ALWAYS_INLINE VALUE fj_float_from_parts(uint64_t m10, int m10digits, i
>18 digits / overflow / extreme exponent -> strtod (round-to-odd). */
if (!overflow && m10digits >= 1 && m10digits <= 18 && (long)m10digits + e10 >= -307) {
if (m10 == 0) return rb_float_new(neg ? -0.0 : 0.0);
return rb_float_new(fj_eisel_lemire_s2d(e10, m10, neg));
d = fj_eisel_lemire_s2d(e10, m10, neg);
} else {
/* Fallback for >18 digits / extreme or subnormal exponents. */
d = RFLOAT_VALUE(fj_float_strtod(p, n));
}
/* Fallback for >18 digits / extreme or subnormal exponents. */
return fj_float_strtod(p, n);
/* A finite literal whose magnitude exceeds Float range (e.g. 1e400) becomes
±Infinity — a silent data change. Report it via :number_overflow (the value is
still returned). The Infinity/NaN keywords take separate paths and never get here.
Gate isinf on a listening handler (matches the Ruby float_or_warn): no handler ->
no point detecting, and it keeps the test off the hot number path. */
if (st->on_warning != Qnil && isinf(d)) fj_warn(st, fj_sym_number_overflow, "number literal out of Float range — collapsed to Infinity");
return rb_float_new(d);
}

/* Scan an already-bounded quoteless token [p, p+n) exactly once: validate it as a
Expand Down Expand Up @@ -677,7 +689,7 @@ static int fj_try_decimal(fj_state *st, const char *p, long n, VALUE *out) {
(st->decimal_precision == 1 && m10digits > 16 && fj_sig_digits(p, n) > 16)) {
*out = fj_to_bigdecimal_token(p, n);
} else {
*out = fj_float_from_parts(m10, m10digits, e10, neg, overflow, p, n);
*out = fj_float_from_parts(st, m10, m10digits, e10, neg, overflow, p, n);
}
return 1;
}
Expand Down Expand Up @@ -789,7 +801,7 @@ static VALUE fj_parse_number(fj_state *st) {
(st->decimal_precision == 1 && m10digits > 16 && fj_sig_digits(np, nlen) > 16)) {
return fj_to_bigdecimal_token(np, nlen);
}
return fj_float_from_parts(m10, m10digits, e10, neg, overflow, np, nlen);
return fj_float_from_parts(st, m10, m10digits, e10, neg, overflow, np, nlen);
}

static VALUE fj_parse_literal(fj_state *st, const char *word, VALUE value) {
Expand Down Expand Up @@ -842,6 +854,7 @@ static VALUE fj_parse_identifier_key(fj_state *st) {

static VALUE fj_parse_object_key(fj_state *st) {
int b = fj_byte(st);
int kind;

/* Quoted key. The common case has no escapes: intern straight from the buffer
* with no throwaway allocation. An escaped key (rare) falls through to the
Expand All @@ -862,6 +875,12 @@ static VALUE fj_parse_object_key(fj_state *st) {
return fj_parse_string(st, b);
}

/* A key may open with a smart/curly quote too (a word-processor paste curls the
* keys, not just the values) — route to the same reader the value path uses.
* Mirrors the Ruby fallback's parse_object_key; Hash#[]= dedups the key on store. */
kind = fj_smart_quote_kind(st);
if (kind) return fj_parse_smart_string(st, kind);

if (fj_is_key_start(b)) return fj_parse_identifier_key(st);

fj_error(st, "expected a key");
Expand Down Expand Up @@ -1197,7 +1216,7 @@ static int fj_try_member_number(fj_state *st, VALUE *out) {
(st->decimal_precision == 1 && m10digits > 16 && fj_sig_digits(np, nlen) > 16)) {
*out = fj_to_bigdecimal_token(np, nlen);
} else {
*out = fj_float_from_parts(m10, m10digits, e10, neg, overflow, np, nlen);
*out = fj_float_from_parts(st, m10, m10digits, e10, neg, overflow, np, nlen);
}
return 1;
}
Expand Down Expand Up @@ -1625,6 +1644,7 @@ void Init_smarter_json(void) {
fj_sym_empty_slot = ID2SYM(rb_intern("empty_slot"));
fj_sym_empty_value = ID2SYM(rb_intern("empty_value"));
fj_sym_duplicate_key = ID2SYM(rb_intern("duplicate_key"));
fj_sym_number_overflow = ID2SYM(rb_intern("number_overflow"));
fj_sym_encoding = ID2SYM(rb_intern("encoding"));
fj_sym_symbolize_keys = ID2SYM(rb_intern("symbolize_keys"));
fj_sym_first_wins = ID2SYM(rb_intern("first_wins"));
Expand Down
Loading