SmarterJSON Basic Read API

Reading JSON has one entry point for content and one for files. Both accept the same options, and both take an optional block for streaming.

`SmarterJSON.process` — read a String or an IO

require "smarter_json"

SmarterJSON.process('{"a": 1, "b": [2, 3]}')          # => [{"a"=>1, "b"=>[2, 3]}]   (always an Array of documents)
SmarterJSON.process_one('{"a": 1, "b": [2, 3]}')      # => {"a"=>1, "b"=>[2, 3]}     (the single document's value)
SmarterJSON.process_one("host: localhost\nport: 5432") # => {"host"=>"localhost", "port"=>5432}  (no braces needed)

process accepts either a String of JSON content or an IO to read from as its first argument. A String is always treated as content, never as a filename — use process_file for paths. When the input wraps the payload in obvious markdown / prose / tags, process strips that wrapper first and then reads the recovered payload(s).

SmarterJSON.process_one(<<~TEXT)
  Here is the JSON:

  ```json
  {
    "a": 1
  }
  ```
TEXT
# => {"a"=>1}

SmarterJSON.process_one(<<~TEXT)
  Here is the result:

  {
    "a": 1
  }

  Hope this helps.
TEXT
# => {"a"=>1}

SmarterJSON.process(<<~TEXT)
  first attempt:
  {"a":1}

  corrected payload:
  {"b":2}
TEXT
# => [{"a"=>1}, {"b"=>2}]

SmarterJSON.process(io)         # an open IO (File, StringIO, socket, …) — reads it and extracts the data
SmarterJSON.process(some_string) # JSON content

`process` always returns an Array of documents

This is the distinguishing feature: process reads multi-document input (NDJSON / JSONL / concatenated) automatically, with no block and no special method, and always returns an Array of the documents it found:

SmarterJSON.process("")                                 # => []            (zero documents)
SmarterJSON.process('{"id":1}')                          # => [{"id"=>1}]   (one document, still an Array)
SmarterJSON.process(%({"id":1}\n{"id":2}\n{"id":3}))     # => [{"id"=>1}, {"id"=>2}, {"id"=>3}]

For the common single-document case, process_one returns the one value directly — and warns (never raises) if there was more than one, so a stray extra document is never dropped silently:

SmarterJSON.process_one('{"id":1}')   # => {"id"=>1}
SmarterJSON.process_one("")           # => nil

Type-checking the result? Use result.is_a?(Array), not result.class == Array — idiomatic, and future-proof if the return ever becomes a specialized Array subclass.

Documents are separated by newlines, commas, RS (0x1E), or simple concatenation (self-delimiting values) — never by a space. A top-level value must be a recognized JSON value (number / true / false / null / quoted string / object / array) or an implicit-root object (host: localhost); a bare top-level run such as localhost or 1 2 3 raises ParseError. (Quoteless string values inside objects and arrays are unchanged.) If wrapper noise is stripped and several payloads are recovered, they come back by the same rule — an Array of payloads (process_one returns the first). Only SmarterJSON reads multi-document input via plain process: Oj and the stdlib json library raise without a block.

`SmarterJSON.process_file` — read a file by path

SmarterJSON.process_file("config.json5")     # read the file, then process — same return-value rules as process

process_file opens the file, reads it with the labeled encoding: (default "UTF-8", no transcoding pass), and processes it.

Streaming with a block (bounded memory)

For input larger than memory, pass a block. Each recovered top-level document is yielded as it is framed, and the method returns the document count instead of collecting the documents into an Array. Both process and process_file forward the block.

SmarterJSON.process_file("events.ndjson") { |event| EventJob.perform_async(event) }
SmarterJSON.process(io) { |doc| handle(doc) }

The streaming path now frames whole top-level documents, not just one line at a time. That means NDJSON / JSONL still work, but pretty-printed multi-line objects and arrays work too, as do mixed \n / \r\n / \r line endings and comment-only separators between documents.

`SmarterJSON.foreach` — stream a file or IO, composably

foreach is the composable sibling of process_file. Its argument is a file path or any IO (a socket, a StringIO, an open File); a String is always a path, never content.

With a block it behaves exactly like the block form above — streams each document, returns the document count. Without a block it returns a plain Enumerator (like CSV.foreach — not an Enumerator::Lazy), so .map / .select return Arrays the usual way, and you can chain over the stream:

SmarterJSON.foreach("events.ndjson").each { |event| EventJob.perform_async(event) }   # like the block form
SmarterJSON.foreach("events.ndjson").select { |e| e["level"] == "error" }              # => an Array of the matches

It reads one document at a time, so foreach(path).first(3) only reads ~3 documents off disk, and .next pulls them one by one. .map / .select read the source lazily but still build an Array of their result; to keep a whole pipeline bounded end to end (a large filtered set off a fat file), add .lazy at the call site:

SmarterJSON.foreach("session.jsonl", symbolize_keys: true)
           .lazy
           .select { |doc| %w[user assistant].include?(doc[:type]) }
           .each   { |doc| puts doc[:text] }

Options are validated eagerly — a bad option key or value raises immediately, before any iteration. An IO source is single-pass (an IO can only be read once), so iterating the returned Enumerator a second time over the same IO yields nothing; a path-backed foreach re-opens the file and is re-iterable.

The C extension and the pure-Ruby fallback

By default (acceleration: true) the C extension is used when it is compiled and loadable (SmarterJSON::HAS_ACCELERATION is then true); otherwise the pure-Ruby implementation runs and produces identical results. Pass acceleration: false to force the pure-Ruby path. See Configuration Options.

Seeing what was fixed: `on_warning:`

process and process_file are lenient — they salvage your data rather than reject a whole document over a stray comma. Pass an on_warning: callable to also get a record of what was adjusted, so the leniency is transparent instead of silent. It is invoked once per fix and never changes the return value:

warns = []
result = SmarterJSON.process("[1,,2]", on_warning: ->(w) { warns << w })
result            # => [[1, 2]]   (one document: the array [1, 2] with the empty slot collapsed; process always returns an Array of documents)
warns.map(&:type) # => [:empty_slot]
warns.first.to_s  # => "extra comma, collapsed an empty slot at line 1, col 4"

Each warning is a SmarterJSON::Warning with type, message, line, and col. The types are :empty_slot (a collapsed empty comma slot), :empty_value (a key with no value, read as null), :duplicate_key (a repeated key that was dropped), plus wrapper-recovery warnings such as :code_fence_stripped, :prefix_text_ignored, :suffix_text_ignored, and :wrapper_tag_stripped. Clean input never invokes the handler. It fires on every path — including the streaming block form — and works the same on the C and pure-Ruby paths. See Configuration Options.

PREVIOUS: Introduction | NEXT: The Basic Write API | UP: README

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contents

SmarterJSON Basic Read API

`SmarterJSON.process` — read a String or an IO

`process` always returns an Array of documents

`SmarterJSON.process_file` — read a file by path

Streaming with a block (bounded memory)

`SmarterJSON.foreach` — stream a file or IO, composably

The C extension and the pure-Ruby fallback

Seeing what was fixed: `on_warning:`

FilesExpand file tree

basic_read_api.md

Latest commit

History

basic_read_api.md

File metadata and controls

Contents

SmarterJSON Basic Read API

SmarterJSON.process — read a String or an IO

process always returns an Array of documents

SmarterJSON.process_file — read a file by path

Streaming with a block (bounded memory)

SmarterJSON.foreach — stream a file or IO, composably

The C extension and the pure-Ruby fallback

Seeing what was fixed: on_warning:

`SmarterJSON.process` — read a String or an IO

`process` always returns an Array of documents

`SmarterJSON.process_file` — read a file by path

`SmarterJSON.foreach` — stream a file or IO, composably

Seeing what was fixed: `on_warning:`