Skip to content

asunLab/asun-java

Repository files navigation

ASUN Java — High-Performance Array-Schema Unified Notation

Why ASUN?

json

Standard JSON repeats every field name in every record. When you send structured data to an LLM, over an API, or across services, that repetition wastes tokens, bytes, and attention:

[
  { "id": 1, "name": "Alice", "active": true },
  { "id": 2, "name": "Bob", "active": false },
  { "id": 3, "name": "Carol", "active": true }
]

asun

ASUN declares the schema once and streams data as compact tuples:

[{id, name, active}]:
  (1,Alice,true),
  (2,Bob,false),
  (3,Carol,true)

Fewer tokens. Smaller payloads. Clearer structure, and faster parsing than repeated-object JSON.


Features

  • ClassMeta + MethodHandle invokeExact: Pre-computed per-class metadata with pre-adapted MethodHandles. Primitive fields use type-specific invokeExact (e.g., (Object)->int) with zero boxing/adaptation overhead
  • SIMD Acceleration: Uses jdk.incubator.vector (ByteVector 256-bit/128-bit) for fast character scanning — special char detection, escape scanning, delimiter search
  • ThreadLocal Buffer Pool: ByteBuffer with ThreadLocal reuse (up to 1MB), 2× growth factor — eliminates allocation for repeated encode calls
  • ISO-8859-1 Fast String Construction: Both encoder and decoder detect ASCII-only content and use ISO_8859_1 charset for direct byte-copy String construction (vs UTF-8 validation overhead)
  • Single-Pass Fused Write: writeString combines classify + write in one pass with rollback on special chars — no separate analysis pass
  • CHAR_CLASS Lookup Table: 128-byte classification table replaces 6+ per-character comparisons for ASUN delimiter detection
  • POW10 Direct Double Parsing: Decoder parses intPart + fracVal / POW10[fracDigits] avoiding String allocation for simple decimals
  • Type-Tag Switch Dispatch: Integer-tagged field types (T_BOOLEAN=0 .. T_STRUCT=11) for O(1) dispatch in both encode and decode
  • DEC_DIGITS Fast Formatting: 200-byte lookup table for two-digit integer formatting, eliminates division-per-digit overhead
  • Binary Format: Little-endian wire format with zero parsing overhead — direct memory reads for primitives
  • Schema-Driven: {field1,field2}:(val1,val2) format eliminates redundant key repetition in arrays
  • No Native Map Type: keyed collections are modeled explicitly as entry lists such as attrs@[{key@str,value@int}]

API

import io.asun.Asun;

// Text encode/decode
String text = Asun.encode(obj);                    // schema without scalar hints
String typed = Asun.encodeTyped(obj);              // schema with scalar type hints
T obj = Asun.decode(text, MyClass.class);          // single struct
List<T> list = Asun.decodeList(text, MyClass.class); // list of structs

// byte[] decode (avoids String→byte[] conversion)
T obj = Asun.decode(bytes, MyClass.class);
List<T> list = Asun.decodeList(bytes, MyClass.class);

// Binary encode/decode
byte[] bin = Asun.encodeBinary(obj);
T obj = Asun.decodeBinary(bin, MyClass.class);
List<T> list = Asun.decodeBinaryList(bin, MyClass.class);

Kotlin Support

Bundled Kotlin extensions provide inline reified helpers for idiomatic Kotlin usage:

import io.asun.decode
import io.asun.decodeList

val user: User = decode(text)
val users: List<User> = decodeList(text)

Packaging note:

  • The main asun-java artifact includes the Kotlin helper layer.
  • Because of that, the published runtime dependency set includes kotlin-stdlib-jdk8.
  • Java users can ignore the Kotlin APIs; they do not change the Java-facing API surface.

Note on Kotlin Data Classes: ASUN currently targets zero-dependency maximum performance using raw reflection. To use Kotlin data class, please ensure:

  1. Use var for properties instead of val.
  2. Provide default values for all properties (this forces Kotlin to generate the required no-arg constructor behind the scenes).
  3. Plain nullable types (e.g., String?) are completely natively supported and do not require Optional<T>.

Example compatible data class:

data class User(
    var id: Long = 0,
    var name: String? = null,
    var active: Boolean = false
)

ASUN Format

Single struct:

{name,age,active}:(Alice,30,true)

Array (schema-driven — schema written once):

[{name,age,active}]:(Alice,30,true),(Bob,25,false)

Typed schema:

{name@str,age@int,active@bool}:(Alice,30,true)

Nested struct:

{id,dept}:(1,(Engineering))

Entry-list collection:

{attrs@[{key@str,value@int}]}:([(age,30),(score,95)])

Native Map<K,V> fields are intentionally unsupported in the current format. Use a List<Entry>-style model instead.

Benchmark Output

Run the bundled benchmark example:

./gradlew runBenchExample

The Java benchmark now uses the same JSON/ASUN/BIN output style as the Go benchmark:

  Flat struct × 500 (8 fields, vec)
    Serialize:   JSON 16.22ms/60784B | ASUN 10.11ms(1.6x)/28327B(46.6%) | BIN 4.92ms(3.3x)/37230B(61.2%)
    Deserialize: JSON    22.09ms | ASUN     5.70ms(3.9x) | BIN     2.11ms(10.5x)

(46.6%) means the ASUN payload is 46.6% of the JSON size, not “saved 46.6%”.

Why ASUN Can Beat JSON (Gson)

Gson uses Java reflection for field access and tree-based intermediate representations. ASUN achieves 2–5x better performance through format advantages and JIT-friendly design:

  1. No Key Repetition: JSON repeats every key for every object. ASUN writes the schema once, then only values — 53% smaller output means less memory bandwidth.
  2. No Quoting Overhead: ASUN only quotes strings that contain special characters — most strings are emitted raw, saving " delimiter overhead.
  3. MethodHandle invokeExact: Pre-adapted handles match JIT-optimized direct field access. No boxing for primitives, no type adaptation at call site — vs Gson's reflection-based Field.get()/Field.set().
  4. SIMD Scanning: Character classification uses 256-bit vector operations, processing 32 bytes per cycle.
  5. ThreadLocal Buffer Pool: Reuses up to 1MB byte buffers, eliminating allocation pressure in the encode hot path.
  6. ISO-8859-1 String Fast Path: ASCII-only strings (the common case) use ISO_8859_1 charset for direct byte-copy construction — 2-3x faster than UTF-8 validation.
  7. Direct Double Parsing: POW10 lookup table parses intPart + fracVal / 10^n without String allocation — avoids Double.parseDouble() for simple decimals.
  8. No Intermediate Tree: Gson builds JsonElement trees during parsing. ASUN decodes directly into target objects with zero intermediate allocations.

Supported Types

ASUN Type ASUN Text ASUN Binary
bool true/false 1 byte (0/1)
int decimal 4/8/2/1 bytes LE
float decimal 4/8 bytes IEEE754 LE
str plain or "quoted" u32 length + UTF-8
char single char string 2 bytes LE
Optional<T> value or empty u8 tag + payload
List<T> [v1,v2,...] u32 count + elements
Entry-list collection [(k1,v1),(k2,v2)] u32 count + elements
Nested struct (f1,f2,...) fields in order

Native Map<K,V> fields are intentionally unsupported. Model keyed data as an explicit entry struct list instead.

Build & Run

# Requirements: JDK 21+, Gradle 9.1+ (wrapper included)
./gradlew test
./gradlew runBasicExample
./gradlew runComplexExample
./gradlew runBenchExample

Project Structure

src/main/java/io/asun/
├── Asun.java          — Public API + text encoder (CHAR_CLASS, DEC_DIGITS, single-pass writeString)
├── ClassMeta.java     — Per-class metadata cache (FieldMeta, MethodHandle invokeExact, type tags)
├── AsunDecoder.java   — SIMD-accelerated text decoder (POW10, skipWs, ISO-8859-1 fast path)
├── AsunBinary.java    — Binary codec (LE wire format)
├── ByteBuffer.java    — ThreadLocal byte buffer pool (1MB max, 2× growth, ASCII tracking)
├── SimdUtils.java     — SIMD utilities (ByteVector 256/128)
├── AsunException.java — Runtime exception
└── examples/
    ├── BasicExample.java    — 12 basic examples
    ├── ComplexExample.java  — 14 complex examples
    └── BenchExample.java    — Full benchmark suite (JSON / ASUN / BIN)

Contributors

Latest Benchmarks

Benchmark results vary by JDK, CPU, and heap settings. For the current machine, run:

./gradlew runBenchExample

That example prints the canonical JSON/ASUN/BIN comparison format used across the language implementations in this repository.

About

Java / Kotlin version for ASUN, high performance, replace json, LLM, save tokens

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors