When producing data, developers often want to use the language they're fluent in to define how their data looks. Asking a developer to create a schema in the language of the protocol is friction they'd rather not face.
In our case, developers building in TypeScript who want to produce Avro can define their schemas with TypeScript types
and interfaces.
This tool lets them generate an Avro Schema (avsc) out of their TypeScript file.
- TypeScript to Avro Schema
Use this to get a schema you can share (via a schema registry or directly with your consumers).
input.ts
export interface MyInterface {
optionalBool?: boolean;
requiredBytes: Buffer;
optionalString?: string;
requiredDouble: number;
}Note that the interface is exported, which is a requirement.
MyInterface.avsc
{
"name": "MyInterface",
"fields": [
{
"name": "optionalBool",
"type": [
"null",
"boolean"
]
},
{
"name": "requiredBytes",
"type": "bytes"
},
{
"name": "optionalString",
"type": [
"null",
"string"
]
},
{
"name": "requiredDouble",
"type": "double"
}
],
"type": "record"
}
Use this when you want to produce Avro using a TypeScript interface.
Generates a typed serializer using
the avsc library.
input.ts
export interface MyInterface {
someField?: string;
}MyInterface.serializer.ts
import avro from 'avsc';
import {MyInterface} from './input';
const exactType = avro.Type.forSchema({"name": "MyInterface","fields": [{"name": "someField", "type": "string"}],"type": "record"});
export default function serialize(value: MyInterface): Buffer {
return exactType.toBuffer({
someField: value.someField === undefined ? null : value.someField
});
}Why does the serializer manually convert undefineds to nulls?
In TypeScript, the idiomatic way to denote optionality is using the ? modifier.
When an optional field is empty, it is 'set' to undefined.
However, in Avro emptiness of optional fields is always denoted using null.
The "manual" conversions are a type-safe way of converting between these two idioms.
Avro has types far richer than the default JSON type system, including logical types. However, it is a conscious choice not to create types that complicate users' domain.
Naturally supported types are:
| TypeScript | Avro |
|---|---|
boolean |
boolean |
Buffer |
bytes |
string |
string |
number |
double |
null |
null |
- Literal types are also automatically translated to their respective Avro types except for strings, see string literals.
- Optional fields (e.g.
f?: number) will produce nullable types (["null", "number"]). - Arrays are translated to Avro arrays, and their item types are converted recursively (also considering type narrowing, if it exists).
- Unions that qualify as Avro enums (string literals matching the regular expression
[A-Za-z_][A-Za-z0-9_]*) will be translated into those enums.
In case a string literal is used in a TypeScript type, it will be converted to an Avro enum, increasing fidelity and reducing the size of the output Avro to 1. e.g.:
export interface WithLit {
str: 'foo';
}Will result in the Avro schema:
{
"fields": [
{
"name": "str",
"type": {
"name": "foo",
"symbols": [
"foo"
],
"type": "enum"
}
}
],
"name": "WithLit",
"type": "record"
}
And when serialized will have an output of a single 0x00 byte.
If you're looking to further narrow the types you'll be producing, there are two ways to get there:
By using an annotation, you can let the tool know that you intend to produce a narrower type, e.g.:
// @avro int
thisWillBeAnInt: number;Supported types:
| Avro Type Annotation | Field Type |
|---|---|
@avro int |
number |
@avro float |
number |
@avro double |
number |
@avro long |
number |
@avro date |
number |
@avro time-millis |
number |
@avro time-micros |
number |
@avro timestamp-millis |
number |
@avro timestamp-micros |
number |
@avro local-timestamp-millis |
number |
@avro local-timestamp-micros |
number |
@avro uuid |
string |
You can see an example of all these conversions from TypeScript to Avro in the tests.
By using an type from this tool, you can let it know that you intend to produce a narrower type, e.g.:
thisWillBeAnInt: AvroInt;The library type is only a type alias of the base JSON type to reduce friction when working with the interfaces.
Supported types:
| Avro Type Annotation | Field Type |
|---|---|
AvroInt |
number |
AvroFloat |
number |
AvroDouble |
number |
AvroLong |
number |
AvroDate |
number |
AvroTimeMillis |
number |
AvroTimeMicros |
number |
AvroTimestampMillis |
number |
AvroTimestampMicros |
number |
AvroLocalTimestampMillis |
number |
AvroLocalTimestampMicros |
number |
AvroUuid |
string |
You can see an example of all these conversions from TypeScript to Avro in the tests.
Don't... but if you have to:
- Clone the repository
- Make sure to use the Node version in the
.nvmrcfile (I recommend usingnvm install). - Run
npm run bootstrapto bootstrap the tests. - Run
npm run buildto build the Typescript. - Run
npm run testto make sure it all worked. - Turn back.
First you'll need to install the command line utility: npm install -g ..
Run ts2avsc --help to see the following:
Usage: ts2avsc [options] <source.ts> [target-directory]
Convert a TypeScript file to a set of Avro Schemas and/or Serializers
Arguments:
source.ts The typescript file containing the type definitions
target-directory The directory in which to place the output files (default: ".")
Options:
-V, --version output the version number
--no-schemas Generate schemas
--pretty Pretty print Schema files
--serializers Generate serializers
-h, --help display help for command
Example usage:
🦆 ts2avsc --serializers --pretty tests/cases/007-two-base-types/input.ts ./tests/cases/007-two-base-types
- Writing schemas...
+ Writing Interface7.avsc...
+ Writing Type7.avsc...
- Writing serializers...
+ Writing Interface7.serializer.ts...
+ Writing Type7.serializer.ts...
All done!
You can see the input and outputs for this call in ./tests/cases/007-two-base-types.
When you've come to regret your decision, you can get rid of the command line utility with npm uninstall -g ..
Sure, I mean, you could, why not :)
The design is compositional, which can be seen by reading src/generator/typescript-to-avsc.ts:
Translates an input TypeScript interface into an Avro schema
graph LR
interface.ts-- toAst -->ParsedAst
ParsedAst-- toAvroSchema -->Schemas
Schemas-- writeAvsc -->schema.avsc
Translates an input TypeScript interface into a typed Avro serializer
graph LR
interface.ts-- toAst -->ParsedAst
ParsedAst-- toAvroSchema -->Schemas
Schemas-- toAvroSerializer -->serializer.ts
- Files in src/generator/typescript are responsible for parsing the input to an intermediate model.
- Files in src/generator/avcs are responsible for converting the above model to a model of the Avro schemas and serializing each one.
- Files in src/generator/avsc-lib are responsible for using the above model of the Avro schema and creating a typed serializer for each supported library.
- Test errors and make sure coverage is decent
- Add language features:
- long types
- decimal
- duration
- aliases
- default values
- order (fields)
- maps
- unions as fields (ts -> avro)
- unions as the type itself
- fixed
- type references outside the file's scope
- not just the interface in the same file
- run tests on the serializer/deserializer to make sure they do what they're supposed to
- Document multiple root types (schema and serializer outputs)
- Split the command line tool from the types library
- Consider using actual newtypes for types library
Copyright 2022 Omer van Kloeten
(I'll remove the copyright once this is no longer a WIP project)
Prior art: https://github.com/lbovet/typson