Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
294 changes: 2 additions & 292 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,309 +7,19 @@ Minimal HTTP [ClickHouse](https://clickhouse.com) client for Elixir.

Used in [Ecto ClickHouse adapter.](https://github.com/plausible/ecto_ch)

### Key features

- RowBinary
- Native query parameters
- Per query settings
- Minimal API

Your ideas are welcome [here.](https://github.com/plausible/ch/issues/82)

## Installation

```elixir
defp deps do
[
{:ch, "~> 0.7.0"}
{:ch, "~> 0.9.0"}
]
end
```

## Usage

#### Start [DBConnection](https://github.com/elixir-ecto/db_connection) pool

```elixir
defaults = [
scheme: "http",
hostname: "localhost",
port: 8123,
database: "default",
settings: [],
pool_size: 1,
timeout: :timer.seconds(15)
]

# note that starting in ClickHouse 25.1.3.23 `default` user doesn't have
# network access by default in the official Docker images
# see https://github.com/ClickHouse/ClickHouse/pull/75259
{:ok, pid} = Ch.start_link(defaults)
```

#### Select rows

```elixir
{:ok, pid} = Ch.start_link()

{:ok, %Ch.Result{rows: [[0], [1], [2]]}} =
Ch.query(pid, "SELECT * FROM system.numbers LIMIT 3")

{:ok, %Ch.Result{rows: [[0], [1], [2]]}} =
Ch.query(pid, "SELECT * FROM system.numbers LIMIT {$0:UInt8}", [3])

{:ok, %Ch.Result{rows: [[0], [1], [2]]}} =
Ch.query(pid, "SELECT * FROM system.numbers LIMIT {limit:UInt8}", %{"limit" => 3})
```

Note on datetime encoding in query parameters:

- `%NaiveDateTime{}` is encoded as text to make it assume the column's or ClickHouse server's timezone
- `%DateTime{}` is encoded as unix timestamp and is treated as UTC timestamp by ClickHouse

#### Select rows (lots of params, reverse proxy)

> [!NOTE]
>
> Support for multipart requests was added in `v0.6.2`

For queries with many parameters the resulting URL can become too long for some reverse proxies, resulting in a `414 Request-URI Too Large` error.

To avoid this, you can use the `multipart: true` option to send the query and parameters in the request body.

```elixir
{:ok, pid} = Ch.start_link()

# Moves parameters from the URL to a multipart/form-data body
%Ch.Result{rows: [[[1, 2, 3 | _rest]]]} =
Ch.query!(pid, "SELECT {ids:Array(UInt64)}", %{"ids" => Enum.to_list(1..10_000)}, multipart: true)
```

> [!NOTE]
>
> `multipart: true` is currently required on each individual query. Support for pool-wide configuration is planned for a future release.

#### Insert rows

```elixir
{:ok, pid} = Ch.start_link()

Ch.query!(pid, "CREATE TABLE IF NOT EXISTS ch_demo(id UInt64) ENGINE Null")

%Ch.Result{num_rows: 2} =
Ch.query!(pid, "INSERT INTO ch_demo(id) VALUES (0), (1)")

%Ch.Result{num_rows: 2} =
Ch.query!(pid, "INSERT INTO ch_demo(id) VALUES ({$0:UInt8}), ({$1:UInt32})", [0, 1])

%Ch.Result{num_rows: 2} =
Ch.query!(pid, "INSERT INTO ch_demo(id) VALUES ({a:UInt16}), ({b:UInt64})", %{"a" => 0, "b" => 1})

%Ch.Result{num_rows: 2} =
Ch.query!(pid, "INSERT INTO ch_demo(id) SELECT number FROM system.numbers LIMIT {limit:UInt8}", %{"limit" => 2})
```

#### Insert rows as [RowBinary](https://clickhouse.com/docs/en/interfaces/formats/RowBinary) (efficient)

```elixir
{:ok, pid} = Ch.start_link()

Ch.query!(pid, "CREATE TABLE IF NOT EXISTS ch_demo(id UInt64) ENGINE Null")

types = ["UInt64"]
# or
types = [Ch.Types.u64()]
# or
types = [:u64]

%Ch.Result{num_rows: 2} =
Ch.query!(pid, "INSERT INTO ch_demo(id) FORMAT RowBinary", [[0], [1]], types: types)
```

Note that RowBinary format encoding requires `:types` option to be provided.

Similarly, you can use [RowBinaryWithNamesAndTypes](https://clickhouse.com/docs/en/interfaces/formats/RowBinaryWithNamesAndTypes) which would additionally do something like a type check.

```elixir
sql = "INSERT INTO ch_demo FORMAT RowBinaryWithNamesAndTypes"
opts = [names: ["id"], types: ["UInt64"]]
rows = [[0], [1]]

%Ch.Result{num_rows: 2} = Ch.query!(pid, sql, rows, opts)
```

#### Insert rows in custom [format](https://clickhouse.com/docs/en/interfaces/formats)

```elixir
{:ok, pid} = Ch.start_link()

Ch.query!(pid, "CREATE TABLE IF NOT EXISTS ch_demo(id UInt64) ENGINE Null")

csv = [0, 1] |> Enum.map(&to_string/1) |> Enum.intersperse(?\n)

%Ch.Result{num_rows: 2} =
Ch.query!(pid, "INSERT INTO ch_demo(id) FORMAT CSV", csv, encode: false)
```

#### Insert rows as chunked RowBinary stream

```elixir
{:ok, pid} = Ch.start_link()

Ch.query!(pid, "CREATE TABLE IF NOT EXISTS ch_demo(id UInt64) ENGINE Null")

stream = Stream.repeatedly(fn -> [:rand.uniform(100)] end)
chunked = Stream.chunk_every(stream, 100)
encoded = Stream.map(chunked, fn chunk -> Ch.RowBinary.encode_rows(chunk, _types = ["UInt64"]) end)
ten_encoded_chunks = Stream.take(encoded, 10)

%Ch.Result{num_rows: 1000} =
Ch.query(pid, "INSERT INTO ch_demo(id) FORMAT RowBinary", ten_encoded_chunks, encode: false)
```

This query makes a [`transfer-encoding: chunked`](https://en.wikipedia.org/wiki/Chunked_transfer_encoding) HTTP request while unfolding the stream resulting in lower memory usage.

#### Query with custom [settings](https://clickhouse.com/docs/en/operations/settings/settings)

```elixir
{:ok, pid} = Ch.start_link()

settings = [async_insert: 1]

%Ch.Result{rows: [["async_insert", "Bool", "0"]]} =
Ch.query!(pid, "SHOW SETTINGS LIKE 'async_insert'")

%Ch.Result{rows: [["async_insert", "Bool", "1"]]} =
Ch.query!(pid, "SHOW SETTINGS LIKE 'async_insert'", [], settings: settings)
```

## Caveats

#### NULL in RowBinary

It's the same as in [ch-go](https://clickhouse.com/docs/en/integrations/go#nullable)

> At insert time, Nil can be passed for both the normal and Nullable version of a column. For the former, the default value for the type will be persisted, e.g., an empty string for string. For the nullable version, a NULL value will be stored in ClickHouse.

```elixir
{:ok, pid} = Ch.start_link()

Ch.query!(pid, """
CREATE TABLE ch_nulls (
a UInt8 NULL,
b UInt8 DEFAULT 10,
c UInt8 NOT NULL
) ENGINE Memory
""")

types = ["Nullable(UInt8)", "UInt8", "UInt8"]
inserted_rows = [[nil, nil, nil]]
selected_rows = [[nil, 0, 0]]

%Ch.Result{num_rows: 1} =
Ch.query!(pid, "INSERT INTO ch_nulls(a, b, c) FORMAT RowBinary", inserted_rows, types: types)

%Ch.Result{rows: ^selected_rows} =
Ch.query!(pid, "SELECT * FROM ch_nulls")
```

Note that in this example `DEFAULT 10` is ignored and `0` (the default value for `UInt8`) is persisted instead.

However, [`input()`](https://clickhouse.com/docs/en/sql-reference/table-functions/input) can be used as a workaround:

```elixir
sql = """
INSERT INTO ch_nulls
SELECT * FROM input('a Nullable(UInt8), b Nullable(UInt8), c UInt8')
FORMAT RowBinary\
"""

Ch.query!(pid, sql, inserted_rows, types: ["Nullable(UInt8)", "Nullable(UInt8)", "UInt8"])

%Ch.Result{rows: [[0], [10]]} =
Ch.query!(pid, "SELECT b FROM ch_nulls ORDER BY b")
```

#### UTF-8 in RowBinary

When decoding [`String`](https://clickhouse.com/docs/en/sql-reference/data-types/string) columns non UTF-8 characters are replaced with `�` (U+FFFD). This behaviour is similar to [`toValidUTF8`](https://clickhouse.com/docs/en/sql-reference/functions/string-functions#tovalidutf8) and [JSON format.](https://clickhouse.com/docs/en/interfaces/formats#json)

```elixir
{:ok, pid} = Ch.start_link()

Ch.query!(pid, "CREATE TABLE ch_utf8(str String) ENGINE Memory")

bin = "\x61\xF0\x80\x80\x80b"
utf8 = "a�b"

%Ch.Result{num_rows: 1} =
Ch.query!(pid, "INSERT INTO ch_utf8(str) FORMAT RowBinary", [[bin]], types: ["String"])

%Ch.Result{rows: [[^utf8]]} =
Ch.query!(pid, "SELECT * FROM ch_utf8")

%Ch.Result{rows: %{"data" => [[^utf8]]}} =
pid |> Ch.query!("SELECT * FROM ch_utf8 FORMAT JSONCompact") |> Map.update!(:rows, &Jason.decode!/1)
```

To get raw binary from `String` columns use `:binary` type that skips UTF-8 checks.

```elixir
%Ch.Result{rows: [[^bin]]} =
Ch.query!(pid, "SELECT * FROM ch_utf8", [], types: [:binary])
```

#### Timezones in RowBinary

Decoding non-UTC datetimes like `DateTime('Asia/Taipei')` requires a [timezone database.](https://hexdocs.pm/elixir/DateTime.html#module-time-zone-database)

```elixir
Mix.install([:ch, :tz])

:ok = Calendar.put_time_zone_database(Tz.TimeZoneDatabase)

{:ok, pid} = Ch.start_link()

%Ch.Result{rows: [[~N[2023-04-25 17:45:09]]]} =
Ch.query!(pid, "SELECT CAST(now() as DateTime)")

%Ch.Result{rows: [[~U[2023-04-25 17:45:11Z]]]} =
Ch.query!(pid, "SELECT CAST(now() as DateTime('UTC'))")

%Ch.Result{rows: [[%DateTime{time_zone: "Asia/Taipei"} = taipei]]} =
Ch.query!(pid, "SELECT CAST(now() as DateTime('Asia/Taipei'))")

"2023-04-26 01:45:12+08:00 CST Asia/Taipei" = to_string(taipei)
```

Encoding non-UTC datetimes works but might be slow due to timezone conversion:

```elixir
Mix.install([:ch, :tz])

:ok = Calendar.put_time_zone_database(Tz.TimeZoneDatabase)

{:ok, pid} = Ch.start_link()

Ch.query!(pid, "CREATE TABLE ch_datetimes(name String, datetime DateTime) ENGINE Memory")

naive = NaiveDateTime.utc_now()
utc = DateTime.utc_now()
taipei = DateTime.shift_zone!(utc, "Asia/Taipei")

rows = [["naive", naive], ["utc", utc], ["taipei", taipei]]

Ch.query!(pid, "INSERT INTO ch_datetimes(name, datetime) FORMAT RowBinary", rows, types: ["String", "DateTime"])

%Ch.Result{
rows: [
["naive", ~U[2024-12-21 05:24:40Z]],
["utc", ~U[2024-12-21 05:24:40Z]],
["taipei", ~U[2024-12-21 05:24:40Z]]
]
} =
Ch.query!(pid, "SELECT name, CAST(datetime as DateTime('UTC')) FROM ch_datetimes")
```
See guides and tests for examples.

## [Benchmarks](./bench)

Expand Down
19 changes: 19 additions & 0 deletions bench/compress.exs
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
rowbinary = fn count ->
Enum.map(1..count, fn i ->
row = [i, "Golang SQL database driver", [1, 2, 3, 4, 5, 6, 7, 8, 9], DateTime.utc_now()]
Ch.RowBinary.encode_row(row, ["UInt64", "String", "Array(UInt8)", "DateTime"])
end)
end

Benchee.run(
%{
"zstd once" => fn input -> :zstd.compress(input) end,
"zstd stream" => fn input -> Compress.zstd_stream(input) end,
"nimble_lz4 once" => fn input -> NimbleLZ4.compress(input) end
Copy link
Copy Markdown
Collaborator Author

@ruslandoga ruslandoga Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first I thought about streaming compression each time we Ch.Buffer.add_row but it seems to be slower than doing it once in the end. And it complicates the API.

Results
Operating System: macOS
CPU Information: Apple M2
Number of Available Cores: 8
Available memory: 8 GB
Elixir 1.19.5
Erlang 28.3
JIT enabled: true

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: 1 rows, 100,000 rows, 1000 rows
Estimated total run time: 1 min 3 s
Excluding outliers: false

Benchmarking nimble_lz4 once with input 1 rows ...
Benchmarking nimble_lz4 once with input 100,000 rows ...
Benchmarking nimble_lz4 once with input 1000 rows ...
Benchmarking zstd once with input 1 rows ...
Benchmarking zstd once with input 100,000 rows ...
Benchmarking zstd once with input 1000 rows ...
Benchmarking zstd stream with input 1 rows ...
Benchmarking zstd stream with input 100,000 rows ...
Benchmarking zstd stream with input 1000 rows ...
Calculating statistics...
Formatting results...

##### With input 1 rows #####
Name                      ips        average  deviation         median         99th %
nimble_lz4 once      409.21 K        2.44 μs   ±323.54%        2.33 μs        3.04 μs
zstd once            378.85 K        2.64 μs   ±625.79%        2.21 μs        7.08 μs
zstd stream           10.25 K       97.54 μs   ±357.19%       83.13 μs      203.34 μs

Comparison:
nimble_lz4 once      409.21 K
zstd once            378.85 K - 1.08x slower +0.196 μs
zstd stream           10.25 K - 39.91x slower +95.09 μs

##### With input 100,000 rows #####
Name                      ips        average  deviation         median         99th %
nimble_lz4 once         76.57       13.06 ms     ±3.87%       13.02 ms       13.38 ms
zstd once               72.66       13.76 ms     ±3.34%       13.72 ms       14.45 ms
zstd stream             14.45       69.20 ms     ±8.87%       65.38 ms       81.65 ms

Comparison:
nimble_lz4 once         76.57
zstd once               72.66 - 1.05x slower +0.70 ms
zstd stream             14.45 - 5.30x slower +56.14 ms

##### With input 1000 rows #####
Name                      ips        average  deviation         median         99th %
nimble_lz4 once        7.84 K      127.53 μs     ±4.95%      126.25 μs      147.81 μs
zstd once              7.46 K      134.09 μs     ±2.88%      133.58 μs      148.21 μs
zstd stream            1.91 K      524.74 μs    ±16.48%      534.88 μs      809.10 μs

Comparison:
nimble_lz4 once        7.84 K
zstd once              7.46 K - 1.05x slower +6.55 μs
zstd stream            1.91 K - 4.11x slower +397.20 μs

},
inputs: %{
"1 rows" => rowbinary.(1),
"1000 rows" => rowbinary.(1000),
"100,000 rows" => rowbinary.(100_000)
}
)
16 changes: 16 additions & 0 deletions bench/support/compress.ex
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
defmodule Compress do
def zstd_stream(input) when is_list(input) do
{:ok, ctx} = :zstd.context(:compress)
zstd_stream_continue(input, ctx)
end

defp zstd_stream_continue([value | rest], ctx) do
{:continue, c} = :zstd.stream(ctx, value)
[c | zstd_stream_continue(rest, ctx)]
end

defp zstd_stream_continue([], ctx) do
{:done, c} = :zstd.finish(ctx, [])
c
end
end
Loading
Loading