Replicon v1.0.2

Replicon is a high-throughput, lightweight, single-node per datacenter (DC) JSON ingest + replication engine over HTTP/HTTPS.

It accepts JSON payloads via POST /ingress/{channel}/add (single object or array of objects), buffers them durably (file queue or Redis/Valkey Streams), writes to local storage in background (file / PostgreSQL / MySQL), and asynchronously replicates events between DC nodes without cross-DC quorum (no write blocking on network partitions).

This repository is binary-only and contains deployable runtime assets (binaries, configs, install/uninstall scripts).

At a glance

HTTP and HTTPS ingest endpoints
Dynamic channels via channels.yml (per-channel API keys + validation, optional schema mapping)
Async pipeline: ingest -> durable queue -> writer workers -> storage
Queue backends: file, redis (Streams + consumer group + DLQ)
Storage backends: file, postgres, mysql, noop
Pull-based replication over HTTP or HTTPS (per-peer + per-channel cursors)
SQL schema auto-create on startup (tables + optional indexes from channels.yml)

Throughput note:

In lab tests with HTTP keep-alive and small JSON payloads, ingest acceptance has reached ~5000-8000 req/s on small 1-2 vCPU nodes. Actual throughput depends on payload size, validation, TLS, queue/storage backend, and writer batch settings.

Repository layout (published release repo)

.
├── README.md
├── INSTALL.md
├── CHANGELOG.md
├── install.sh
├── uninstall.sh
├── SHA256SUMS.txt
├── bin/
│   ├── replicon-linux-amd64
│   ├── replicon-linux-arm64
│   ├── replicon-linux-arm
│   ├── replicon-darwin-amd64
│   └── replicon-darwin-arm64
└── conf/
    ├── replicon.conf
    ├── channels.yml
    └── replicon.service

Quick install

Linux (default backend: PostgreSQL):

curl -fsSL https://raw.githubusercontent.com/jwillberg/replicon/main/install.sh -o install.sh
chmod +x install.sh
./install.sh --repo jwillberg/replicon --version v1.0.2 --provision

MySQL backend:

./install.sh --repo jwillberg/replicon --version v1.0.2 --db-backend mysql --provision

More install options are in INSTALL.md.

Installed paths (Linux installer defaults)

binary: /usr/local/bin/replicon
config: /etc/replicon/replicon.conf
channels: /etc/replicon/channels.yml
systemd unit: /etc/systemd/system/replicon.service
runtime data dir: /var/lib/replicon

The default systemd unit sets WorkingDirectory=/var/lib/replicon, so relative paths like data/queue/... and data/db/... in replicon.conf resolve under /var/lib/replicon. It also sets StartLimitIntervalSec=60 and StartLimitBurst=5 to avoid tight restart loops on startup errors.

Tested platforms (2026-02-19)

Smoke-tested end-to-end with:

install.sh --provision
file backend ingest test
redis + postgres ingest test
redis + mysql ingest test
uninstall.sh --purge

Tested operating systems:

Ubuntu 24.04
Ubuntu 22.04
Debian 13 (trixie)
Debian 12 (bookworm)
Fedora
CentOS
Rocky Linux 10
AlmaLinux 10
openSUSE Leap 16.0

Supported by installer package-manager logic:

apt-get (Ubuntu, Debian)
dnf / yum (Fedora, Rocky, Alma, CentOS)
zypper (openSUSE, SLES)

Key goals

Always writable per DC: local writes succeed even if inter-DC links are down.
Eventual consistency between DCs via pull-based replication.
Simple ops: no Kafka, no DB clustering required.
Idempotent replication: safe to retry, no duplicates on replication apply.

Architecture (per DC)

HTTP ingest API: POST /ingress/{channel}/add
Durable queue:
- file backend (append-only log + ack cursor), or
- redis backend (Redis/Valkey Streams + consumer group + DLQ)
Writer: consumes queue asynchronously and writes to configured storage backend
Replicator: pulls missing events from peers and applies locally

Any load balancer can route traffic to any healthy DC node.

Data model (minimal)

Replicon works best with a canonical append-only event log.

Canonical event identity

Each DC produces its own monotonic sequence:

origin_dc (string, e.g. dc1) is node_name
origin_seq (monotonic integer per origin_dc)

Primary key: (origin_dc, origin_seq)

This avoids global counters and makes replication trivial:

"Give me events after origin_seq=N for origin_dc=dc2"

Canonical raw log table (`events_raw`)

When canonical raw storage is needed (any raw_only channel, or any structured channel with store_raw=true), the SQL backends use a fixed canonical table name: events_raw.

Minimal schema (PostgreSQL):

CREATE TABLE IF NOT EXISTS events_raw (
  origin_dc    TEXT        NOT NULL,
  origin_seq   BIGINT      NOT NULL DEFAULT nextval('replicon_origin_seq'),
  node_name    TEXT        NOT NULL,
  channel      TEXT        NOT NULL,
  received_at  TIMESTAMPTZ NOT NULL DEFAULT now(),
  json_data    JSONB       NOT NULL,
  PRIMARY KEY (origin_dc, origin_seq)
);

Minimal schema (MySQL/MariaDB):

CREATE TABLE IF NOT EXISTS events_raw (
  origin_dc    VARCHAR(191) NOT NULL,
  origin_seq   BIGINT       NOT NULL,
  node_name    VARCHAR(191) NOT NULL,
  channel      VARCHAR(191) NOT NULL,
  received_at  DATETIME(6)  NOT NULL DEFAULT CURRENT_TIMESTAMP(6),
  json_data    JSON         NOT NULL,
  PRIMARY KEY (origin_dc, origin_seq)
) ENGINE=InnoDB;

Replication cursor table (`replication_progress`)

Replication progress is tracked per peer and per channel:

CREATE TABLE IF NOT EXISTS replication_progress (
  peer_dc                  TEXT/VARCHAR NOT NULL,
  channel                  TEXT/VARCHAR NOT NULL,
  last_origin_seq_received BIGINT       NOT NULL DEFAULT 0,
  updated_at               TIMESTAMPTZ/DATETIME NOT NULL,
  PRIMARY KEY (peer_dc, channel)
);

The schema is auto-created on startup when storage_backend=postgres or storage_backend=mysql.

Runtime flow

Client sends JSON object (single) or JSON array of objects (batch) to POST /ingress/{channel}/add
Replicon validates auth + channel rules
Event is accepted to queue (202 Accepted)
Writer workers flush queued events to storage in batches
Replicator pulls missing events from peers and applies locally

HTTP API

Health

GET /healthz

Returns node and ingest/writer status.

It includes:

node (node_name)
gomaxprocs (effective CPU parallelism)
ingest stats: queue_backend, queue_depth, storage_backend, writer_workers, counters, last_error

Public ingest

POST /ingress/{channel}/add
Headers:
- Authorization: Bearer <channel_api_key>
- Content-Type: application/json (recommended)

Body:

may be:
- one JSON object
- one JSON array of JSON objects
for array input, item count is limited by max_batch_items
max size is enforced by max_body_bytes
structured channels enforce per-channel schema/validation rules from channels.yml (per item)

Response codes:

202 accepted to queue (accepted_count included for batch requests)
400 invalid JSON / schema validation error / invalid batch shape
401 auth error (missing/invalid API key)
404 unknown/disabled channel
405 method not allowed
413 payload too large
429 queue is full (only with queue_backend=file when queue_buffer is reached)
503 enqueue failed / timeout

Batch enqueue note:

if enqueue fails after some items were already accepted, response is JSON with:
- status: "partial"
- accepted_count
- failed_index
- error

Example:

curl -X POST "http://127.0.0.1:8080/ingress/events_raw/add" \
  -H "Authorization: Bearer key-abc123" \
  -H "Content-Type: application/json" \
  -d '{"source":"collector-1","event":"login_fail","ts":"2026-02-19T12:00:00Z"}'

Structured channel example (abuse_reports after enabling it in channels.yml):

curl -X POST "http://127.0.0.1:8080/ingress/abuse_reports/add" \
  -H "Authorization: Bearer key-abuse-001" \
  -H "Content-Type: application/json" \
  -d '{"report_time":"2026-02-19T12:00:00Z","attacker_ip":"1.2.3.4","reporter_ip":"5.6.7.8","service":"ssh","comment":"blocked by firewall"}'

Batch example (same endpoint):

curl -X POST "http://127.0.0.1:8080/ingress/events_raw/add" \
  -H "Authorization: Bearer key-abc123" \
  -H "Content-Type: application/json" \
  -d '[{"source":"collector-1","event":"login_fail","ts":"2026-02-19T12:00:00Z"},{"source":"collector-1","event":"portscan","ts":"2026-02-19T12:00:01Z"}]'

Expected client-visible failures:

# wrong API key -> 401
curl -i -X POST "http://127.0.0.1:8080/ingress/events_raw/add" \
  -H "Authorization: Bearer wrong-key" \
  -H "Content-Type: application/json" \
  -d '{"source":"collector-1"}'

# disabled example channel (default conf) -> 404
curl -i -X POST "http://127.0.0.1:8080/ingress/abuse_reports/add" \
  -H "Authorization: Bearer key-abuse-001" \
  -H "Content-Type: application/json" \
  -d '{"report_time":"2026-02-19T12:00:00Z","attacker_ip":"1.2.3.4","reporter_ip":"5.6.7.8","service":"ssh"}'

# invalid batch item -> 400
curl -i -X POST "http://127.0.0.1:8080/ingress/events_raw/add" \
  -H "Authorization: Bearer key-abc123" \
  -H "Content-Type: application/json" \
  -d '[{"source":"collector-1"}, 123]'

Client implementation checklist:

Reuse connections (HTTP keep-alive).
Keep batch item count <= max_batch_items when using array payloads.
Retry on 429 and 503 with exponential backoff + jitter.
Treat 202 as accepted-to-queue (asynchronous write).
Treat 4xx as client/config issue (fix payload/key/channel before retrying).

Client connection behavior (important)

Reuse HTTP connections (keep-alive on). Do not open/close a new TCP connection for every event.
Keep one long-lived HTTP client instance per process and use its connection pool.
For HTTPS, connection reuse matters even more because TLS handshakes are expensive.
For load tests, ab without -k usually underestimates throughput because of connect/TIME_WAIT overhead.

Internal replication endpoint

GET /internal/events?channel=<name>&from_seq=<n>&limit=<k>
Header: X-Replicon-Token: <replication_token>

Used by Replicon peers only.

Notes:

limit is capped to 5000.
Only local-origin events are returned (where origin_dc == node_name).

Channel modes

Replicon supports two ingest/write behaviors per channel:

raw_only

Payload accepts one JSON object or an array of JSON objects.
With SQL storage (Postgres/MySQL): payload is stored in canonical raw log (events_raw.json_data).
With file storage: payload is stored as-is in JSONL file (no SQL schema mapping).

structured

Payload is validated against an allowlist of columns and types.
With SQL storage (Postgres/MySQL): payload is mapped into a structured table (created at startup if missing).
store_raw=true: structured row + canonical raw copy in events_raw
store_raw=false: structured-only write (no events_raw copy)

channels.yml (dynamic channels)

Replicon channels are defined in channels.yml.

Top-level general: shared defaults (optional)
channels.<name>: per-channel overrides

Default template contains:

events_raw: baseline raw_only channel (enabled)
abuse_reports: structured example (disabled by default)

Recommended model

Put shared defaults in top-level general.
Add only channel-specific differences under each channel key.
Keep optional examples disabled with enabled: false.

`channels.yml` schema reference

Top-level:

channels is required (must contain at least one channel key)
general is optional (shared defaults for all channels)

general keys (all optional):

enabled (true|false, default true)
api_keys (list of bearer keys)
mode (raw_only|structured, default raw_only)
on_unknown_field (reject|drop, default reject, structured mode only)
store_raw (true|false, default true, structured mode only)

Per-channel keys:

enabled (true|false, default from general.enabled, otherwise true)
- enables is accepted as a typo-friendly alias for enabled
api_keys (required for enabled channel unless inherited from general.api_keys)
mode (raw_only|structured, default from general.mode, otherwise raw_only)
on_unknown_field (reject|drop, default from general.on_unknown_field, otherwise reject, structured mode only)
store_raw (true|false, default from general.store_raw, otherwise true, structured mode only)
table (required only when mode=structured)
columns (required only when mode=structured)
required (optional; must be subset of columns)
field_types (optional map; default type is text for unspecified columns)
indexes (optional list; structured mode only)
indexes[].name (optional; auto-generated if omitted)
indexes[].unique (optional; default false)
indexes[].columns (required; list of column or column DESC)

Example `channels.yml`

general:
  enabled: true
  api_keys: [key-abc123]
  on_unknown_field: reject
  store_raw: true

channels:
  # baseline raw channel name (you can rename/remove this)
  events_raw:
    mode: raw_only

  # structured example (disabled by default)
  abuse_reports:
    enabled: false
    api_keys: [key-abuse-001]
    mode: structured
    store_raw: false
    table: abuse_reports
    columns: [report_time, attacker_ip, reporter_ip, service, comment]
    required: [report_time, attacker_ip, reporter_ip, service]
    field_types:
      report_time: timestamptz
      attacker_ip: inet
      reporter_ip: inet
      service: text
      comment: text

Structured `field_types` supported

text (varchar, character varying, char, character aliases)
timestamptz (timestamp with time zone)
timestamp (timestamp without time zone)
date
time (time without time zone)
timetz (time with time zone)
inet, cidr, macaddr, macaddr8
smallint, integer, bigint
boolean
real, double precision
numeric (decimal alias)
uuid
bytea
jsonb (json alias)
one-dimensional arrays with [] suffix (for supported base types)

Also accepted as aliases with modifiers:

varchar(n), character varying(n), char(n), character(n) -> text
numeric(p,s), decimal(p,s) -> numeric
timestamp(n) -> timestamp
timestamp(n) with time zone -> timestamptz
time(n) -> time
time(n) with time zone -> timetz

Also accepted as common SQL synonyms:

int2 -> smallint
int / int4 -> integer
int8 -> bigint
bool -> boolean
float4 -> real
float8 / float -> double precision

Dialect notes:

PostgreSQL uses native types
MySQL stores network/timezone-specific types as validated strings/UTC timestamps
MySQL stores jsonb and arrays as JSON-compatible storage

Index support (structured channels only)

Structured channel indexes are optional and created at startup.

Notes:

Index names must be unique across all enabled channels.
Index columns may include:
- structured payload columns defined in columns
- canonical structured columns: origin_dc, origin_seq, node_name, channel, received_at
DESC ordering is accepted (column DESC), but exact behavior depends on your MySQL/MariaDB version.

SQL schema management (auto)

When storage_backend=postgres or storage_backend=mysql, Replicon auto-creates:

per-DC origin sequence (replicon_origin_seq)
canonical raw table events_raw + indexes (only if needed by enabled channels)
structured channel tables from channels.yml (and optional indexes)
replication cursor table replication_progress (per peer + per channel)

replicon.conf essentials

Important keys:

Identity/runtime:
- node_name
- channels_file
- cpu (-1, *, or auto = runtime default/all cores; N fixed)
- max_body_bytes
- max_batch_items (max JSON objects when ingest body is an array)
Security:
- client_api_key (fallback if channels file missing)
- replication_token
Listeners:
- http_listen
- https_enabled, https_listen, tls_cert_file, tls_key_file
Queue:
- queue_backend=file|redis
- Redis Streams settings (queue_redis_*)
Writer:
- writer_enabled
- writer_workers
- writer_batch_size
- writer_batch_flush_ms
Storage:
- storage_backend=file|postgres|mysql|noop
- storage_postgres_dsn / storage_mysql_dsn
Replication:
- replication_enabled
- replication_peer (repeatable or comma-separated on one line)
- replication_poll_interval_ms
- replication_batch_limit

Replication model

Pull-based (no cross-DC quorum)
Per-channel cursors tracked in replication_progress
Idempotent apply on canonical keys (origin_dc, origin_seq)
Works over HTTP or HTTPS depending on peer URL (http:// / https://)
Requires storage_backend=postgres or storage_backend=mysql (replication reads/writes via SQL)

Timeouts and recovery:

replication_http_timeout_ms is per request to a peer.
If a peer is down for minutes/hours, Replicon keeps polling and will catch up when connectivity returns.
Replication follows enabled channel set from channels.yml (one cursor per peer + channel).

Example peers:

replication_enabled = true
replication_peer = dc2=http://10.0.1.2:8080
replication_peer = dc3=https://dc3.example.com

Replication loop (conceptually):

read last seq for (peer, channel) from replication_progress
call peer /internal/events?channel=<name>&from_seq=last+1&limit=batch_limit
insert idempotently into local SQL storage
update cursor to newest seq applied

Structured-only replication:

For structured channels with store_raw=false, Replicon replicates events by reading/writing the structured table directly (with the same (origin_dc, origin_seq) primary key).

Backend and queue recommendations

Small/simple node:
- queue_backend=file, storage_backend=file
Production SQL ingest:
- queue_backend=redis, storage_backend=postgres or mysql
Benchmark API overhead only:
- storage_backend=noop

Native HTTPS

Enable in conf/replicon.conf (or /etc/replicon/replicon.conf):

https_enabled = true
https_listen = 0.0.0.0:8443
tls_cert_file = /etc/replicon/tls/server.crt
tls_key_file = /etc/replicon/tls/server.key

With HTTPS enabled, Replicon can serve both:

HTTP on http_listen (if set)
HTTPS on https_listen

To run HTTPS-only, set http_listen = (empty) and keep https_enabled=true.

CPU concurrency

Configure cpu in replicon.conf:

# Use runtime default (recommended)
cpu = -1

# Same as above:
# cpu = *
# cpu = auto

# Pin to 2 OS threads
# cpu = 2

Behavior:

cpu = -1, *, or auto: keep Go runtime default GOMAXPROCS
cpu = N (N > 0): force GOMAXPROCS=N

Check effective value:

curl -s http://127.0.0.1:8080/healthz

Queue + writer backends

Writer batching controls:

writer_batch_size: max events per storage write call.
writer_batch_flush_ms: max wait for filling a batch before flushing a smaller batch.

Start points for SQL backends:

writer_batch_size=200
writer_batch_flush_ms=20

File queue (no Redis required)

queue_backend = file
queue_file_path = data/queue/ingest.log
queue_file_ack_path = data/queue/ingest.ack
queue_file_sync_ms = 200
queue_buffer = 10000

writer_enabled = true
writer_workers = 1
writer_batch_size = 200
writer_batch_flush_ms = 20

storage_backend = file
storage_file_path = data/db/events_raw.jsonl
storage_file_sync_ms = 200

Notes:

File queue is durable with periodic fsync (queue_file_sync_ms).
Backpressure is enforced via queue_buffer:
- if backlog reaches queue_buffer, ingest returns 429.

Redis/Valkey queue (Streams)

queue_backend = redis

# Option A: TCP
# queue_redis_network = tcp
# queue_redis_addr = 127.0.0.1:6379

# Option B: Unix socket
# queue_redis_network = unix
# queue_redis_addr = /var/run/redis/redis.sock

queue_redis_network = tcp
queue_redis_addr = 127.0.0.1:6379
queue_redis_stream = replicon:ingest
queue_redis_group = writer
queue_redis_consumer = replicon
queue_redis_dlq_stream = replicon:ingest:dlq
queue_redis_maxlen = 5000000
queue_redis_max_retries = 5
queue_redis_timeout_ms = 1500

Notes:

Redis backend uses XADD + XREADGROUP + XACK (consumer group).
Failed events are re-queued with incremented retry counter up to queue_redis_max_retries, then moved to queue_redis_dlq_stream.
If you set queue_redis_maxlen, Redis will trim the stream approximately: if writers cannot keep up, old queue entries may be dropped (data loss).
For durability, enable AOF in Redis/Valkey (appendonly yes, appendfsync everysec).

Storage backends

`storage_backend=file`

Writes append-only JSONL to storage_file_path.
Intended for simple setups and debugging, not for fast querying.
Structured table projection is not available (structured mode still validates payloads).

`storage_backend=postgres`

storage_backend = postgres
storage_postgres_dsn = postgres://replicon:replicon-secret@127.0.0.1:5432/replicon?sslmode=disable
storage_postgres_connect_timeout_ms = 3000
storage_postgres_max_open_conns = 16
storage_postgres_max_idle_conns = 16
storage_postgres_conn_max_lifetime_min = 30

`storage_backend=mysql`

storage_backend = mysql
storage_mysql_dsn = replicon:replicon-secret@tcp(127.0.0.1:3306)/replicon?parseTime=true
storage_mysql_connect_timeout_ms = 3000
storage_mysql_max_open_conns = 16
storage_mysql_max_idle_conns = 16
storage_mysql_conn_max_lifetime_min = 30

`storage_backend=noop`

Discards writes (benchmark/testing).

Client best practices

Reuse connections (HTTP keep-alive)
Prefer persistent client pools (do not reconnect every event)
Retry 429/503 with exponential backoff + jitter
Treat 202 as accepted-to-queue (async write)

Verification commands

Service + health:

systemctl status replicon --no-pager
curl -s http://127.0.0.1:8080/healthz

Quick ingest test:

curl -X POST "http://127.0.0.1:8080/ingress/events_raw/add" \
  -H "Authorization: Bearer key-abc123" \
  -H "Content-Type: application/json" \
  -d '{"source":"smoke","ts":"2026-02-19T12:00:00Z"}'

SQL verification (optional):

PostgreSQL:

psql "postgres://replicon:replicon-secret@127.0.0.1:5432/replicon?sslmode=disable" -c "SELECT COUNT(*) FROM events_raw;"

MySQL/MariaDB:

mysql -u replicon -p -h 127.0.0.1 -P 3306 replicon -e "SELECT COUNT(*) FROM events_raw;"

Notes for Alma/RHEL-family

Installer v1.0.2 supports Redis-compatible package fallback on newer RHEL-family systems (for example valkey where redis package is not available).

Uninstall

./uninstall.sh
./uninstall.sh --purge

More docs

INSTALL.md for install/provision options
CHANGELOG.md for release history

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
bin		bin
conf		conf
CHANGELOG.md		CHANGELOG.md
INSTALL.md		INSTALL.md
README.md		README.md
SHA256SUMS.txt		SHA256SUMS.txt
install.sh		install.sh
uninstall.sh		uninstall.sh

Folders and files

Latest commit

History

Repository files navigation

Replicon v1.0.2

At a glance

Repository layout (published release repo)

Quick install

Installed paths (Linux installer defaults)

Tested platforms (2026-02-19)

Key goals

Architecture (per DC)

Data model (minimal)

Canonical event identity

Canonical raw log table (events_raw)

Replication cursor table (replication_progress)

Runtime flow

HTTP API

Health

Public ingest

Client connection behavior (important)

Internal replication endpoint

Channel modes

channels.yml (dynamic channels)

Recommended model

channels.yml schema reference

Example channels.yml

Structured field_types supported

Index support (structured channels only)

SQL schema management (auto)

replicon.conf essentials

Replication model

Backend and queue recommendations

Native HTTPS

CPU concurrency

Queue + writer backends

File queue (no Redis required)

Redis/Valkey queue (Streams)

Storage backends

storage_backend=file

storage_backend=postgres

storage_backend=mysql

storage_backend=noop

Client best practices

Verification commands

Notes for Alma/RHEL-family

Uninstall

More docs

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Canonical raw log table (`events_raw`)

Replication cursor table (`replication_progress`)

`channels.yml` schema reference

Example `channels.yml`

Structured `field_types` supported

`storage_backend=file`

`storage_backend=postgres`

`storage_backend=mysql`

`storage_backend=noop`

Packages