Replicon is a high-throughput, lightweight, single-node per datacenter (DC) JSON ingest + replication engine over HTTP/HTTPS.
It accepts JSON payloads via POST /ingress/{channel}/add (single object or array of objects), buffers them durably (file queue or Redis/Valkey Streams),
writes to local storage in background (file / PostgreSQL / MySQL), and asynchronously replicates events between DC nodes
without cross-DC quorum (no write blocking on network partitions).
This repository is binary-only and contains deployable runtime assets (binaries, configs, install/uninstall scripts).
- HTTP and HTTPS ingest endpoints
- Dynamic channels via
channels.yml(per-channel API keys + validation, optional schema mapping) - Async pipeline: ingest -> durable queue -> writer workers -> storage
- Queue backends:
file,redis(Streams + consumer group + DLQ) - Storage backends:
file,postgres,mysql,noop - Pull-based replication over HTTP or HTTPS (per-peer + per-channel cursors)
- SQL schema auto-create on startup (tables + optional indexes from
channels.yml)
Throughput note:
- In lab tests with HTTP keep-alive and small JSON payloads, ingest acceptance has reached ~5000-8000 req/s on small 1-2 vCPU nodes. Actual throughput depends on payload size, validation, TLS, queue/storage backend, and writer batch settings.
.
├── README.md
├── INSTALL.md
├── CHANGELOG.md
├── install.sh
├── uninstall.sh
├── SHA256SUMS.txt
├── bin/
│ ├── replicon-linux-amd64
│ ├── replicon-linux-arm64
│ ├── replicon-linux-arm
│ ├── replicon-darwin-amd64
│ └── replicon-darwin-arm64
└── conf/
├── replicon.conf
├── channels.yml
└── replicon.service
Linux (default backend: PostgreSQL):
curl -fsSL https://raw.githubusercontent.com/jwillberg/replicon/main/install.sh -o install.sh
chmod +x install.sh
./install.sh --repo jwillberg/replicon --version v1.0.2 --provisionMySQL backend:
./install.sh --repo jwillberg/replicon --version v1.0.2 --db-backend mysql --provisionMore install options are in INSTALL.md.
- binary:
/usr/local/bin/replicon - config:
/etc/replicon/replicon.conf - channels:
/etc/replicon/channels.yml - systemd unit:
/etc/systemd/system/replicon.service - runtime data dir:
/var/lib/replicon
The default systemd unit sets WorkingDirectory=/var/lib/replicon, so relative paths like data/queue/...
and data/db/... in replicon.conf resolve under /var/lib/replicon.
It also sets StartLimitIntervalSec=60 and StartLimitBurst=5 to avoid tight restart loops on startup errors.
Smoke-tested end-to-end with:
install.sh --provision- file backend ingest test
- redis + postgres ingest test
- redis + mysql ingest test
uninstall.sh --purge
Tested operating systems:
- Ubuntu 24.04
- Ubuntu 22.04
- Debian 13 (trixie)
- Debian 12 (bookworm)
- Fedora
- CentOS
- Rocky Linux 10
- AlmaLinux 10
- openSUSE Leap 16.0
Supported by installer package-manager logic:
apt-get(Ubuntu, Debian)dnf/yum(Fedora, Rocky, Alma, CentOS)zypper(openSUSE, SLES)
- Always writable per DC: local writes succeed even if inter-DC links are down.
- Eventual consistency between DCs via pull-based replication.
- Simple ops: no Kafka, no DB clustering required.
- Idempotent replication: safe to retry, no duplicates on replication apply.
- HTTP ingest API:
POST /ingress/{channel}/add - Durable queue:
filebackend (append-only log + ack cursor), orredisbackend (Redis/Valkey Streams + consumer group + DLQ)
- Writer: consumes queue asynchronously and writes to configured storage backend
- Replicator: pulls missing events from peers and applies locally
Any load balancer can route traffic to any healthy DC node.
Replicon works best with a canonical append-only event log.
Each DC produces its own monotonic sequence:
origin_dc(string, e.g.dc1) isnode_nameorigin_seq(monotonic integer perorigin_dc)
Primary key: (origin_dc, origin_seq)
This avoids global counters and makes replication trivial:
- "Give me events after
origin_seq=Nfororigin_dc=dc2"
When canonical raw storage is needed (any raw_only channel, or any structured channel with store_raw=true),
the SQL backends use a fixed canonical table name: events_raw.
Minimal schema (PostgreSQL):
CREATE TABLE IF NOT EXISTS events_raw (
origin_dc TEXT NOT NULL,
origin_seq BIGINT NOT NULL DEFAULT nextval('replicon_origin_seq'),
node_name TEXT NOT NULL,
channel TEXT NOT NULL,
received_at TIMESTAMPTZ NOT NULL DEFAULT now(),
json_data JSONB NOT NULL,
PRIMARY KEY (origin_dc, origin_seq)
);Minimal schema (MySQL/MariaDB):
CREATE TABLE IF NOT EXISTS events_raw (
origin_dc VARCHAR(191) NOT NULL,
origin_seq BIGINT NOT NULL,
node_name VARCHAR(191) NOT NULL,
channel VARCHAR(191) NOT NULL,
received_at DATETIME(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6),
json_data JSON NOT NULL,
PRIMARY KEY (origin_dc, origin_seq)
) ENGINE=InnoDB;Replication progress is tracked per peer and per channel:
CREATE TABLE IF NOT EXISTS replication_progress (
peer_dc TEXT/VARCHAR NOT NULL,
channel TEXT/VARCHAR NOT NULL,
last_origin_seq_received BIGINT NOT NULL DEFAULT 0,
updated_at TIMESTAMPTZ/DATETIME NOT NULL,
PRIMARY KEY (peer_dc, channel)
);The schema is auto-created on startup when storage_backend=postgres or storage_backend=mysql.
- Client sends JSON object (single) or JSON array of objects (batch) to
POST /ingress/{channel}/add - Replicon validates auth + channel rules
- Event is accepted to queue (
202 Accepted) - Writer workers flush queued events to storage in batches
- Replicator pulls missing events from peers and applies locally
GET /healthz
Returns node and ingest/writer status.
It includes:
node(node_name)gomaxprocs(effective CPU parallelism)- ingest stats:
queue_backend,queue_depth,storage_backend,writer_workers, counters,last_error
POST /ingress/{channel}/add- Headers:
Authorization: Bearer <channel_api_key>Content-Type: application/json(recommended)
Body:
- may be:
- one JSON object
- one JSON array of JSON objects
- for array input, item count is limited by
max_batch_items - max size is enforced by
max_body_bytes - structured channels enforce per-channel schema/validation rules from
channels.yml(per item)
Response codes:
202accepted to queue (accepted_countincluded for batch requests)400invalid JSON / schema validation error / invalid batch shape401auth error (missing/invalid API key)404unknown/disabled channel405method not allowed413payload too large429queue is full (only withqueue_backend=filewhenqueue_bufferis reached)503enqueue failed / timeout
Batch enqueue note:
- if enqueue fails after some items were already accepted, response is JSON with:
status: "partial"accepted_countfailed_indexerror
Example:
curl -X POST "http://127.0.0.1:8080/ingress/events_raw/add" \
-H "Authorization: Bearer key-abc123" \
-H "Content-Type: application/json" \
-d '{"source":"collector-1","event":"login_fail","ts":"2026-02-19T12:00:00Z"}'Structured channel example (abuse_reports after enabling it in channels.yml):
curl -X POST "http://127.0.0.1:8080/ingress/abuse_reports/add" \
-H "Authorization: Bearer key-abuse-001" \
-H "Content-Type: application/json" \
-d '{"report_time":"2026-02-19T12:00:00Z","attacker_ip":"1.2.3.4","reporter_ip":"5.6.7.8","service":"ssh","comment":"blocked by firewall"}'Batch example (same endpoint):
curl -X POST "http://127.0.0.1:8080/ingress/events_raw/add" \
-H "Authorization: Bearer key-abc123" \
-H "Content-Type: application/json" \
-d '[{"source":"collector-1","event":"login_fail","ts":"2026-02-19T12:00:00Z"},{"source":"collector-1","event":"portscan","ts":"2026-02-19T12:00:01Z"}]'Expected client-visible failures:
# wrong API key -> 401
curl -i -X POST "http://127.0.0.1:8080/ingress/events_raw/add" \
-H "Authorization: Bearer wrong-key" \
-H "Content-Type: application/json" \
-d '{"source":"collector-1"}'
# disabled example channel (default conf) -> 404
curl -i -X POST "http://127.0.0.1:8080/ingress/abuse_reports/add" \
-H "Authorization: Bearer key-abuse-001" \
-H "Content-Type: application/json" \
-d '{"report_time":"2026-02-19T12:00:00Z","attacker_ip":"1.2.3.4","reporter_ip":"5.6.7.8","service":"ssh"}'
# invalid batch item -> 400
curl -i -X POST "http://127.0.0.1:8080/ingress/events_raw/add" \
-H "Authorization: Bearer key-abc123" \
-H "Content-Type: application/json" \
-d '[{"source":"collector-1"}, 123]'Client implementation checklist:
- Reuse connections (HTTP keep-alive).
- Keep batch item count <=
max_batch_itemswhen using array payloads. - Retry on
429and503with exponential backoff + jitter. - Treat
202as accepted-to-queue (asynchronous write). - Treat
4xxas client/config issue (fix payload/key/channel before retrying).
- Reuse HTTP connections (keep-alive on). Do not open/close a new TCP connection for every event.
- Keep one long-lived HTTP client instance per process and use its connection pool.
- For HTTPS, connection reuse matters even more because TLS handshakes are expensive.
- For load tests,
abwithout-kusually underestimates throughput because of connect/TIME_WAIT overhead.
GET /internal/events?channel=<name>&from_seq=<n>&limit=<k>- Header:
X-Replicon-Token: <replication_token>
Used by Replicon peers only.
Notes:
limitis capped to5000.- Only local-origin events are returned (where
origin_dc == node_name).
Replicon supports two ingest/write behaviors per channel:
raw_only
- Payload accepts one JSON object or an array of JSON objects.
- With SQL storage (Postgres/MySQL): payload is stored in canonical raw log (
events_raw.json_data). - With file storage: payload is stored as-is in JSONL file (no SQL schema mapping).
structured
- Payload is validated against an allowlist of columns and types.
- With SQL storage (Postgres/MySQL): payload is mapped into a structured table (created at startup if missing).
store_raw=true: structured row + canonical raw copy inevents_rawstore_raw=false: structured-only write (noevents_rawcopy)
Replicon channels are defined in channels.yml.
- Top-level
general: shared defaults (optional) channels.<name>: per-channel overrides
Default template contains:
events_raw: baselineraw_onlychannel (enabled)abuse_reports: structured example (disabled by default)
- Put shared defaults in top-level
general. - Add only channel-specific differences under each channel key.
- Keep optional examples disabled with
enabled: false.
Top-level:
channelsis required (must contain at least one channel key)generalis optional (shared defaults for all channels)
general keys (all optional):
enabled(true|false, defaulttrue)api_keys(list of bearer keys)mode(raw_only|structured, defaultraw_only)on_unknown_field(reject|drop, defaultreject, structured mode only)store_raw(true|false, defaulttrue, structured mode only)
Per-channel keys:
enabled(true|false, default fromgeneral.enabled, otherwisetrue)enablesis accepted as a typo-friendly alias forenabled
api_keys(required for enabled channel unless inherited fromgeneral.api_keys)mode(raw_only|structured, default fromgeneral.mode, otherwiseraw_only)on_unknown_field(reject|drop, default fromgeneral.on_unknown_field, otherwisereject, structured mode only)store_raw(true|false, default fromgeneral.store_raw, otherwisetrue, structured mode only)table(required only whenmode=structured)columns(required only whenmode=structured)required(optional; must be subset ofcolumns)field_types(optional map; default type istextfor unspecified columns)indexes(optional list; structured mode only)indexes[].name(optional; auto-generated if omitted)indexes[].unique(optional; defaultfalse)indexes[].columns(required; list ofcolumnorcolumn DESC)
general:
enabled: true
api_keys: [key-abc123]
on_unknown_field: reject
store_raw: true
channels:
# baseline raw channel name (you can rename/remove this)
events_raw:
mode: raw_only
# structured example (disabled by default)
abuse_reports:
enabled: false
api_keys: [key-abuse-001]
mode: structured
store_raw: false
table: abuse_reports
columns: [report_time, attacker_ip, reporter_ip, service, comment]
required: [report_time, attacker_ip, reporter_ip, service]
field_types:
report_time: timestamptz
attacker_ip: inet
reporter_ip: inet
service: text
comment: texttext(varchar,character varying,char,characteraliases)timestamptz(timestamp with time zone)timestamp(timestamp without time zone)datetime(time without time zone)timetz(time with time zone)inet,cidr,macaddr,macaddr8smallint,integer,bigintbooleanreal,double precisionnumeric(decimalalias)uuidbyteajsonb(jsonalias)- one-dimensional arrays with
[]suffix (for supported base types)
Also accepted as aliases with modifiers:
varchar(n),character varying(n),char(n),character(n)->textnumeric(p,s),decimal(p,s)->numerictimestamp(n)->timestamptimestamp(n) with time zone->timestamptztime(n)->timetime(n) with time zone->timetz
Also accepted as common SQL synonyms:
int2->smallintint/int4->integerint8->bigintbool->booleanfloat4->realfloat8/float->double precision
Dialect notes:
- PostgreSQL uses native types
- MySQL stores network/timezone-specific types as validated strings/UTC timestamps
- MySQL stores
jsonband arrays as JSON-compatible storage
Structured channel indexes are optional and created at startup.
Notes:
- Index names must be unique across all enabled channels.
- Index columns may include:
- structured payload columns defined in
columns - canonical structured columns:
origin_dc,origin_seq,node_name,channel,received_at
- structured payload columns defined in
DESCordering is accepted (column DESC), but exact behavior depends on your MySQL/MariaDB version.
When storage_backend=postgres or storage_backend=mysql, Replicon auto-creates:
- per-DC origin sequence (
replicon_origin_seq) - canonical raw table
events_raw+ indexes (only if needed by enabled channels) - structured channel tables from
channels.yml(and optionalindexes) - replication cursor table
replication_progress(per peer + per channel)
Important keys:
- Identity/runtime:
node_namechannels_filecpu(-1,*, orauto= runtime default/all cores;Nfixed)max_body_bytesmax_batch_items(max JSON objects when ingest body is an array)
- Security:
client_api_key(fallback if channels file missing)replication_token
- Listeners:
http_listenhttps_enabled,https_listen,tls_cert_file,tls_key_file
- Queue:
queue_backend=file|redis- Redis Streams settings (
queue_redis_*)
- Writer:
writer_enabledwriter_workerswriter_batch_sizewriter_batch_flush_ms
- Storage:
storage_backend=file|postgres|mysql|noopstorage_postgres_dsn/storage_mysql_dsn
- Replication:
replication_enabledreplication_peer(repeatable or comma-separated on one line)replication_poll_interval_msreplication_batch_limit
- Pull-based (no cross-DC quorum)
- Per-channel cursors tracked in
replication_progress - Idempotent apply on canonical keys (
origin_dc,origin_seq) - Works over HTTP or HTTPS depending on peer URL (
http:///https://) - Requires
storage_backend=postgresorstorage_backend=mysql(replication reads/writes via SQL)
Timeouts and recovery:
replication_http_timeout_msis per request to a peer.- If a peer is down for minutes/hours, Replicon keeps polling and will catch up when connectivity returns.
- Replication follows enabled channel set from
channels.yml(one cursor per peer + channel).
Example peers:
replication_enabled = true
replication_peer = dc2=http://10.0.1.2:8080
replication_peer = dc3=https://dc3.example.com
Replication loop (conceptually):
- read last seq for
(peer, channel)fromreplication_progress - call peer
/internal/events?channel=<name>&from_seq=last+1&limit=batch_limit - insert idempotently into local SQL storage
- update cursor to newest seq applied
Structured-only replication:
- For structured channels with
store_raw=false, Replicon replicates events by reading/writing the structured table directly (with the same(origin_dc, origin_seq)primary key).
- Small/simple node:
queue_backend=file,storage_backend=file
- Production SQL ingest:
queue_backend=redis,storage_backend=postgresormysql
- Benchmark API overhead only:
storage_backend=noop
Enable in conf/replicon.conf (or /etc/replicon/replicon.conf):
https_enabled = true
https_listen = 0.0.0.0:8443
tls_cert_file = /etc/replicon/tls/server.crt
tls_key_file = /etc/replicon/tls/server.key
With HTTPS enabled, Replicon can serve both:
- HTTP on
http_listen(if set) - HTTPS on
https_listen
To run HTTPS-only, set http_listen = (empty) and keep https_enabled=true.
Configure cpu in replicon.conf:
# Use runtime default (recommended)
cpu = -1
# Same as above:
# cpu = *
# cpu = auto
# Pin to 2 OS threads
# cpu = 2
Behavior:
cpu = -1,*, orauto: keep Go runtime defaultGOMAXPROCScpu = N(N > 0): forceGOMAXPROCS=N
Check effective value:
curl -s http://127.0.0.1:8080/healthzWriter batching controls:
writer_batch_size: max events per storage write call.writer_batch_flush_ms: max wait for filling a batch before flushing a smaller batch.
Start points for SQL backends:
writer_batch_size=200writer_batch_flush_ms=20
queue_backend = file
queue_file_path = data/queue/ingest.log
queue_file_ack_path = data/queue/ingest.ack
queue_file_sync_ms = 200
queue_buffer = 10000
writer_enabled = true
writer_workers = 1
writer_batch_size = 200
writer_batch_flush_ms = 20
storage_backend = file
storage_file_path = data/db/events_raw.jsonl
storage_file_sync_ms = 200
Notes:
- File queue is durable with periodic fsync (
queue_file_sync_ms). - Backpressure is enforced via
queue_buffer:- if backlog reaches
queue_buffer, ingest returns429.
- if backlog reaches
queue_backend = redis
# Option A: TCP
# queue_redis_network = tcp
# queue_redis_addr = 127.0.0.1:6379
# Option B: Unix socket
# queue_redis_network = unix
# queue_redis_addr = /var/run/redis/redis.sock
queue_redis_network = tcp
queue_redis_addr = 127.0.0.1:6379
queue_redis_stream = replicon:ingest
queue_redis_group = writer
queue_redis_consumer = replicon
queue_redis_dlq_stream = replicon:ingest:dlq
queue_redis_maxlen = 5000000
queue_redis_max_retries = 5
queue_redis_timeout_ms = 1500
Notes:
- Redis backend uses
XADD+XREADGROUP+XACK(consumer group). - Failed events are re-queued with incremented retry counter up to
queue_redis_max_retries, then moved toqueue_redis_dlq_stream. - If you set
queue_redis_maxlen, Redis will trim the stream approximately: if writers cannot keep up, old queue entries may be dropped (data loss). - For durability, enable AOF in Redis/Valkey (
appendonly yes,appendfsync everysec).
- Writes append-only JSONL to
storage_file_path. - Intended for simple setups and debugging, not for fast querying.
- Structured table projection is not available (structured mode still validates payloads).
storage_backend = postgres
storage_postgres_dsn = postgres://replicon:replicon-secret@127.0.0.1:5432/replicon?sslmode=disable
storage_postgres_connect_timeout_ms = 3000
storage_postgres_max_open_conns = 16
storage_postgres_max_idle_conns = 16
storage_postgres_conn_max_lifetime_min = 30
storage_backend = mysql
storage_mysql_dsn = replicon:replicon-secret@tcp(127.0.0.1:3306)/replicon?parseTime=true
storage_mysql_connect_timeout_ms = 3000
storage_mysql_max_open_conns = 16
storage_mysql_max_idle_conns = 16
storage_mysql_conn_max_lifetime_min = 30
- Discards writes (benchmark/testing).
- Reuse connections (HTTP keep-alive)
- Prefer persistent client pools (do not reconnect every event)
- Retry
429/503with exponential backoff + jitter - Treat
202as accepted-to-queue (async write)
Service + health:
systemctl status replicon --no-pager
curl -s http://127.0.0.1:8080/healthzQuick ingest test:
curl -X POST "http://127.0.0.1:8080/ingress/events_raw/add" \
-H "Authorization: Bearer key-abc123" \
-H "Content-Type: application/json" \
-d '{"source":"smoke","ts":"2026-02-19T12:00:00Z"}'SQL verification (optional):
PostgreSQL:
psql "postgres://replicon:replicon-secret@127.0.0.1:5432/replicon?sslmode=disable" -c "SELECT COUNT(*) FROM events_raw;"MySQL/MariaDB:
mysql -u replicon -p -h 127.0.0.1 -P 3306 replicon -e "SELECT COUNT(*) FROM events_raw;"Installer v1.0.2 supports Redis-compatible package fallback on newer RHEL-family systems (for example valkey where redis package is not available).
./uninstall.sh
./uninstall.sh --purgeINSTALL.mdfor install/provision optionsCHANGELOG.mdfor release history