Skip to content

semilayer/demo-join-databases

Repository files navigation

Semilayer — multi-bridge airgap, native joins

image

A single SemiLayer runner reaches a local Postgres, MySQL, MongoDB, Redis, and ClickHouse, exposes them as five lenses with declared relations, and a React UI calls query, search, and feed on them — every operation with cross-bridge include joins planned server-side, a 4-feed selector showing how the same parent lens can be ranked by signals from four different child bridges, and semantic search over two different lenses (orders in MySQL, sessions in ClickHouse). SemiLayer never opens any of the five databases.

Join Search Feeds

What this proves

Property How it's enforced
Cross-DB joins via one Beam call relations declared on each lens; the planner walks them through each lens's bridge
SemiLayer never holds DB credentials credentialsLocation: 'runner-local' on each source — service stores config: {}
All three operations work over the join query include, search include, feed include exercised in the UI and in pnpm smoke
QUERY hydrates every relation in one call customers.query({ include: { orders, events, carts, sessions } }) — four child bridges, one Beam call. The "ultimate join" panel.
Engagement ranking spans bridges feed.rank.engagement aggregates across mysql, redis, mongodb, and clickhouse — one parent lens, four child stores
One parent, many feeds, many signals top_spenders (MySQL sum), hot_carts (Redis sum), most_engaged (MongoDB recent_count w/ window), marathon_sessions (ClickHouse sum) — same candidate pool, four different stories
Search works on non-customer lenses too sessions.summary (ClickHouse) is embedded; semantic queries like "abandoned cart at checkout" hit ClickHouse and hydrate the parent customer from Postgres
Disconnect = clean failure Stop the runner → API returns 503 no_runners_online
Document-shaped data joins natively customer_events (MongoDB) joins to customers via customerId — first NoSQL bridge in the demo
In-memory K/V joins natively cart_items (Redis) joins to customers via line-level keys (cart_items:{cid}:{lid}) — first non-disk bridge
Columnar embedded text works too sessions (ClickHouse) is the first non-customer searchable lens and the first columnar bridge — proves embeddings aren't a Postgres-only luxury

Architecture — one Beam call, five bridges

flowchart LR
    UI["React UI<br/>generated/semilayer Beam client"]

    subgraph SaaS["api.semilayer.com (control plane)"]
      API["query · search · feed<br/>plans cross-bridge joins<br/><i>holds zero DB creds</i>"]
    end

    subgraph Edge["YOUR runner (docker)"]
      RUN["reads creds from<br/>SEMILAYER_SOURCE_*"]
    end

    subgraph Data["five runner-local databases"]
      PG["Postgres :5441<br/><b>customers</b>"]
      MY["MySQL :3307<br/><b>orders</b>"]
      MO["MongoDB :27018<br/><b>customer_events</b>"]
      RD["Redis :6380<br/><b>cart_items</b>"]
      CH["ClickHouse :8124<br/><b>sessions</b>"]
    end

    UI -->|"customers.query<br/>include orders+events+carts+sessions"| API
    API -->|"plan: 1× parent + 4× child reads"| RUN
    RUN -->|"SELECT * FROM customers"| PG
    RUN -->|"WHERE customer_id IN (…)"| MY
    RUN -->|"find customerId IN (…)"| MO
    RUN -->|"MGET cart_items:{cid}:*"| RD
    RUN -->|"WHERE customer_id IN (…)"| CH

    classDef saas fill:#1e1b4b,stroke:#8B5CF6,color:#fff
    classDef edge fill:#0f172a,stroke:#3B82F6,color:#fff
    classDef pg fill:#0c4a6e,stroke:#0ea5e9,color:#fff
    classDef my fill:#4c1d95,stroke:#8B5CF6,color:#fff
    classDef mo fill:#064e3b,stroke:#10B981,color:#fff
    classDef rd fill:#78350f,stroke:#f59e0b,color:#fff
    classDef ch fill:#155e75,stroke:#06b6d4,color:#fff
    class API saas
    class RUN edge
    class PG pg
    class MY my
    class MO mo
    class RD rd
    class CH ch
Loading

The data plane stays inside the dotted edge. SemiLayer holds the lens schema + bridge package name; every byte read from a database flows through the runner. Switching to enterprise self-hosted or air-gapped deployments changes nothing about the planner — only where the runner runs.

Prerequisites

  • Docker + docker compose
  • pnpm + Node 22+
  • A SemiLayer account with at least one project + environment

One-time setup (~3 minutes)

1. Bootstrap project context

cp .semilayerrc.example .semilayerrc
# edit .semilayerrc with your org / project / env slugs

Or run semilayer init to do it interactively.

2. Mint a runner

semilayer runners create my-demo-runner

The output prints both the runner UUID and the rk_ token exactly once. Save them — you can't retrieve the token later.

3. Mint a public key

semilayer keys create --type public --label "demo-join"

Copy the pk_… value.

4. Fill in .env

cp .env.example .env

Then edit .env:

VITE_SEMILAYER_PK=pk_…                   # from step 3
SEMILAYER_RUNNER_ID=ea13ded3-…           # UUID from step 2
SEMILAYER_RUNNER_TOKEN=rk_…              # token from step 2

5. Push config + assign the runner + generate the Beam client

pnpm install
semilayer push --resume-ingest                        # creates 5 sources + 5 lenses, starts ingest
semilayer runners assign my-demo-runner pg-customers
semilayer runners assign my-demo-runner mysql-orders
semilayer runners assign my-demo-runner mongo-events
semilayer runners assign my-demo-runner redis-carts
semilayer runners assign my-demo-runner clickhouse-sessions
semilayer generate                                    # ./generated/semilayer/
image

Check that all five lenses are ready:

semilayer status
# customers         ready   50 vectors
# orders            ready   50 vectors
# customer_events   ready   0 vectors    # query-only, no embeddings
# cart_items        ready   0 vectors    # query-only, no embeddings
# sessions          ready   120 vectors  # searchable on `summary`

Run the demo

pnpm seed                        # 50 customers + 50 orders + 200 events + 80 cart lines + 120 sessions
docker compose up -d             # postgres + mysql + mongo + redis + clickhouse + runner
./scripts/check-health.sh --wait 30
pnpm dev                         # opens http://localhost:5180

Three sections, each one Beam call with cross-bridge include:

  1. query customers · include orders + events + carts + sessions — the ultimate join panel. List customers (Postgres) with orders (MySQL), page views (MongoDB), live cart lines (Redis), and browsing sessions (ClickHouse) all hydrated in one call. The planner walks every relation through the runner; SemiLayer never opens any of the five databases.

  2. search orders ↔ sessions · include customer — toggle between the two searchable lenses. Orders (MySQL) embeds the item field, so queries like "wireless audio gear" land. Sessions (ClickHouse) embeds the summary narrative, so queries like "abandoned cart at checkout" or "compared headphones" land. Each hit hydrates the parent customer from Postgres in the same round-trip.

  3. feed customers · 4-feed selector — same parent lens, four different stories:

    • top_spenders ranks by sum(orders.amountCents) from MySQL (historical money).
    • hot_carts ranks by sum(cart_items.subtotalCents) from Redis (live intent — what's in the cart right now).
    • most_engaged ranks by recent_count of customer_events from MongoDB over a 7-day window (recent activity, no money).
    • marathon_sessions ranks by sum(sessions.durationSeconds) from ClickHouse (depth of engagement — how long people stay, not how often they click).

    Each feed exposes a left-rail card with its description, signal, and bridge — click to switch. Each ranked item hydrates the relation that drove the score.

Headless smoke + e2e

pnpm smoke                       # asserts all 3 ops return populated joins
pnpm e2e                         # full headless run including airgap-disconnect proof

pnpm e2e boots both DBs, seeds, starts the runner, runs the smoke script, kills the runner and expects 503, restarts and re-runs smoke. Tears the stack down on exit.

Adding more bridges

Each subfolder under bridges/ is one database. To add a sixth, drop in bridges/<name>/init.sql + bridges/<name>/seed.ts, add a service to docker-compose.yml, register the source + lens in sl.config.ts with credentialsLocation: 'runner-local', and:

semilayer push --resume-ingest
semilayer runners assign my-demo-runner my-new-source
semilayer generate              # types update

Detailed steps in bridges/README.md.

Footgun for structured-config bridges (ClickHouse, BigQuery, Snowflake, DynamoDB, …): their bridge manifests don't accept a connection URL. Instead, the runner reads one env var per config field — SEMILAYER_SOURCE_<NAME>_<KEY> — and assembles the config object. For the ClickHouse source in this demo (clickhouse-sessions): _HOST, _PORT, _DATABASE, _USERNAME, _PASSWORD. Pure-digit values are coerced to integers. Mixing _URL and per-key vars on the same source is allowed, but URL-friendly bridges (Postgres, MySQL, MongoDB, Redis) generally use _URL for brevity. Requires runner 0.1.21+.

Prove the airgap is real

docker compose stop runner

curl -sS -X POST https://api.semilayer.com/v1/query/customers \
  -H "authorization: Bearer $VITE_SEMILAYER_PK" \
  -d '{"limit":1}'
# → 503, "no_runners_online"

docker compose start runner
# Same curl now returns rows again.
image

You can also verify directly:

docker compose exec postgres psql -U semilayer -d customers \
  -c "SELECT client_addr FROM pg_stat_activity"
# Connections only from the runner container — never from outside the laptop.

What's in the box

File Purpose
sl.config.ts Five runner-local sources, five lenses with declared relations, four discover feeds (top_spenders, hot_carts, most_engaged, marathon_sessions) each ranked by a different cross-bridge engagement signal
bridges/postgres/, bridges/mysql/, bridges/mongodb/, bridges/redis/, bridges/clickhouse/ Per-bridge schema + seeder. Add a folder per new DB.
docker-compose.yml Postgres :5441, MySQL :3307, MongoDB :27018, Redis :6380, ClickHouse :8124 (HTTP), runner connects to all five inside the network
generated/semilayer/ Output of semilayer generate — typed Beam client (beam.customers.feed.marathon_sessions(...), beam.sessions.search(...) etc)
src/ Vite + React UI calling the generated client. Three sections, three operations; QUERY hydrates all relations, SEARCH toggles between two lenses, FEED has a 4-feed selector
scripts/smoke.ts Asserts every native join + every feed + both searchable lenses return populated data
scripts/run-e2e.sh Headless boot → seed → smoke → airgap-disconnect proof → restart → smoke

Environment variables

Variable Used by
VITE_SEMILAYER_PK The Vite UI (browser) and the smoke script
VITE_SEMILAYER_API API base URL — defaults to https://api.semilayer.com
SEMILAYER_RUNNER_ID The runner UUID (not the runner name) — gateway looks up by id
SEMILAYER_RUNNER_TOKEN The rk_… token shown once when the runner was minted
SEMILAYER_GATEWAY_URL wss://runner.semilayer.com/connect (override for self-hosted)
SEMILAYER_SOURCE_PG_CUSTOMERS_URL Postgres URL the runner uses inside the docker network
SEMILAYER_SOURCE_MYSQL_ORDERS_URL MySQL URL the runner uses inside the docker network
SEMILAYER_SOURCE_MONGO_EVENTS_URL MongoDB URL the runner uses inside the docker network
SEMILAYER_SOURCE_REDIS_CARTS_URL Redis URL the runner uses inside the docker network
SEMILAYER_SOURCE_CLICKHOUSE_SESSIONS_{HOST,PORT,DATABASE,USERNAME,PASSWORD} ClickHouse — structured config (one env var per field, no _URL). See "Adding more bridges" footgun above.
PG_HOST_URL, MYSQL_HOST_URL, MONGO_EVENTS_HOST_URL, REDIS_CARTS_HOST_URL, CLICKHOUSE_HOST_URL Host-side URLs for the seed scripts (pnpm seed runs on your laptop)

The runner-local URL convention is SEMILAYER_SOURCE_<UPPER_SNAKE(name)>_URL for URL-shaped bridges, and SEMILAYER_SOURCE_<UPPER_SNAKE(name)>_<KEY> for structured-config bridges. Source pg-customersSEMILAYER_SOURCE_PG_CUSTOMERS_URL. Source clickhouse-sessionsSEMILAYER_SOURCE_CLICKHOUSE_SESSIONS_HOST, _PORT, etc. The runner sees them; the SaaS plane never does.

Troubleshooting

Symptom Likely cause
Runner immediately disconnects (1006/4401) SEMILAYER_RUNNER_ID should be the UUID from runners create, not the runner name
runners assign says Source not found You haven't run semilayer push yet, or you're in the wrong env (semilayer config show)
Joins return empty arrays + includeErrors in meta The runner image is older than 0.1.18 — bump in docker-compose.yml
most_engaged feed shows everyone with score: 0 The runner image is older than 0.1.18 — the recent_count window emitted an ISO-string operand that didn't match BSON Date fields in the older mongodb bridge
clickhouse-sessions ingest errors with requires a "host" config value The runner image is older than 0.1.21 — earlier versions only resolved _URL. Bump to 0.1.21+ for per-key env vars.
ClickHouse healthcheck stuck on starting Use 127.0.0.1 (not localhost) inside the alpine image — localhost resolves to ::1 and ClickHouse only listens on IPv4
pnpm smoke fails with 503 The runner container isn't online, or isn't assigned to one of the sources. Check semilayer runners list and semilayer runners assignments <runner>.
Vite UI shows "loading…" forever Check the browser console — usually a missing/expired VITE_SEMILAYER_PK

See docs.semilayer.dev/runners/troubleshooting for more.

About

An example of airgapp-ed runners joining [everything]

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors