Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 36 additions & 0 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -625,6 +625,42 @@ jobs:
asset_name: OctoBot_macos_arm64
asset_content_type: application/x-binary

sync-deploy:
name: Deploy Sync Server
needs: [docker]
if: false # Temporarily skipped
runs-on: ubuntu-latest

env:
DEPLOY_ENV: ${{ startsWith(github.ref, 'refs/tags/') && 'production' || github.ref == 'refs/heads/master' && 'staging' || 'development' }}

steps:
- uses: actions/checkout@v6

- name: Install Ansible
run: pip install ansible

- name: Install Galaxy requirements
working-directory: infra/sync/ansible
run: ansible-galaxy install -r requirements.yml

- name: Set up SSH key
run: |
mkdir -p ~/.ssh
echo "${{ secrets.SYNC_DEPLOY_SSH_KEY }}" > ~/.ssh/id_ed25519
chmod 600 ~/.ssh/id_ed25519
for ip in ${{ secrets.SYNC_NODE_IPS }}; do
ssh-keyscan -H "$ip" >> ~/.ssh/known_hosts 2>/dev/null
done

- name: Deploy to ${{ env.DEPLOY_ENV }}
working-directory: infra/sync/ansible
env:
ANSIBLE_VAULT_PASSWORD: ${{ secrets.SYNC_ANSIBLE_VAULT_PASSWORD }}
run: |
ansible-playbook playbooks/deploy-octobot-sync.yml \
-i inventories/${{ env.DEPLOY_ENV }}

notify:
if: ${{ failure() }}
needs:
Expand Down
7 changes: 7 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,13 @@ letsencrypt/
# dev env
.env

# Ansible decrypted temp files, SSH keys, and production collections config
infra/**/*.dec.yml
infra/**/.ssh/

# Ansible Galaxy installed roles (installed via requirements.yml)
infra/**/roles/geerlingguy.*/

# Pants build system
/.pants.d/
/dist/
Expand Down
246 changes: 246 additions & 0 deletions infra/sync/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,246 @@
# OctoBot Sync Server — Infrastructure

Deploys the OctoBot sync server stack across multiple VPS nodes with zero-downtime rolling updates.

**Stack per node:** Garage (S3 storage) + OctoBot sync server + Nginx (reverse proxy with caching)

## Prerequisites

- Python 3.10+ with pip (`pip install -r infra/sync/requirements.txt` installs `ansible-core`)
- SSH access to target nodes (key-based, user `deploy` with sudo)
- OctoBot Docker image (`drakkarsoftware/octobot`) — the sync server runs via `OctoBot sync` CLI command (no separate image needed)

## Quick start

```bash
# 1. Install Ansible
pip install -r infra/sync/requirements.txt
cd infra/sync/ansible

# 2. Install Ansible Galaxy roles
ansible-galaxy install -r requirements.yml

# 3. Set up credentials for your environment
cp vault.yml.example inventories/development/group_vars/all/vault.yml
cp hosts.yml.example inventories/development/hosts.yml

# 4. Set up SSH key
mkdir -p inventories/development/.ssh
cp ~/.ssh/id_rsa inventories/development/.ssh/id_rsa
chmod 600 inventories/development/.ssh/id_rsa

# 5. Fill in real values
vim inventories/development/hosts.yml # node IPs, zones, capacity
vim inventories/development/group_vars/all/vault.yml # secrets

# 6. Encrypt sensitive files
ansible-vault encrypt inventories/development/group_vars/all/vault.yml
ansible-vault encrypt inventories/development/hosts.yml

# 7. Deploy
ansible-playbook playbooks/site.yml -i inventories/development
```

## Environments

| Environment | Branch/Trigger | Image Tag | Inventory |
|---|---|---|---|
| development | push to `dev` | `latest` | `inventories/development` |
| staging | push to `master` | `stable` | `inventories/staging` |
| production | git tag | version | `inventories/production` |

Deploy to a specific environment:

```bash
ansible-playbook playbooks/site.yml -i inventories/staging
ansible-playbook playbooks/site.yml -i inventories/production
```

Bare `ansible-playbook` without `-i` defaults to development (configured in `ansible.cfg`).

## Playbooks

| Playbook | Purpose | When to use |
|---|---|---|
| `site.yml` | Full stack rolling deploy | First deploy, infra changes, Garage config changes |
| `deploy-octobot-sync.yml` | App-only rolling update | New app version (fast — only restarts OctoBot Sync) |
| `setup-garage.yml` | Cluster bootstrap | Once after first `site.yml` — creates bucket + API key |

### First-time setup

```bash
# 1. Deploy the full stack (Garage + OctoBot Sync + Nginx)
ansible-playbook playbooks/site.yml -i inventories/production

# 2. Bootstrap the Garage cluster (connects nodes, assigns layout, creates bucket + key)
ansible-playbook playbooks/setup-garage.yml -i inventories/production

# 3. Save the S3 credentials output by step 2 into vault.yml
ansible-vault edit inventories/production/group_vars/all/vault.yml

# 4. Save the node IDs into hosts.yml (garage_node_id per host)
ansible-vault edit inventories/production/hosts.yml

# 5. Re-deploy with real S3 credentials
ansible-playbook playbooks/site.yml -i inventories/production
```

### Routine app deploy

```bash
ansible-playbook playbooks/deploy-octobot-sync.yml -i inventories/production
```

## Credentials

All secrets are managed via [Ansible Vault](https://docs.ansible.com/ansible/latest/vault_guide/).

### SSH keys per environment

Each environment has its own SSH key at `inventories/<env>/.ssh/id_rsa` (gitignored):

```bash
mkdir -p inventories/production/.ssh
ssh-keygen -t ed25519 -f inventories/production/.ssh/id_rsa -N ""
# Copy the public key to your nodes:
ssh-copy-id -i inventories/production/.ssh/id_rsa.pub deploy@node-ip
```

When deploying to a non-default environment, pass the key explicitly:

```bash
ansible-playbook playbooks/site.yml -i inventories/production \
--private-key inventories/production/.ssh/id_rsa
```

### Encrypted files per environment

| File | Contents |
|---|---|
| `inventories/<env>/hosts.yml` | Node IPs, garage node IDs |
| `inventories/<env>/group_vars/all/vault.yml` | S3 keys, encryption secrets, Garage tokens |
| `inventories/<env>/.ssh/` | SSH private key for the `deploy` user (gitignored) |

### Editing encrypted files

```bash
# Edit in-place (opens $EDITOR)
ansible-vault edit inventories/production/group_vars/all/vault.yml

# Or decrypt to a gitignored temp file, edit, then re-encrypt
ansible-vault decrypt inventories/production/group_vars/all/vault.yml \
--output inventories/production/group_vars/all/vault.dec.yml
vim inventories/production/group_vars/all/vault.dec.yml
ansible-vault encrypt inventories/production/group_vars/all/vault.dec.yml \
--output inventories/production/group_vars/all/vault.yml
rm inventories/production/group_vars/all/vault.dec.yml

# Same for hosts
ansible-vault decrypt inventories/production/hosts.yml \
--output inventories/production/hosts.dec.yml
vim inventories/production/hosts.dec.yml
ansible-vault encrypt inventories/production/hosts.dec.yml \
--output inventories/production/hosts.yml
rm inventories/production/hosts.dec.yml

# Re-encrypt with a new password
ansible-vault rekey inventories/production/group_vars/all/vault.yml
```

### Pre-commit hook

Prevents accidentally committing unencrypted `vault.yml` or `hosts.yml`:

```bash
# Unix / macOS
cp infra/sync/ansible/scripts/pre-commit-vault-check.py .git/hooks/pre-commit
chmod +x .git/hooks/pre-commit

# Windows (Git Bash)
cp infra/sync/ansible/scripts/pre-commit-vault-check.py .git/hooks/pre-commit
```

### Vault password

The vault password is read from the `ANSIBLE_VAULT_PASSWORD` environment variable (via `scripts/vault-password.sh`). Set it before running playbooks:

```bash
export ANSIBLE_VAULT_PASSWORD="your-vault-password"
```

Or pass it interactively:

```bash
ansible-playbook playbooks/site.yml -i inventories/production --ask-vault-pass
```

### Generating secrets

```bash
# Garage RPC secret
openssl rand -hex 32

# Garage admin/metrics tokens
openssl rand -base64 32

# Encryption secrets
openssl rand -base64 48
```

### Required vault variables

See `vault.yml.example` for the full list:

| Variable | Purpose |
|---|---|
| `vault_garage_rpc_secret` | Shared secret for Garage inter-node RPC |
| `vault_garage_admin_token` | Garage admin API authentication |
| `vault_garage_metrics_token` | Garage metrics endpoint authentication |
| `vault_s3_access_key` | S3 API access key (from `setup-garage.yml`) |
| `vault_s3_secret_key` | S3 API secret key (from `setup-garage.yml`) |
| `vault_platform_pubkey_evm` | Platform EVM address (identity) |
| `vault_encryption_secret` | User data encryption key |
| `vault_platform_encryption_secret` | Platform data encryption key |

## Adding a new node

1. Edit the environment's `hosts.yml` — add a new entry under `sync_nodes`
2. Run `site.yml` with `--limit` to deploy only to the new node:
```bash
ansible-playbook playbooks/site.yml -i inventories/production --limit new-node.example.com
```
3. Run `setup-garage.yml` to assign the new node in the Garage layout (bucket/key creation is skipped — they replicate automatically)

## Zero-downtime guarantee

- `serial: 1` — one node updated at a time
- Garage `replication_factor=3` — quorum needs 2/3, losing 1 is safe
- OctoBot sync is stateless — restart loses nothing
- Health checks must pass before moving to next node
- 10s pause between nodes for data re-sync

## CI/CD

Automated via GitHub Actions (`.github/workflows/main.yml`):

1. **`docker`** (existing) — builds the OctoBot image (`drakkarsoftware/octobot`), which includes the sync server
2. **`sync-deploy`** — after `docker` succeeds, runs Ansible `deploy-octobot-sync.yml` against the right environment

The sync server uses the same OctoBot image with `OctoBot sync` as the entry point — no separate build step needed.

Required GitHub secrets:

| Secret | Purpose |
|---|---|
| `SYNC_DEPLOY_SSH_KEY` | Ed25519 private key for the `deploy` user on VPS nodes |
| `SYNC_ANSIBLE_VAULT_PASSWORD` | Vault password for decrypting secrets |
| `SYNC_NODE_IPS` | Space-separated list of node IPs (for ssh-keyscan) |

## Nginx caching

Nginx config is auto-generated from `collections.json` (via `generate_nginx_conf.py`):

- **Public + pull_only** collections — cached 1h
- **Public + writable** collections — cached 30s
- **Private** collections — no cache, proxied directly
- `X-Cache-Status` header on cached routes for debugging
16 changes: 16 additions & 0 deletions infra/sync/ansible/ansible.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
[defaults]
inventory = inventories/development
roles_path = roles
vault_password_file = scripts/vault-password.sh
# SSH key per environment: inventories/<env>/.ssh/id_rsa
private_key_file = inventories/development/.ssh/id_rsa
remote_tmp = /tmp/.ansible/tmp
host_key_checking = False
retry_files_enabled = False
# Ignore .example files so Ansible doesn't try to parse them as inventory
inventory_ignore_extensions = ~, .orig, .bak, .ini, .cfg, .retry, .pyc, .pyo, .example
deprecation_warnings = False

[privilege_escalation]
become = True
become_method = sudo
28 changes: 28 additions & 0 deletions infra/sync/ansible/hosts.yml.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Copy to inventories/<env>/hosts.yml and fill in real values
# Then encrypt: ansible-vault encrypt inventories/<env>/hosts.yml
#
# For a single-node dev setup, one host is enough.
# For staging/production, use 3+ nodes across different zones for redundancy.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


all:
children:
sync_nodes:
hosts:
sync-1.example.com:
ansible_host: 203.0.113.10
ansible_user: deploy
garage_rpc_public_addr: "203.0.113.10:3901"
garage_capacity: 100 # GB of storage to allocate
garage_zone: "dc1"
# sync-2.example.com:
# ansible_host: 203.0.113.11
# ansible_user: deploy
# garage_rpc_public_addr: "203.0.113.11:3901"
# garage_capacity: 100
# garage_zone: "dc2"
# sync-3.example.com:
# ansible_host: 203.0.113.12
# ansible_user: deploy
# garage_rpc_public_addr: "203.0.113.12:3901"
# garage_capacity: 100
# garage_zone: "dc3"
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Docker images
octobot_sync_image: "drakkarsoftware/octobot"
octobot_image_tag: "latest"
garage_image: "dxflrs/garage:v2.2.0"
nginx_image: "nginx:1-alpine"

# Deployment
stack_deploy_dir: "/opt/octobot-sync"
s3_bucket: "octobot-sync-dev"
s3_region: "garage"
octobot_sync_port: 3000
nginx_port: 8080
garage_replication_factor: 1

# Map vault → app vars
garage_rpc_secret: "{{ vault_garage_rpc_secret }}"
garage_admin_token: "{{ vault_garage_admin_token }}"
garage_metrics_token: "{{ vault_garage_metrics_token }}"
s3_access_key: "{{ vault_s3_access_key }}"
s3_secret_key: "{{ vault_s3_secret_key }}"
platform_pubkey_evm: "{{ vault_platform_pubkey_evm }}"
encryption_secret: "{{ vault_encryption_secret }}"
platform_encryption_secret: "{{ vault_platform_encryption_secret }}"
evm_base_rpc: "{{ vault_evm_base_rpc | default('') }}"
evm_contract_base: "{{ vault_evm_contract_base | default('') }}"

# Firewall (geerlingguy.firewall)
firewall_allowed_tcp_ports:
- "22"
- "8080"
# Port 3901 (Garage RPC) restricted to peer IPs only — see sync_nodes group vars

# Docker (geerlingguy.docker)
docker_install_compose_plugin: true
docker_users:
- deploy
Loading