-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
[Sync] Add infra #3316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
[Sync] Add infra #3316
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,246 @@ | ||
| # OctoBot Sync Server — Infrastructure | ||
|
|
||
| Deploys the OctoBot sync server stack across multiple VPS nodes with zero-downtime rolling updates. | ||
|
|
||
| **Stack per node:** Garage (S3 storage) + OctoBot sync server + Nginx (reverse proxy with caching) | ||
|
|
||
| ## Prerequisites | ||
|
|
||
| - Python 3.10+ with pip (`pip install -r infra/sync/requirements.txt` installs `ansible-core`) | ||
| - SSH access to target nodes (key-based, user `deploy` with sudo) | ||
| - OctoBot Docker image (`drakkarsoftware/octobot`) — the sync server runs via `OctoBot sync` CLI command (no separate image needed) | ||
|
|
||
| ## Quick start | ||
|
|
||
| ```bash | ||
| # 1. Install Ansible | ||
| pip install -r infra/sync/requirements.txt | ||
| cd infra/sync/ansible | ||
|
|
||
| # 2. Install Ansible Galaxy roles | ||
| ansible-galaxy install -r requirements.yml | ||
|
|
||
| # 3. Set up credentials for your environment | ||
| cp vault.yml.example inventories/development/group_vars/all/vault.yml | ||
| cp hosts.yml.example inventories/development/hosts.yml | ||
|
|
||
| # 4. Set up SSH key | ||
| mkdir -p inventories/development/.ssh | ||
| cp ~/.ssh/id_rsa inventories/development/.ssh/id_rsa | ||
| chmod 600 inventories/development/.ssh/id_rsa | ||
|
|
||
| # 5. Fill in real values | ||
| vim inventories/development/hosts.yml # node IPs, zones, capacity | ||
| vim inventories/development/group_vars/all/vault.yml # secrets | ||
|
|
||
| # 6. Encrypt sensitive files | ||
| ansible-vault encrypt inventories/development/group_vars/all/vault.yml | ||
| ansible-vault encrypt inventories/development/hosts.yml | ||
|
|
||
| # 7. Deploy | ||
| ansible-playbook playbooks/site.yml -i inventories/development | ||
| ``` | ||
|
|
||
| ## Environments | ||
|
|
||
| | Environment | Branch/Trigger | Image Tag | Inventory | | ||
| |---|---|---|---| | ||
| | development | push to `dev` | `latest` | `inventories/development` | | ||
| | staging | push to `master` | `stable` | `inventories/staging` | | ||
| | production | git tag | version | `inventories/production` | | ||
|
|
||
| Deploy to a specific environment: | ||
|
|
||
| ```bash | ||
| ansible-playbook playbooks/site.yml -i inventories/staging | ||
| ansible-playbook playbooks/site.yml -i inventories/production | ||
| ``` | ||
|
|
||
| Bare `ansible-playbook` without `-i` defaults to development (configured in `ansible.cfg`). | ||
|
|
||
| ## Playbooks | ||
|
|
||
| | Playbook | Purpose | When to use | | ||
| |---|---|---| | ||
| | `site.yml` | Full stack rolling deploy | First deploy, infra changes, Garage config changes | | ||
| | `deploy-octobot-sync.yml` | App-only rolling update | New app version (fast — only restarts OctoBot Sync) | | ||
| | `setup-garage.yml` | Cluster bootstrap | Once after first `site.yml` — creates bucket + API key | | ||
|
|
||
| ### First-time setup | ||
|
|
||
| ```bash | ||
| # 1. Deploy the full stack (Garage + OctoBot Sync + Nginx) | ||
| ansible-playbook playbooks/site.yml -i inventories/production | ||
|
|
||
| # 2. Bootstrap the Garage cluster (connects nodes, assigns layout, creates bucket + key) | ||
| ansible-playbook playbooks/setup-garage.yml -i inventories/production | ||
|
|
||
| # 3. Save the S3 credentials output by step 2 into vault.yml | ||
| ansible-vault edit inventories/production/group_vars/all/vault.yml | ||
|
|
||
| # 4. Save the node IDs into hosts.yml (garage_node_id per host) | ||
| ansible-vault edit inventories/production/hosts.yml | ||
|
|
||
| # 5. Re-deploy with real S3 credentials | ||
| ansible-playbook playbooks/site.yml -i inventories/production | ||
| ``` | ||
|
|
||
| ### Routine app deploy | ||
|
|
||
| ```bash | ||
| ansible-playbook playbooks/deploy-octobot-sync.yml -i inventories/production | ||
| ``` | ||
|
|
||
| ## Credentials | ||
|
|
||
| All secrets are managed via [Ansible Vault](https://docs.ansible.com/ansible/latest/vault_guide/). | ||
|
|
||
| ### SSH keys per environment | ||
|
|
||
| Each environment has its own SSH key at `inventories/<env>/.ssh/id_rsa` (gitignored): | ||
|
|
||
| ```bash | ||
| mkdir -p inventories/production/.ssh | ||
| ssh-keygen -t ed25519 -f inventories/production/.ssh/id_rsa -N "" | ||
| # Copy the public key to your nodes: | ||
| ssh-copy-id -i inventories/production/.ssh/id_rsa.pub deploy@node-ip | ||
| ``` | ||
|
|
||
| When deploying to a non-default environment, pass the key explicitly: | ||
|
|
||
| ```bash | ||
| ansible-playbook playbooks/site.yml -i inventories/production \ | ||
| --private-key inventories/production/.ssh/id_rsa | ||
| ``` | ||
|
|
||
| ### Encrypted files per environment | ||
|
|
||
| | File | Contents | | ||
| |---|---| | ||
| | `inventories/<env>/hosts.yml` | Node IPs, garage node IDs | | ||
| | `inventories/<env>/group_vars/all/vault.yml` | S3 keys, encryption secrets, Garage tokens | | ||
| | `inventories/<env>/.ssh/` | SSH private key for the `deploy` user (gitignored) | | ||
|
|
||
| ### Editing encrypted files | ||
|
|
||
| ```bash | ||
| # Edit in-place (opens $EDITOR) | ||
| ansible-vault edit inventories/production/group_vars/all/vault.yml | ||
|
|
||
| # Or decrypt to a gitignored temp file, edit, then re-encrypt | ||
| ansible-vault decrypt inventories/production/group_vars/all/vault.yml \ | ||
| --output inventories/production/group_vars/all/vault.dec.yml | ||
| vim inventories/production/group_vars/all/vault.dec.yml | ||
| ansible-vault encrypt inventories/production/group_vars/all/vault.dec.yml \ | ||
| --output inventories/production/group_vars/all/vault.yml | ||
| rm inventories/production/group_vars/all/vault.dec.yml | ||
|
|
||
| # Same for hosts | ||
| ansible-vault decrypt inventories/production/hosts.yml \ | ||
| --output inventories/production/hosts.dec.yml | ||
| vim inventories/production/hosts.dec.yml | ||
| ansible-vault encrypt inventories/production/hosts.dec.yml \ | ||
| --output inventories/production/hosts.yml | ||
| rm inventories/production/hosts.dec.yml | ||
|
|
||
| # Re-encrypt with a new password | ||
| ansible-vault rekey inventories/production/group_vars/all/vault.yml | ||
| ``` | ||
|
|
||
| ### Pre-commit hook | ||
|
|
||
| Prevents accidentally committing unencrypted `vault.yml` or `hosts.yml`: | ||
|
|
||
| ```bash | ||
| # Unix / macOS | ||
| cp infra/sync/ansible/scripts/pre-commit-vault-check.py .git/hooks/pre-commit | ||
| chmod +x .git/hooks/pre-commit | ||
|
|
||
| # Windows (Git Bash) | ||
| cp infra/sync/ansible/scripts/pre-commit-vault-check.py .git/hooks/pre-commit | ||
| ``` | ||
|
|
||
| ### Vault password | ||
|
|
||
| The vault password is read from the `ANSIBLE_VAULT_PASSWORD` environment variable (via `scripts/vault-password.sh`). Set it before running playbooks: | ||
|
|
||
| ```bash | ||
| export ANSIBLE_VAULT_PASSWORD="your-vault-password" | ||
| ``` | ||
|
|
||
| Or pass it interactively: | ||
|
|
||
| ```bash | ||
| ansible-playbook playbooks/site.yml -i inventories/production --ask-vault-pass | ||
| ``` | ||
|
|
||
| ### Generating secrets | ||
|
|
||
| ```bash | ||
| # Garage RPC secret | ||
| openssl rand -hex 32 | ||
|
|
||
| # Garage admin/metrics tokens | ||
| openssl rand -base64 32 | ||
|
|
||
| # Encryption secrets | ||
| openssl rand -base64 48 | ||
| ``` | ||
|
|
||
| ### Required vault variables | ||
|
|
||
| See `vault.yml.example` for the full list: | ||
|
|
||
| | Variable | Purpose | | ||
| |---|---| | ||
| | `vault_garage_rpc_secret` | Shared secret for Garage inter-node RPC | | ||
| | `vault_garage_admin_token` | Garage admin API authentication | | ||
| | `vault_garage_metrics_token` | Garage metrics endpoint authentication | | ||
| | `vault_s3_access_key` | S3 API access key (from `setup-garage.yml`) | | ||
| | `vault_s3_secret_key` | S3 API secret key (from `setup-garage.yml`) | | ||
| | `vault_platform_pubkey_evm` | Platform EVM address (identity) | | ||
| | `vault_encryption_secret` | User data encryption key | | ||
| | `vault_platform_encryption_secret` | Platform data encryption key | | ||
|
|
||
| ## Adding a new node | ||
|
|
||
| 1. Edit the environment's `hosts.yml` — add a new entry under `sync_nodes` | ||
| 2. Run `site.yml` with `--limit` to deploy only to the new node: | ||
| ```bash | ||
| ansible-playbook playbooks/site.yml -i inventories/production --limit new-node.example.com | ||
| ``` | ||
| 3. Run `setup-garage.yml` to assign the new node in the Garage layout (bucket/key creation is skipped — they replicate automatically) | ||
|
|
||
| ## Zero-downtime guarantee | ||
|
|
||
| - `serial: 1` — one node updated at a time | ||
| - Garage `replication_factor=3` — quorum needs 2/3, losing 1 is safe | ||
| - OctoBot sync is stateless — restart loses nothing | ||
| - Health checks must pass before moving to next node | ||
| - 10s pause between nodes for data re-sync | ||
|
|
||
| ## CI/CD | ||
|
|
||
| Automated via GitHub Actions (`.github/workflows/main.yml`): | ||
|
|
||
| 1. **`docker`** (existing) — builds the OctoBot image (`drakkarsoftware/octobot`), which includes the sync server | ||
| 2. **`sync-deploy`** — after `docker` succeeds, runs Ansible `deploy-octobot-sync.yml` against the right environment | ||
|
|
||
| The sync server uses the same OctoBot image with `OctoBot sync` as the entry point — no separate build step needed. | ||
|
|
||
| Required GitHub secrets: | ||
|
|
||
| | Secret | Purpose | | ||
| |---|---| | ||
| | `SYNC_DEPLOY_SSH_KEY` | Ed25519 private key for the `deploy` user on VPS nodes | | ||
| | `SYNC_ANSIBLE_VAULT_PASSWORD` | Vault password for decrypting secrets | | ||
| | `SYNC_NODE_IPS` | Space-separated list of node IPs (for ssh-keyscan) | | ||
|
|
||
| ## Nginx caching | ||
|
|
||
| Nginx config is auto-generated from `collections.json` (via `generate_nginx_conf.py`): | ||
|
|
||
| - **Public + pull_only** collections — cached 1h | ||
| - **Public + writable** collections — cached 30s | ||
| - **Private** collections — no cache, proxied directly | ||
| - `X-Cache-Status` header on cached routes for debugging |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,16 @@ | ||
| [defaults] | ||
| inventory = inventories/development | ||
| roles_path = roles | ||
| vault_password_file = scripts/vault-password.sh | ||
| # SSH key per environment: inventories/<env>/.ssh/id_rsa | ||
| private_key_file = inventories/development/.ssh/id_rsa | ||
| remote_tmp = /tmp/.ansible/tmp | ||
| host_key_checking = False | ||
| retry_files_enabled = False | ||
| # Ignore .example files so Ansible doesn't try to parse them as inventory | ||
| inventory_ignore_extensions = ~, .orig, .bak, .ini, .cfg, .retry, .pyc, .pyo, .example | ||
| deprecation_warnings = False | ||
|
|
||
| [privilege_escalation] | ||
| become = True | ||
| become_method = sudo |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,28 @@ | ||
| # Copy to inventories/<env>/hosts.yml and fill in real values | ||
| # Then encrypt: ansible-vault encrypt inventories/<env>/hosts.yml | ||
| # | ||
| # For a single-node dev setup, one host is enough. | ||
| # For staging/production, use 3+ nodes across different zones for redundancy. | ||
|
|
||
| all: | ||
| children: | ||
| sync_nodes: | ||
| hosts: | ||
| sync-1.example.com: | ||
| ansible_host: 203.0.113.10 | ||
| ansible_user: deploy | ||
| garage_rpc_public_addr: "203.0.113.10:3901" | ||
| garage_capacity: 100 # GB of storage to allocate | ||
| garage_zone: "dc1" | ||
| # sync-2.example.com: | ||
| # ansible_host: 203.0.113.11 | ||
| # ansible_user: deploy | ||
| # garage_rpc_public_addr: "203.0.113.11:3901" | ||
| # garage_capacity: 100 | ||
| # garage_zone: "dc2" | ||
| # sync-3.example.com: | ||
| # ansible_host: 203.0.113.12 | ||
| # ansible_user: deploy | ||
| # garage_rpc_public_addr: "203.0.113.12:3901" | ||
| # garage_capacity: 100 | ||
| # garage_zone: "dc3" | ||
36 changes: 36 additions & 0 deletions
36
infra/sync/ansible/inventories/development/group_vars/all/vars.yml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,36 @@ | ||
| # Docker images | ||
| octobot_sync_image: "drakkarsoftware/octobot" | ||
| octobot_image_tag: "latest" | ||
| garage_image: "dxflrs/garage:v2.2.0" | ||
| nginx_image: "nginx:1-alpine" | ||
|
|
||
| # Deployment | ||
| stack_deploy_dir: "/opt/octobot-sync" | ||
| s3_bucket: "octobot-sync-dev" | ||
| s3_region: "garage" | ||
| octobot_sync_port: 3000 | ||
| nginx_port: 8080 | ||
| garage_replication_factor: 1 | ||
|
|
||
| # Map vault → app vars | ||
| garage_rpc_secret: "{{ vault_garage_rpc_secret }}" | ||
| garage_admin_token: "{{ vault_garage_admin_token }}" | ||
| garage_metrics_token: "{{ vault_garage_metrics_token }}" | ||
| s3_access_key: "{{ vault_s3_access_key }}" | ||
| s3_secret_key: "{{ vault_s3_secret_key }}" | ||
| platform_pubkey_evm: "{{ vault_platform_pubkey_evm }}" | ||
| encryption_secret: "{{ vault_encryption_secret }}" | ||
| platform_encryption_secret: "{{ vault_platform_encryption_secret }}" | ||
| evm_base_rpc: "{{ vault_evm_base_rpc | default('') }}" | ||
| evm_contract_base: "{{ vault_evm_contract_base | default('') }}" | ||
|
|
||
| # Firewall (geerlingguy.firewall) | ||
| firewall_allowed_tcp_ports: | ||
| - "22" | ||
| - "8080" | ||
| # Port 3901 (Garage RPC) restricted to peer IPs only — see sync_nodes group vars | ||
|
|
||
| # Docker (geerlingguy.docker) | ||
| docker_install_compose_plugin: true | ||
| docker_users: | ||
| - deploy |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍