OLTP Hospital Simulator β Continuous data streaming for CDC testing with Debezium or another CDC ingestion engine.
Alimentador-BD is a production-ready Python simulator that generates realistic hospital data in PostgreSQL with continuous INSERT/UPDATE operations. Perfect for testing CDC (Change Data Capture) pipelines with Debezium, validating data consistency, and developing ETL/ELT systems.
β¨ Continuous Data Streaming
- 70% INSERT operations (new records)
- 30% UPDATE operations (realistic modifications)
- ~1 operation per 2 seconds (configurable)
π₯ Realistic Hospital Schema
- 7 OLTP tables (patients, doctors, appointments, exams, admissions, etc.)
- ~13k initial seed records
- Proper foreign keys and constraints
- CDC-compatible triggers and indexes
π Production-Ready Code
- Type hints, docstrings, PEP 8 compliance
- Error handling with exponential backoff
- Batch operations with transaction support
- Comprehensive logging with rotation
π³ Multiple Deployment Options
- Local development (Docker Compose)
- AWS EC2 / RDS
- Kubernetes
- Standalone Python
π Comprehensive Documentation
- Quick start (5 minutes)
- Complete user guide (Portuguese)
- Technical architecture
- Production deployment guide
- Developer contribution guide
- Python 3.11+
- PostgreSQL 14+
- Docker & Docker Compose (optional)
# Clone repository
git clone https://github.com:Hycky/oltp-simulator.git
cd alimentador-bd
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Configure (copy example and edit credentials)
cp config/.env.example config/.env
# Edit config/.env with your PostgreSQL connectionmake init # Create schema and indexes
make seed # Populate ~13k initial recordsmake stream # Continuous INSERT/UPDATE operations# In another terminal
make counts # Show record counts
tail -f logs/app.log # View live logs# Start PostgreSQL + PgAdmin
docker-compose up -d postgres
# From host machine, initialize
make init
make seed
# Stream
make stream| Command | Description |
|---|---|
make install |
Create venv and install dependencies |
make init |
Create schema, indexes, lookup data |
make seed |
Populate ~13k initial records |
make stream |
Start continuous streaming |
make reset |
Drop + recreate + seed all |
make counts |
Display table record counts |
make fmt |
Format code with Black |
make lint |
Check code with Ruff |
make clean |
Remove cache and temp files |
pacientes (2,000)
βββ id, nome, nascimento, cpf, telefone, endereco
βββ created_at, updated_at (automatic)
βββ PRIMARY KEY, UNIQUE(cpf), INDEX(cpf)
medicos (200)
βββ id, nome, crm, especialidade, telefone
βββ PRIMARY KEY, UNIQUE(crm), INDEX(crm)
convenios (12)
βββ id, nome, cnpj, tipo, cobertura
βββ PRIMARY KEY, UNIQUE(cnpj)
pacientes_convenios (2,500+)
βββ id, paciente_id β pacientes
βββ convenio_id β convenios
βββ UNIQUE(paciente_id, convenio_id)
consultas (4,000+)
βββ id, paciente_id β pacientes
βββ medico_id β medicos
βββ data, motivo, status (agendada|realizada|cancelada|faltou)
βββ INDEX(paciente_id, medico_id, data)
exames (3,500+)
βββ id, paciente_id β pacientes
βββ tipo_exame, data, resultado
βββ INDEX(paciente_id, data)
internacoes (1,200+)
βββ id, paciente_id β pacientes
βββ data_entrada, data_saida, motivo, quarto
βββ CHECK(data_saida >= data_entrada)- β BIGSERIAL primary keys on all tables
- β Unique constraints on natural keys (CPF, CRM, CNPJ)
- β Cascading foreign keys (ON UPDATE CASCADE, ON DELETE RESTRICT)
- β
Automatic timestamps with triggers (
created_at,updated_at) - β 9 strategic indexes for performance
- β CDC-compatible schema for Debezium
# PostgreSQL Connection
PG_HOST=localhost
PG_PORT=5432
PG_USER=postgres
PG_PASSWORD=postgres
PG_DATABASE=teste_pacientes
# Streaming Configuration
STREAM_INTERVAL_SECONDS=2 # Delay between operations (seconds)
BATCH_SIZE=50 # Records per batch
MAX_JITTER_MS=400 # Random delay variation (ms)
# Seeding Configuration
SEED_PACIENTES=2000
SEED_MEDICOS=200
SEED_CONVENIOS=12
SEED_CONSULTAS=4000
SEED_EXAMES=3500
SEED_INTERNACOES=1200
SEED_PACIENTES_CONVENIOS=2500
# Logging
LOG_LEVEL=INFO # DEBUG, INFO, WARNING, ERROR[db]
search_path = "public"
connect_timeout = 10
[stream]
interval_seconds = 2
batch_size = 50
max_jitter_ms = 400
fail_fast_on_critical = true
[logging]
level = "INFO"
rotate_when = "midnight"
backup_count = 7The simulator executes 8 realistic operations:
- insert_paciente - Register new patient
- insert_consulta - Schedule new appointment
- insert_exame - Request new lab test
- insert_internacao - Admit patient to hospital
- update_paciente - Modify contact info
- update_consulta - Change appointment status
- update_exame - Record lab results
- update_internacao - Discharge patient
Each operation:
- β Validates foreign keys before execution
- β Commits in batches for performance
- β Logs operation type and counts
- β Handles errors gracefully (continues on non-critical failures)
- β Reconnects automatically with exponential backoff
-- Check for orphaned records (should return 0)
SELECT COUNT(*) FROM consultas
WHERE paciente_id NOT IN (SELECT id FROM pacientes);
-- Verify unique CPFs
SELECT cpf, COUNT(*) FROM pacientes
GROUP BY cpf HAVING COUNT(*) > 1;
-- Check timestamp coherence
SELECT COUNT(*) FROM consultas
WHERE created_at > now();# Terminal 1 - Stream for 5 minutes
timeout 300 make stream
# Terminal 2 - Check growth every 10 seconds
while true; do make counts; sleep 10; done# Stress test: high throughput
STREAM_INTERVAL_SECONDS=0 timeout 60 make stream
# Measure: ~200 ops/minuteAlimentador-BD generates CDC-compatible changes for Debezium capture.
{
"name": "postgres-connector",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.hostname": "10.42.88.67",
"database.port": 5441,
"database.user": "app",
"database.password": "app123",
"database.dbname": "teste_pacientes",
"database.server.name": "alimentador-bd",
"plugin.name": "pgoutput",
"publication.name": "alimentador_pub",
"table.include.list": "public.*",
"publication.autocreate.mode": "filtered",
"slot.name": "alimentador_slot"
}
}- β
All INSERT operations β
{before: null, after: {patient data}} - β
All UPDATE operations β
{before: {old data}, after: {new data}} - β
updated_atfield automatically populated by triggers - β Natural keys (CPF, CRM, CNPJ) for deduplication
{
"schema": {...},
"payload": {
"before": null,
"after": {
"id": 2045,
"nome": "JoΓ£o Silva",
"cpf": "123.456.789-00",
"created_at": 1705255200000,
"updated_at": 1705255200000
},
"source": {
"version": "2.4.0.Final",
"connector": "postgresql",
"name": "alimentador-bd",
"ts_ms": 1705255200123,
"txId": 12345,
"lsn": 12345678,
"xmin": null
},
"op": "c",
"ts_ms": 1705255200123,
"transaction": null
}
}alimentador_bd/
βββ config/
β βββ .env.example # Template for credentials
β βββ settings.toml # Configuration
βββ scripts/ # Python modules
β βββ cli.py # CLI interface (Typer)
β βββ stream.py # Streaming engine
β βββ seed.py # Initial data population
β βββ db_init.py # Database connection
β βββ data_gen.py # Data generation (Faker)
β βββ validators.py # FK validation cache
β βββ reset.py # Reset orchestration
βββ sql/ # SQL scripts
β βββ 01_schema.sql # Table definitions
β βββ 02_indexes.sql # Indexes
β βββ 03_seed-lookups.sql # Initial data
β βββ 99_drop_all.sql # Cleanup
βββ logs/ # Runtime logs
βββ Makefile # Build automation
βββ Dockerfile # Container image
βββ docker-compose.yml # Local stack
βββ pyproject.toml # Python config
βββ requirements.txt # Dependencies
βββ README.md # This file
βββ GUIDE.md # User guide (Portuguese)
βββ ARCHITECTURE.md # Technical design
βββ DEPLOYMENT.md # Production setup
βββ CONTRIBUTING.md # Contribution guide
βββ CHANGELOG.md # Version history
βββ LICENSE # MIT license
# Start PostgreSQL (Docker)
docker-compose up -d postgres
# Initialize from host
make init
make seed
# Stream
make streamdocker build -t alimentador-bd:1.0.0 .
docker run --rm \
-e PG_HOST=localhost \
-e PG_USER=app \
-e PG_PASSWORD=app123 \
-e PG_DATABASE=teste_pacientes \
-v ./logs:/app/logs \
alimentador-bd:1.0.0 \
python -m scripts.cli streamSee DEPLOYMENT.md for detailed guides:
- β AWS EC2 + RDS setup
- β Kubernetes deployment
- β Monitoring and scaling
- β Backup and recovery
- β Security best practices
| Document | Purpose |
|---|---|
| README.md | Overview, quick start, schema (this file) |
| GUIDE.md | Complete user manual in Portuguese π§π· |
| ARCHITECTURE.md | Technical design and data flow |
| DEPLOYMENT.md | Production setup (AWS, K8s, Docker) |
| CONTRIBUTING.md | How to contribute, dev setup |
| CHANGELOG.md | Version history and roadmap |
# Check PostgreSQL is running
psql -U postgres -h localhost -c "SELECT 1"
# Verify credentials in config/.env
cat config/.env | grep PG_This is expected and handled gracefully. The simulator skips duplicates and logs them:
grep "IntegrityError" logs/app.log# Verify database is initialized
make init
make seed
make counts
# Check logs
tail -20 logs/app.log# Check disk space and PostgreSQL performance
df -h
psql -U app -d teste_pacientes -c "SELECT * FROM pg_stat_user_tables"
# Reduce batch size if needed
BATCH_SIZE=25 make streamWe welcome contributions! See CONTRIBUTING.md for:
- Development setup
- Code style guidelines
- Testing procedures
- Pull request workflow
Quick start for contributors:
git clone https://github.com/yourusername/alimentador-bd.git
cd alimentador-bd
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
make init && make seed
make stream # Test it works| Metric | Value |
|---|---|
| Seed time | <2 seconds |
| Initial records | ~13,000 |
| Stream rate | 1 op / 2s |
| Batch size | 50 records |
| Insert ops | 70% |
| Update ops | 30% |
| Throughput | 200+ ops/min |
| Memory usage | ~256 MB |
| CPU usage | Low (<1 core) |
MIT License - See LICENSE for details
Copyright Β© 2025 Henrique Ferreira
- Documentation: See GUIDE.md (Portuguese) or ARCHITECTURE.md (English)
- Issues: Report bugs using GitHub issue templates
- Discussions: Ask questions in GitHub Discussions
- Email: [your-email@example.com]
- Read GUIDE.md (Portuguese user guide) or this README
- Setup with
make install && make init && make seed - Run with
make stream - Monitor with
make countsandtail -f logs/app.log - Deploy using DEPLOYMENT.md for production
Version: 1.0.0 | Status: Production Ready β | License: MIT