Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
204 changes: 204 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,204 @@
# AlertFlow - InfoDengue ETL Pipeline

Airflow-based ETL system for the InfoDengue epidemiological surveillance project.

## Prerequisites

- Docker & Docker Compose
- GNU Make
- Python 3.14 (for local development)
- Conda/Mamba

## Quick Start

### 1. Environment Setup

Create `.env` file in the project root and populate the variables:

```bash
envsubst < .env.tpl > .env
```

Generate Fernet key:
```bash
python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"
```

### 2. Build Containers

```bash
docker compose build
```

The build process:
- Uses multi-stage Dockerfile
- Installs system dependencies (git, postgres-client, build tools)
- Installs Python dependencies via Poetry
- Configures Airflow with Celery executor

### 3. Start Services

```bash
docker compose up -d
```

Services started:
- **postgres** - Metadata database (port 5432)
- **redis** - Message broker (port 6379)
- **airflow-webserver** - API server (port 8080)
- **airflow-scheduler** - DAG scheduler
- **airflow-worker** - Celery worker
- **airflow-dag-processor** - DAG file processor
- **airflow-triggerer** - Deferrable task triggerer

### 4. Initialize Airflow

On first run, the `airflow-init` service:
1. Creates admin user
2. Runs database migrations
3. Sets up connections and variables

Check initialization:
```bash
docker compose logs airflow-init
```

### 5. Access Airflow UI

Open: http://localhost:${AIRFLOW_PORT}/alertflow

Default credentials (set in `.env`):
- Username: (set via `_AIRFLOW_WWW_USER_USERNAME`)
- Password: (set via `_AIRFLOW_WWW_USER_PASSWORD`)

## Development Workflow

### Project Structure

```
AlertFlow/
├── alertflow/
│ ├── dags/ # Airflow DAG definitions
│ ├── plugins/ # Custom plugins
│ └── logs/ # Task logs
├── docker/
│ ├── compose.yaml # Main compose file
│ ├── compose-dev.yaml # Development overrides
│ └── Dockerfile # Container image definition
├── pyproject.toml # Python dependencies (Poetry)
├── Makefile # Build/test automation
└── .env # Environment variables (not committed)
```

### Adding DAGs

Place DAG files in `alertflow/dags/`:
- Changes detected automatically (30s interval)
- No container restart needed
- View parsed DAGs at http://localhost:${AIRFLOW_PORT}/alertflow

### Running CLI Commands

```bash
./airflow.sh dags list
```

### Viewing Logs

```bash
# All services
docker compose logs -f

# Specific service
docker compose logs -f airflow-webserver
```

### Restarting Services

```bash
# All services
docker compose restart
# or:
docker compose down && docker compose up -d
```

## Configuration

### Airflow Config

Override any config via environment variables in `docker-compose.yaml`:

```yaml
environment:
AIRFLOW__CORE__EXECUTOR: CeleryExecutor
AIRFLOW__CORE__LOAD_EXAMPLES: 'false'
```

## Common Tasks

### Database Migrations

```bash
docker compose run --rm airflow-cli db migrate
```

### Create Admin User

```bash
docker compose run --rm airflow-cli users create \
--username admin \
--firstname Admin \
--lastname User \
--role Admin \
--email admin@example.com \
--password admin
```

## Troubleshooting

### Port Already in Use

```bash
# Change port in .env
echo "AIRFLOW_PORT=8081" >> .env
docker compose -f docker-compose.yaml down
docker compose -f docker-compose.yaml up -d
```

### Database Connection Issues

```bash
# Check postgres health
docker compose exec postgres pg_isready -U airflow

# Reset database (WARNING: destroys data)
docker compose down -v
docker volume rm alertflow_postgres-db-volume
docker compose up -d
```

### View Container Resource Usage

```bash
docker stats
```

## Make Targets

| Target | Description |
|--------|-------------|
| `make build` | Build all container images |
| `make up` | Start all services in background |
| `make down` | Stop all services |
| `make restart` | Restart all services |
| `make logs` | Tail logs from all services |

## InfoDengue Specific

This setup is tailored for the InfoDengue ETL pipeline:
- Processes epidemiological data for dengue, zika, chikungunya
- Integrates with climate data APIs (COPERNICUS)
- Uses geospatial analysis for disease mapping
- Outputs feed the InfoDengue dashboard

For more details on the ETL pipeline, see `alertflow/dags/`.
8 changes: 4 additions & 4 deletions docker-compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -48,12 +48,12 @@ x-airflow-common:
AIRFLOW__CORE__AUTH_MANAGER: airflow.providers.fab.auth_manager.fab_auth_manager.FabAuthManager
AIRFLOW__CORE__LOAD_EXAMPLES: false
AIRFLOW__CORE__DEFAULT_TIMEZONE: America/Sao_Paulo
AIRFLOW__CORE__INTERNAL_API_URL: http://localhost:${AIRFLOW_PORT}
AIRFLOW__CORE__INTERNAL_API_URL: http://alertflow_webserver:${AIRFLOW_PORT}/alertflow
AIRFLOW__CORE__INTERNAL_API_SECRET_KEY: ${AIRFLOW__CORE__INTERNAL_API_SECRET_KEY}
AIRFLOW__CORE__EXECUTION_API_SERVER_URL: 'http://airflow-apiserver:${AIRFLOW_PORT}/execution/'
AIRFLOW__CORE__EXECUTION_API_SERVER_URL: http://alertflow_webserver:${AIRFLOW_PORT}/alertflow/execution
AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: ${AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION:-true}
AIRFLOW__CLI__ENDPOINT_URL: http://localhost:${AIRFLOW_PORT}/alertflow
AIRFLOW__WEBSERVER__BASE_URL: http://localhost:${AIRFLOW_PORT}/alertflow
AIRFLOW__CLI__ENDPOINT_URL: http://alertflow_webserver:${AIRFLOW_PORT}/alertflow
AIRFLOW__WEBSERVER__BASE_URL: ${AIRFLOW__WEBSERVER__BASE_URL}
AIRFLOW__WEBSERVER__SECRET_KEY: ${AIRFLOW__WEBSERVER__SECRET_KEY}
AIRFLOW__WEBSERVER__WEB_SERVER_HOST: 0.0.0.0
AIRFLOW__WEBSERVER__WEB_SERVER_PORT: ${AIRFLOW_PORT}
Expand Down
1 change: 1 addition & 0 deletions env.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION=${AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_C
AIRFLOW__SCHEDULER__ENABLE_HEALTH_CHECK=${AIRFLOW__SCHEDULER__ENABLE_HEALTH_CHECK:-false}
AIRFLOW__API_AUTH__JWT_SECRET=${AIRFLOW__API_AUTH__JWT_SECRET}
AIRFLOW__API_AUTH__JWT_ISSUER=${AIRFLOW__API_AUTH__JWT_ISSUER:-airflow}
AIRFLOW__WEBSERVER__BASE_URL=${AIRFLOW__WEBSERVER__BASE_URL:-http://localhost:8081/alertflow}
AIRFLOW__WEBSERVER__SECRET_KEY=${AIRFLOW__WEBSERVER__SECRET_KEY}
AIRFLOW__LOGGING__LOGGING_LEVEL=${AIRFLOW__LOGGING__LOGGING_LEVEL:-INFO}
AIRFLOW__SENTRY__SENTRY_ON=${AIRFLOW__SENTRY__SENTRY_ON:-true}
Expand Down
Loading