Real-time fault intelligence for HVAC actuators. Flox ingests live telemetry from Belimo actuators — torque, motor position, temperature, signal quality — runs continuous fault classification, and surfaces actionable insights through a facility dashboard and a conversational AI operations agent.
Belimo actuators collect rich internal signals during operation — torque demand, motor position feedback, internal temperature, and control signal quality — but this data is rarely used beyond basic device status.
Flox closes that gap:
- Telemetry ingest — actuator signals are ingested in real time and persisted with full history per variable per device.
- Fault classification — a Celery worker runs a continuous diagnosis cycle. Heuristic rules detect known failure modes (stiction, high-torque anomaly, temperature drift, signal loss). An optional ML inference server extends this with trained classifiers.
- Fault propagation — device-level faults roll up through the node hierarchy (actuator → AHU → plant), so system-level health reflects the worst downstream condition.
- Facility dashboard — a live map view shows zone health, device positions, and active faults across the building. An issues panel lists all open faults ranked by severity with diagnosis context and recommended actions.
- Operations agent — a Claude-powered agent answers natural-language questions about faults, runs diagnosis on demand, retrieves fault history, and can execute corrective actions with explicit operator approval before any write is committed.
| Kind | Severity |
|---|---|
stiction_suspected |
Critical |
high_torque_anomaly |
Warning |
temperature_drift |
Warning |
signal_loss |
Critical |
weak_signal |
Warning |
ML-based classifiers (when enabled) extend coverage beyond rule thresholds.
cp .env.example .env # set NAME, ANTHROPIC_API_KEY, and database credentials
make init # create venv, sync Python deps, link env files
make up # start postgres, redis, fastapi backend, classifier worker
make dev # start Vite frontend at http://localhost:3000The frontend connects to the FastAPI backend at /api/status. If the backend is not running the dashboard will show a connection error.
make doctor # verify toolchain
make help # list all targetsflowchart TD
A([Actuator]) -->|telemetry stream| B[POST /api/ingest\nFastAPI]
B --> C[(PostgreSQL\ntelemetry history\n+ latest values)]
C --> D[Celery beat worker\nrun_diagnosis_cycle]
D -->|heuristic + ML classifier| E{Fault?}
E -->|yes| F[Attach / update fault\nset node status]
E -->|no| G[Clear fault\nmark healthy]
F --> H[Propagate status\nup node hierarchy]
G --> H
H --> C
C --> I[GET /api/status\nFastAPI]
I --> J[React dashboard\nmap · issues · telemetry charts]
mindmap
root((Flox))
apps
webapp
Facility map
Issues dashboard
Device telemetry charts
AI agent panel
backend
fastapi
Telemetry ingest
Status endpoint
Fault resolution
Agent chat
Document upload
worker
Celery beat
Classification loop
ml
models
Architecture
Training loop
data
ETL pipeline
Processed artifacts
inference.py
ML inference server
configs
Hyperparameter YAML
shacklib
diagnosis_engine.py
Fault classification
State management
Payload builders
agent.py
Claude integration
backend_state.py
Postgres read/write
node_simulator.py
Actuator signal simulator
mock_facility.py
Seed data
logger.py
Structured JSON logging
database
SQL init files
docker
Per-service Dockerfiles
scripts
Seed and migration helpers
The agent is powered by Claude and has access to platform tools: querying live device status, fetching fault history for a specific node, running the diagnosis cycle, and resolving faults.
Destructive actions require explicit operator approval before execution. The frontend surfaces an approval prompt; the agent does not proceed until the operator confirms.
# The agent is exposed at POST /api/agent/chat
# The frontend sends the full conversation history on each turn.
# Tool events are returned alongside the reply so the UI can display what ran.To interact via the UI, open the Operations Agent panel and type a question. Use @NODE_ID to attach a specific device to your message. Quick prompts are generated automatically from the current top fault.
Example prompts:
Give me a live system overview and top active faults.Why is node BEL-VLV-003 reporting stiction_suspected?Show fault history for node BEL-AHU-001.Run diagnosis for BEL-VLV-003 now.Resolve fault fault-a3b2c1d0 with note "validated on site".
Copy .env.example to .env and fill in the values relevant to your deployment.
| Variable | Description |
|---|---|
NAME |
Project name, used as Docker container prefix |
ANTHROPIC_API_KEY |
Required for the operations agent |
BACKEND_PORT |
FastAPI listen port (default: 5000) |
POSTGRES_* |
Database connection settings |
REDIS_PORT |
Redis port |
ML_URL |
URL of the ML inference service |
CLASSIFIER_INTERVAL_SECONDS |
How often the classifier runs (default: 5) |
BACKEND_STARTUP_SEED_MODE |
Seed mode on startup: always or once |
VITE_REQUIRE_AUTH |
Enable Supabase session auth on the frontend |
LOKI_PORT / GRAFANA_PORT |
Enable remote log aggregation |
| Target | Description |
|---|---|
make init |
First-time setup: venv, deps, env linking |
make dev |
Start Vite frontend |
make up |
Start core services (postgres, redis, backend, classifier, worker) |
make down |
Stop all services |
make run.backend |
Start FastAPI backend only |
make run.worker |
Start Celery worker only |
make run.ml |
Start ML inference server |
make lift.ml |
Core services + ML inference |
make lift.sim |
Core services + node simulator |
make lift.logging |
Add Loki + Grafana log stack |
make lift.mlflow |
Add MLflow experiment tracking |
make etl |
Run ETL pipeline |
make train |
Run model training |
make fmt |
Format Python with black |
make lint |
Lint with ruff |
make type |
Type-check with mypy |
make test |
Run pytest |
make clean |
Remove caches and build artifacts |
make doctor |
Verify toolchain (Python, uv, Bun, Docker) |
| Profile | Services | Command |
|---|---|---|
| (default) | postgres, redis, backend, classifier, worker | make up |
ml |
+ ML inference | make lift.ml |
sim |
+ node simulator | make lift.sim |
minio |
+ MinIO object storage | make lift.minio |
tensorboard |
+ TensorBoard | make lift.tensorboard |
mlflow |
+ MLflow | make lift.mlflow |
logging |
+ Loki + Grafana | make lift.logging |
database |
+ MongoDB | make lift.database |
Application state is stored in normalized Postgres tables. A legacy JSONB snapshot in backend_state (id = 1) is maintained for backward compatibility. read_state() reconstructs the full JSON contract; update_state() writes both representations atomically.
erDiagram
backend_nodes {
TEXT id PK
TEXT label
TEXT type
TEXT status
DOUBLE position
TEXT latest_fault_id
TEXT updated_at
}
backend_node_latest_telemetry {
TEXT node_id FK
TEXT metric
JSONB value
}
backend_node_history {
TEXT node_id FK
TEXT metric
INTEGER ordinal
TEXT point_time
JSONB value
}
backend_faults {
TEXT id PK
TEXT node_id
TEXT state
TEXT kind
DOUBLE probability
TEXT summary
TEXT recommended_action
TEXT opened_at
TEXT resolved_by
TEXT note
}
backend_catalog_device_templates {
TEXT id PK
TEXT name
TEXT model
TEXT type
TEXT zone_id
}
backend_agent_audit_log {
BIGINT ordinal PK
JSONB payload
}
backend_nodes ||--o{ backend_node_latest_telemetry : "latest telemetry"
backend_nodes ||--o{ backend_node_history : "history"
backend_nodes ||--o{ backend_faults : "faults"
backend_catalog_device_templates ||--o| backend_catalog_fault_meta : "impact metadata"
backend_agent_meta ||--o{ backend_agent_audit_log : "audit log"
