Home

🧠 MLOps Platform Wiki

Welcome to the MLOps Platform wiki – your central guide to understanding, deploying, and extending a production‑grade machine learning infrastructure.

📖 What is this platform?

This repository implements a full MLOps stack that automates the entire machine learning lifecycle:

Feature Engineering using Feast (offline & online store)
Experiment Tracking & Model Registry with MLflow
Training Pipeline Orchestration via Kubeflow Pipelines
Advanced Drift Detection (KS‑test, Jensen‑Shannon, PCA) for automatic retraining
Model Serving with KServe and GPU‑optimised transformers
Infrastructure as Code (Terraform) on AWS EKS
CI/CD with GitHub Actions

It is designed to be cloud‑agnostic, scalable, and cost‑efficient, supporting teams that need to move from experimentation to production with confidence.

🗺️ Architecture at a Glance

graph TD
    subgraph "Data"
        A[Data Lake / S3]
        B[Streaming Events]
        C[Feast Feature Store]
    end

    subgraph "ML Pipelines (Kubeflow)"
        D[Training Pipeline]
        E[Drift Detection]
    end

    subgraph "MLflow"
        F[Experiment Tracking]
        G[Model Registry]
    end

    subgraph "Serving (KServe)"
        H[InferenceService]
        I[Transformer]
        J[Predictor]
    end

    subgraph "Infrastructure (EKS)"
        K[Kubernetes]
        L[GPU Nodes]
        M[Prometheus Monitoring]
    end

    A --> C
    B --> C
    C --> D
    D --> F
    F --> G
    G --> H
    H --> I --> J
    E --> D
    K --> L
    M --> K

For a detailed walk‑through see the Architecture Deep-Dive page.

🚀 Quick Start

Clone the repository

git clone https://github.com/Awrsha/mlops-platform.git
cd mlops-platform

Provision infrastructure with Terraform

cd terraform
terraform init && terraform apply

Deploy cluster services (KServe, MLflow, Feast, Kubeflow)
```
kubectl apply -f kubernetes/
```
Trigger a training pipeline
Submit the compiled pipeline to your Kubeflow endpoint or let the CI/CD handle it automatically.

For a more detailed setup, read the Installation Guide.

📚 Wiki Pages

Installation Guide – Step‑by‑step instructions to deploy the platform.
Architecture Deep-Dive – Component design, data flow, and scaling strategies.
Drift Detection – How statistical tests trigger automatic retraining.
CI/CD Pipelines – GitHub Actions workflows explained.
Model Serving – KServe configuration, transformers, and GPU optimisation.
Monitoring & Alerting – Prometheus rules, Grafana dashboards, and alerts.
Troubleshooting – Common issues and solutions.

🤝 Contributing

We welcome contributions! See the main README or the Contributing Guide for details.

❓ Need Help?

Open a GitHub Issue or reach out to the team on Discussions.

Happy building! 🚀

Provide feedback

Saved searches

Use saved searches to filter your results more quickly