Real-time surface defect detection for manufacturing quality control. Detects bottle defects at 40ms per unit—100x faster than manual inspection—with zero false positives on production samples.
Try the system in action: https://defectscope.azurewebsites.net
Upload a bottle image or select from sample images to see live predictions with visual heatmaps explaining the model's decision.
- Problem Statement
- Technical Approach
- Key Results
- Getting Started
- Usage
- Architecture
- Performance
- Engineering Decisions
- Limitations
- Development
Manufacturing quality control relies on human visual inspection. A trained inspector processes roughly 12 bottles per minute, examining each for surface defects like cracks, scratches, or printing imperfections. The process is slow, labor-intensive, and prone to error—fatigue reduces detection rates after sustained inspection.
DefectScope automates this process using a dual-model approach. Two independent neural networks analyze each bottle image and cross-verify their predictions. If both models agree, the bottle passes. If they disagree, the image is flagged for human review. This redundancy eliminates false alarms while catching genuine defects.
The system combines supervised classification and unsupervised anomaly detection:
A convolutional neural network fine-tuned on the MVTec Anomaly Detection dataset. The backbone extracts visual patterns learned from 1.3 million ImageNet images. A custom classification head outputs a probability: the likelihood the bottle is defective.
- Training data: ~300 bottle images (200 good, ~100 defective)
- Threshold: 0.3363 (calibrated to p95 of good samples)
- Latency: ~20ms per image
An unsupervised model trained only on images of good bottles. It learns to reconstruct normal surfaces. When shown a defect, reconstruction error spikes, flagging anomalies the classifier might miss.
- Training data: Good bottles only (~200 images)
- Architecture: Encoder → bottleneck → Decoder
- Anomaly threshold: 0.003 (tuned for zero false positives)
- Latency: ~10ms per image
Both models agree → No human review needed
One disagrees → Flag for human confirmation
Both flag defect → Confidence high, forward as defective
This approach catches:
- Pattern-based defects the classifier learned (scratches, dents)
- Novel anomalies the autoencoder detects (unexpected variations)
Grad-CAM (Gradient-weighted Class Activation Mapping) generates heatmaps showing which image regions influenced the classification. Warm colors indicate regions that activated defect detection. This provides traceability—essential for manufacturing audits.
Evaluated on 83 production samples:
| Metric | Value |
|---|---|
| Detection Rate (Recall) | 100% |
| False Positive Rate | 0% |
| Precision | 93% |
| F1 Score | 0.96 |
| Latency per Bottle | 40ms |
Performance comparison with baseline:
| Method | Speed | Accuracy |
|---|---|---|
| Manual Inspection | 12 bottles/min | ~92% |
| DefectScope | ~1,500 bottles/min | 96% |
- Python 3.11+
- pip or conda
Clone the repository and install dependencies:
git clone https://github.com/pranjalts07/defectscope.git
cd defectscope
python -m venv env
source env/bin/activate # On Windows: env\Scripts\activate
pip install -r requirements.txt
cp .env.example .env-
Download the MVTec Dataset
python scripts/download_mvtec.py --data_dir data/raw
This downloads the MVTec Anomaly Detection dataset (~8 GB). The "bottle" category is used for training and evaluation.
-
Train the Models (Optional)
The repository includes pre-trained checkpoints. To retrain:
python -m training.train_cnn --category bottle --epochs 30 python -m training.train_autoencoder --category bottle --epochs 50 python -m evaluation.threshold_search --category bottle
-
Start the Web Server
uvicorn api.main:app --reload --port 8000
Open http://localhost:8000 in your browser.
docker-compose up --buildThe service runs on port 8000 and is ready for production.
Upload a bottle image via drag-and-drop or file picker. Results include:
- Classification: Good or Defective
- Confidence: Model certainty (0-100%)
- Latency: Processing time in milliseconds
- Grad-CAM Heatmap: Visual explanation of the decision
- Anomaly Score: Reconstruction error from the autoencoder
curl -X POST http://localhost:8000/predict \
-F "file=@bottle.jpg"Response:
{
"prediction": "good",
"confidence": 0.976,
"anomaly_score": 0.0012,
"anomaly_threshold": 0.003,
"needs_review": false,
"latency_ms": 38.2
}Full API documentation available at http://localhost:8000/docs (Swagger UI).
python -m inference.predict --image path/to/bottle.jpg --config configs/config.yamldefectscope/
├── api/ # FastAPI web server and REST endpoints
│ ├── main.py # Application server, request handlers
│ ├── schemas.py # Pydantic models for request/response validation
│ └── static/ # Web UI (HTML, CSS, JavaScript)
├── models/ # Neural network implementations
│ ├── cnn_classifier.py # DenseNet-121 for classification
│ └── autoencoder.py # Convolutional autoencoder for anomaly detection
├── inference/ # Production prediction pipeline
│ └── predict.py # DefectPredictor class, Grad-CAM generation
├── training/ # Model training scripts
│ ├── train_cnn.py # CNN training loop with validation
│ └── train_autoencoder.py # Autoencoder training
├── evaluation/ # Metrics and threshold tuning
│ ├── evaluate.py # ROC curves, confusion matrices
│ ├── benchmark.py # Latency measurements
│ ├── threshold_search.py # Optimal threshold search
│ └── metrics.py # Evaluation utilities
├── utils/ # Shared utilities
│ ├── transforms.py # Image preprocessing
│ ├── gradcam.py # Grad-CAM implementation
│ ├── metrics.py # Evaluation functions
│ └── dataset.py # Data loading utilities
├── tests/ # Unit and integration tests
├── configs/ # Configuration files
│ └── config.yaml # Model paths, thresholds
├── scripts/ # Utility scripts
│ ├── download_mvtec.py # Dataset download
│ └── export_onnx.py # ONNX export for edge deployment
├── requirements.txt # Python dependencies
├── Dockerfile # Container specification
└── docker-compose.yml # Multi-container orchestration
Latency breakdown on M1 MacBook Pro (CPU mode):
| Component | Time |
|---|---|
| Image loading | 2ms |
| Preprocessing | 5ms |
| CNN inference | 20ms |
| Autoencoder inference | 10ms |
| Grad-CAM generation | 3ms |
| Total | 40ms |
On GPU hardware (NVIDIA A100), total latency reduces to ~15ms. Throughput: ~1,500 bottles/minute on single GPU.
- Development: M1 MacBook Pro, 16GB unified memory (CPU)
- Inference: NVIDIA A100 GPU (production deployment)
A single DenseNet classifier shows good precision/recall on the test set but struggles with production edge cases—novel defect types not well-represented in training data. The autoencoder acts as a safety net:
- CNN: Catches pattern-based defects (learned from examples)
- Autoencoder: Catches novel anomalies (deviations from "normal")
- Cross-check: Eliminates overconfident false positives
Rather than using the model's default 0.5 decision boundary, we calibrate per-model thresholds to the production data distribution:
- CNN threshold: 0.3363 (p95 of good sample probabilities)
- Autoencoder threshold: 0.003 (tuned for zero false positives)
This prevents production line stoppages caused by false alarms while maintaining 100% defect detection.
Visual explanations matter in manufacturing:
- Auditors and operators need to understand why a bottle was flagged
- Grad-CAM shows attention regions without requiring model retraining
- Heatmaps help identify whether the model is looking at relevant features
If the autoencoder model fails to load, the CNN continues operating independently. The system reports degraded mode but remains functional. This prevents total service failure during model updates.
- Training data domain: Model trained on overhead bottle photos under controlled lighting. Performance degrades on unusual angles, outdoor lighting, or different bottle shapes.
- Category-specific: Currently trained for bottles only. Multi-category detection requires retraining with mixed datasets.
- Novelty detection ceiling: The autoencoder catches ~86% of novel defects—some truly out-of-distribution anomalies will slip through.
- Labeling requirements: Effective performance requires ~200-300 labeled images per category.
pytest tests/ -v --cov=.Tests cover:
- Model forward passes with mock inputs
- API endpoint behavior with various payloads
- Dataset loading and preprocessing
- Threshold calibration logic
python -m evaluation.evaluate --category bottle
python -m evaluation.benchmark --n 100Generates:
- ROC/PR curves
- Confusion matrices
- Latency distributions
python scripts/export_onnx.py --model-path models/densenet.pth --output models/densenet.onnxCreates ONNX-format models for deployment on edge devices without PyTorch dependency.
- ONNX export for edge/embedded deployment
- Multi-category detection (bottles, caps, labels, packaging)
- Pixel-level defect localization (segment where on the bottle is defective)
- Active learning: Online model improvement from production corrections
- Explainable failure modes: Confidence intervals around predictions
- MVTec Anomaly Detection Dataset
- Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
- Densely Connected Convolutional Networks
- Unsupervised Anomaly Segmentation with Convolutional Autoencoders
MIT License – See LICENSE for details.
