This document tracks key technical and architectural decisions made during the development of the AI-Powered Alcohol Label Verification App prototype.
| Decision # | Title | Link | Outcome |
|---|---|---|---|
| 001 | Programming Language Selection | Decision 001 | ✅ Decided |
| 002 | Web Framework Selection | Decision 002 | ✅ Decided |
| 003 | AI/OCR Service Selection | Decision 003 | ✅ Decided |
| 004 | Deployment Platform | Decision 004 | ✅ Decided |
| 005 | Sample Label Generator Approach | Decision 005 | ✅ Decided |
| 006 | Deterministic Generation vs AI Image Generation | Decision 006 | ✅ Decided |
| 007 | JPEG-Only Output Format | Decision 007 | ✅ Decided |
| 008 | CLI-First Development Approach | Decision 008 | ✅ Decided |
| 009 | Graceful Degradation Validation Strategy | Decision 009 | ✅ Decided |
| 010 | [To Be Determined] | Decision 010 | 🔄 Pending |
| 011 | Remove Pretty-Print Option - JSON-Only Output | Decision 011 | ✅ Decided |
| 012 | Docker Strategy with Separate Ollama Service | Decision 012 | ✅ Decided |
| 013 | Pytest Test Suite with 80% Coverage Target | Decision 013 | ✅ Decided |
| 014 | FastAPI with Open Access and Minimal Observability | Decision 014 | ✅ Decided |
| 015 | Container Registry - GitHub Container Registry (GHCR) | Decision 015 | ✅ Decided |
Date: 2026-02-14
Status: ✅ Decided
Decision: Use Python as the primary programming language
- Project requires AI/ML integration for OCR and text extraction
- Need rapid development within 6-day timeline
- Focus on rock-solid core features over ambitious scope
- Must integrate with modern AI services (OpenAI, Google Cloud Vision, or Azure)
Pros:
- Rich AI/ML ecosystem (OpenAI SDK, Google Cloud Vision, Azure CV)
- Excellent image processing libraries (Pillow, OpenCV)
- Strong text processing capabilities (difflib, fuzzywuzzy, regex)
- Rapid web development frameworks (FastAPI, Streamlit, Flask)
- Single language for entire stack possible (reduces context switching)
- Mature libraries for all required functionality
Cons:
- May require separate frontend if not using Streamlit
- Deployment may need more consideration than JS-based solutions
Pros:
- Single language for full-stack development
- Fast deployment to Vercel/Netlify
- Large ecosystem of web frameworks
Cons:
- AI/ML libraries less mature than Python
- OCR integration more complex
- Image processing libraries less robust
- Less natural fit for ML/AI workload
Pros:
- Aligns with TTB's current COLA system (.NET)
- Azure integration might be smoother
Cons:
- Slower development time
- Less mature AI/ML ecosystem
- Steeper learning curve for rapid prototyping
- Not ideal for 6-day timeline
Python selected because:
-
AI/ML Best Fit: The core requirement is AI-powered text extraction. Python has the most mature, well-documented, and actively maintained libraries for:
- OpenAI GPT-4 Vision API
- Google Cloud Vision API
- Azure Computer Vision API
- Tesseract OCR
-
Rich Ecosystem: Python libraries handle all project requirements:
- Image processing: Pillow, OpenCV
- Text comparison: difflib (built-in), fuzzywuzzy
- String validation: regex (built-in)
- Web frameworks: FastAPI, Streamlit, Flask
-
Rapid Development: With 6 days for implementation:
- Python's concise syntax enables faster development
- Frameworks like Streamlit can create full UI in hours, not days
- Less boilerplate code than compiled languages
-
Industry Standard: Most modern AI/ML work and research happens in Python
- Better documentation and community support
- More examples and tutorials for similar use cases
- Easier to find solutions to implementation challenges
-
Focus on Business Logic: Python allows spending more time on:
- Label verification algorithms
- Comparison logic for field matching
- Government warning validation
- Less time on language/framework mechanics
- Framework Selection: Next decision will be Python web framework (Streamlit vs FastAPI vs Flask)
- Deployment: Will need Python-friendly hosting (Railway, Render, Heroku, Streamlit Cloud, PythonAnywhere)
- Dependencies: Will use pip/poetry for dependency management
- Testing: Will use pytest for unit tests
- Code Quality: Will use black/ruff for formatting, mypy for type checking
- Successfully integrate with chosen AI service (OpenAI/Google/Azure)
- Process label images and extract text in < 5 seconds
- Clean, maintainable Python codebase
- Straightforward deployment process
- OpenAI Python SDK: https://github.com/openai/openai-python
- Pillow Documentation: https://pillow.readthedocs.io/
- FastAPI Documentation: https://fastapi.tiangolo.com/
- Streamlit Documentation: https://docs.streamlit.io/
Date: 2026-02-14
Status: ✅ Decided
Decision: Use FastAPI as the web framework
- Need API backend for label verification service
- Must support file uploads (images)
- Need to expose endpoints for frontend consumption
- Should be production-ready and well-documented
- Must support async operations for AI/OCR calls
Pros:
- Fastest development time (single Python file)
- Built-in UI components (file upload, forms, display)
- Perfect for internal tools
- Instant deployment to Streamlit Cloud
Cons:
- Less separation of concerns
- Limited API flexibility
- Not ideal for Lambda deployment
- Less "production-like" architecture
Pros:
- Modern, fast, production-ready
- Automatic API documentation (OpenAPI/Swagger)
- Native async/await support for AI API calls
- Type hints and validation built-in (Pydantic)
- Easy to deploy to AWS Lambda (with Mangum adapter)
- Clean separation of API and frontend
- Excellent for RESTful API design
Cons:
- Requires separate frontend implementation
- More initial setup than Streamlit
Pros:
- Mature, well-known framework
- Large ecosystem and community
Cons:
- Less modern than FastAPI
- No native async support (need Flask-Async)
- More boilerplate code
- Manual API documentation
FastAPI selected because:
- AWS Lambda Compatible: Works well with Lambda via Mangum adapter
- Async Support: Native async/await perfect for AI API calls that may take seconds
- Production Quality: Automatic validation, serialization, and API docs
- Type Safety: Pydantic models reduce bugs and improve maintainability
- Performance: Built on Starlette and Uvicorn (fast ASGI server)
- Developer Experience: Automatic interactive API docs at /docs endpoint
- Need to build separate frontend (React/Vue/vanilla JS)
- API-first architecture with clear endpoints
- Can test API independently of frontend
- Documentation generated automatically
- Easy to add authentication later if needed
- API responds in < 5 seconds per label verification
- Clean API contract with proper request/response models
- Automatic API documentation available
- Successful deployment to AWS Lambda
- FastAPI Documentation: https://fastapi.tiangolo.com/
- Mangum (Lambda adapter): https://mangum.io/
Date: 2026-02-14
Status: ✅ Decided
Decision: Use Ollama with vision-capable model (llama3.2-vision or llava)
From interview with Marcus Williams (IT Systems Administrator):
"Our network blocks outbound traffic to a lot of domains... During the scanning vendor pilot, half their features didn't work because our firewall blocked connections to their ML endpoints."
Key Requirements:
- No external API dependencies that could be blocked by government firewalls
- Must work in isolated network environment
- < 5 second processing time (critical requirement)
- Must extract text from label images accurately
- Must handle structured data extraction (brand name, ABV, warnings, etc.)
Pros:
- State-of-the-art accuracy
- Fast processing (typically 1-3 seconds)
- Minimal setup and maintenance
- Strong structured data extraction
Cons:
- ❌ Requires external API calls (blocked by TTB firewall per Marcus)
- ❌ Won't work in production environment
- Ongoing API costs
- Network dependency
Pros:
- ✅ Fully self-contained, runs locally
- ✅ No external dependencies or API calls
- ✅ Works in isolated network
- Supports vision-capable models (llama3.2-vision, llava, etc.)
- Free to run
Cons:
⚠️ CRITICAL: Performance concern - May not meet 5-second requirement⚠️ Lambda incompatible - Models are too large (GBs) for Lambda- Requires significant compute resources
- Accuracy may be lower than GPT-4 Vision
- Setup complexity for deployment
Pros:
- Tesseract is self-contained and lightweight
- Fast OCR processing (< 1 second)
- Lambda-compatible
- Could use smaller Ollama model for text structuring
Cons:
- Two-stage process (OCR then parsing)
- Tesseract accuracy varies with image quality
- More complex pipeline
Pros:
- ✅ Fast (< 1 second for OCR)
- ✅ Fully self-contained
- ✅ Lambda-compatible
- ✅ Predictable performance
- No AI model hosting needed
Cons:
- Less intelligent text extraction
- Requires manual parsing logic
- May miss edge cases
- Less impressive technically
Selected Ollama with EC2 deployment:
- Meets Core Requirement: Self-contained, no external APIs blocked by firewall
- Vision Capabilities: Can process images and extract structured data
- Realistic for TTB: Mirrors on-premise deployment scenario
- No Ongoing Costs: Free to run (unlike cloud APIs)
- Prototype Appropriate: Demonstrates AI capability in constrained environment
Resolution of Lambda Conflict:
- Moved to EC2 deployment (see Decision 004)
- EC2 can handle 4-8GB Ollama models
- Allows persistent model loading (no cold starts)
Ollama Model Selection:
- Primary: llama3.2-vision (11B parameters, vision + text)
- Alternative: llava (smaller, faster if performance issues)
- Will benchmark both on EC2 to meet 5-second requirement
Meeting the 5-Second Requirement:
- Pre-load model at container startup (not per request)
- Keep Ollama service warm
- Optimize prompts for structured output
- Use appropriate instance size (t3.xlarge minimum)
- Consider GPU instance (g4dn.xlarge) if CPU inference too slow
Fallback Plan: If Ollama cannot meet 5-second requirement:
- Switch to Tesseract OCR + rule-based extraction
- Document in README that this was a performance-driven decision
- Note that production deployment could use specialized OCR hardware
Technical:
- FastAPI will call Ollama API at http://localhost:11434
- Use ollama Python library for integration
- Design prompts for JSON-structured output extraction
- Handle Ollama errors gracefully
Deployment:
- Docker Compose with two services: ollama + api
- Model downloaded during container build or first run
- Persistent volume for model storage
Development:
- Can develop locally with Ollama installed
- Use same Docker Compose for local testing
- Mock Ollama responses for fast iteration if needed
- Image processing completes in < 5 seconds
- Accurate text extraction (>90% accuracy on clear labels)
- Works without external API calls
- Deployable to chosen AWS service
- Ollama: https://ollama.ai/
- Ollama Python SDK: https://github.com/ollama/ollama-python
- Tesseract OCR: https://github.com/tesseract-ocr/tesseract
- AWS Lambda Limits: https://docs.aws.amazon.com/lambda/latest/dg/gettingstarted-limits.html
Date: 2026-02-14
Status: ✅ Decided
Decision: AWS EC2 deployment (dropping Lambda approach)
- AWS is the chosen cloud provider
- Lambda was initial consideration
- Choice depends on AI/OCR service selection
Pros:
- Serverless, auto-scaling
- Pay per request
- Easy to deploy with SAM/CDK
- No server management
Cons:
- ❌ Incompatible with Ollama (model size limits)
- 15-minute timeout limit
- Cold start latency
- 10GB storage limit
Pros:
- ✅ Can run Ollama with vision models
- Full control over environment
- Persistent storage
- Can use GPU instances if needed
Cons:
- More expensive (running 24/7)
- Manual scaling
- More deployment complexity
Pros:
- Container-based deployment
- Better scaling than EC2
- No server management
- Can run Ollama in container
Cons:
- More expensive than Lambda
- Requires container registry setup
- More complex than Lambda
Pros:
- Easy container deployment
- Auto-scaling
- Built-in load balancing
Cons:
- Limited to 4 vCPU / 12GB RAM
- May not be sufficient for Ollama
EC2 selected to support self-contained Ollama deployment:
- Enables Ollama: EC2 can handle 4-8GB vision models that Lambda cannot
- Self-Contained Requirement: Meets TTB's firewall constraints (no external APIs)
- Performance Control: Can select instance type to meet 5-second requirement
- Flexibility: Can add GPU if needed for faster inference
- Prototype Realistic: Mirrors how TTB would actually deploy (on-premise or gov cloud)
Trade-offs Accepted:
- More complex deployment than Lambda
- Higher baseline cost (running instance vs. pay-per-request)
- Manual scaling (but acceptable for prototype)
- Requires instance management
Deployment Plan:
- Use EC2 instance (t3.xlarge or larger based on performance testing)
- Docker container with FastAPI + Ollama
- nginx reverse proxy for production serving
- Elastic IP for stable endpoint
- Security group for port 80/443 only
Instance Selection:
- Start with t3.xlarge (4 vCPU, 16GB RAM)
- Monitor Ollama performance with vision model
- Scale up if needed (t3.2xlarge or g4dn for GPU)
Docker Setup:
# docker-compose.yml structure
services:
ollama:
image: ollama/ollama
volumes:
- ollama_data:/root/.ollama
api:
build: ./backend
depends_on:
- ollama
environment:
- OLLAMA_HOST=ollama:11434Frontend Hosting:
- Option A: Serve static frontend from same EC2 (nginx)
- Option B: Host frontend on S3 + CloudFront (recommended for separation)
- Set up EC2 instance with Docker
- Install and test Ollama with llama3.2-vision or llava
- Benchmark performance with sample label images
- Optimize instance size if needed
- Document deployment process
Date: 2026-02-14
Status: ✅ Decided
Decision: Build a Python script (gen_samples.py) to generate realistic synthetic alcohol beverage labels for testing
- Need test data to validate the label verification system
- Can't use real TTB COLA data (privacy, permissions)
- Need both GOOD (compliant) and BAD (non-compliant) labels
- Labels must reflect actual 27 CFR regulatory requirements
- Need ground truth data (JSON metadata) for validation testing
Pros:
- Authentic real-world labels
- Already approved/verified by TTB
- No generation needed
Cons:
- Can't create BAD examples (all are approved)
- No ground truth metadata
- May have copyright/usage restrictions
- Limited control over test scenarios
- Can't test specific violation types
Pros:
- Full creative control
- Can create specific test cases
- Professional appearance
Cons:
- Time-consuming (would take days for 100+ labels)
- Not scalable
- Hard to ensure regulatory accuracy
- Manual metadata creation error-prone
Pros:
- Scalable - generate hundreds of labels quickly
- Fully automated with command-line control
- Can create specific violation types systematically
- Generates ground truth metadata automatically
- Ensures regulatory accuracy (based on CFR analysis)
- Randomization creates realistic variety
- Reproducible test data
Cons:
- Initial development time (~6-7 hours)
- May not look as polished as manual design
- Limited to programmatic design elements
Selected synthetic generation approach because:
-
Testing Requirements: Need systematic test coverage of both compliant and non-compliant cases
- GOOD labels test baseline functionality
- BAD labels test specific violation detection (15+ types)
- Need many examples to test robustness
-
Ground Truth Essential: For validation testing, need to know:
- What text SHOULD be extracted from each label
- Which regulations each label violates
- Expected validation outcome (COMPLIANT/NON_COMPLIANT)
- Manual labels would require manual metadata creation (error-prone)
-
Scalability: Command-line generation allows:
- Quick iteration: regenerate all samples if requirements change
- Large test sets: easily create 100+ labels
- Specific scenarios: target specific violation types
- Batch processing testing: generate many labels at once
-
Regulatory Accuracy: Script enforces requirements from TTB_REGULATORY_SUMMARY.md:
- Type size calculations based on container size
- Exact government warning text
- Proper format requirements (bold, all caps, etc.)
- Product-specific requirements (spirits vs wine vs malt)
-
Development Time Acceptable: ~6-7 hours to build generator is worthwhile because:
- Saves days of manual label creation
- Enables unlimited regeneration
- Provides foundation for future test expansion
Architecture:
- Single Python file:
gen_samples.py - Minimal dependencies: Pillow + standard library
- Component-based design:
FieldRandomizer- generate valid field valuesLabel- data structureViolationGenerator- apply specific violationsLabelRenderer- draw label to image (PIL)LabelGenerator- orchestrate + CLI
Design Philosophy: Keep it simple!
- Randomize all parameters within regulatory bounds
- Realistic but not overly complex designs
- Standard canvas sizes based on actual bottle labels
- No intentional image degradation (labels are exports from design software, not photos)
Command Line Interface:
python scripts/gen_samples.py --good 50 --bad 50
python scripts/gen_samples.py --good 100 --bad 0
python scripts/gen_samples.py --seed 42 # optional reproducibilityOutput per Label:
label_good_001.jpg- JPEG image (< 750 KB per TTB requirement)label_good_001.tif- TIFF image (< 750 KB)label_good_001.json- Metadata with ground truth
Violation Types Supported:
- Missing required fields (warning, brand, ABV, net contents, etc.)
- Warning format violations (not all caps, body bold, wrong text)
- Value mismatches (ABV outside tolerance, wrong net contents)
- Format violations (type size too small, missing import phrase)
- Mix of single and multiple violations per label
For Development:
- Separate work stream for sample generator
- Develop before main verification system
- Can test generator output manually before integration
For Testing:
- Comprehensive test coverage of regulations
- Ground truth enables automated validation testing
- Can regenerate samples if requirements change
- Supports batch processing testing
For Documentation:
SAMPLE_GENERATOR.mdprovides complete specification- Implementation plan with 8 phases (~6-7 hours)
- Clear testing strategy and success criteria
For Future:
- Can extend with more violation types
- Can add image quality variations if needed
- Foundation for continuous test data generation
Generator Functionality:
- [✅] Script runs without errors
- [✅] Generates specified number of GOOD and BAD labels
- [✅] All output files created (JPEG, TIFF, JSON)
- [✅] All files < 750 KB
Label Quality:
- [✅] GOOD labels comply with all 27 CFR requirements
- [✅] BAD labels have documented violations
- [✅] Labels are visually realistic
- [✅] Text is readable (proper contrast and size)
- [✅] Type sizes meet regulatory minimums (GOOD labels)
Testing Support:
- GOOD labels validate as COMPLIANT in main system (pending main system build)
- BAD labels validate as NON_COMPLIANT with correct violations (pending main system build)
- [✅] Ground truth metadata is accurate and complete
- [✅] Can generate 100+ labels in reasonable time
SAMPLE_GENERATOR.md- Complete technical specificationTTB_REGULATORY_SUMMARY.md- Regulatory requirements- 27 CFR XML files in
cfr_regulations/directory
Date: 2026-02-15
Status: ✅ Decided
Decision: Use deterministic programmatic generation (PIL/Pillow) rather than AI image generation tools for creating sample labels
- External documentation suggested: "AI image generation tools work well for this"
- Need to choose between AI-generated images vs programmatic rendering
- Must have accurate ground truth for validation testing
- Need control over specific regulatory violations
- Want to generate labels that look realistic but are deterministic
Pros:
- More realistic/varied visual designs
- Can generate diverse artistic styles
- Less programming complexity
- Potentially more professional appearance
- Follows suggestion in external docs
Cons:
- ❌ No control over text accuracy - AI might misspell, alter text
- ❌ Ground truth uncertain - would need OCR to extract what AI actually wrote
- ❌ Violation generation challenging - hard to prompt for specific violations
- ❌ Expensive/slow - API costs and generation time per image
- ❌ Inconsistent - AI might not follow regulatory requirements exactly
- ❌ Two-step process - Generate image, then OCR it to get ground truth
- No guarantee of regulatory accuracy
Pros:
- ✅ Complete control over every element
- ✅ Guaranteed regulatory accuracy - we render exactly what we specify
- ✅ Known ground truth - we know precisely what text is on each label
- ✅ Deterministic output - same parameters = same result
- ✅ Fast generation - seconds per label, no API calls
- ✅ No API costs - free after Pillow installation
- ✅ Violation control - can create specific regulatory violations systematically
- ✅ Works offline - no network dependency
Cons:
- Labels may look somewhat "synthetic" or template-like
- Limited visual variety in design styles
- Requires careful PIL/Pillow programming
- Less impressive visually than AI-generated images
Pros:
- Use programmatic for guaranteed accuracy
- Use AI for additional visual variety
- Best of both worlds
Cons:
- More complex implementation
- Two different generation pipelines
- Still have ground truth issues with AI-generated labels
Selected programmatic generation (Option 2) because:
-
Ideal vs. Practical Reality:
- In a perfect world, we would use REAL label images that had been validated by human experts in the 27 CFR regulations
- Real labels validated by TTB experts would provide:
- Authentic visual complexity
- Real-world edge cases and variations
- Confidence in regulatory compliance
- No risk of programmatic errors in interpretation
- However, this is not possible for this POC because:
- We don't have access to a corpus of pre-validated labels with ground truth
- Getting expert validation would take weeks/months (outside POC timeline)
- Real TTB COLA database only has approved labels (can't get BAD examples)
- Building such a dataset is beyond the scope of a 6-day prototype
- Therefore, deterministic generation was the pragmatic choice given constraints
-
Ground Truth is Critical: For testing the verification system, we MUST know exactly what text should be extracted
-
Ground Truth is Critical (Continued): For testing the verification system, we MUST know exactly what text should be extracted
- With AI generation: AI might write "Goverment Warning" or "45.2% ABV" when we wanted "45.0% ABV"
- With programmatic: We render exactly "GOVERNMENT WARNING:" and "45.0% ABV"
- Ground truth is essential for automated validation testing
-
Violation Control: Need to systematically test specific violations
- Must create labels with "warning not all caps" violation
- Must create labels with "ABV outside tolerance" violation
- Very difficult to prompt AI to generate these specific regulatory violations
- Programmatic generation: simply set
warning_header_all_caps = False
-
Regulatory Accuracy: Labels must follow precise 27 CFR requirements
- Type sizes must be exactly 1mm, 2mm, or 3mm based on container size
- Warning text must be word-for-word exact
- AI models aren't trained on TTB regulations
- Programmatic: we implement exact requirements from CFR
-
Human Expertise Requirements: Deterministic methods were necessary for sample generation so that we could KNOW with some degree of certainty that we were generating definitively GOOD or BAD samples
- With AI-generated samples, a human well-versed in the regulations would have to either: a) Train the AI to reliably generate GOOD and BAD samples (requires extensive prompt engineering and validation), OR b) Manually sort images created by an AI-based sample generator into GOOD and BAD categories
- Given the time constraints, it seemed unlikely for the human operator to become conversant enough in the 27 CFR regulations (spanning 1,400+ pages) to make accurate compliance determinations
- The regulations contain subtle requirements (e.g., "GOVERNMENT WARNING:" must be bold but body text must not be bold, ABV tolerances vary by product type) that are difficult for humans to quickly master
- Programmatic generation embeds regulatory knowledge directly in code, removing human judgment from the classification process
-
Speed & Scale: Can generate 100+ labels in minutes
- Programmatic: ~2 seconds per label, no API latency
- AI generation: 10-30 seconds per label + API costs
- For testing, need ability to regenerate entire test set quickly
-
Deterministic Testing: Same seed = same labels
- Critical for reproducible test results
- Debugging: can regenerate exact same problematic label
- AI generation is inherently non-deterministic
-
Cost & Availability:
- Programmatic: Free after Pillow installation
- AI APIs: $0.02-$0.20+ per image, adds up for 100+ labels
- No API keys or accounts needed
Programmatic rendering with PIL/Pillow:
- Use
ImageDrawto create canvas and draw text - Calculate type sizes:
mm * 300 DPI / 25.4 ≈ mm * 11.8 pixels - Load system fonts with fallback to default
- Render text with bold/regular weights
- Add decorative elements (borders, lines, corners)
- Ensure contrast (light backgrounds, dark text)
Visual Quality Improvements:
- Randomize background colors from palette
- Randomize layout positions slightly
- Add decorative elements (borders, corner ornaments)
- Vary font sizes within regulatory bounds
- Different canvas sizes for different container sizes
Result: Labels look "designed" rather than "photographed" - appropriate for the use case (labels are design exports, not photos of bottles)
For Testing:
- 100% accurate ground truth for validation
- Can test specific violation types systematically
- Reproducible test sets with seeds
- Fast iteration when requirements change
For Development:
- More initial programming (vs AI prompting)
- Need to handle font loading, text rendering
- But: Complete control over output
For Future:
- Could add AI generation as supplementary tool later
- Could use AI for additional visual variety
- Foundation is solid programmatic generation with known ground truth
- [✅] Generated 40 labels (20 GOOD + 20 BAD) successfully
- [✅] All files < 750 KB
- [✅] Ground truth JSON matches rendered images exactly
- [✅] Specific violations render correctly (warning not caps, missing fields, etc.)
- [✅] Labels are visually acceptable (readable, contrasting, organized)
- [✅] Generation speed: ~2 seconds per label
- PIL/Pillow Documentation: https://pillow.readthedocs.io/
gen_samples.py- Implementation (~1,100 lines)SAMPLE_GENERATOR.md- Technical specification
Date: 2026-02-16 (revised 2026-02-20)
Status: ✅ Revised
Decision: Generate both JPEG and TIFF files for sample labels; accept both JPEG and TIFF for label uploads
- TTB COLA submission requirements specify "JPEG or TIFF, < 750 KB"
- Initial implementation generated both JPEG and TIFF per label
- An intermediate revision briefly considered JPEG-only to simplify file management
- However, the verification API must accept both formats to match real TTB workflow (applicants may submit either)
Accept both JPEG and TIFF for label uploads throughout the system:
POST /verify,POST /verify/async: accept.jpg,.jpeg,.tif,.tiff- Sample generator: produces both
.jpgand.tifper label for comprehensive testing - Batch ZIP files: may contain either format
- TTB Compliance: Real COLA submissions are JPEG or TIFF — the verifier must handle both
- Test Coverage: Generating both formats from
gen_samples.pyexercises both upload paths - No PNG: PNG is not an accepted TTB COLA format and is explicitly rejected by the API
# api.py — accepted MIME types and extensions
ALLOWED_CONTENT_TYPES = {"image/jpeg", "image/jpg", "image/tiff"}
ALLOWED_EXTENSIONS = {".jpg", ".jpeg", ".tif", ".tiff"}- [✅] API accepts JPEG and TIFF uploads; rejects PNG and other formats
- [✅] Sample generator produces both
.jpgand.tifper label - [✅] All files < 750 KB
- TTB COLA Requirements: JPEG or TIFF, < 750 KB
app/api.py—validate_upload_file()function
Date: 2026-02-16
Status: ✅ Decided
Decision: Build verification engine as command-line tool first, wrap with FastAPI later
- Need to build label verification system with OCR + validation logic
- Could start with web app (FastAPI + frontend) or CLI tool
- Want to focus on core verification logic without UI distractions
- Need rapid iteration and testing capability
Pros:
- Matches final deliverable format
- User-friendly interface from start
- Demo-ready immediately
Cons:
- Frontend/backend coordination overhead
- Slower iteration on core logic
- Harder to test and debug
- UI concerns distract from verification accuracy
Pros:
- Focus purely on verification logic
- Fast iteration and testing (just run command)
- Easy to test against 40-label golden dataset
- Clean separation: build core, then wrap in API
- Follows Unix philosophy: do one thing well
- Same JSON output works for both CLI and API
Cons:
- Requires second step to add web interface
- Less immediately impressive for demos
Selected CLI-first approach because:
-
Focus on Core Logic: Verification accuracy is the critical requirement
- Build OCR extraction properly
- Get validation logic correct
- Meet 5-second performance requirement
- No UI distractions
-
Rapid Testing: With golden dataset ready:
# Test immediately python verify_label.py samples/label_good_001.jpg # Batch test all 40 labels ./test_all_samples.sh
- Instant feedback on accuracy
- Easy to iterate on validation rules
- No need to click through web forms
-
Clean Architecture:
CLI (verify_label.py) ↓ Core Logic (label_validator.py) ← Returns JSON ↓ FastAPI (app.py) ← Wraps same function- Single source of truth for verification
- No duplication between CLI and API
- Test core once, both interfaces work
-
JSON Output is Universal:
- CLI returns JSON to stdout
- FastAPI returns same JSON as HTTP response
- Consistent results regardless of interface
- Easy to test programmatically
-
Matches Development Workflow:
- Build verifier (this phase)
- Test thoroughly with CLI
- Wrap in FastAPI (next phase)
- Add frontend (final phase)
CLI Interface:
# Basic usage - returns JSON to stdout
python verify_label.py samples/label_good_001.jpg
# With ground truth for full validation
python verify_label.py label.jpg --ground-truth application.json
# Batch mode
python verify_label.py samples/*.jpg
# Choose OCR backend
python verify_label.py label.jpg --ocr ollama # default
python verify_label.py label.jpg --ocr tesseract # fallbackOutput Format (JSON only):
{
"status": "COMPLIANT",
"validation_level": "full",
"extracted_fields": { ... },
"validation_results": { ... },
"violations": [],
"processing_time_seconds": 2.3
}Future FastAPI Integration:
@app.post("/verify")
async def verify_label(image: UploadFile, ...):
# Call same core function
result = verify_label_file(image_path, expected_fields)
return result # Same JSONFor Development:
- Start with
ocr_backends.py,label_extractor.py,label_validator.py - CLI is thin wrapper: parse args, call validator, print JSON
- Can test immediately against golden dataset
- Iterate quickly without web server overhead
For Testing:
- Simple bash script tests all 40 labels
- Easy to measure accuracy (20 GOOD should pass, 20 BAD should fail)
- Performance testing: just time the command
- No browser/HTTP testing needed yet
For FastAPI Migration:
- Core logic already returns JSON
- FastAPI just wraps: accept upload, call function, return JSON
- No rewrite needed - literally import and call
- Can build frontend against CLI JSON structure first
- [✅] CLI tool accepts image path and returns JSON
- [✅] Ollama installed with llama3.2-vision model
- Verifier correctly identifies 20 GOOD labels as compliant
- Verifier correctly identifies 20 BAD labels as non-compliant
- Processing time < 5 seconds per label
- JSON output structure documented
- Unix Philosophy: https://en.wikipedia.org/wiki/Unix_philosophy
- Golden dataset:
samples/directory (40 labels with ground truth)
Date: 2026-02-16
Status: ✅ Decided
Decision: Support two-tier validation: structural checks (always) + accuracy checks (optional, requires ground truth)
- Real TTB workflow: verify label matches COLA application data
- Need to compare extracted text against claimed values
- Users may not always have complete application data
- Should provide useful feedback even with partial information
Core Question: Can we validate a label from the image alone, or do we need application data?
Answer: Depends on what we're validating:
- Structural compliance (required fields present, warning formatted correctly) - YES, image only
- Accuracy verification (does brand match claim, is ABV correct) - NO, need ground truth
Pros:
- Complete validation always
- Matches TTB workflow exactly
- No ambiguous results
Cons:
- Can't validate without application data
- Not useful for quick structural checks
- Rigid - all or nothing
Pros:
- Always works regardless of input
- Fast structural checks
- No external data needed
Cons:
- Can't verify accuracy (main use case)
- Misleading - might say "compliant" when fields are wrong
- Doesn't match real TTB workflow
Pros:
- Works with any amount of ground truth (none, partial, full)
- Transparent about what was/wasn't validated
- Useful in all scenarios
- Guides users to provide more data
- Mirrors real-world flexibility
Cons:
- More complex output (three states: pass/fail/unknown)
- Need to track what was checked vs skipped
Selected graceful degradation because:
-
Matches Real-World Usage:
- Sometimes user has full application data
- Sometimes only partial (just brand and ABV)
- Sometimes none (quick structural check)
- System should be useful in all cases
-
Two-Tier Validation System:
Tier 1: Structural (Always Performed)
- Government warning present? ✅
- "GOVERNMENT WARNING:" in all caps? ✅
- Warning text matches exact wording? ✅
- Brand name present? ✅
- ABV present? ✅
- Net contents present? ✅
- Bottler info present? ✅
Tier 2: Accuracy (Requires Ground Truth)
- Brand matches application? (need expected brand)
- ABV matches within tolerance? (need expected ABV)
- Net contents matches? (need expected net contents)
- Bottler info matches? (need expected bottler)
- Country of origin matches? (need expected country)
-
Transparent Communication:
{ "status": "PARTIAL_COMPLIANCE", "validation_level": "structural_only", "structural_checks": { "passed": 8, "failed": 0 }, "accuracy_checks": { "skipped": 6, "reason": "no_ground_truth" }, "warnings": [ "Cannot verify field accuracy without application data" ] } -
Guides Users:
- When ground truth missing: "Provide application data for complete validation"
- Shows what was checked vs what wasn't
- User understands limitations
-
Flexible Workflow:
# Quick structural check python verify_label.py label.jpg # → "Structure OK, but can't verify accuracy" # Full validation python verify_label.py label.jpg --ground-truth app.json # → "COMPLIANT - all fields match" # Partial validation python verify_label.py label.jpg --brand "X" --abv "13.5%" # → "Brand matches, ABV matches, other fields not verified"
Validation Function Signature:
def verify_label_file(
image_path: str,
expected_fields: Optional[Dict[str, str]] = None,
ocr_backend: str = "ollama"
) -> Dict[str, Any]:
"""
Verify label compliance.
Args:
image_path: Path to label image
expected_fields: Optional dict with expected values
{
'brand_name': 'Stone\'s Throw',
'alcohol_content': '13.5% alc./vol.',
'net_contents': '750 mL',
# ... any subset of fields
}
ocr_backend: 'ollama' or 'tesseract'
Returns:
{
'status': 'COMPLIANT' | 'NON_COMPLIANT' | 'PARTIAL_COMPLIANCE',
'validation_level': 'full' | 'partial' | 'structural_only',
'extracted_fields': {...},
'validation_results': {
'structural': {'passed': N, 'failed': M, 'checks': [...]},
'accuracy': {'passed': X, 'failed': Y, 'skipped': Z, 'checks': [...]}
},
'violations': [...],
'warnings': [...]
}
"""Three Validation Levels:
- full - All expected fields provided, complete validation
- partial - Some expected fields provided, validate what we can
- structural_only - No expected fields, only check structure
Status Determination:
COMPLIANT- All checks passed (structural + accuracy where applicable)NON_COMPLIANT- Any checks failed (violations found)PARTIAL_COMPLIANCE- All performed checks passed, but some skipped
For CLI:
# No ground truth
python verify_label.py label.jpg
# Returns: status=PARTIAL_COMPLIANCE, warnings about missing data
# Full ground truth
python verify_label.py label.jpg --ground-truth app.json
# Returns: status=COMPLIANT or NON_COMPLIANT
# Partial ground truth
python verify_label.py label.jpg --brand "X" --abv "13.5%"
# Returns: status varies, shows which checks ranFor FastAPI (Future):
<form>
<input type="file" name="label" required>
<!-- Optional fields with warning -->
<p class="warning">
💡 Providing application data enables full validation.
Without it, we can only check structural compliance.
</p>
<input name="brand_name" placeholder="Brand Name (optional)">
<input name="alcohol_content" placeholder="ABV (optional)">
<!-- ... -->
</form>For Users:
- Always get useful feedback
- Understand what was/wasn't validated
- Encouraged to provide complete data
- No binary "works/doesn't work" frustration
- Structural validation works without ground truth
- Accuracy validation works with ground truth
- Partial ground truth correctly skips missing fields
- Output clearly indicates validation level
- Warnings guide users to provide more data
- Graceful degradation pattern: https://en.wikipedia.org/wiki/Graceful_degradation
- TTB workflow: REQUIREMENTS.md
Date: 2026-02-16 (revised 2026-02-20)
Status: ✅ Revised
Decision: Use Ollama (llama3.2-vision) as the sole OCR backend; Tesseract was evaluated and rejected due to unacceptable accuracy
Revision note: An earlier draft of this decision proposed a hybrid approach (Tesseract default, Ollama optional). After further implementation and evaluation the hybrid was abandoned: Tesseract accuracy (~60-70%) was judged insufficient for regulatory compliance checking, and the async queue architecture (see deployment) makes the Ollama latency acceptable. The system now uses Ollama exclusively.
During implementation, discovered a fundamental tension between two critical requirements:
-
5-Second Performance Requirement:
- Sarah Chen (Deputy Director): "If we can't get results back in about 5 seconds, nobody's going to use it. We learned that the hard way."
- Based on failed vendor pilot (30-40 second processing times)
- Stated as critical dealbreaker
-
AI-Powered Verification:
- Project title: "AI-Powered Alcohol Label Verification App"
- Marcus Williams mentioned previous vendor's "ML endpoints"
- Implies use of modern AI/ML technology
-
Local Execution Constraint:
- Marcus Williams: "our network blocks outbound traffic... firewall blocked connections to their ML endpoints"
- Must run completely locally (no cloud APIs)
These requirements are in direct conflict with available technology.
Evaluated three OCR approaches on our golden dataset:
Test Results:
- Speed: ✅ ~1 second per label (meets 5-second requirement)
- Accuracy:
⚠️ Problematic- Missing brand names (decorative fonts)
- Text corruption: "Black Brewing" → "Black ibealtl se"
- OCR errors: "ability" → "epulty"
- Overall: ~60-70% field accuracy estimate
- Technology: Traditional pattern-matching OCR (not AI)
- Local: ✅ Yes
Example output:
Hefeweizen
7.5% ABV
Imported by Black ibealtl se Francisco, CA ← OCR ERROR
Test Results:
- Speed: ❌ ~58 seconds per label (exceeds 5-second requirement by 12x)
- Accuracy: ✅ Excellent
- Correctly extracted all fields including decorative fonts
- Handled markdown formatting well
- Overall: ~95%+ field accuracy estimate
- Technology: Modern AI vision transformer (7.9 GB model)
- Local: ✅ Yes
Example output:
**Ridge & Co.**
**Hefeweizen**
**7.5% ABV**
**Imported by Black Brewing, San Francisco, CA** ← CORRECT
Test Results:
- Speed: ❌ Timed out after 60 seconds (or had errors)
- Accuracy: ❓ Could not complete testing
- Technology: Medium AI vision model (4.7 GB)
- Local: ✅ Yes
- Conclusion: Not viable - slower or error-prone
- Speed: ✅ ~2 seconds
- Accuracy: ✅ Excellent
- Local: ❌ NO - blocked by government firewall
- Conclusion: Not viable for TTB environment
Pros:
- Meets critical 5-second requirement
- Works in TTB's firewall-restricted environment
- Fast enough for batch processing (200-300 labels)
- Production-ready and stable
Cons:
- Misses "AI-Powered" aspect of project title
- Lower accuracy on decorative fonts (~60-70%)
- May produce false positives/negatives due to OCR errors
- Doesn't demonstrate modern AI capability
Pros:
- Truly "AI-Powered" using modern vision transformers
- Excellent accuracy (~95%+)
- Demonstrates cutting-edge ML capability
- Better handles complex label layouts
Cons:
- Fails critical 5-second requirement (58 seconds)
- Sarah Chen explicitly said this is a dealbreaker
- Mirrors failed vendor pilot (30-40 seconds)
- Impractical for batch processing (200 labels = 3+ hours)
Pros:
- ✅ Meets 5-second requirement (Tesseract default)
- ✅ Demonstrates AI capability (Ollama optional)
- ✅ User choice: speed vs accuracy tradeoff
- ✅ Honest about technological constraints
- ✅ Shows good engineering judgment
- ✅ Practical for different use cases
Cons:
- More complex implementation (two backends)
- Documentation must explain tradeoff clearly
- User must understand speed/accuracy choice
Selected hybrid approach because:
-
Acknowledges Reality of Conflicting Requirements:
- Both requirements (5-second + AI) are legitimate
- Current local AI technology cannot meet both simultaneously
- Providing both options respects both requirements
-
Matches Real-World Use Cases:
- Routine processing (90% of use): Tesseract (fast, good enough)
- Complex/disputed cases (10%): Ollama (slow, highly accurate)
- Dave Morrison's quote: "there's nuance... you need judgment"
- Agents can choose appropriate tool for the situation
-
Demonstrates Engineering Maturity:
- Recognizes when requirements conflict
- Makes reasoned tradeoffs with data
- Documents decision transparently
- Provides flexibility rather than rigid solution
-
Respects Critical Requirements:
- Default = Tesseract: Respects Sarah Chen's 5-second dealbreaker
- Optional = Ollama: Honors "AI-Powered" project title
- Both local: Respects Marcus Williams' firewall constraint
-
Production-Ready Architecture:
- Easy to add new OCR backends later
- Can swap defaults as technology improves
- Users vote with their usage patterns
CLI Interface:
# Default: Fast mode with Tesseract (~1 second)
python verify_label.py label.jpg --ground-truth app.json
# Accurate mode: Use AI for better accuracy (~60 seconds)
python verify_label.py label.jpg --ground-truth app.json --accurate
# Explicit backend selection
python verify_label.py label.jpg --ocr tesseract # fast
python verify_label.py label.jpg --ocr ollama # accurateWeb UI (Future FastAPI):
<form>
<input type="file" name="label" required>
<!-- Speed vs Accuracy Choice -->
<label>
<input type="checkbox" name="use_ai" value="true">
Use AI for higher accuracy (slower, ~60 seconds)
</label>
<p class="help-text">
Default uses fast OCR (~1 second).
AI mode provides better accuracy but takes longer.
</p>
<!-- Application data fields... -->
</form>Metadata Output:
{
"ocr_backend": "tesseract",
"processing_time_seconds": 0.98,
"confidence": 0.91,
"note": "For higher accuracy, use --accurate flag (AI mode)"
}| Backend | Speed | Accuracy | Use Case | Default? |
|---|---|---|---|---|
| Tesseract | ~1 sec | ~70% | Routine screening, batch processing | ✅ YES |
| Ollama | ~58 sec | ~95% | Complex cases, disputed labels | Optional |
In README.md:
## OCR Technology Choice
This verifier uses a **hybrid OCR approach** to balance speed and accuracy:
### Default: Tesseract OCR (Fast)
- Processing time: ~1 second per label
- Meets critical 5-second requirement
- Good accuracy on clean text
- May struggle with decorative/script fonts
### Optional: Ollama AI (Accurate)
- Processing time: ~60 seconds per label
- Superior accuracy on complex labels
- Handles decorative fonts better
- Use with `--accurate` flag
### Why Hybrid?
The project requirements contain a fundamental tension:
- **5-second performance requirement** (critical per stakeholder interview)
- **AI-powered verification** (project title)
- **Local execution** (firewall constraints)
Current AI vision models running locally cannot meet the 5-second requirement
(~60 seconds observed). Rather than fail one requirement to meet another, we
provide both options and let users choose based on their needs.
**Recommendation:** Use Tesseract for routine processing. Use AI mode for
complex or disputed cases where accuracy is more important than speed.Near-term (6-12 months):
-
GPU Acceleration: Deploy on GPU instances (g4dn.xlarge)
- Could reduce Ollama time from 60s → 10-15s
- Still unlikely to reach 5-second target with full models
-
Model Optimization:
- Quantize llama3.2-vision (8-bit or 4-bit)
- Try smaller models (moondream, etc.)
- Custom fine-tuned model for label-specific task
-
Hybrid Workflow:
- Tesseract for initial screening
- AI verification only for flagged cases
- Reduces overall processing time
Long-term (12+ months):
-
Custom Trained Model:
- Purpose-built lightweight model for alcohol labels
- Train on 10,000+ real TTB labels
- Could achieve 2-5 second inference with high accuracy
-
Hardware Acceleration:
- Dedicated inference hardware (NVIDIA T4, Coral TPU)
- Batch optimization for 200-300 label imports
- Could make AI mode practical for routine use
-
Standardized Labels:
- Work with industry to standardize label formats
- Make OCR easier with consistent fonts/layouts
- Could improve Tesseract accuracy to 90%+
For Development:
- Build OCR abstraction layer (already done in
ocr_backends.py) - Both backends working and tested
- Easy to add new backends as technology improves
For Users:
- Clear choice: fast vs accurate
- Documented tradeoffs
- Flexibility for different use cases
For Evaluation:
- Demonstrates understanding of conflicting requirements
- Shows engineering judgment about tradeoffs
- Provides working solution that respects all constraints
- Honest about technological limitations
For TTB Deployment:
- Start with Tesseract default (meets speed requirement)
- Collect usage data on accuracy issues
- Invest in GPU infrastructure if AI mode proves valuable
- Clear upgrade path as technology improves
- [✅] Tesseract backend functional and < 5 seconds
- [✅] Ollama backend functional (tested with llama3.2-vision)
- [✅] OCR abstraction layer supports both backends
- CLI supports
--accurateflag for backend selection - Web UI includes "Use AI for accuracy" checkbox
- Documentation clearly explains tradeoff
- Testing on 40-label golden dataset quantifies accuracy difference
- Testing results:
OCR_ANALYSIS.md(to be created) - Stakeholder interviews:
Take-Home Project_ AI-Powered Alcohol Label Verification App.docx - OCR backends implementation:
ocr_backends.py - Sarah Chen quote (5-second requirement): Lines 66-67 of .docx
- Marcus Williams quote (firewall): Lines 85-87 of .docx
Date: 2026-02-16
Status: ✅ Decided
Decision: Remove --pretty flag and TTY detection from CLI. Always output compact JSON with no indentation.
- Original CLI had
--prettyflag that auto-detected TTY and pretty-printed JSON - Moving to API-first architecture with FastAPI
- APIs should output consistent, compact JSON regardless of client
- Pretty-printing adds complexity and inconsistency between CLI and API usage
Pros:
- Helpful for manual debugging
- Backward compatible with existing scripts
Cons:
- Inconsistent behavior between CLI and API
- Additional code complexity
- Users can pipe to
jqorpython -m json.toolfor pretty-printing
Pros:
- Consistent output format (matches API behavior)
- Simpler code (remove 4+ lines of logic)
- Smaller output size for pipelines
- Industry standard for APIs
- Users can pretty-print externally if needed
Cons:
- Less readable for human inspection (mitigated by external tools)
Selected Option 2 because:
- API-First Design: CLI will be wrapper around core validation logic used by API
- Consistency: Same JSON format whether called via CLI, API, or programmatically
- Simplicity: Removes conditional logic and flags
- Best Practice: APIs return compact JSON; clients handle formatting
- External Tools:
jq,python -m json.tool, IDE formatters available
- Code Changes: Remove
--prettyargument, TTY detection, conditional formatting - Test Updates: Update
run_tests.shTEST 7 to check for compact JSON - Documentation: Update CLI docs to note JSON-only output
- User Impact: Users wanting pretty output must use external tools (one-liner:
| python -m json.tool)
- ✅ CLI outputs identical JSON format as API
- ✅ All tests pass with compact JSON
- ✅ Code simplified (fewer conditionals)
- ✅ Documentation updated
verify_label.pylines 249-250, 254-256, 276, 287- API design best practices: compact JSON for machine consumption
Date: 2026-02-16
Status: ✅ Decided
Decision: Use multi-stage Docker build for main app (~500MB) with Ollama as separate service in docker-compose
- Need to containerize application for deployment
- Ollama AI vision models are ~7.9GB (llama3.2-vision)
- Want to support both fast (Tesseract) and accurate (Ollama) OCR options
- All-in-one image would be ~10GB (slow build/push, violates Docker best practices)
- Must support local development and production deployment
Architecture:
services:
ollama:
image: ollama/ollama:latest
ports: ["11434:11434"]
verifier:
build: .
environment:
- OLLAMA_HOST=http://ollama:11434Pros:
- ✅ Main app image stays lean (~500MB)
- ✅ Uses official Ollama container (well-maintained)
- ✅ Services can scale independently
- ✅ Easy GPU passthrough for Ollama
- ✅ Models cached in Docker volume (survive restarts)
- ✅ Clear separation of concerns
Cons:
⚠️ Requires docker-compose (acceptable for deployment)⚠️ Minimal network overhead between containers
Pros:
- Simplest Docker setup
- Uses host GPU directly
Cons:
- ❌ Not fully containerized
- ❌ Manual installation required on each host
- ❌ Host dependency reduces portability
Cons:
- ❌ Image size ~10GB (7.9GB model + base)
- ❌ Slow build times (30+ minutes)
- ❌ Slow push/pull to registry
- ❌ Violates Docker best practices (one concern per container)
- ❌ Wastes bandwidth for users not using Ollama
Selected Option 1 because:
- Lean Images: Main app stays ~500MB (Tesseract + Python deps)
- Official Images: Ollama maintained by Ollama team, always up-to-date
- Flexibility: Can run verifier without Ollama for Tesseract-only deployments
- Scalability: Can scale Ollama separately if it becomes bottleneck
- Best Practices: Each service has single responsibility
- GPU Support: Official Ollama image handles GPU passthrough correctly
- Model Management: Ollama handles model downloads/updates
Multi-Stage Dockerfile:
Stage 1: base (Python 3.12-slim + Tesseract)
Stage 2: builder (install Python dependencies)
Stage 3: test (run pytest with 80% coverage requirement)
Stage 4: production (minimal runtime)
docker-compose.yml:
- Ollama service with GPU support and health checks
- Verifier service depends on Ollama (can override for Tesseract-only)
- Shared network for service communication
- Volume for Ollama models (persist across restarts)
Environment Configuration:
OLLAMA_HOSTconfigurable via .env- Defaults to docker-compose service name
- Can override for external Ollama instance
- Development: Requires Docker Compose for local dev with Ollama
- Testing: Multi-stage build runs pytest automatically (fails if <80% coverage)
- Deployment: Two containers deployed together via docker-compose or orchestrator
- Image Size: Main app ~500MB, Ollama ~4GB (one-time pull)
- Startup Time: Ollama takes ~10-30s to start, verifier waits via health check
- Graceful Degradation: If Ollama unavailable, verifier falls back to Tesseract
- ✅ Main image size <600MB
- ✅ Docker build completes in <5 minutes (excluding model pull)
- ✅ Both services start successfully via docker-compose
- ✅ Health checks pass for both services
- ✅ Can run verifier with or without Ollama
- ✅ GPU passthrough works for Ollama (when available)
- Ollama Docker image: https://hub.docker.com/r/ollama/ollama
- Docker multi-stage builds: https://docs.docker.com/build/building/multi-stage/
- Docker Compose health checks: https://docs.docker.com/compose/compose-file/compose-file-v3/#healthcheck
Date: 2026-02-16
Status: ✅ Decided
Decision: Implement comprehensive pytest suite with 80% minimum coverage, integrated into Docker build. Keep existing bash tests for smoke testing.
- Currently have bash test script (
run_tests.sh) with 24 tests across 8 categories - Moving to containerized deployment requires proper CI/CD testing
- Need unit tests, integration tests, and API tests
- Docker multi-stage build should fail if tests fail or coverage too low
- Want to maintain both developer-friendly smoke tests and rigorous CI/CD tests
Bash Tests:
- Quick smoke tests for local development
- Human-readable colored output
- Direct CLI testing
- Fast feedback (~30 seconds with --quick)
Pytest Suite:
- Comprehensive unit + integration + API tests
- Code coverage metrics
- Runs in Docker build (fails build if tests fail)
- CI/CD integration ready
- 80% minimum coverage requirement
Pros:
- ✅ Best of both worlds
- ✅ Bash tests remain useful for quick local checks
- ✅ Pytest provides rigor for CI/CD
- ✅ Different tools for different purposes
Cons:
⚠️ Two test suites to maintain (acceptable - different purposes)
Cons:
- ❌ Loses convenient colored CLI output for developers
- ❌ Bash tests already written and working
Cons:
- ❌ Not suitable for CI/CD
- ❌ No code coverage metrics
- ❌ Harder to run in Docker
Selected Option 1 because:
- Complementary Tools: Bash for quick dev feedback, pytest for thorough testing
- Developer Experience: Bash tests provide instant visual feedback
- CI/CD Ready: Pytest integrates with Docker, generates coverage reports
- Coverage Enforcement: 80% minimum prevents regressions
- API Testing: pytest + httpx ideal for FastAPI testing
- Existing Work: Bash tests already provide value, no reason to discard
tests/
├── conftest.py (shared fixtures)
├── test_unit/ (90% coverage target)
│ ├── test_field_validators.py
│ ├── test_label_extractor.py
│ ├── test_ocr_backends.py
│ └── test_label_validator.py
├── test_integration/ (70% coverage target)
│ ├── test_cli.py
│ └── test_end_to_end.py
└── test_api/ (95% coverage target)
└── test_fastapi_endpoints.py
field_validators.py: 90% (fuzzy matching, ABV tolerance, edge cases)label_extractor.py: 85% (regex patterns, field extraction)ocr_backends.py: 70% (mock Ollama, test Tesseract)label_validator.py: 90% (Tier 1/2 validation, graceful degradation)verify_label.py: 60% (CLI integration via subprocess)api.py: 95% (all endpoints, error handling)- Overall Target: 80%+
- Use existing 40 golden samples (20 GOOD + 20 BAD)
- 4.9MB dataset included in Docker image
- Tests verify both Tesseract and Ollama backends
- Users can replace samples with their own (document in
docs/GOLDEN_SAMPLES.md)
- Docker Build: Test stage runs pytest automatically
- Build Failures: Build fails if tests fail or coverage <80%
- Development Workflow: Developers run bash tests locally, pytest before push
- CI/CD: GitHub Actions runs Docker build (includes tests)
- Coverage Reports: Generated in Docker, can be exported as artifacts
- Test Fixtures: Shared fixtures in conftest.py (golden sample paths, mock OCR)
- ✅ 80%+ code coverage achieved
- ✅ All tests pass in Docker build
- ✅ Tests complete in <2 minutes (excluding Ollama tests)
- ✅ Bash tests remain functional for quick dev checks
- ✅ API tests cover all endpoints and error conditions
- ✅ Tests catch regressions (e.g., graceful degradation working)
- pytest: https://docs.pytest.org/
- pytest-cov: https://pytest-cov.readthedocs.io/
- FastAPI testing: https://fastapi.tiangolo.com/tutorial/testing/
Date: 2026-02-16
Status: ✅ Decided
Decision: Implement FastAPI with open access (no authentication), CORS allow-all, logging to stdout only. No metrics, tracing, or API versioning at this stage.
- Building prototype to demonstrate label verification capability
- Need REST API with file upload for web UI integration
- May eventually use AWS API Gateway for production (handles auth, rate limiting, etc.)
- Want to keep prototype simple and focused on core functionality
- Containerized deployment captures stdout logs automatically
Endpoints:
POST /verify- Single label verification (synchronous, image + optional ground truth)POST /verify/async- Single label verification (async, queue-based, CloudFront-safe)GET /verify/async/{job_id}- Poll async verify job statusPOST /verify/retry/{job_id}- Re-enqueue a failed async verify jobPOST /verify/batch- Batch verification (ZIP file, async)GET /verify/batch/{job_id}- Poll batch job statusGET /health- Health checkGET /docs- Auto-generated Swagger UI (/redocis disabled)
Request Limits:
- Max file size: 10MB per image
- Max batch size: 50 images
- Allowed formats: .jpg, .jpeg, .tif, .tiff (PNG is not accepted — not a TTB COLA format)
Batch ZIP Format:
batch.zip
├── label_001.jpg
├── label_001.json (ground truth, optional)
├── label_002.tif
├── label_002.json
└── ...
JSON files must match image filenames (e.g., label_001.jpg → label_001.json)
Rationale:
- Session cookie authentication added (4-hour sessions, httponly cookies)
- Credentials stored in AWS Secrets Manager
- Host-header middleware enforces domain-only access (rejects direct ALB/IP access)
- Login endpoint:
POST /ui/login
Note: The original design called for open access with authentication deferred to API Gateway. This was revised — a lightweight built-in auth layer was implemented to prevent unauthorized access to the prototype.
Configuration: CORS_ORIGINS=["*"]
Rationale:
- Prototype needs to work from any dev environment
- Frontend may be served from localhost, different ports, etc.
- Production will restrict to specific domains
Future:
# Production CORS example (commented in code)
CORS_ORIGINS=[
"https://ttb.gov",
"https://cola.ttb.gov",
"https://label-verify.ttb.gov"
]Implementation:
- Structured logs to stdout
- Docker captures and forwards to log aggregator
- Format:
%(asctime)s - %(name)s - %(levelname)s - %(message)s - Log level configurable via env var (default: INFO)
Rationale:
- Docker best practice: log to stdout
- Syslog, CloudWatch, or other aggregators can capture stdout
- No file I/O reduces complexity
- Container-native approach
Decision: No Prometheus, StatsD, or custom metrics
Rationale:
- Prototype focuses on functionality
- AWS API Gateway provides metrics in production
- Can add
/metricsendpoint later if needed (comment in code)
Future Options (Commented):
# Option 1: Prometheus metrics
# from prometheus_fastapi_instrumentator import Instrumentator
# Instrumentator().instrument(app).expose(app)
# Option 2: CloudWatch via boto3
# Option 3: Custom metrics to external serviceDecision: No distributed tracing (OpenTelemetry, Jaeger, etc.)
Rationale:
- Single service (verifier + Ollama)
- No microservices requiring tracing
- Adds complexity without current benefit
Decision: No /v1/ or /v2/ prefix
Rationale:
- First version of API
- Breaking changes unlikely in prototype
- Can add versioning when needed
Future Path:
# When versioning needed:
# app.include_router(v1_router, prefix="/v1")
# app.include_router(v2_router, prefix="/v2")Decision: No rate limiting in app code
Rationale:
- AWS API Gateway handles rate limiting in production
- Prototype deployed in controlled environment
- Can add
slowapiif needed (comment in code)
Future Implementation (Commented):
# from slowapi import Limiter
# limiter = Limiter(key_func=get_remote_address)
# @limiter.limit("100/hour")
# async def verify_label(...):HTTP Status Codes:
200 OK- Successful validation (includes NON_COMPLIANT status in JSON)400 Bad Request- Invalid file format, malformed JSON413 Payload Too Large- File >10MB or batch >50 images422 Unprocessable Entity- Validation error (Pydantic)500 Internal Server Error- OCR failure, unexpected errors
Error Response Format:
{
"error": "File too large",
"detail": "Maximum file size is 10MB",
"timestamp": "2026-02-16T12:00:00Z"
}- Development: Simplified API without auth complexity
- Testing: Easy to test with curl, no tokens needed
- Deployment: Docker logs go to stdout (captured by orchestrator)
- Production Path: Clear migration to API Gateway for enterprise features
- Documentation: Must document prototype nature and production considerations
- Future-Proofing: Comments in code show where to add auth/metrics/versioning
- ✅ API endpoints work without authentication
- ✅ CORS allows requests from any origin
- ✅ Logs appear in Docker stdout
- ✅ File upload handles 10MB images
- ✅ Batch processing handles 50 images
- ✅ Error responses have consistent format
- ✅ Swagger UI auto-generated and functional
- FastAPI CORS: https://fastapi.tiangolo.com/tutorial/cors/
- FastAPI file uploads: https://fastapi.tiangolo.com/tutorial/request-files/
- Docker logging: https://docs.docker.com/config/containers/logging/
- AWS API Gateway: https://aws.amazon.com/api-gateway/
Date: 2026-02-16
Status: ✅ Decided
Decision: Use GitHub Container Registry (GHCR) for container image storage and distribution
- Need to deploy Docker images from CI/CD pipeline to AWS EC2 instances
- Must choose between GitHub Container Registry (GHCR) and AWS Elastic Container Registry (ECR)
- Project is in prototype/POC phase, deploying to AWS infrastructure
- Already using GitHub for source control
- Budget is not a constraint but simplicity is valued
Pros:
- Native GitHub integration - single platform for code and containers
- GitHub Actions can push without credential configuration (uses
GITHUB_TOKEN) - Free for private repositories (500MB storage, unlimited bandwidth)
- Simpler CI/CD setup (5-10 minutes vs 20-30 minutes)
- Unified access control with GitHub repository permissions
- Packages linked directly to repository for easy discovery
- No separate billing or AWS account coordination needed
Cons:
- Not AWS-native - EC2 pulls over internet (slight latency vs VPC)
- Limited vulnerability scanning (basic only, not Inspector-grade)
- Fewer advanced registry features (lifecycle policies, immutable tags)
- Cross-region bandwidth charges (GHCR → AWS)
Pros:
- AWS-native integration with EC2 (same network, VPC endpoints)
- Sub-second image pulls within same AWS region
- Native IAM integration (EC2 instance profiles)
- Advanced vulnerability scanning (Amazon Inspector)
- Image lifecycle policies for cost optimization
- Encryption at rest with AWS KMS
- Better for government compliance (FedRAMP, etc.)
Cons:
- Cost:
$0.10 per GB/month storage + bandwidth charges ($2-5/month) - More complex CI/CD setup (AWS credentials in GitHub Secrets)
- Additional authentication steps (
aws ecr get-login-password) - More moving parts (IAM policies, ECR policies, credentials rotation)
GHCR was chosen for the following reasons:
- Rapid Iteration Priority: Currently in prototype/POC phase where speed of implementation matters more than production optimization
- Simplicity: Unified GitHub workflow reduces complexity and maintenance burden
- Zero Additional Cost: Free for private repositories eliminates budget approval delays
- Sufficient for Current Needs: Basic vulnerability scanning and unlimited bandwidth meet prototype requirements
- Easy Migration Path: Can migrate to ECR later if production deployment requires AWS-native features
- One Less Service: Reduces operational overhead during development phase
Immediate:
- CI/CD pipeline will be simpler and faster to implement
- No AWS credential management needed for container registry
- EC2 instances will pull images over internet (acceptable latency for prototype)
Future Migration Path:
- If production deployment requires ECR (compliance, performance, VPC isolation):
- Migration is straightforward: re-tag and push existing images
- GitHub Actions workflow changes are minimal (swap registry endpoint)
- No code changes required in application
Best Practice:
- Use this prototype phase to validate the full deployment process
- Re-evaluate before production launch if:
- Government compliance mandates AWS-only infrastructure
- Image pull performance becomes critical (multi-region, high frequency)
- Advanced registry features are needed (lifecycle policies, cross-region replication)
- ✅ CI/CD pipeline setup completed in <30 minutes
- ✅ Image build and push time <10 minutes
- ✅ EC2 image pull time <60 seconds (acceptable for prototype)
- ✅ Zero registry-related costs during development
- GHCR Documentation
- AWS ECR Pricing
- docs/DOCKER_DEPLOYMENT.md - Updated with GHCR instructions
Date: YYYY-MM-DD
Status: 🔄 Pending | ✅ Decided | ❌ Rejected | 🔄 Revised
Decision: [Clear statement of the decision]
[Why is this decision needed? What problem does it solve?]
[List alternatives with pros/cons]
[Why was this option chosen?]
[What does this mean for the project?]
[How will we know this was the right choice?]
[Links to relevant documentation or resources]
Document Maintained By: Project Team
Last Updated: 2026-02-16
Recent Additions:
- Decision 012: Docker Strategy with Separate Ollama Service
- Decision 013: Pytest Test Suite with 80% Coverage Target
- Decision 014: FastAPI with Open Access and Minimal Observability
- Decision 015: Container Registry - GitHub Container Registry (GHCR)