A collection of AI agents built on Databricks demonstrating different patterns for intelligent document processing and analysis.
| Agent | Description | Key Capabilities | Databricks Features |
|---|---|---|---|
| Finance Research Agent | Multi-agent system for SEC document analysis using a supervisor-orchestrated architecture. Answers questions about company financials, risks, and strategy from 10-K, 10-Q, 8-K filings and earnings reports. | • LangGraph supervisor pattern with intelligent query routing • Self-querying retrieval with metadata filtering • Parallel async execution for complex queries • Stateful conversation memory • Support for 40+ public companies |
• Vector Search (hybrid semantic + keyword) • Unity Catalog (data governance) • Model Serving (Claude Sonnet 4.5) • MLflow (tracing & deployment) • Lakebase (conversation checkpointing) |
| Loan Automation Agent | Document processing pipeline for mortgage loan automation. Extracts structured data from loan limit PDFs and product documentation using AI-powered parsing. | • Multimodal extraction from PDF images • Structured schema-driven data extraction • Document parsing with element classification • Natural language query interface via Genie |
• AI Functions (ai_query, ai_parse_document) • Vector Search (semantic retrieval) • Unity Catalog (volumes & tables) • Genie Spaces (NL query interface) • Delta Lake (CDC-enabled tables) |
hackathon/
├── databricks.yml # Databricks Asset Bundle configuration
├── requirements.txt # Python dependencies
├── finance_research_agent/
│ ├── 00_document_ingestion/ # SEC document processing
│ ├── 01_research_agent/ # Multi-agent implementation
│ │ └── 00_agent.py # LangGraph agent definitions
│ ├── data/ # Sample data
│ └── setup/ # Setup notebooks
├── loan_automation_agent/
│ └── 00_document_ingestion/ # Loan document processing pipeline
│ ├── 01_ingest_loan_limits.py # Loan limits extraction
│ ├── 02_ingest_product_documentation.py # Product docs parsing
│ ├── 03_create_vector_search_index.py # VS index creation
│ └── config.yaml # Pipeline configuration
└── .venv/ # Local virtual environment
A multi-agent system for SEC document research using LangGraph and LangChain.
┌─────────────────────────────────────────────────────────────┐
│ Supervisor Agent │
│ Routes queries based on complexity and information needs │
└─────────────────────┬───────────────────────────────────────┘
│
┌───────────┴───────────┐
│ │
▼ ▼
┌─────────────────┐ ┌─────────────────────┐
│ Document Agent │ │ Deep Research Agent │
│ │ │ │
│ Simple queries │ │ Complex queries │
│ Single metrics │ │ Parallel execution │
│ Direct retrieval│ │ Multi-angle analysis│
└────────┬────────┘ └──────────┬──────────┘
│ │
└───────────┬─────────────┘
│
▼
┌─────────────────┐
│ Final Answer │
│ Synthesis │
└─────────────────┘
- Supervisor Agent - Routes queries based on complexity, injects temporal context (fiscal year/quarter)
- Document Agent - Self-querying retrieval for straightforward questions
- Deep Research Agent - Parallel execution of 2-5 subqueries for complex analysis
- Final Answer - Synthesizes results into coherent responses
- 10-K (Annual reports)
- 10-Q (Quarterly reports)
- 8-K (Current reports)
- Earnings Reports
A document processing pipeline for mortgage loan automation.
PDF Documents (Loan Limits / Product Docs)
↓
AI-Powered Extraction (ai_query / ai_parse_document)
↓
Structured Delta Tables
↓
Vector Search Index
↓
Query Interface (Genie Spaces)
- Loan Limits Extraction - Converts PDF pages to images, extracts structured loan limit data using multimodal AI
- Product Documentation Parsing - Parses PDFs into elements (text, tables, figures) with page-level grouping
- Vector Search Index - Enables semantic search over product documentation
- Genie Space - Natural language interface for querying loan limits data
- Python 3.11+
- Databricks workspace with:
- Vector Search endpoint configured
- Unity Catalog enabled
- Model Serving endpoint for Claude
# Activate virtual environment
source .venv/bin/activate
# Install dependencies
pip install -r requirements.txt- LangGraph - Multi-agent orchestration
- LangChain - LLM framework and tools
- Databricks Vector Search - Semantic retrieval
- Databricks AI Functions - Document parsing and extraction
- Claude Sonnet 4.5 - LLM via Model Serving
- MLflow - Tracking and deployment
- Unity Catalog - Data governance
- Pydantic - Schema validation