Skip to content

bendoan-db/databricks-agents-hackathon

Repository files navigation

Databricks AI Agents Hackathon

A collection of AI agents built on Databricks demonstrating different patterns for intelligent document processing and analysis.

Agent Summary

Agent Description Key Capabilities Databricks Features
Finance Research Agent Multi-agent system for SEC document analysis using a supervisor-orchestrated architecture. Answers questions about company financials, risks, and strategy from 10-K, 10-Q, 8-K filings and earnings reports. • LangGraph supervisor pattern with intelligent query routing
• Self-querying retrieval with metadata filtering
• Parallel async execution for complex queries
• Stateful conversation memory
• Support for 40+ public companies
• Vector Search (hybrid semantic + keyword)
• Unity Catalog (data governance)
• Model Serving (Claude Sonnet 4.5)
• MLflow (tracing & deployment)
• Lakebase (conversation checkpointing)
Loan Automation Agent Document processing pipeline for mortgage loan automation. Extracts structured data from loan limit PDFs and product documentation using AI-powered parsing. • Multimodal extraction from PDF images
• Structured schema-driven data extraction
• Document parsing with element classification
• Natural language query interface via Genie
• AI Functions (ai_query, ai_parse_document)
• Vector Search (semantic retrieval)
• Unity Catalog (volumes & tables)
• Genie Spaces (NL query interface)
• Delta Lake (CDC-enabled tables)

Project Structure

hackathon/
├── databricks.yml                    # Databricks Asset Bundle configuration
├── requirements.txt                  # Python dependencies
├── finance_research_agent/
│   ├── 00_document_ingestion/        # SEC document processing
│   ├── 01_research_agent/            # Multi-agent implementation
│   │   └── 00_agent.py               # LangGraph agent definitions
│   ├── data/                         # Sample data
│   └── setup/                        # Setup notebooks
├── loan_automation_agent/
│   └── 00_document_ingestion/        # Loan document processing pipeline
│       ├── 01_ingest_loan_limits.py  # Loan limits extraction
│       ├── 02_ingest_product_documentation.py  # Product docs parsing
│       ├── 03_create_vector_search_index.py    # VS index creation
│       └── config.yaml               # Pipeline configuration
└── .venv/                            # Local virtual environment

Finance Research Agent

A multi-agent system for SEC document research using LangGraph and LangChain.

Architecture

┌─────────────────────────────────────────────────────────────┐
│                     Supervisor Agent                         │
│  Routes queries based on complexity and information needs    │
└─────────────────────┬───────────────────────────────────────┘
                      │
          ┌───────────┴───────────┐
          │                       │
          ▼                       ▼
┌─────────────────┐     ┌─────────────────────┐
│  Document Agent │     │ Deep Research Agent │
│                 │     │                     │
│ Simple queries  │     │ Complex queries     │
│ Single metrics  │     │ Parallel execution  │
│ Direct retrieval│     │ Multi-angle analysis│
└────────┬────────┘     └──────────┬──────────┘
         │                         │
         └───────────┬─────────────┘
                     │
                     ▼
          ┌─────────────────┐
          │  Final Answer   │
          │   Synthesis     │
          └─────────────────┘

Agent Components

  • Supervisor Agent - Routes queries based on complexity, injects temporal context (fiscal year/quarter)
  • Document Agent - Self-querying retrieval for straightforward questions
  • Deep Research Agent - Parallel execution of 2-5 subqueries for complex analysis
  • Final Answer - Synthesizes results into coherent responses

Supported Document Types

  • 10-K (Annual reports)
  • 10-Q (Quarterly reports)
  • 8-K (Current reports)
  • Earnings Reports

Loan Automation Agent

A document processing pipeline for mortgage loan automation.

Pipeline Stages

PDF Documents (Loan Limits / Product Docs)
    ↓
AI-Powered Extraction (ai_query / ai_parse_document)
    ↓
Structured Delta Tables
    ↓
Vector Search Index
    ↓
Query Interface (Genie Spaces)

Components

  • Loan Limits Extraction - Converts PDF pages to images, extracts structured loan limit data using multimodal AI
  • Product Documentation Parsing - Parses PDFs into elements (text, tables, figures) with page-level grouping
  • Vector Search Index - Enables semantic search over product documentation
  • Genie Space - Natural language interface for querying loan limits data

Getting Started

Prerequisites

  • Python 3.11+
  • Databricks workspace with:
    • Vector Search endpoint configured
    • Unity Catalog enabled
    • Model Serving endpoint for Claude

Local Development

# Activate virtual environment
source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

Tech Stack

  • LangGraph - Multi-agent orchestration
  • LangChain - LLM framework and tools
  • Databricks Vector Search - Semantic retrieval
  • Databricks AI Functions - Document parsing and extraction
  • Claude Sonnet 4.5 - LLM via Model Serving
  • MLflow - Tracking and deployment
  • Unity Catalog - Data governance
  • Pydantic - Schema validation

About

Collection of artifacts to support Databricks Agents Hackathon, including advanced document parsing techniques, research agent templates, and evaluation harness

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages