A comprehensive data pipeline designed to automate the extraction, transformation, and analysis of SEC financial reports. This tool leverages modern Python design patterns to process financial data and generate insightful investment summaries directly into Google Sheets.
- Automated Data Pipeline: Seamlessly scrapes and processes SEC financial reports.
- Company Screener:
- Ingests financial report files (Income Statement, Balance Sheet, Cash Flow) from local storage.
- Generates custom analysis reports on Google Sheets.
- Advanced Financial Analysis:
- Calculates key valuation metrics including DCF, Graham Number, and DDM.
- Provides Buy/Sell decision support based on configurable strategies.
- Google Sheets Integration:
- Automatically updates a central "Stock" spreadsheet with new analysis data.
- Fetches existing portfolio data for context-aware recommendations.
This project implements a robust ETL (Extract, Transform, Load) architecture, utilizing advanced design patterns to ensure modularity and scalability.
-
Extract (Data Ingestion)
- Responsibility: Ingests raw financial data from hybrid sources (local Excel files & external APIs).
- Design Patterns:
- Template Method: Defines the standard data loading skeleton in
InputTemplate. - Mediator:
APIMediatorcoordinates complex interactions between different API services. - Command: Encapsulates API requests into objects, decoupling execution logic.
- Template Method: Defines the standard data loading skeleton in
-
Transform (Data Processing & Analysis)
- Responsibility: Cleans raw data, calculates financial indicators, and executes valuation models.
- Design Patterns:
- Chain of Responsibility:
PipelineHandlercreates a processing pipeline for data cleaning (e.g., stripping whitespace -> type conversion). - Abstract Factory:
TableAbstractFactoryprovides an interface to create families of related financial tables (Price, Score, Decision). - Builder: Constructs complex analysis objects step-by-step (e.g.,
ParsTableBuilder). - Strategy: Encapsulates interchangeable algorithms for scoring and buy/sell decisions (
ScoreTableStrategy,BuyDecisionTableStrategy).
- Chain of Responsibility:
-
Load (Data Output)
- Responsibility: Dispatches analysis results to destination systems.
- Design Patterns:
- Observer:
OutputSubjectnotifies subscribed observers (Google Sheets, Databases) when new analysis data is ready, decoupling the analysis engine from the reporting layer.
- Observer:
- Robust Data Handling: Heavy use of
PandasandNumPyfor efficient vectorization and manipulation of large financial datasets. - Environment Management: Fully integrated with
python-dotenvfor secure API key management andvenvfor isolated development environments. - Extensible Architecture: The use of Abstract Base Classes (ABCs) ensures that new indicators or data sources can be added by simply implementing a predefined interface.
- Python 3.8 or higher
- Google Cloud Platform Service Account (
creds.json) - API Keys for data sources (Quandl, AlphaVantage)
-
Google Sheets Credentials: Place your Google Service Account JSON key file in the project root and rename it to
creds.json.- Ensure the service account has access to a Google Sheet named "Stock".
-
Environment Variables: Create a
.envfile in the project root with your API keys:QUANDL_API_KEY=your_quandl_key ALPHA_API_KEY=your_alphavantage_key
-
Create and Activate Virtual Environment (Recommended)
# Create virtual environment python3 -m venv venv # Activate virtual environment (macOS/Linux) source venv/bin/activate # Activate virtual environment (Windows) # venv\Scripts\activate
-
Install Project
# Install project and dependencies in editable mode pip install -e .
Once installed, you can run the main analysis pipeline using the command line interface:
financial-reportThe system will:
- Load local financial reports from
~/FinancialData/{ticker}. - Fetch supplementary market data via APIs.
- Execute the analysis pipeline.
- Upload the results to the configured Google Sheet.
.
├── src/
│ └── company_screener/ # Main package
│ ├── API/ # Data fetching layer (Command Pattern)
│ ├── Config/ # Configuration management
│ ├── CreateTables/ # Table generation logic (Factory Pattern)
│ ├── Input/ # Data ingestion strategies
│ ├── Output/ # Data export handlers (Observer Pattern)
│ ├── Worker/ # Utility workers and loggers
│ ├── main.py # Application entry point
│ └── mainFactory.py # Dependency Injection root
├── pyproject.toml # Project configuration & dependencies
├── requirements.txt # Legacy dependency file
└── README.md # Documentation
This project is licensed under the MIT License - see the pyproject.toml file for details.

