A comprehensive algorithmic trading framework implementing machine learning and technical analysis strategies for quantitative trading.
- Data Collection: Automated data collection from APIs with checkpoint/resume capability
- Data Processing: Cleaning, normalization, and feature engineering
- Trading Strategies: Technical indicators and ML-based strategies (XGBoost)
- Backtesting: Comprehensive backtesting framework with performance metrics
- Performance Analytics: Sharpe ratio, Sortino ratio, drawdown analysis, and more
The system follows a modular architecture with the Signal → Order → Execute flow:
Data Layer → Signal Engine → Order Management → Execution → Analytics
- Data Layer (
src/data_loader.py): Data loading, cleaning, and preprocessing - Signal Engine (
src/strategies/): Strategy implementations (technical indicators, ML models) - Mock Environment (
src/simulator.py): Portfolio and trade execution simulation - Analytics (
src/evaluator.py): Performance metrics and evaluation
- Python 3.8 or higher
- pip package manager
- Clone the repository:
git clone <repository-url>
cd LAB4-Algorithmic-Trading- Create and activate virtual environment:
Windows:
python -m venv venv
.\venv\Scripts\Activate.ps1Linux/Mac:
python3 -m venv venv
source venv/bin/activate- Install dependencies:
pip install --upgrade pip
pip install -r requirements.txtSelect stocks from S&P 500 with equal sector representation:
python src/stock_selector.py --n-stocks 20 --output-dir data/selectionsCollect raw market data (requires API credentials):
# Set API credentials
export API_BASE=https://api.polygon.io
export API_KEY=YOUR_API_KEY
# Collect raw data
python run_data_collection.py --collect-raw \
--selection-file data/selections/selected_stocks_20_*.json \
--n-weekdays 60 --bar-minutes 1Process raw data to add technical indicators:
python run_data_collection.py --processTrain XGBoost models on processed data:
python run_ml_training.py --processed-file "data/processed/processed_*.csv" --mode train-allRun backtests using trained models:
python run_backtest.py --processed-file "data/processed/processed_*.csv" \
--ticker AVGO --initial-cash 1000000 --plotSelect stocks with equal sector representation from S&P 500:
python src/stock_selector.py --n-stocks 20 --output-dir data/selectionsThe selection is saved as a JSON file in the specified directory.
The data collection system supports checkpoint/resume functionality:
# Collect raw data with checkpoint support
python run_data_collection.py --collect-raw \
--selection-file data/selections/selected_stocks_20_*.json \
--n-weekdays 60 --bar-minutes 1 \
--api-base https://api.polygon.io --api-key YOUR_API_KEY
# Process raw data
python run_data_collection.py --process
# Check status
python run_data_collection.py --status
# Clear checkpoints
python run_data_collection.py --clear-checkpointsAPI Credentials: Set via environment variables or command-line arguments:
export API_BASE=https://api.polygon.io
export API_KEY=YOUR_API_KEYTrain XGBoost models on processed data:
# Train models for all tickers
python run_ml_training.py --processed-file "data/processed/processed_*.csv" --mode train-all
# Train model for specific ticker
python run_ml_training.py --processed-file "data/processed/processed_*.csv" \
--mode train --ticker AVGO
# Generate predictions
python run_ml_training.py --processed-file "data/processed/processed_*.csv" \
--mode predict --ticker AVGORun backtests with various configurations:
# Single ticker backtest
python run_backtest.py --processed-file "data/processed/processed_*.csv" \
--ticker AVGO --initial-cash 1000000
# Multiple tickers
python run_backtest.py --processed-file "data/processed/processed_*.csv" \
--tickers AVGO HRL GL --initial-cash 1000000
# With date range and custom parameters
python run_backtest.py --processed-file "data/processed/processed_*.csv" \
--ticker AVGO --start-date 2024-10-01 --end-date 2024-12-31 \
--min-confidence 0.7 --lookback-window 20 --plotResults are saved to backtest_results/ directory in JSON format.
from run_data_collection import DataCollectionRunner
runner = DataCollectionRunner(
api_base="https://api.polygon.io",
api_key="your_api_key"
)
# Collect raw data
raw_data = runner.collect_raw_data(
tickers=['AAPL', 'MSFT', 'GOOGL'],
n_weekdays=60,
bar_minutes=1,
resume=True
)
# Process data
processed_data = runner.process_data(resume=True)from src.strategies.ml_models import train_model_on_processed_data
model, metrics = train_model_on_processed_data(
processed_file='data/processed/processed_*.csv',
ticker='AVGO',
lookback_window=10,
save_model_path='models/model_AVGO.pkl'
)from run_backtest import run_backtest_ml
result = run_backtest_ml(
processed_file='data/processed/processed_*.csv',
ticker='AVGO',
initial_cash=1000000.0,
model_dir='models',
min_confidence=0.6
)
report = result['report']
print(f"Total Return: {report['total_return']:.2f}%")
print(f"Sharpe Ratio: {report['sharpe_ratio']:.4f}")LAB4-Algorithmic-Trading/
├── data/
│ ├── raw/ # Raw market data
│ ├── processed/ # Processed data with features
│ └── selections/ # Stock selection files
├── checkpoints/ # Checkpoint files for resume
├── models/ # Trained ML models
├── backtest_results/ # Backtest results and plots
├── src/
│ ├── data_loader.py # Data loading and preprocessing
│ ├── simulator.py # Mock trading simulator
│ ├── evaluator.py # Performance metrics
│ ├── stock_selector.py # Stock selection from S&P 500
│ └── strategies/
│ ├── base_strategy.py # Base strategy class
│ ├── technical_indicators.py # Technical analysis indicators
│ └── ml_models.py # XGBoost ML models
├── run_data_collection.py # Data collection runner
├── run_ml_training.py # ML model training script
├── run_backtest.py # Backtesting script
├── main.py # Main entry point
├── requirements.txt # Python dependencies
└── README.md # This file
The system calculates comprehensive performance metrics:
- Total Return: Overall portfolio return percentage
- Annualized Return: Annualized return rate
- Sharpe Ratio: Risk-adjusted return metric
- Sortino Ratio: Downside risk-adjusted return
- Maximum Drawdown: Largest peak-to-trough decline
- Volatility: Annualized standard deviation of returns
- Win Rate: Percentage of profitable trades
- Profit Factor: Ratio of gross profit to gross loss