Cryptocurrency trading research system using PyTorch Double DQN with LSTM, multi-timeframe analysis, and a real-time dashboard. Research project, not trading advice โ see the warnings at the bottom before doing anything with real money.
QTrader is a deep reinforcement learning system that learns to trade cryptocurrencies by:
- Analyzing multiple timeframes simultaneously (1m to 4h)
- Learning from historical price data and technical indicators
- Optimizing for portfolio growth with intelligent risk management
- Adapting strategy based on market conditions
Unlike traditional RL trading agents that try to time every move, QTrader implements a HODL-based approach where it:
- Learns when to enter positions (buying the dip)
- Learns position sizing based on market volatility (10%, 25%, 50%, 100%)
- Learns when to exit (taking profits or cutting losses)
- Uses trailing stops and take-profit levels automatically
This matches real-world crypto trading better than high-frequency approaches.
โโโโโโโโโโโโโโโโโโโ HTTP/WS โโโโโโโโโโโโโโโโโโโ
โ โ โโโโโโโโโโโโโโโโโโโโโโโโโโค โ
โ Training Agent โ Real-time Events โ Dashboard โ
โ (PyTorch DQN) โ โ (Web UI) โ
โ โ โโโโโโโโโโโโโโโโโโโโโโโ โ โ
โโโโโโโโโโฌโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ
โโ Double DQN + LSTM
โโ Prioritized Experience Replay
โโ Multi-Timeframe State (5m, 15m, 30m, 1h, 4h)
โโ Advanced Technical Indicators (RSI, MACD, BB, etc.)
โโ Risk Management (Position Sizing, Stop Loss, Take Profit)
Fully Decoupled: Agent and Dashboard can run independently on different machines!
- ๐ง Double DQN with Dueling Architecture - Stable, efficient learning
- ๐ LSTM Networks - Captures temporal patterns in price movements
- ๐ฏ Prioritized Experience Replay - Learns from important transitions
- ๐ Multi-Timeframe Analysis - Sees market from multiple perspectives
- ๐ก๏ธ Advanced Risk Management - Dynamic position sizing, stop-loss, take-profit
- โก GPU Accelerated - CUDA support with mixed precision training
- ๐พ Automatic Checkpointing - Saves best models during training
- ๐ Config-Driven - No code changes for different strategies
- ๐ Live Portfolio Tracking - Balance, P&L, ROI updated every second
- ๐ Performance Metrics - Win rate, Sharpe ratio, max drawdown
- ๐จ Trading Exchange UI - Professional Bootstrap 5 interface
- ๐ Real-Time Charts - Price and reward visualization
- ๐ธ Trade History Feed - Live buy/sell activity
- ๐ WebSocket Streaming - Sub-second latency
- ๐ Remote Monitoring - Deploy dashboard on separate machine/cloud
git clone https://github.com/boxsie/qtrader.git
cd qtrader
# GPU build โ installs torch 2.5.1+cu121 from the PyTorch index.
pip install torch==2.5.1+cu121 torchvision==0.20.1+cu121 torchaudio==2.5.1+cu121 \
--index-url https://download.pytorch.org/whl/cu121
pip install -r agent/requirements.txt
# CPU-only (works for tests + smoke runs):
# pip install torch --index-url https://download.pytorch.org/whl/cpu
# pip install -r agent/requirements.txtThe BTC/USD 1-minute CSV is not committed. Download it from Binance Vision
(free, no API key) into data/:
python scripts/fetch_btc_data.py --start 2017-08 --end 2025-12See docs/DATA.md for CLI options and the output schema.
cd agent
# Basic training (50 episodes, fast config)
python train.py --config fast --profile day_trader
# Full training with dashboard monitoring
python train.py --config full --profile day_trader --dashboard http://localhost:8765
# Custom episode count
python train.py --config fast --profile scalper --episodes 200# In a new terminal
cd dashboard
pip install -r requirements.txt
python server.pyOpen browser to http://localhost:8765 and watch your agent train in real-time!
qtrader/
โโโ agent/ # Training Agent (Self-Contained)
โ โโโ config/ # Training configurations
โ โ โโโ fast.yaml # Quick training (50 episodes)
โ โ โโโ full.yaml # Full training (1000 episodes)
โ โโโ profiles/ # Trading strategies
โ โ โโโ scalper.yaml # 1m-15m timeframes
โ โ โโโ day_trader.yaml # 15m-4h timeframes
โ โ โโโ swing_trader.yaml # 1h-1d timeframes
โ โโโ qlearn/ # Deep RL models
โ โโโ train.py # Main training script
โ โโโ requirements.txt # Agent dependencies
โ
โโโ dashboard/ # Dashboard Server (Standalone)
โ โโโ server.py # FastAPI WebSocket server
โ โโโ index.html # Web UI with playback controls
โ โโโ history/ # Saved run recordings (auto-generated)
โ โโโ requirements.txt # Dashboard dependencies
โ
โโโ dataservice/ # Experimental: HTTP API for OHLCV + sentiment
โ โโโ README.md # (not required for training)
โ
โโโ shared/ # Shared event contracts
โ โโโ events.py # Event models
โ
โโโ scripts/
โ โโโ fetch_btc_data.py # Download BTCUSDT 1-min data from Binance Vision
โ
โโโ tests/ # pytest suite
โโโ docs/ # DATA.md, REWARDS.md, DEPLOYMENT.md
โโโ examples/ # Reward configuration examples
โ
โโโ data/ # Market data (gitignored, populated by fetch script)
โ โโโ btcusd_1-min_data.csv
โ
โโโ runs/ # Training outputs (gitignored, auto-generated)
โ โโโ [config]/[hash]/[profile]/[hash]/[timestamp]/
โ โโโ best_model.pt
โ โโโ run_log.jsonl
โ โโโ run_summary.txt
โ
โโโ README.md # This file
fast.yaml - Quick experimentation (50 episodes)
training:
num_episodes: 50
batch_size: 256
memory:
capacity: 10000
model:
hidden_size: 128
lstm_layers: 2full.yaml - Production training (1000 episodes)
training:
num_episodes: 1000
batch_size: 512
memory:
capacity: 100000
model:
hidden_size: 256
lstm_layers: 3day_trader.yaml - Intraday swings (recommended)
timeframe_weights:
5m: 0.15
15m: 0.45 # Primary
30m: 0.15
1h: 0.20
4h: 0.05scalper.yaml - Short-term trades
timeframe_weights:
1m: 0.30
5m: 0.40 # Primary
15m: 0.20
30m: 0.10swing_trader.yaml - Multi-day holds
timeframe_weights:
1h: 0.20
4h: 0.45 # Primary
1d: 0.35The agent can take 9 discrete actions per step:
| Action | Description |
|---|---|
| 0 | HOLD - Do nothing |
| 1-4 | BUY 10% / 25% / 50% / 100% of cash |
| 5-8 | SELL 10% / 25% / 50% / 100% of position |
Position sizing is dynamic based on volatility (higher vol = smaller positions).
QTrader uses a sophisticated multi-component reward system:
-
Immediate Rewards (30%)
- Trading fees (small penalty)
- Realized P&L on position closes
-
Timing Rewards (20%)
- Retrospective evaluation of past decisions
- "Did buying 15 steps ago turn out good?"
-
Portfolio Growth Rewards (50%)
- New portfolio highs = BIG rewards
- Encourages long-term growth over quick wins
This prevents overfitting to individual trades and focuses on total portfolio performance.
Built-in risk controls:
- Position Sizing: Volatility-adjusted (max 15% of portfolio)
- Stop Loss: Dynamic 2% trailing stop
- Take Profit: Automatic 4% profit taking
- Max Drawdown: Training stops at 20% drawdown
- Fee Modeling: 0.15% per trade (Coinbase standard)
The web dashboard provides real-time monitoring with full playback capabilities:
Portfolio Panel:
- Portfolio value with color-coded P&L
- Cash vs crypto allocation
- ROI percentage
- Unrealized P&L on open positions
Performance Metrics:
- Total trades (buys/sells breakdown)
- Win rate (decision quality, not just profitable trades)
- Average profit per trade
- Total fees paid
- Current market price
Training Metrics:
- Episode progress bar
- Epsilon decay (exploration โ exploitation)
- Neural network loss
- Average reward per step
Live Charts:
- Price chart with real-time updates
- Reward chart showing agent performance
- Smooth animations, auto-scaling
Trade Feed:
- Live trade history (last 50 trades)
- Color-coded BUY (green) / SELL (red)
- Price, quantity, and total value
Record, replay, and analyze training runs:
Auto-Recording:
- Every training run automatically recorded with unique ID
- All events saved to
dashboard/history/as JSON - Auto-saves when new run starts (run_id change detected)
- User notified with toast message on save
History Browser:
- Click History button to browse all saved runs
- See metadata: timestamp, duration, episodes, ROI, event count
- Load any run for instant replay
Playback Controls:
โถ๏ธ Play/Pause - Control playback speed- ๐ Stop - Exit playback, return to live mode
- ๐ Timeline Slider - Seek to any event (rebuilds state)
- โก Speed Control - 0.5x to 10x playback speed
- ๐ Position Display - Current event / total events
Mode Indicators:
- LIVE mode - Green dot, WebSocket connected
- PLAYBACK mode - Yellow indicator, disconnected from live
- Clear visual distinction between modes
Smart Features:
- Non-blocking recording (zero performance impact)
- State reconstruction on timeline seek (accurate historical replay)
- Auto-save on run change (no manual saves needed)
- Handles corrupted files and edge cases gracefully
Example Workflow:
# 1. Start training with dashboard
python train.py --dashboard http://localhost:8765
# 2. Training auto-records all events
# 3. Start new run โ previous run auto-saves
python train.py --dashboard http://localhost:8765
# 4. Click History โ Browse โ Load โ Play!
# Replay entire run at any speed, seek anywhereStorage Format:
{
"run_id": "20251005_143022",
"metadata": {
"start_time": "2025-10-05T14:30:22",
"duration_seconds": 2725,
"total_episodes": 100,
"final_stats": {"roi": 15.5, ...}
},
"events": [...], // All events with timestamps
"saved_at": "2025-10-05T15:15:47"
}Files saved to: dashboard/history/run_{run_id}_{timestamp}.json
python train.py --resume runs/fast/.../20251005_123456/best_model.ptpython train.py --mode eval --model runs/.../best_model.pt --episodes 10# On the monitoring machine
cd dashboard
python server.py --host 0.0.0.0 --port 8765
# On the GPU training machine
cd agent
python train.py --dashboard http://<dashboard-host>:8765Deploy dashboard to Heroku/Railway, then:
python train.py --dashboard https://your-dashboard.railway.app- docs/DATA.md โ fetching market data and the CSV schema
- docs/REWARDS.md โ modular reward system and trader profiles
- docs/DEPLOYMENT.md โ Docker Compose / Kubernetes / bare-metal
- dataservice/README.md โ experimental HTTP data API
pytest tests/ -v # reward + profile sanity checks
python agent/test_data.py # validate your CSV's columnsLatest Training Run (Day Trader, 41 episodes):
- ROI: 11.45%
- Win Rate: 65%
- Trades: 19 total
- Sharpe Ratio: 1.82
- Max Drawdown: -4.2%
See runs/ directory for detailed logs and training metrics. Monitor training in real-time via the web dashboard at http://localhost:8765.
๐จ DO NOT TRADE LIVE WITHOUT:
- โ Extended Training - Run 200+ episodes minimum
- โ Out-of-Sample Testing - Test on 2023, 2024 data (different market regimes)
- โ Paper Trading - 30 days on real-time data
- โ Risk Metrics - Verify Sharpe ratio > 1.5, max drawdown < 10%
- โ Baseline Comparison - Must beat buy-and-hold
- โ Stress Testing - Flash crashes, high volatility periods
- โ Fee Reconciliation - Verify exchange fees match model (0.15%)
- โ Position Limits - Set max exposure per trade
- โ Circuit Breakers - Auto-stop on excessive drawdown
- Sample Size: Most experiments to date are small (need 200+ episodes for stability).
- Slippage Modeling: Configurable but conservative defaults; doesn't capture book impact at scale.
- No Liquidity Constraints: Assumes orders always fill at the modelled price.
- Single Asset: BTC/USD only.
- Training Data: As far back as the fetch script can pull from Binance Vision (BTCUSDT listed 2017-08).
Open an issue if you'd like to help with any of these.
Contributions welcome! Areas needing work:
- Multi-asset portfolio support
- Additional technical indicators
- Alternative reward functions
- Hyperparameter optimization
- Live trading connectors (Coinbase, Binance)
- Ensemble methods
MIT โ see LICENSE.
Built with:
- PyTorch - Deep learning framework
- FastAPI - Dashboard backend
- Bootstrap 5 - UI framework
- Chart.js - Charting library
- TA-Lib - Technical indicators
โก Start training: cd agent && python train.py --config fast --profile day_trader
๐ Monitor progress: Open http://localhost:8765 in your browser
๐ Happy trading!