A powerful Python application for fetching and storing messages from Telegram channels. Built with modern best practices and a clean architecture. 🚀
- 📥 Fetch messages from Telegram channels
- 🔍 Filter messages by keywords
- 🖼️ Download and store media files
- 💾 Store messages in SQLite database
- 🖥️ CLI interface with rich formatting
- ⚡ Async support for better performance
- 📝 Comprehensive logging
- 🎯 Type hints and documentation
- 🛡️ Smart rate limit and ban handling
- 🐍 Python 3.8 or higher
- 🔑 Telegram API credentials (API ID and Hash)
- 🔐 Access to the target Telegram channel
- Clone the repository:
git clone https://github.com/yourusername/telegram-fetcher.git
cd telegram-fetcher- Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txt- Set up environment variables:
cp .env.example .envEdit .env with your Telegram API credentials and other settings.
The application provides a CLI interface with the following commands:
To set up the project for first use:
python -m src.cli initThis command will:
- Create necessary directories
- Initialize the database
- Start required Docker services
To fetch messages from the configured channel:
python -m src.cli fetchOptions:
--limit INTEGER: Limit the number of messages to fetch--no-media: Skip downloading media files--keywords LIST: Filter messages by keywords (comma-separated)--date STRING: Filter messages by date (format: dd-MM-yyyy)⚠️ Note: Currently, the date filter fetches all messages first and then filters them locally. This means the initial fetch may take longer than expected as it doesn't utilize Telegram's API date filtering.--verbose: Enable verbose logging
To list stored messages:
python -m src.cli listOptions:
--limit INTEGER: Number of messages to display (default: 100)--skip INTEGER: Number of messages to skip (default: 0)
To normalize stored messages:
python -m src.cli normalizeOptions:
--limit INTEGER: Maximum number of messages to normalize in each batch--skip-empty: Skip messages with empty text content--verbose: Show detailed progress for each message
To filter normalized messages based on keywords:
python -m src.cli filter --keywords "keyword1,keyword2"Options:
--keywords: Comma-separated list of keywords to filter messages (required)--batch-size: Number of messages to process in each batch (default: 100)
To clean up stored messages and media files:
python -m src.cli cleanupOptions:
--force,-f: Skip confirmation prompt before cleanup--database-only: Clean up only the database records--message-type: Type of messages to clean ('messages' or 'normalized')- 'messages': Clean up raw message records
- 'normalized': Clean up only normalized message records
--media-only: Clean up only the downloaded media files
Examples:
# Clean everything with confirmation
python -m src.cli cleanup
# Clean everything without confirmation
python -m src.cli cleanup --force
# Clean only normalized messages
python -m src.cli cleanup --database-only --message-type normalized
# Clean only media files
python -m src.cli cleanup --media-onlyTo stop running Docker services:
python -m src.cli stopOptions:
--clear-database: Clear database before stopping--clear-media: Clear media files before stopping
The application implements robust error handling for various Telegram API restrictions:
- Automatically handles Telegram's FloodWaitError
- Smart retry mechanism with exponential backoff
- Continues operation after waiting the required time
- Graceful handling of media download failures
- Automatic retries for temporary errors
- Skips problematic media files to continue operation
- Respects Telegram API's rate limiting
- Implements safe error recovery
- Prevents account bans through smart throttling
telegram-fetcher/
├── 📂 src/
│ ├── __init__.py
│ ├── cli.py # CLI interface
│ ├── config.py # Configuration management
│ ├── models.py # Database models
│ ├── service.py # Business logic
│ └── telegram_client.py # Telegram client wrapper
├── 📂 data/
│ ├── media/ # Downloaded media files
│ └── telegram.db # SQLite database
├── 📂 tests/ # Test suite
├── 📄 requirements.txt # Dependencies
└── 📄 README.md # This file
This project is licensed under the MIT License - see the LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request.