Skip to content

Rantoniaina/telegram-fetcher

Repository files navigation

📱 Telegram Message Fetcher

Python Telegram License Code style: black

A powerful Python application for fetching and storing messages from Telegram channels. Built with modern best practices and a clean architecture. 🚀

✨ Features

  • 📥 Fetch messages from Telegram channels
  • 🔍 Filter messages by keywords
  • 🖼️ Download and store media files
  • 💾 Store messages in SQLite database
  • 🖥️ CLI interface with rich formatting
  • ⚡ Async support for better performance
  • 📝 Comprehensive logging
  • 🎯 Type hints and documentation
  • 🛡️ Smart rate limit and ban handling

🛠️ Prerequisites

  • 🐍 Python 3.8 or higher
  • 🔑 Telegram API credentials (API ID and Hash)
  • 🔐 Access to the target Telegram channel

🚀 Installation

  1. Clone the repository:
git clone https://github.com/yourusername/telegram-fetcher.git
cd telegram-fetcher
  1. Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt
  1. Set up environment variables:
cp .env.example .env

Edit .env with your Telegram API credentials and other settings.

🎮 Usage

The application provides a CLI interface with the following commands:

🚀 Initialize Project

To set up the project for first use:

python -m src.cli init

This command will:

  • Create necessary directories
  • Initialize the database
  • Start required Docker services

📥 Fetch Messages

To fetch messages from the configured channel:

python -m src.cli fetch

Options:

  • --limit INTEGER: Limit the number of messages to fetch
  • --no-media: Skip downloading media files
  • --keywords LIST: Filter messages by keywords (comma-separated)
  • --date STRING: Filter messages by date (format: dd-MM-yyyy)

    ⚠️ Note: Currently, the date filter fetches all messages first and then filters them locally. This means the initial fetch may take longer than expected as it doesn't utilize Telegram's API date filtering.

  • --verbose: Enable verbose logging

📋 List Messages

To list stored messages:

python -m src.cli list

Options:

  • --limit INTEGER: Number of messages to display (default: 100)
  • --skip INTEGER: Number of messages to skip (default: 0)

🔄 Normalize Messages

To normalize stored messages:

python -m src.cli normalize

Options:

  • --limit INTEGER: Maximum number of messages to normalize in each batch
  • --skip-empty: Skip messages with empty text content
  • --verbose: Show detailed progress for each message

🔍 Filter Messages

To filter normalized messages based on keywords:

python -m src.cli filter --keywords "keyword1,keyword2"

Options:

  • --keywords: Comma-separated list of keywords to filter messages (required)
  • --batch-size: Number of messages to process in each batch (default: 100)

🧹 Cleanup

To clean up stored messages and media files:

python -m src.cli cleanup

Options:

  • --force, -f: Skip confirmation prompt before cleanup
  • --database-only: Clean up only the database records
    • --message-type: Type of messages to clean ('messages' or 'normalized')
      • 'messages': Clean up raw message records
      • 'normalized': Clean up only normalized message records
  • --media-only: Clean up only the downloaded media files

Examples:

# Clean everything with confirmation
python -m src.cli cleanup

# Clean everything without confirmation
python -m src.cli cleanup --force

# Clean only normalized messages
python -m src.cli cleanup --database-only --message-type normalized

# Clean only media files
python -m src.cli cleanup --media-only

🛑 Stop Services

To stop running Docker services:

python -m src.cli stop

Options:

  • --clear-database: Clear database before stopping
  • --clear-media: Clear media files before stopping

🛡️ Rate Limits and Error Handling

The application implements robust error handling for various Telegram API restrictions:

Rate Limits

  • Automatically handles Telegram's FloodWaitError
  • Smart retry mechanism with exponential backoff
  • Continues operation after waiting the required time

Media Downloads

  • Graceful handling of media download failures
  • Automatic retries for temporary errors
  • Skips problematic media files to continue operation

Best Practices

  • Respects Telegram API's rate limiting
  • Implements safe error recovery
  • Prevents account bans through smart throttling

📁 Project Structure

telegram-fetcher/
├── 📂 src/
│   ├── __init__.py
│   ├── cli.py           # CLI interface
│   ├── config.py        # Configuration management
│   ├── models.py        # Database models
│   ├── service.py       # Business logic
│   └── telegram_client.py # Telegram client wrapper
├── 📂 data/
│   ├── media/          # Downloaded media files
│   └── telegram.db     # SQLite database
├── 📂 tests/           # Test suite
├── 📄 requirements.txt # Dependencies
└── 📄 README.md       # This file

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

About

A tool for fetching and managing Telegram messages with batch processing capabilities

Topics

Resources

License

Stars

Watchers

Forks