diff --git a/README.md b/README.md index 3d0885e..d4833e6 100644 --- a/README.md +++ b/README.md @@ -2,279 +2,171 @@ # hype -This Mastodon bot pulls trending posts from chosen instances, ranks them, and boosts the top results to your timeline. You decide which instances it fetches and how much you want to see per instance. +A Mastodon bot that boosts trending posts from other instances into your timeline, helping you discover content across the federated social web. ## Why -For smaller instances the local timeline is rather empty. This is why trends simply do not work on those instances: There is just not enough activity. Instead of manually checking out other instances this bot allows to subscribe to a multitude of different mastodon compatible servers to fetch trending posts and repost them to your current server helping discoverability of accounts, people and topics within the federated social web. +For smaller Mastodon instances, the local timeline can be quite empty and trends often don't work due to limited activity. Rather than manually checking other instances, this bot lets you subscribe to multiple Mastodon-compatible servers to fetch trending posts and boost them to your timeline—enhancing discoverability of accounts, people, and topics across the Fediverse. ## Installation -Deploy with docker-compose +### Docker Compose ```yaml version: "3" services: hype: - image: ghcr.io/goingdark-social/hypebot:v0.1.0 + image: ghcr.io/goingdark-social/hypebot:latest volumes: - ./config:/app/config ``` -Replace `v0.1.0` with the release you want to run. -Pull requests publish images tagged with the PR number and commit. For testing you can pull them from the registry, for example: +Replace `latest` with a specific version (e.g., `v0.4.0`). -``` +Pull requests publish images tagged with the PR number and commit SHA: + +```bash docker pull ghcr.io/goingdark-social/hypebot:pr-123 docker pull ghcr.io/goingdark-social/hypebot:sha-abcdef1 ``` -### Custom User ID/Group ID +### Custom UID/GID -The Docker image supports customizable UID/GID for security and compatibility with different deployment environments: +The Docker image supports customizable UID/GID for security and compatibility: ```bash -# Build with custom UID/GID (useful for matching host user permissions) docker build --build-arg USER_UID=2000 --build-arg USER_GID=3000 -t hypebot-custom . - -# Build with custom user name (default is 'hype') docker build --build-arg USER_NAME=mybot --build-arg USER_UID=1500 -t hypebot-named . ``` -**Build Arguments:** +**Arguments:** - `USER_UID` - User ID (default: 1000) -- `USER_GID` - Group ID (default: 1000) +- `USER_GID` - Group ID (default: 1000) - `USER_NAME` - Username (default: hype) -### Kubernetes Deployment - -For Kubernetes deployments with Pod Security Standards, use the provided `deploy.yaml`: +### Kubernetes ```bash kubectl apply -f deploy.yaml ``` -The deployment includes: -- `runAsNonRoot: true` for security -- Numeric UID/GID specification for compatibility -- Security context with dropped capabilities -- Resource limits and requests -- Proper volume mounts for config, secrets, and logs - -**Important for Kubernetes:** The image uses numeric UID:GID format (`USER 1000:1000`) instead of named users to ensure compatibility with `runAsNonRoot: true` security policies. +Includes `runAsNonRoot: true`, security context with dropped capabilities, resource limits, and proper volume mounts. ## Configuration -Create a `config.yaml` and a `auth.yaml` file in `./config/`. Enter the credentials of your bot-account into `auth.yaml`. You can define which servers to follow and how often to fetch new posts as well as how to automatically change your profile in config.yaml. See the examples below: +Create `config.yaml` and `auth.yaml` in `./config/`: `auth.yaml`: - ```yaml -# Credentials for your bot account bot_account: server: "mastodon.example.com" access_token: "Create a new application in your bot account at Preferences -> Development" ``` -`config.yaml` - +`config.yaml`: ```yaml -# Refresh interval in minutes (default: 15) -interval: 15 +interval: 30 -# Text to add to the bot profile befor the list of subscribed servers -profile_prefix: "I am boosting trending posts from:" +profile_prefix: "Boosting trending posts from:" -# profile fields to fill in fields: - code: https://github.com/goingdark-social/hypebot - operator: "YOUR HANDLE HERE" + instance: https://mastodon.example.com + code: "https://github.com/goingdark-social/hypebot" + automation: "Runs every 30 minutes" + about: "Boosts trending posts from curated instances" -# Define subscribed instances with fetch and boost limits -# fetch_limit: how many trending posts to fetch from the instance (max 20) -# boost_limit: how many of those to actually boost per run -# Legacy format (single limit) still supported for backward compatibility subscribed_instances: chaos.social: - fetch_limit: 20 # Fetch top 20 trending posts - boost_limit: 4 # But only boost up to 4 best posts + fetch_limit: 20 + boost_limit: 4 mastodon.social: - fetch_limit: 15 # Fetch top 15 trending posts - boost_limit: 3 # But only boost up to 3 best posts - # Legacy format still works: + fetch_limit: 15 + boost_limit: 3 fosstodon.org: - limit: 5 # Fetch and boost up to 5 posts - -# Local Timeline Configuration -# Enable boosting from your own instance's local timeline -local_timeline_enabled: true # Enabled by default (set to false to disable) -local_timeline_fetch_limit: 20 # How many posts to fetch from local timeline -local_timeline_boost_limit: 2 # Max boosts from local timeline per run -local_timeline_min_engagement: 1 # Minimum total engagement (boosts + stars + comments) required + limit: 5 -# Filter posts from specific instances filtered_instances: - example.com -daily_public_cap: 48 +daily_public_cap: 96 per_hour_public_cap: 6 max_boosts_per_run: 8 max_boosts_per_author_per_day: 1 author_diversity_enforced: true + prefer_media: 1 require_media: true -skip_sensitive_without_cw: true -min_reblogs: 0 -min_favourites: 0 +min_reblogs: 10 +min_favourites: 10 + languages_allowlist: - en -state_path: "/app/secrets/state.json" -seen_cache_size: 6000 + hashtag_scores: python: 10 rust: 5 -# Spam Detection Options -spam_emoji_penalty: 1.0 # Points to reduce per emoji over the threshold -spam_emoji_threshold: 2 # Number of emojis before penalty applies -spam_link_penalty: 0.5 # Points to reduce when links are present - -# Debug and Logging Options -log_level: "INFO" # Set to "DEBUG" for detailed logging -debug_decisions: false # Enable detailed decision tracing and reasoning -logfile_path: "" # Path to log file for persistent logging (e.g., "/app/logs/hypebot.log") -``` - -`min_reblogs` and `min_favourites` let you ignore posts that haven't gained enough traction yet. -`seen_cache_size` sets how many posts the bot keeps in memory to avoid boosting the same thing twice. A bigger cache catches more duplicates but uses more RAM and takes longer to search. -`hashtag_scores` lets you push posts with certain hashtags to the front by assigning weights. -`prefer_media` adds the given bonus to posts with attachments; set to `true` for a default of `1`. -`author_diversity_enforced` respects `max_boosts_per_author_per_day` when enabled. When enabled, the bot enforces a 24-hour rolling window where the same author cannot be boosted more than once within any 24-hour period, preventing the same authors from dominating the feed. -`max_boosts_per_run` limits how many posts get boosted in each run. -`max_boosts_per_author_per_day` prevents the same author from being boosted multiple times within a 24-hour rolling window. This ensures diverse content and prevents any single author from dominating your timeline. - -### Language Filtering +local_timeline_enabled: true +local_timeline_fetch_limit: 20 +local_timeline_boost_limit: 4 +local_timeline_min_engagement: 1 -The bot can filter posts based on language to ensure you only see content in languages you understand: +spam_emoji_penalty: 0.5 +spam_emoji_threshold: 2 +spam_link_penalty: 0.3 -```yaml -languages_allowlist: - - en # English only - -# Optional: Choose language detection method -use_mastodon_language_detection: false # Default: use langdetect for content-based detection -# Set to true to trust Mastodon's language field (faster but less accurate) +debug_decisions: true +log_level: "DEBUG" ``` -**How it works:** -- When `languages_allowlist` is configured, only posts in the specified languages will be boosted -- **By default (`use_mastodon_language_detection: false`)**, the bot detects language from post content using the `langdetect` library to verify the actual language, ignoring what Mastodon reports - - Mastodon's language detection can be incorrect, so analyzing actual content is more reliable - - Content is analyzed after removing HTML tags, URLs, mentions, and hashtags - - Very short posts (less than 10 characters) that can't be reliably detected are skipped -- **Alternatively (`use_mastodon_language_detection: true`)**, the bot trusts Mastodon's `language` field - - Faster but less accurate, as Mastodon can misidentify languages - - Recommended only if you trust your instance's language detection or need better performance -- Posts with undetectable or non-allowed languages are skipped -- Leave the list empty (`languages_allowlist: []`) to disable language filtering - -**Language Detection:** -- Detection results are logged when `debug_decisions: true` is enabled - - With `use_mastodon_language_detection: false`: Shows both detected and Mastodon-reported languages for comparison - - With `use_mastodon_language_detection: true`: Shows Mastodon's reported language - - -### Spam Detection - -The bot includes configurable spam detection to reduce scores for potentially promotional content: - -- `spam_emoji_penalty` - Points to reduce per emoji over the threshold (default: 0, disabled) -- `spam_emoji_threshold` - Number of emojis before penalty applies (default: 2) -- `spam_link_penalty` - Points to reduce when links are detected in posts (default: 0, disabled) +## Features -When enabled, posts with excessive emojis or links receive score penalties to reduce their boost priority. This helps avoid promoting content that may be spam-like or overly promotional. +### Multi-Instance Trending Posts +- Boost trending posts from multiple Mastodon instances +- Configure separate fetch and boost limits per instance +- Filter posts from specific instances entirely ### Local Timeline Boosting +- Optionally boost posts from your own instance's local timeline +- Only boosts posts from the same day with minimum engagement +- Great for promoting local community content on smaller instances -The bot can optionally boost posts from your own instance's local timeline, in addition to trending posts from remote instances. This feature helps promote local content that has already gained some community engagement. - -**How it works:** -- When `local_timeline_enabled: true`, the bot fetches posts from your instance's local timeline -- Posts are filtered to include only those from the current day (same day as the boost run) -- Posts must have minimum engagement: at least `local_timeline_min_engagement` total interactions (reblogs + favorites + replies) -- The bot respects `local_timeline_boost_limit` to avoid over-promoting local content -- Local posts are scored and ranked alongside trending posts from remote instances - -**Configuration:** -```yaml -local_timeline_enabled: true # Enabled by default (set to false to disable) -local_timeline_fetch_limit: 20 # How many local posts to fetch -local_timeline_boost_limit: 2 # Max boosts from local timeline per run -local_timeline_min_engagement: 1 # Minimum total engagement required -``` - -This feature is enabled by default and is useful for smaller instances where local content may not trend globally, but deserves visibility within the community. +### Language Filtering +- Filter posts by language using Mastodon metadata +- Or use automatic content-based detection with langdetect +- Skips posts with undetectable or non-allowed languages + +### Quality Controls +- **Hashtag scoring**: Assign weights to prioritize certain hashtags +- **Media preferences**: Prefer or require posts with media attachments +- **Engagement thresholds**: Skip posts with too few reblogs or favorites +- **Age decay**: Reduce score of older posts +- **Minimum score threshold**: Skip posts below a score cutoff + +### Spam & Duplicate Detection +- **Spam detection**: Penalize posts with excessive emojis or links +- **Duplicate avoidance**: Track canonical URLs with configurable cache +- **Author diversity**: Prevent same author from dominating your timeline (24h rolling window) ### Proactive Federation - -The bot automatically federates unfederated trending posts from remote instances. When a trending post is not yet in your local instance's database, the bot will actively federate it using `search_v2(resolve=True)` before boosting. This helps seed federation by bringing trending content from other instances into your local timeline, which is especially useful for smaller instances that want to increase content discovery for their users. +- Automatically federates unfederated trending posts +- Uses `search_v2(resolve=True)` before boosting to seed federation ### Debug Logging +- Comprehensive decision tracing with `debug_decisions: true` +- Detailed scoring breakdown: hashtag scores, engagement, media bonus +- Filtering decisions with specific reasons +- Persistent logging to file with `logfile_path` -For detailed traceability of the bot's decisions, you can enable debug logging: - -- `debug_decisions: true` - Enables comprehensive logging of each decision made during the boost process -- `logfile_path: "/path/to/logfile.log"` - Enables persistent logging to a file in addition to console output -- `log_level: "DEBUG"` - Sets the logging level to show debug messages - -When `debug_decisions` is enabled, the bot will log: -- Detailed scoring breakdown for each post (hashtag scores, engagement, media bonus) -- Filtering decisions with specific reasons (content rules, seen status, instance filters) -- Instance fetching results and counts -- Complete boost cycle summaries with statistics - -Example debug output: -``` -STATUS 12345678... | SCORING: 21.48 - Hashtags: ['python', 'mastodon'] - Tag scores: [10, 5] = 15 - Reblogs: 5 -> 3.58 - Favourites: 10 -> 2.40 - Media bonus: 0.5 (has_media: True) - Total: 15 + 3.58 + 2.40 + 0.5 = 21.48 -STATUS 12345678... | FILTER CHECK: KEEP - Media attachments: 1 - Skip no media: False (require_media: False) - Language: 'en', allowlist: ['en'] - Skip language: False -DECISION: BOOST - Status passes all checks -``` +### Flexible Deployment +- Configurable refresh interval (default: 15 minutes) +- Hourly and daily public boost caps +- State persistence for continuity across restarts +- Docker and Kubernetes support -**Migration note**: The `rotate_instances` option has been removed. The bot now checks every subscribed instance each run, so older configs should drop this field. +## Credits -## Features - -- Boost trending posts from other Mastodon instances -- Optionally boost from your own instance's local timeline with engagement filtering -- Fetch larger candidate pools (up to 20 per instance) while boosting fewer posts for diversity -- Separate fetch_limit and boost_limit per instance for fine-grained control -- Update bot profile with list of subscribed instances -- Rank collected posts using hashtags, engagement, and optional media preference -- Normalize scores on a 0–100 scale and favor newer posts when scores tie -- Skip duplicates across instances by tracking canonical URLs with a configurable cache -- Enforce hourly and daily caps on public boosts -- Limit boosts per instance per run and for any single author per day (using a 24-hour rolling window) -- Skip reposts and filter posts without media or missing content warnings -- Skip posts with too few reblogs or favourites -- **Language filtering with automatic detection**: Filter posts by language using Mastodon metadata or automatic content-based detection -- Prioritize posts containing weighted hashtags -- Read timestamps whether they're strings or Python datetimes -- Default 15-minute interval for frequent, smaller boost cycles -- Local timeline filtering: only boost posts from the same day with minimum engagement - -## Branches - -Work starts on `develop`. When it's merged into `main` and deleted, a workflow recreates `develop` from `main`. If that job fails, create the branch manually. +This project is a fork of [v411e/hype](https://github.com/v411e/hype). Significant enhancements have been made including local timeline boosting, language filtering, hashtag scoring, spam detection, quality controls, author diversity enforcement, and debug logging. --- - + diff --git a/config/config.example.yaml b/config/config.example.yaml index 52cb82f..1382c2e 100644 --- a/config/config.example.yaml +++ b/config/config.example.yaml @@ -13,6 +13,7 @@ profile_prefix: "I am boosting trending posts from:" fields: code: https://github.com/goingdark-social/hypebot operator: "@yourhandle@yourdomain.example" + feedback: "https://yourinstance.social/@yourbot" # Instance configuration with fetch/boost limits # fetch_limit: How many trending posts to fetch (max 20, API limit) diff --git a/config/config.yaml b/config/config.yaml index 899c6d0..4d8f405 100644 --- a/config/config.yaml +++ b/config/config.yaml @@ -2,7 +2,7 @@ interval: 30 # Text to add to the bot profile before the list of subscribed servers -profile_prefix: "Official hypebot for goingdark.social, automatically boosting top trending posts from selected Mastodon instances:" +profile_prefix: "Official hypebot for goingdark.social, automatically boosting top trending posts from selected Mastodon instances. Feedback: https://goingdark.social/@fanfare" # profile fields to fill in fields: