Skip to content

mst/rss-dedupe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

rss-dedupe

A lightweight RSS feed proxy that rewrites unstable GUIDs to stable ones, fixing RSS readers that see duplicate articles when publishers rotate date components in their URLs.

Problem

Some publishers (e.g. ka-news.de) change the date part of article URLs daily:

Day 1: .../23-3-26-106676515
Day 2: .../24-3-26-106676515

Since RSS readers use the <guid> to identify entries, this causes the same article to appear as new every day. This proxy extracts a stable ID from the URL via a configurable regex and uses that as the GUID instead.

How it works

  1. Fetches upstream feeds on a configurable interval
  2. Applies a regex to each entry's URL to extract a stable ID
  3. Rewrites the <guid> to {feed-name}:{stable-id}
  4. Serves the rewritten feed over HTTP
  5. No persistence — in-memory only

Setup

uv pip install -r requirements.txt
python -m rss_dedupe

Configuration

feeds:
  - name: ksc                        # used in the served path: /feed/ksc.xml
    url: https://www.ka-news.de/ksc/rss
    guid_pattern: "-([0-9]{7,})$"    # regex capture group → stable GUID
    fetch_interval: 1800             # seconds between fetches

server:
  host: 0.0.0.0
  port: 8090

guid_pattern

A regex with one capture group applied to each entry's <link>. The captured value becomes the new GUID. If the pattern doesn't match, the original GUID is kept.

Endpoints

Path Description
GET /feed/{name}.xml Deduplicated RSS 2.0 feed
GET /health {"status": "ok"}

Deployment

Built and pushed to ghcr.io/mst/rss-dedupe:latest via GitHub Actions on every push to main. Deployed on Kubernetes via gitops.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors