Personal reading and video library with RSS sync, clean extraction, embeddings, and recommendations.
Most reading apps are good at collecting links and bad at helping you actually read. Feeds get noisy, article previews are incomplete, recommendations are generic, and video subscriptions live in a separate stack from everything else.
Library turns RSS feeds, longform articles, and YouTube channels into one personal reading system. It syncs feeds, crawls full article content when RSS is thin, learns site-specific cleanup rules, generates embeddings for both articles and subtitles, and ranks your next reads based on what you actually spend time with.
It is not a public content platform. It is a private, opinionated reader for one account or a small set of authenticated users who want a cleaner queue and better retrieval than a normal RSS app.
You can add a normal RSS feed, or hand the app a YouTube handle or channel URL and let it resolve the feed automatically. New entries are stored immediately, then split into follow-up jobs:
- Article feeds get queued for full-page crawling when the RSS body looks too thin
- Video feeds get queued for YouTube download, subtitle extraction, HLS packaging, and upload to R2
- Tweet feeds skip crawling because the feed payload is already the content
Every user is subscribed to feeds through the database, and new accounts can be backfilled across the existing catalog with the included user bootstrap script.
RSS content is often messy, truncated, or wrapped in subscription chrome. Library uses Mozilla Readability for the first pass, then stores hostname-specific sanitization rules to strip repeated junk more aggressively on future syncs.
For articles, the cleaned HTML is chunked into overlapping text windows and embedded with OpenAI. For videos, subtitles are converted to plain text, chunked the same way, and indexed into pgvector. That gives the app one searchable semantic layer across both written and spoken content.
The recommendation system is engagement-driven. It builds preference vectors from the entries you open, finish, bookmark, or dismiss, then ranks candidates using:
- embedding similarity to your positive history
- penalty from explicitly disliked content
- feed-level affinity and dislike signals
- freshness so new items still surface
The result is closer to a personal knowledge queue than a chronological firehose.
library/
├── src/
│ ├── app/
│ │ ├── page.tsx # Auth-gated reader shell
│ │ ├── article/[id]/ # Full reading view
│ │ ├── login/ # Email/password auth
│ │ └── api/
│ │ ├── feeds/route.ts # Feed creation + listing
│ │ ├── feed/sync/route.ts # Sync feed items and queue work
│ │ ├── entries/route.ts # Entry list for the reader
│ │ ├── recommendations/route.ts # Personalized ranking
│ │ ├── search/route.ts # Semantic + title search
│ │ ├── video/process/route.ts # Video ingestion trigger
│ │ └── engagement/route.ts # Reading session tracking
│ ├── components/
│ │ ├── home-client.tsx # Reader UI, filters, swipe actions
│ │ ├── use-entry-engagement.ts # Scroll/time tracking
│ │ └── VideoPlayer.tsx # HLS playback
│ └── lib/
│ ├── rss.ts # Feed parsing and reading time
│ ├── crawl.ts # Article fetch + Readability extraction
│ ├── embeddings.ts # Chunking, embeddings, semantic search
│ ├── recommendations.ts # Personalized scoring model
│ ├── video.ts # yt-dlp + ffmpeg + R2 pipeline
│ ├── queue.ts # BullMQ jobs for crawl/video work
│ └── site-sanitization.ts # Hostname-specific cleanup rules
├── prisma/
│ ├── schema.prisma # Users, feeds, entries, videos, embeddings
│ └── migrations/ # Postgres schema history
├── scripts/
│ ├── create-user.js # Create/update a user and subscribe feeds
│ ├── backfill-site-rules.ts # Learn cleanup rules after the fact
│ └── resegment-videos.ts # Rebuild HLS output when needed
├── docker-compose.yml # App, Postgres, Redis, RSSHub
└── README.md
| Layer | What | Why |
|---|---|---|
| Framework | Next.js 14 (App Router) | Server-rendered reader UI and API routes in one app |
| Language | TypeScript 5 | Shared types across UI, jobs, and API handlers |
| Database | Postgres + pgvector + Prisma | Structured feed data plus vector search |
| Queue | Redis + BullMQ | Background crawl and video processing |
| Extraction | Mozilla Readability + linkedom | Full-article parsing from noisy pages |
| Search / ranking | OpenAI embeddings | Semantic indexing and recommendation signals |
| Video pipeline | yt-dlp + ffmpeg + Cloudflare R2 | Turn YouTube entries into HLS playback with subtitles |
| UI | Tailwind CSS + SWR | Fast reader interactions and live refreshes |
users — authenticated readers and their preference vector
master_feed — canonical feed catalog
subscribed_feed — user-to-feed subscriptions
entries — synced article, tweet, and video entries
entry_embeddings — chunked vector index for semantic search
reader_sessions — per-device reader identity
entry_engagements — open time, scroll depth, bookmark/dismiss signals
videos — YouTube metadata, HLS path, subtitles, playback state
site_sanitization_rules — hostname-specific cleanup rules learned over time
# Clone
git clone git@github.com:Cesarioo/library.git
cd library
# Install dependencies
npm install
# Configure local secrets
cp .env.example .env
# Start Postgres, Redis, and RSSHub
docker compose up -d db redis rsshub
# Apply schema and generate Prisma client
npx prisma migrate deploy
npx prisma generate
# Create a reader account
node scripts/create-user.js you@example.com your-password
# Run the app
npm run devOpen http://localhost:3000.
If you want to run the full app container instead of local next dev, use docker compose up --build library.
# Core services
DATABASE_URL=postgresql://read:read@localhost:5433/read
REDIS_URL=redis://localhost:6379
AUTH_SECRET=
OPENAI_API_KEY=
# Optional: bind crawler requests to a specific IPv4 address
CRAWL_IP=
SITE_RULE_MODEL=gpt-5.4
# Cloudflare R2 (video HLS + subtitles)
R2_ENDPOINT=
R2_ACCESS_KEY_ID=
R2_SECRET_ACCESS_KEY=
R2_BUCKET=library
# Optional: used by scripts/resegment-videos.ts
MEDIA_BASE_URL=https://media.library.oscarmairey.com
VERSION_LABEL=
# Optional: only needed for authenticated Twitter sources in RSSHub
RSSHUB_TWITTER_AUTH_TOKEN=
RSSHUB_TWITTER_CT0=cookies.txtand anycookies-*.txtfiles are intentionally gitignored. They are optional private cookie jars for sites or videos that require authenticated fetching.- The Docker image now boots without cookie files present, but authenticated YouTube downloads may still need you to provide your own cookies locally.
docker-compose.ymlis safe to publish: secrets come from.env, not from committed values.
Built by Oscar Mairey