Library

Personal reading and video library with RSS sync, clean extraction, embeddings, and recommendations.

The idea

Most reading apps are good at collecting links and bad at helping you actually read. Feeds get noisy, article previews are incomplete, recommendations are generic, and video subscriptions live in a separate stack from everything else.

Library turns RSS feeds, longform articles, and YouTube channels into one personal reading system. It syncs feeds, crawls full article content when RSS is thin, learns site-specific cleanup rules, generates embeddings for both articles and subtitles, and ranks your next reads based on what you actually spend time with.

It is not a public content platform. It is a private, opinionated reader for one account or a small set of authenticated users who want a cleaner queue and better retrieval than a normal RSS app.

How it works

Adding feeds

You can add a normal RSS feed, or hand the app a YouTube handle or channel URL and let it resolve the feed automatically. New entries are stored immediately, then split into follow-up jobs:

Article feeds get queued for full-page crawling when the RSS body looks too thin
Video feeds get queued for YouTube download, subtitle extraction, HLS packaging, and upload to R2
Tweet feeds skip crawling because the feed payload is already the content

Every user is subscribed to feeds through the database, and new accounts can be backfilled across the existing catalog with the included user bootstrap script.

Cleaning and understanding content

RSS content is often messy, truncated, or wrapped in subscription chrome. Library uses Mozilla Readability for the first pass, then stores hostname-specific sanitization rules to strip repeated junk more aggressively on future syncs.

For articles, the cleaned HTML is chunked into overlapping text windows and embedded with OpenAI. For videos, subtitles are converted to plain text, chunked the same way, and indexed into pgvector. That gives the app one searchable semantic layer across both written and spoken content.

Ranking what to read next

The recommendation system is engagement-driven. It builds preference vectors from the entries you open, finish, bookmark, or dismiss, then ranks candidates using:

embedding similarity to your positive history
penalty from explicitly disliked content
feed-level affinity and dislike signals
freshness so new items still surface

The result is closer to a personal knowledge queue than a chronological firehose.

Architecture

library/
├── src/
│   ├── app/
│   │   ├── page.tsx                    # Auth-gated reader shell
│   │   ├── article/[id]/              # Full reading view
│   │   ├── login/                     # Email/password auth
│   │   └── api/
│   │       ├── feeds/route.ts         # Feed creation + listing
│   │       ├── feed/sync/route.ts     # Sync feed items and queue work
│   │       ├── entries/route.ts       # Entry list for the reader
│   │       ├── recommendations/route.ts # Personalized ranking
│   │       ├── search/route.ts        # Semantic + title search
│   │       ├── video/process/route.ts # Video ingestion trigger
│   │       └── engagement/route.ts    # Reading session tracking
│   ├── components/
│   │   ├── home-client.tsx            # Reader UI, filters, swipe actions
│   │   ├── use-entry-engagement.ts    # Scroll/time tracking
│   │   └── VideoPlayer.tsx            # HLS playback
│   └── lib/
│       ├── rss.ts                     # Feed parsing and reading time
│       ├── crawl.ts                   # Article fetch + Readability extraction
│       ├── embeddings.ts              # Chunking, embeddings, semantic search
│       ├── recommendations.ts         # Personalized scoring model
│       ├── video.ts                   # yt-dlp + ffmpeg + R2 pipeline
│       ├── queue.ts                   # BullMQ jobs for crawl/video work
│       └── site-sanitization.ts       # Hostname-specific cleanup rules
├── prisma/
│   ├── schema.prisma                  # Users, feeds, entries, videos, embeddings
│   └── migrations/                    # Postgres schema history
├── scripts/
│   ├── create-user.js                 # Create/update a user and subscribe feeds
│   ├── backfill-site-rules.ts         # Learn cleanup rules after the fact
│   └── resegment-videos.ts            # Rebuild HLS output when needed
├── docker-compose.yml                 # App, Postgres, Redis, RSSHub
└── README.md

Tech stack

Layer	What	Why
Framework	Next.js 14 (App Router)	Server-rendered reader UI and API routes in one app
Language	TypeScript 5	Shared types across UI, jobs, and API handlers
Database	Postgres + pgvector + Prisma	Structured feed data plus vector search
Queue	Redis + BullMQ	Background crawl and video processing
Extraction	Mozilla Readability + linkedom	Full-article parsing from noisy pages
Search / ranking	OpenAI embeddings	Semantic indexing and recommendation signals
Video pipeline	yt-dlp + ffmpeg + Cloudflare R2	Turn YouTube entries into HLS playback with subtitles
UI	Tailwind CSS + SWR	Fast reader interactions and live refreshes

Database model

users                  — authenticated readers and their preference vector
master_feed            — canonical feed catalog
subscribed_feed        — user-to-feed subscriptions
entries                — synced article, tweet, and video entries
entry_embeddings       — chunked vector index for semantic search
reader_sessions        — per-device reader identity
entry_engagements      — open time, scroll depth, bookmark/dismiss signals
videos                 — YouTube metadata, HLS path, subtitles, playback state
site_sanitization_rules — hostname-specific cleanup rules learned over time

Getting started

# Clone
git clone git@github.com:Cesarioo/library.git
cd library

# Install dependencies
npm install

# Configure local secrets
cp .env.example .env

# Start Postgres, Redis, and RSSHub
docker compose up -d db redis rsshub

# Apply schema and generate Prisma client
npx prisma migrate deploy
npx prisma generate

# Create a reader account
node scripts/create-user.js you@example.com your-password

# Run the app
npm run dev

Open http://localhost:3000.

If you want to run the full app container instead of local next dev, use docker compose up --build library.

Environment variables

# Core services
DATABASE_URL=postgresql://read:read@localhost:5433/read
REDIS_URL=redis://localhost:6379
AUTH_SECRET=
OPENAI_API_KEY=

# Optional: bind crawler requests to a specific IPv4 address
CRAWL_IP=
SITE_RULE_MODEL=gpt-5.4

# Cloudflare R2 (video HLS + subtitles)
R2_ENDPOINT=
R2_ACCESS_KEY_ID=
R2_SECRET_ACCESS_KEY=
R2_BUCKET=library

# Optional: used by scripts/resegment-videos.ts
MEDIA_BASE_URL=https://media.library.oscarmairey.com
VERSION_LABEL=

# Optional: only needed for authenticated Twitter sources in RSSHub
RSSHUB_TWITTER_AUTH_TOKEN=
RSSHUB_TWITTER_CT0=

Notes

cookies.txt and any cookies-*.txt files are intentionally gitignored. They are optional private cookie jars for sites or videos that require authenticated fetching.
The Docker image now boots without cookie files present, but authenticated YouTube downloads may still need you to provide your own cookies locally.
docker-compose.yml is safe to publish: secrets come from .env, not from committed values.

Built by Oscar Mairey

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
prisma		prisma
public		public
scripts		scripts
src		src
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
entrypoint.sh		entrypoint.sh
middleware.ts		middleware.ts
next.config.mjs		next.config.mjs
package-lock.json		package-lock.json
package.json		package.json
playwright.config.ts		playwright.config.ts
postcss.config.mjs		postcss.config.mjs
prisma.config.ts		prisma.config.ts
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Library

Personal reading and video library with RSS sync, clean extraction, embeddings, and recommendations.

The idea

How it works

Adding feeds

Cleaning and understanding content

Ranking what to read next

Architecture

Tech stack

Database model

Getting started

Environment variables

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Library

Personal reading and video library with RSS sync, clean extraction, embeddings, and recommendations.

The idea

How it works

Adding feeds

Cleaning and understanding content

Ranking what to read next

Architecture

Tech stack

Database model

Getting started

Environment variables

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages