Skip to content

oscarmairey/library

Repository files navigation

Library

Personal reading and video library with RSS sync, clean extraction, embeddings, and recommendations.

TypeScript Next.js Prisma Postgres Redis OpenAI Tailwind


The idea

Most reading apps are good at collecting links and bad at helping you actually read. Feeds get noisy, article previews are incomplete, recommendations are generic, and video subscriptions live in a separate stack from everything else.

Library turns RSS feeds, longform articles, and YouTube channels into one personal reading system. It syncs feeds, crawls full article content when RSS is thin, learns site-specific cleanup rules, generates embeddings for both articles and subtitles, and ranks your next reads based on what you actually spend time with.

It is not a public content platform. It is a private, opinionated reader for one account or a small set of authenticated users who want a cleaner queue and better retrieval than a normal RSS app.

How it works

Adding feeds

You can add a normal RSS feed, or hand the app a YouTube handle or channel URL and let it resolve the feed automatically. New entries are stored immediately, then split into follow-up jobs:

  • Article feeds get queued for full-page crawling when the RSS body looks too thin
  • Video feeds get queued for YouTube download, subtitle extraction, HLS packaging, and upload to R2
  • Tweet feeds skip crawling because the feed payload is already the content

Every user is subscribed to feeds through the database, and new accounts can be backfilled across the existing catalog with the included user bootstrap script.

Cleaning and understanding content

RSS content is often messy, truncated, or wrapped in subscription chrome. Library uses Mozilla Readability for the first pass, then stores hostname-specific sanitization rules to strip repeated junk more aggressively on future syncs.

For articles, the cleaned HTML is chunked into overlapping text windows and embedded with OpenAI. For videos, subtitles are converted to plain text, chunked the same way, and indexed into pgvector. That gives the app one searchable semantic layer across both written and spoken content.

Ranking what to read next

The recommendation system is engagement-driven. It builds preference vectors from the entries you open, finish, bookmark, or dismiss, then ranks candidates using:

  • embedding similarity to your positive history
  • penalty from explicitly disliked content
  • feed-level affinity and dislike signals
  • freshness so new items still surface

The result is closer to a personal knowledge queue than a chronological firehose.

Architecture

library/
├── src/
│   ├── app/
│   │   ├── page.tsx                    # Auth-gated reader shell
│   │   ├── article/[id]/              # Full reading view
│   │   ├── login/                     # Email/password auth
│   │   └── api/
│   │       ├── feeds/route.ts         # Feed creation + listing
│   │       ├── feed/sync/route.ts     # Sync feed items and queue work
│   │       ├── entries/route.ts       # Entry list for the reader
│   │       ├── recommendations/route.ts # Personalized ranking
│   │       ├── search/route.ts        # Semantic + title search
│   │       ├── video/process/route.ts # Video ingestion trigger
│   │       └── engagement/route.ts    # Reading session tracking
│   ├── components/
│   │   ├── home-client.tsx            # Reader UI, filters, swipe actions
│   │   ├── use-entry-engagement.ts    # Scroll/time tracking
│   │   └── VideoPlayer.tsx            # HLS playback
│   └── lib/
│       ├── rss.ts                     # Feed parsing and reading time
│       ├── crawl.ts                   # Article fetch + Readability extraction
│       ├── embeddings.ts              # Chunking, embeddings, semantic search
│       ├── recommendations.ts         # Personalized scoring model
│       ├── video.ts                   # yt-dlp + ffmpeg + R2 pipeline
│       ├── queue.ts                   # BullMQ jobs for crawl/video work
│       └── site-sanitization.ts       # Hostname-specific cleanup rules
├── prisma/
│   ├── schema.prisma                  # Users, feeds, entries, videos, embeddings
│   └── migrations/                    # Postgres schema history
├── scripts/
│   ├── create-user.js                 # Create/update a user and subscribe feeds
│   ├── backfill-site-rules.ts         # Learn cleanup rules after the fact
│   └── resegment-videos.ts            # Rebuild HLS output when needed
├── docker-compose.yml                 # App, Postgres, Redis, RSSHub
└── README.md

Tech stack

Layer What Why
Framework Next.js 14 (App Router) Server-rendered reader UI and API routes in one app
Language TypeScript 5 Shared types across UI, jobs, and API handlers
Database Postgres + pgvector + Prisma Structured feed data plus vector search
Queue Redis + BullMQ Background crawl and video processing
Extraction Mozilla Readability + linkedom Full-article parsing from noisy pages
Search / ranking OpenAI embeddings Semantic indexing and recommendation signals
Video pipeline yt-dlp + ffmpeg + Cloudflare R2 Turn YouTube entries into HLS playback with subtitles
UI Tailwind CSS + SWR Fast reader interactions and live refreshes

Database model

users                  — authenticated readers and their preference vector
master_feed            — canonical feed catalog
subscribed_feed        — user-to-feed subscriptions
entries                — synced article, tweet, and video entries
entry_embeddings       — chunked vector index for semantic search
reader_sessions        — per-device reader identity
entry_engagements      — open time, scroll depth, bookmark/dismiss signals
videos                 — YouTube metadata, HLS path, subtitles, playback state
site_sanitization_rules — hostname-specific cleanup rules learned over time

Getting started

# Clone
git clone git@github.com:Cesarioo/library.git
cd library

# Install dependencies
npm install

# Configure local secrets
cp .env.example .env

# Start Postgres, Redis, and RSSHub
docker compose up -d db redis rsshub

# Apply schema and generate Prisma client
npx prisma migrate deploy
npx prisma generate

# Create a reader account
node scripts/create-user.js you@example.com your-password

# Run the app
npm run dev

Open http://localhost:3000.

If you want to run the full app container instead of local next dev, use docker compose up --build library.

Environment variables

# Core services
DATABASE_URL=postgresql://read:read@localhost:5433/read
REDIS_URL=redis://localhost:6379
AUTH_SECRET=
OPENAI_API_KEY=

# Optional: bind crawler requests to a specific IPv4 address
CRAWL_IP=
SITE_RULE_MODEL=gpt-5.4

# Cloudflare R2 (video HLS + subtitles)
R2_ENDPOINT=
R2_ACCESS_KEY_ID=
R2_SECRET_ACCESS_KEY=
R2_BUCKET=library

# Optional: used by scripts/resegment-videos.ts
MEDIA_BASE_URL=https://media.library.oscarmairey.com
VERSION_LABEL=

# Optional: only needed for authenticated Twitter sources in RSSHub
RSSHUB_TWITTER_AUTH_TOKEN=
RSSHUB_TWITTER_CT0=

Notes

  • cookies.txt and any cookies-*.txt files are intentionally gitignored. They are optional private cookie jars for sites or videos that require authenticated fetching.
  • The Docker image now boots without cookie files present, but authenticated YouTube downloads may still need you to provide your own cookies locally.
  • docker-compose.yml is safe to publish: secrets come from .env, not from committed values.

Built by Oscar Mairey

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors