Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
af75265
feat: implement Open Library/Internet Archive authentication with S3 …
ronibhakta1 Apr 29, 2026
cb5bd66
feat: implement ol-logout command and rename ol-configure to ol-login
ronibhakta1 Apr 29, 2026
613270f
refactor: add lending requirement checks, improve Open Library error …
ronibhakta1 Apr 29, 2026
ee30484
refactor: implement server-side lending configuration checks and robu…
ronibhakta1 Apr 29, 2026
9bf98cb
feat: add item deletion support, improve email validation, and implem…
ronibhakta1 May 1, 2026
756130f
test: add lending configuration support and integrate mock_lending fi…
ronibhakta1 May 1, 2026
0c3019c
feat(catalog): add catalog package with types, enums, and exceptions
ronibhakta1 May 3, 2026
360292d
test(catalog): extend enum string-subclass coverage; fix trailing new…
ronibhakta1 May 3, 2026
884cfcb
fix(catalog): use timezone-aware datetime, fix BigInt FK variants, ad…
ronibhakta1 May 3, 2026
abb6762
feat(catalog): add migration for import_jobs and import_items tables
ronibhakta1 May 3, 2026
43a869b
chore: add rapidfuzz for fuzzy title/author matching
ronibhakta1 May 3, 2026
79d805d
feat(catalog): add OLResolver protocol and APIResolver with full look…
ronibhakta1 May 3, 2026
26d2e50
fix(catalog): guard 409 None return, regex _parse_olid, remove dead code
ronibhakta1 May 3, 2026
b71a582
test(catalog): add full resolver test suite — cascade, Google Books, …
ronibhakta1 May 3, 2026
fbb4e75
fix(catalog): use values_callable on SAEnum to send .value not member…
ronibhakta1 May 3, 2026
509dd56
feat(catalog): add Pydantic schemas for catalog API
ronibhakta1 May 4, 2026
f581645
Merge remote-tracking branch 'origin/feature/IA-Xauth' into feature/m…
ronibhakta1 May 4, 2026
c7619c6
feat(catalog): add catalog import pipeline API layer
ronibhakta1 May 4, 2026
4a78b8d
fix(catalog): address code review findings
ronibhakta1 May 4, 2026
5b2f3d8
feat: introduce catalog foundation and metadata reconciliation tool d…
ronibhakta1 May 5, 2026
4313f4a
feat: implement catalog API router foundation with admin auth and doc…
ronibhakta1 May 5, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -178,3 +178,4 @@ cython_debug/
pyopds2_lenny
.lenny-version
backups/
.worktrees/
44 changes: 43 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,18 @@ url:
update:
@bash docker/utils/update.sh

# Log in to archive.org/openlibrary.org and store IA S3 keys in .env.
# Idempotent — safe to re-run. Use to log in, re-login with a different account,
# or recover from a failed lending setup.
.PHONY: ol-login
ol-login: ifup
@bash docker/utils/ol_configure.sh

# Log out of archive.org — clears IA S3 keys from .env and disables lending.
.PHONY: ol-logout
ol-logout: ifup
@bash docker/utils/ol_logout.sh

# Run environment diagnostics
.PHONY: doctor
doctor:
Expand Down Expand Up @@ -161,4 +173,34 @@ squash-migrations: ifup
@read _
@rm -f alembic/versions/*.py
@docker exec $(container) alembic revision --autogenerate -m "squashed baseline"
@echo "New baseline created. Existing databases must run: make migrate-stamp"
@echo "New baseline created. Existing databases must run: make migrate-stamp"

# Catalog Worker

.PHONY: catalog-worker-start
catalog-worker-start:
@docker compose up -d catalog_worker

.PHONY: catalog-worker-stop
catalog-worker-stop:
@docker compose stop catalog_worker

.PHONY: catalog-worker-logs
catalog-worker-logs:
@docker compose logs -f catalog_worker

# Run catalog migrations (alias: migrate runs all, this scopes the message)
.PHONY: catalog-migrate
catalog-migrate: ifup
@docker exec $(container) alembic upgrade head

# Show catalog worker container status
.PHONY: catalog-status
catalog-status:
@docker compose ps catalog_worker

# Scale the catalog worker to N replicas (default: 1).
# Usage: make catalog-worker-scale replicas=3
.PHONY: catalog-worker-scale
catalog-worker-scale:
@docker compose up -d --scale catalog_worker=$(replicas) --no-recreate catalog_worker
79 changes: 79 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,9 @@
- [Endpoints](#endpoints)
- [Getting Started](#getting-started)
- [Development Setup](#development-setup)
- [Open Library / Internet Archive Auth](#open-library--internet-archive-auth) — enable lending via Admin UI or CLI
- [Updating](#updating)
- [Catalog Import Worker Configuration](#catalog-import-worker-configuration)
- [Database Migrations](#database-migrations)
- [Health Check](#health-check)
- [Testing Readium Server](#testing-readium-server)
Expand Down Expand Up @@ -246,6 +248,38 @@ curl "http://localhost:15080/$BOOK/manifest.json"

---

## Open Library / Internet Archive Auth

Lenny must be connected to an [Internet Archive](https://archive.org) account to enable lending. You can do this two ways: through the **Admin UI** or the **CLI**.

### Option 1 — Admin UI (recommended)

Open the admin dashboard at `/admin`, sign in, and navigate to **Settings → Open Library**. Enter your Internet Archive email and password and click **Log in**. Lending is enabled immediately — no restart required.

To disconnect, click **Log out** on the same page. Lending is disabled immediately.

### Option 2 — CLI

```sh
# Log in (interactive — prompts for email and password)
make ol-login

# Log out — clears IA S3 keys from .env and disables lending
make ol-logout
```

**Scripted / non-interactive login** (e.g. CI):
```sh
OL_EMAIL=you@example.com LENNY_NONINTERACTIVE=1 make ol-login
```
> `LENNY_NONINTERACTIVE=1` suppresses all "are you sure?" confirmation prompts so the command can run unattended in scripts or CI pipelines.

> **Security:** avoid passing `OL_PASSWORD` as an environment variable in scripts — it will appear in shell history and `ps` output. Instead, let the interactive prompt handle the password, or pipe it via stdin using a secrets manager.

After logging in, lending is enabled automatically and the API container is restarted so the credentials take effect. After logging out, lending is disabled and the container restarts immediately.

---

## Updating

To update an existing Lenny installation to the latest version:
Expand Down Expand Up @@ -281,6 +315,51 @@ For details on the update engine architecture, see [docs/plans/update-engine.md]

---

## Catalog Import Worker Configuration

The catalog import worker processes book imports in the background. Three knobs control its capacity — all are set in `.env` and take effect after `make redeploy`.

| Variable | Default | Controls |
|---|---|---|
| `CATALOG_CONCURRENCY` | `10` | Thread-pool size **per worker container**. Each thread handles one item at a time (API lookup → S3 upload → DB write). |
| `CATALOG_WORKER_REPLICAS` | `1` | Number of worker **containers** to run in parallel. Replicas use `SELECT FOR UPDATE SKIP LOCKED` so they never process the same item twice. |
| `CATALOG_WORKER_CPU_LIMIT` | `2.0` | CPU cap per worker container (Docker). |
| `CATALOG_WORKER_MEM_LIMIT` | `1G` | Memory cap per worker container (Docker). |

> `LENNY_WORKERS` (default `3`) controls the API server's uvicorn process count — unrelated to catalog imports.

### When to tune

- **Small library (< 5 000 books):** defaults are fine.
- **Medium library (5 000 – 50 000 books):** raise `CATALOG_CONCURRENCY` to `20` and/or set `CATALOG_WORKER_REPLICAS=2`.
- **Large library (> 50 000 books):** run multiple replicas (`CATALOG_WORKER_REPLICAS=4`) with a moderate concurrency (`CATALOG_CONCURRENCY=10`) to spread load across containers.

### How to apply

```sh
# In .env
CATALOG_CONCURRENCY=20
CATALOG_WORKER_REPLICAS=2
CATALOG_WORKER_CPU_LIMIT=2.0
CATALOG_WORKER_MEM_LIMIT=2G

make redeploy
```

Or scale replicas without a full redeploy:

```sh
make catalog-worker-scale replicas=3
```

Check running workers:

```sh
make catalog-status
```

---

## Database Migrations

Lenny uses [Alembic](https://alembic.sqlalchemy.org/) for database migrations. Migrations run automatically on container startup — no manual steps needed during normal use.
Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0.2.1
0.2.2
1 change: 1 addition & 0 deletions alembic/env.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
# Import models so Base.metadata has all table definitions registered
import lenny.core.models # noqa: F401
import lenny.core.cache # noqa: F401
import lenny.catalog.models # noqa: F401

# Alembic Config object — access to alembic.ini values
config = context.config
Expand Down
125 changes: 125 additions & 0 deletions alembic/versions/002_add_catalog_tables.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
"""Add catalog import_jobs and import_items tables.

Revision ID: 002_catalog
Revises: 001_baseline
Create Date: 2026-05-03
"""
import re
from alembic import op
import sqlalchemy as sa
from sqlalchemy.dialects import postgresql

revision = "002_catalog"
down_revision = "c6b7da6debc2"
branch_labels = None
depends_on = None

_SAFE_IDENT = re.compile(r'^[a-z][a-z0-9_]*$')


def _create_enum(name: str, *values: str) -> None:
if not _SAFE_IDENT.match(name):
raise ValueError(f"Unsafe enum type name: {name!r}")
quoted = ", ".join(f"'{v}'" for v in values)
op.execute(sa.text(f"CREATE TYPE {name} AS ENUM ({quoted})"))


def upgrade() -> None:
# --- Enums (raw SQL — avoids SQLAlchemy auto-create unreliability) ---
_create_enum("jobstatus",
"pending", "running", "awaiting_review", "paused",
"completed", "cancelled", "error")
_create_enum("jobmode", "metadata_sync", "full_import")
_create_enum("persona", "publisher", "library", "author")
_create_enum("resolvertype", "api", "dump")
_create_enum("inputmethod",
"epub_folder", "epub_sidecar", "csv", "marc",
"opds", "onix", "vendor_api")
_create_enum("encryptionpolicy",
"all_encrypted", "all_open", "mixed_auto", "mixed_manual")
_create_enum("pipelinestage",
"pending", "extracting", "extracted", "resolving",
"resolved", "ol_writing", "ol_done", "uploading",
"done", "error", "needs_review", "skipped")
_create_enum("olstatus",
"OL_MATCH_CLEAN", "OL_MATCH_FUZZY", "OL_WORK_ONLY",
"OL_NOT_FOUND", "INSUFFICIENT_METADATA")
_create_enum("actiontaken",
"LINK_ONLY", "CREATE_FULL", "SKIPPED_OL", "NEEDS_REVIEW")

# --- import_jobs ---
op.create_table(
"import_jobs",
sa.Column("id", sa.BigInteger, primary_key=True, autoincrement=True),
sa.Column("status", postgresql.ENUM(name="jobstatus", create_type=False), nullable=False, server_default="pending"),
sa.Column("mode", postgresql.ENUM(name="jobmode", create_type=False), nullable=False),
sa.Column("persona", postgresql.ENUM(name="persona", create_type=False), nullable=False),
sa.Column("resolver_type", postgresql.ENUM(name="resolvertype", create_type=False), nullable=False, server_default="api"),
sa.Column("input_method", postgresql.ENUM(name="inputmethod", create_type=False), nullable=False),
sa.Column("encryption_policy",postgresql.ENUM(name="encryptionpolicy", create_type=False), nullable=False),
sa.Column("dry_run", sa.Boolean, nullable=False, server_default=sa.text("false")),
sa.Column("gate_a_enabled", sa.Boolean, nullable=False, server_default=sa.text("false")),
sa.Column("gate_b_enabled", sa.Boolean, nullable=False, server_default=sa.text("false")),
sa.Column("skip_ol", sa.Boolean, nullable=False, server_default=sa.text("false")),
sa.Column("total", sa.Integer, nullable=False, server_default="0"),
sa.Column("processed", sa.Integer, nullable=False, server_default="0"),
sa.Column("linked", sa.Integer, nullable=False, server_default="0"),
sa.Column("created_ol", sa.Integer, nullable=False, server_default="0"),
sa.Column("needs_review", sa.Integer, nullable=False, server_default="0"),
sa.Column("errors", sa.Integer, nullable=False, server_default="0"),
sa.Column("skipped", sa.Integer, nullable=False, server_default="0"),
sa.Column("created_at", sa.DateTime(timezone=True), server_default=sa.text("now()")),
sa.Column("started_at", sa.DateTime(timezone=True), nullable=True),
sa.Column("completed_at", sa.DateTime(timezone=True), nullable=True),
)

# --- import_items ---
op.create_table(
"import_items",
sa.Column("id", sa.BigInteger, primary_key=True, autoincrement=True),
sa.Column("job_id", sa.BigInteger, sa.ForeignKey("import_jobs.id"), nullable=False),
sa.Column("pipeline_stage", postgresql.ENUM(name="pipelinestage", create_type=False), nullable=False, server_default="pending"),
sa.Column("stage_updated_at", sa.DateTime(timezone=True), server_default=sa.text("now()")),
sa.Column("retry_count", sa.Integer, nullable=False, server_default="0"),
sa.Column("source_path", sa.String, nullable=True),
sa.Column("sha256", sa.String(64), nullable=True),
# Extracted metadata
sa.Column("extracted_title", sa.String, nullable=True),
sa.Column("extracted_author", sa.String, nullable=True),
sa.Column("extracted_isbn", sa.String, nullable=True),
sa.Column("extracted_metadata", postgresql.JSONB, nullable=True),
# OL resolution
sa.Column("ol_status", postgresql.ENUM(name="olstatus", create_type=False), nullable=True),
sa.Column("confidence", sa.Float, nullable=True),
sa.Column("olid", sa.BigInteger, nullable=True),
sa.Column("action_taken", postgresql.ENUM(name="actiontaken", create_type=False), nullable=True),
# Config
sa.Column("encrypted", sa.Boolean, nullable=True),
sa.Column("skip_ol", sa.Boolean, nullable=False, server_default=sa.text("false")),
sa.Column("review_candidates", postgresql.JSONB, nullable=True),
# Results
sa.Column("minio_key", sa.String, nullable=True),
sa.Column("item_id", sa.BigInteger, sa.ForeignKey("items.id"), nullable=True),
sa.Column("error_message", sa.String, nullable=True),
sa.Column("action_log", postgresql.JSONB, nullable=False, server_default="[]"),
sa.Column("created_at", sa.DateTime(timezone=True), server_default=sa.text("now()")),
sa.Column("updated_at", sa.DateTime(timezone=True), server_default=sa.text("now()")),
)

# Indexes — critical for worker performance
op.create_index("idx_import_items_job_stage", "import_items", ["job_id", "pipeline_stage"])
op.create_index("idx_import_items_sha256", "import_items", ["sha256"])
op.create_index("idx_import_items_stage_updated", "import_items", ["pipeline_stage", "stage_updated_at"])
op.create_index("idx_import_items_olid", "import_items", ["olid"])


def downgrade() -> None:
op.drop_index("idx_import_items_olid", table_name="import_items")
op.drop_index("idx_import_items_stage_updated", table_name="import_items")
op.drop_index("idx_import_items_sha256", table_name="import_items")
op.drop_index("idx_import_items_job_stage", table_name="import_items")
op.drop_table("import_items")
op.drop_table("import_jobs")
for name in ("actiontaken", "olstatus", "pipelinestage", "encryptionpolicy",
"inputmethod", "resolvertype", "persona", "jobmode", "jobstatus"):
op.execute(sa.text(f"DROP TYPE IF EXISTS {name}"))
28 changes: 28 additions & 0 deletions compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,33 @@ services:
networks:
- lenny_network

catalog_worker:
build:
context: .
dockerfile: docker/api/Dockerfile
command: python -m lenny.catalog.worker
restart: unless-stopped
depends_on:
db:
condition: service_healthy
s3:
condition: service_healthy
env_file: .env
environment:
- DB_HOST=db
- S3_ENDPOINT=s3:9000
volumes:
- .:/app
- catalog_dump:/data
deploy:
replicas: ${CATALOG_WORKER_REPLICAS:-1}
resources:
limits:
cpus: "${CATALOG_WORKER_CPU_LIMIT:-2.0}"
memory: ${CATALOG_WORKER_MEM_LIMIT:-1G}
networks:
- lenny_network

networks:
lenny_network:
driver: bridge
Expand All @@ -141,3 +168,4 @@ volumes:
db_data:
s3_data:
readium_data:
catalog_dump:
Loading
Loading