Skip to content

Feature/magic tool#182

Draft
ronibhakta1 wants to merge 21 commits into
ArchiveLabs:mainfrom
ronibhakta1:feature/magic-tool
Draft

Feature/magic tool#182
ronibhakta1 wants to merge 21 commits into
ArchiveLabs:mainfrom
ronibhakta1:feature/magic-tool

Conversation

@ronibhakta1
Copy link
Copy Markdown
Collaborator

This pull request introduces Open Library (Internet Archive) authentication and catalog import worker support to Lenny, along with associated documentation, configuration, and database changes. The most important updates are the ability to log in to archive.org for lending, a new catalog import worker container, and the new import_jobs and import_items tables to support catalog imports.

Open Library / Internet Archive Authentication:

  • Added CLI and Admin UI support for logging in and out of archive.org/Open Library, storing IA S3 keys in .env, and enabling/disabling lending. Includes a robust, idempotent ol-login/ol-logout flow and a secure Bash script to handle credentials and container restarts. [1] [2] [3]
  • Updated README.md with clear instructions for enabling/disabling lending via Admin UI or CLI, including security advice for scripted logins. [1] [2]

Catalog Import Worker:

  • Added a new catalog_worker service to compose.yaml with resource limits, environment variables, and volume mounts for catalog imports. [1] [2]
  • Added Makefile targets for starting, stopping, logging, and checking the status of the catalog worker, as well as running catalog migrations.
  • Added catalog worker configuration variables to .env generation and documentation. [1] [2]

Database Schema:

  • Added an Alembic migration to create import_jobs and import_items tables, with supporting enums and indexes for catalog import tracking and worker performance.
  • Registered catalog models in Alembic for migration autogeneration.

Other:

  • Bumped version to 0.2.2.

These changes together enable secure, operator-friendly setup for lending and catalog imports, laying the groundwork for scalable, automated metadata ingestion and lending support.

- Add catalog package: types/enums/exceptions, BookMetadata, OLResult,
  FSM stage transitions, pipeline stages and actions
- Add APIResolver with full OL lookup cascade: ISBN → title/author search
  → Google Books fallback → CREATE_FULL; fuzzy scoring via rapidfuzz
- Add CatalogWorker with ThreadPoolExecutor, claim/dispatch loop,
  stale-item reset on startup, and graceful SIGTERM shutdown
- Add 22 FastAPI routes under /v1/api/catalog: job CRUD, lifecycle
  (pause/resume/cancel), SSE progress stream, review gates A/B/C,
  fuzzy resolution, manual search/link/create, OL auth status
- Add extractor and pipeline stages (extract → resolve → OL write →
  upload → done), gate guards, dry-run support, encryption policy
- Add Alembic migration 002 for import_jobs and import_items tables
- Register catalog router in lenny/app.py
- Add docker/compose.yaml catalog-worker service and Makefile targets
- Add test suites: 26 route tests, resolver cascade tests, extractor
  and pipeline unit tests, conftest with in-memory SQLite fixture
Critical:
- worker: _check_job_completion counted only PENDING items, causing jobs
  with NEEDS_REVIEW items to be falsely marked COMPLETED; now counts all
  non-terminal stages and transitions to AWAITING_REVIEW when all remaining
  items are gated for human review; commit wrapped in try/except rollback
- resolver: correct lookup() docstring — OLRateLimited can propagate

Important:
- routes: add ManualCreateRequest schema; manual_create now uses typed body
  instead of bare dict (no request validation, no OpenAPI schema)
- routes: manual_create returns 503 (server unconfigured) not 401 (caller
  unauthenticated) when OL credentials are missing
- routes: consolidate manual_create except clauses; unexpected exceptions
  are logged and returned as 500 instead of swallowed as 502
- routes: SSE generator acquires a fresh _scoped_session per poll and
  releases it immediately after reading, avoiding connection pool exhaustion
  from long-lived streams
- conftest: patch _scoped_session in routes module so SSE tests use the
  test session; create items table with SQLite-compatible DDL
- tests: manual_link is now a real DB integration test; duplicate OLID
  test added; Gate C test uses MIXED_MANUAL policy and asserts item count

Minor:
- exceptions: remove dead OLAuthRequired and OLAuthError (session-cookie
  auth was removed in the IA auth migration)
- resolver: Google Books fallback also fires for ISBN-only records without
  a title
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant