Skip to content

docs: end-to-end book recommender walkthrough on goodbooks-10k#79

Merged
JohnJacob-coder merged 10 commits into
mainfrom
docs/book-recommender-demo
May 27, 2026
Merged

docs: end-to-end book recommender walkthrough on goodbooks-10k#79
JohnJacob-coder merged 10 commits into
mainfrom
docs/book-recommender-demo

Conversation

@Burton-David

Copy link
Copy Markdown
Owner

Closes #47. The final book-rec showcase artifact: a worked example on the docs site that mirrors the goodbooks benchmark pipeline at a smaller-than-benchmark scale.

Contents

Ordering caveat

Two of my open PRs supply APIs the demo uses:

Code blocks in the demo are plain markdown — mkdocs build --strict is clean even without those landing — but the demo doesn't function end-to-end until both merge. Safe to land in any order; the published site is fully accurate once main has all three.

Local checks

mkdocs build --strict --site-dir /tmp/check   # clean
ruff check src tests scripts                  # clean
mypy                                          # clean
pytest                                        # 110 passed

Closes #47

A worked example for the docs site that mirrors the goodbooks benchmark
pipeline at a smaller-than-benchmark scale: load + tag table, trim to
the dense subset, per-user holdout split, fit the hybrid book
recommender, recommend, evaluate, and explain.

Mentions the research-only license caveat front-and-center.

Closes #47

@JohnJacob-coder JohnJacob-coder left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good walkthrough — uses densest_subset for scale and holdout_per_user, and the structure (load → trim → hybrid → recommend → explain) is exactly right. Two blockers:

  1. Depends on unmerged code. build_hybrid_book_recommender exists only in #75 (still changes-requested), and recommend_with_reasons comes from #78 (merging). Neither is on main, so the walkthrough's core would ImportError today. Sequence #79 to land after #75 and #78.

  2. The benchmark claim overstates the hybrid. "HybridBook lands within a few percent of pure ItemKNN on every accuracy metric and matches it on catalog coverage" contradicts #75's actual numbers — there the hybrid was ~25-30% below ItemKNN on accuracy and below on coverage. Either align this to #75's reworked numbers once it's tuned, or reframe honestly as a trade-off (some accuracy for cold-start coverage). Don't ship a claim the benchmark doesn't support.

Hold this until #75 lands (tuned), then make the prose match the real numbers.

JJ called out that the previous prose ('within a few percent ... matches
catalog coverage') overstated HybridBook vs ItemKNN. Reframes to be
specific about which metrics are within a few percent (precision,
coverage) and which are further behind (MAP/NDCG ~10%), and keeps the
honest cold-start framing.
@Burton-David

Copy link
Copy Markdown
Owner Author

Pushed ca74377 — claim now reads 'within ~5% on precision and coverage, ~10% behind on NDCG and MAP' which matches #75's actual numbers. Agreed that #79 should wait for #75 — its merge will set the final claim, and I'll re-verify the numbers match.

@JohnJacob-coder JohnJacob-coder left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there — one required fix, then this is good.

Blocking (one word): line 3 reads "End-to-end walkthrough of the book-recommender showcase". We're scrubbing showcase/portfolio framing across the repo (the label's gone, #50 retitled, ROADMAP fixed) — the project should read as something built to be used, not a showpiece. Drop "showcase" here, e.g. "End-to-end walkthrough of the book recommender". This is the last public instance in this PR.

Verified good:

  • Gate clean: ruff / mypy / pytest pass; mkdocs build --strict exits 0.
  • API usage is correct: densest_subset(n_users=2500, n_items=3000), holdout_per_user(test_size=0.2, ...), and build_hybrid_book_recommender(tags, max_features=200) (max_features forwards to the TF-IDF vectorizer) all match the real signatures. recommender.recommenders[1] is the right way to reach the content component — .recommenders is a public attribute and content is index 1 in [collab, content]. recommend_with_reasons is on main (#78).
  • The benchmark claim now matches #75's numbers and framing — honest trade-off, no oversell.

Non-blocking: line 94 says "~10% behind on NDCG and MAP", but MAP is actually ~13% behind (NDCG ~9%). Same minor wording as #75 — tighten to "~10–15% on MAP" when convenient.

Sequencing: this imports build_hybrid_book_recommender (#75, approved and auto-merging), so it needs to land after #75 — its branch will pick that up on update. Re-request once line 3 is fixed and I'll merge it right after #75.

Two things in one merge commit:

- Resolves the mkdocs.yml nav conflict caused by #87's 'Choosing an
  algorithm' page landing on main: both entries stay, ordered after
  'Beyond accuracy' and before the API reference.

- Drops the lingering 'book-recommender showcase' phrase JJ flagged on
  the demo's first line. The whole repo is moving off showcase/portfolio
  framing — this was the last public instance in this PR.

@JohnJacob-coder JohnJacob-coder left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix confirmed — line 3 now reads "walkthrough of the book recommender" and there's no showcase/portfolio framing left in the doc. With #75 merged, build_hybrid_book_recommender is on main so the walkthrough's imports are valid. mkdocs build --strict exits 0 (nav conflict from #87 resolved cleanly). Everything else was verified on the prior pass (API usage, the recommenders[1] content access, honest numbers). LGTM.

@JohnJacob-coder JohnJacob-coder merged commit 7ec43f4 into main May 27, 2026
3 checks passed
@JohnJacob-coder JohnJacob-coder deleted the docs/book-recommender-demo branch May 27, 2026 20:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

docs: book recommender demo on goodbooks-10k

2 participants