docs: end-to-end book recommender walkthrough on goodbooks-10k#79
Conversation
A worked example for the docs site that mirrors the goodbooks benchmark pipeline at a smaller-than-benchmark scale: load + tag table, trim to the dense subset, per-user holdout split, fit the hybrid book recommender, recommend, evaluate, and explain. Mentions the research-only license caveat front-and-center. Closes #47
JohnJacob-coder
left a comment
There was a problem hiding this comment.
Good walkthrough — uses densest_subset for scale and holdout_per_user, and the structure (load → trim → hybrid → recommend → explain) is exactly right. Two blockers:
-
Depends on unmerged code.
build_hybrid_book_recommenderexists only in #75 (still changes-requested), andrecommend_with_reasonscomes from #78 (merging). Neither is onmain, so the walkthrough's core wouldImportErrortoday. Sequence #79 to land after #75 and #78. -
The benchmark claim overstates the hybrid. "HybridBook lands within a few percent of pure ItemKNN on every accuracy metric and matches it on catalog coverage" contradicts #75's actual numbers — there the hybrid was ~25-30% below ItemKNN on accuracy and below on coverage. Either align this to #75's reworked numbers once it's tuned, or reframe honestly as a trade-off (some accuracy for cold-start coverage). Don't ship a claim the benchmark doesn't support.
Hold this until #75 lands (tuned), then make the prose match the real numbers.
JJ called out that the previous prose ('within a few percent ... matches
catalog coverage') overstated HybridBook vs ItemKNN. Reframes to be
specific about which metrics are within a few percent (precision,
coverage) and which are further behind (MAP/NDCG ~10%), and keeps the
honest cold-start framing.
JohnJacob-coder
left a comment
There was a problem hiding this comment.
Almost there — one required fix, then this is good.
Blocking (one word): line 3 reads "End-to-end walkthrough of the book-recommender showcase". We're scrubbing showcase/portfolio framing across the repo (the label's gone, #50 retitled, ROADMAP fixed) — the project should read as something built to be used, not a showpiece. Drop "showcase" here, e.g. "End-to-end walkthrough of the book recommender". This is the last public instance in this PR.
Verified good:
- Gate clean: ruff / mypy / pytest pass;
mkdocs build --strictexits 0. - API usage is correct:
densest_subset(n_users=2500, n_items=3000),holdout_per_user(test_size=0.2, ...), andbuild_hybrid_book_recommender(tags, max_features=200)(max_features forwards to the TF-IDF vectorizer) all match the real signatures.recommender.recommenders[1]is the right way to reach the content component —.recommendersis a public attribute and content is index 1 in[collab, content].recommend_with_reasonsis on main (#78). - The benchmark claim now matches #75's numbers and framing — honest trade-off, no oversell.
Non-blocking: line 94 says "~10% behind on NDCG and MAP", but MAP is actually ~13% behind (NDCG ~9%). Same minor wording as #75 — tighten to "~10–15% on MAP" when convenient.
Sequencing: this imports build_hybrid_book_recommender (#75, approved and auto-merging), so it needs to land after #75 — its branch will pick that up on update. Re-request once line 3 is fixed and I'll merge it right after #75.
Two things in one merge commit: - Resolves the mkdocs.yml nav conflict caused by #87's 'Choosing an algorithm' page landing on main: both entries stay, ordered after 'Beyond accuracy' and before the API reference. - Drops the lingering 'book-recommender showcase' phrase JJ flagged on the demo's first line. The whole repo is moving off showcase/portfolio framing — this was the last public instance in this PR.
JohnJacob-coder
left a comment
There was a problem hiding this comment.
Fix confirmed — line 3 now reads "walkthrough of the book recommender" and there's no showcase/portfolio framing left in the doc. With #75 merged, build_hybrid_book_recommender is on main so the walkthrough's imports are valid. mkdocs build --strict exits 0 (nav conflict from #87 resolved cleanly). Everything else was verified on the prior pass (API usage, the recommenders[1] content access, honest numbers). LGTM.
Closes #47. The final book-rec showcase artifact: a worked example on the docs site that mirrors the goodbooks benchmark pipeline at a smaller-than-benchmark scale.
Contents
docs/book_recommender_demo.mdwalks through:densest_subset(n_users=2500, n_items=3000)holdout_per_user(new in Add holdout_per_user split for fair top-N evaluation #71)build_hybrid_book_recommender(tags, max_features=200)(from feat: hybrid collaborative + content book recommender #75)precision_at_k/ndcg_at_kcontent.recommend_with_reasons(...)(from feat: explainable recommendations on ContentBased #78)mkdocs.ymladds the new page to the nav between Quickstart and API Reference.Ordering caveat
Two of my open PRs supply APIs the demo uses:
build_hybrid_book_recommenderContentBased.recommend_with_reasonsandexplainCode blocks in the demo are plain markdown —
mkdocs build --strictis clean even without those landing — but the demo doesn't function end-to-end until both merge. Safe to land in any order; the published site is fully accurate oncemainhas all three.Local checks
Closes #47