Skip to content

refactor(papers): use opencite for paper sync + declare dependency #307

Description

@neuromechanist

Goal

Refactor the homegrown paper fetch layer in src/knowledge/papers_sync.py to use neuromechanist/opencite (published on PyPI), and declare the dependency so GitHub's dependency graph attributes OSA as a downstream of opencite.

Why

papers_sync.py hand-rolls fetching from 3 sources (OpenAlex via pyalex, Semantic Scholar via httpx, PubMed via E-utilities XML). opencite is a maintained superset: 10+ deduplicated sources, a rich Paper model, citation-graph traversal, BibTeX, and PDF retrieval. opencite was inspired by this code and will be the maintained home for paper tooling.

Scope (this issue: sync/fetch layer only)

  • Replace sync_openalex_papers / sync_semanticscholar_papers / sync_pubmed_papers fetching with opencite's SearchOrchestrator.search(...).
  • Replace sync_citing_papers with opencite's CitationExplorer.citing_papers(...).
  • Keep the local SQLite + FTS store and upsert_paper(...) write path unchanged.
  • Keep the search_<community>_papers tool unchanged (its retrieval bug is fixed separately in fix(search): paper/knowledge search returns nothing for multi-word queries #305/fix(search): match multi-word FTS queries #306).
  • Map OSA's configured API keys (OpenAlex/S2/PubMed) into opencite's Config.
  • Bridge opencite's async API to the existing sync sync-pipeline call sites.

Attribution

  • Add opencite>=<latest> to the server optional-dependencies in pyproject.toml. GitHub's dependency graph reads pyproject manifests, so OSA shows up under opencite's "Dependents"/"Used by".

Out of scope (tracked separately)

  • Live on-demand "search most recent papers" feature.
  • Exposing citation-graph / canonical / BibTeX as new agent tools.

Acceptance

  • Paper sync produces equivalent-or-better coverage for existing community queries (HED, EEGLAB) with no schema change.
  • opencite declared as a dependency and resolvable via uv sync --extra server.
  • Real tests (no mocks) covering the opencite -> upsert_paper mapping.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Priority 2: Important, fix when possibleenhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions