Skip to content

feat(papers): live on-demand 'search recent papers' via opencite#310

Merged
neuromechanist merged 2 commits into
developfrom
feature/issue-308-live-paper-search
Jun 5, 2026
Merged

feat(papers): live on-demand 'search recent papers' via opencite#310
neuromechanist merged 2 commits into
developfrom
feature/issue-308-live-paper-search

Conversation

@neuromechanist

Copy link
Copy Markdown
Member

Summary

Implements the live paper search idea (#308): a tool that fetches the most recent literature on demand via opencite, beyond what the batch sync has indexed. Builds directly on the opencite client added in #309.

Closes #308.

What's added

  • search_<community>_papers_live — a new agent tool. It queries opencite across (openalex, s2, pubmed) sorted newest-first, returns the top results, and best-effort caches them into the community knowledge DB so future local searches find them too.
  • Opt-in & gated: controlled by a new citations.live_search config flag (default on wherever paper citations are configured). The tool description and EEGLAB prompt tell the agent to use it only when the user asks for recent / latest / new papers, or when the local index comes up short, so there's no surprise latency on ordinary questions.
  • Responsive: the opencite call is bounded by a timeout (default 20s); on timeout/error the tool returns a graceful message and the chat continues.

Design

  • search_papers_live() in papers_sync.py reuses the existing opencite client, the _run async bridge (safe under LangGraph's async ToolNode, which runs sync tools in a worker thread), _build_config, and _store_papers. API keys are read from the server's env vars (same ones OSA settings use).
  • Discovery-only, consistent with the local paper tool: results are presented as references, not used to formulate answers.

Tests

  • _paper_to_result mapping (real Paper, offline).
  • Live network test: returns recent papers and verifies they're cached into a real SQLite DB (no mocks).
  • Tool registration: present with include_live_papers=True, absent by default.
  • All 6 community configs still validate with the new flag; 513 tests pass across tools/knowledge/scheduler; ruff clean.

Verified locally

Live call returned 2026 papers, newest-first, e.g. for "EEGLAB independent component analysis".

Adds search_<community>_papers_live, an opt-in tool that queries opencite
for the most recent literature (newest first) when the user asks for
recent/new papers or the local index comes up short. Results are
best-effort cached into the community DB so future local searches find
them. Bounded by a timeout to keep chat responsive.

- search_papers_live() in papers_sync.py reuses the opencite client +
  _run async bridge; keys read from the server's env.
- Gated by citations.live_search (new config flag, default on) wherever
  paper citations are configured.
- EEGLAB prompt guides the agent to use it for recency.

Closes #308
Address PR review:
- cache live-search results in a background daemon thread so a chat
  response is never delayed (was synchronous; could block up to the
  SQLite busy timeout if the scheduler was mid-sync on the same DB).
- set opencite per-request timeout just under the overall cap so each
  source finishes/times out cleanly before wait_for cancels, avoiding
  orphaned-task error noise on the timeout path.
- test caching deterministically via _cache_papers_async (offline, real
  SQLite); live network test now just asserts result shape.
@neuromechanist

neuromechanist commented Jun 5, 2026

Copy link
Copy Markdown
Member Author

Review (Sonnet, pr-review-toolkit) + resolutions

No critical issues. The reviewer confirmed the async/timeout reasoning is sound: __aexit__ does close all httpx clients on timeout (no leak); _run's thread-offload path runs wait_for correctly; no exception escapes to the agent; gating logic is None-safe; _paper_to_result empty-title risk is negligible (opencite filters); tests are real (no behavioral mocks).

Both 'important' findings addressed (no tech debt carried forward):

  1. Cache write could block the chat response up to ~5s if the scheduler was mid-sync on the same DB. Caching now runs in a background daemon thread (_cache_papers_async) with error logging, so it never adds latency to (or can fail) the response. Tested deterministically offline via the returned joinable thread.
  2. Orphaned per-source tasks on the timeout path logged spurious errors. Mitigated downstream by setting opencite's per-request config.timeout just under the overall wait_for cap so sources finish/time-out cleanly first. The proper fix belongs in opencite (cancel its own tasks on cancellation) — filed as search(): cancel in-flight per-source tasks on cancellation/timeout neuromechanist/opencite#40.

26 offline + 1 live network test pass; ruff clean.

@neuromechanist neuromechanist merged commit da68367 into develop Jun 5, 2026
7 checks passed
@neuromechanist neuromechanist deleted the feature/issue-308-live-paper-search branch June 5, 2026 08:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant