Release: public FAQ + citation dashboards#337
Merged
Merged
Conversation
* Phase 1: FAQ JSON endpoint (#324) * feat(api): public FAQ JSON feed gated by public_feeds config Add a top-level public_feeds config block (faq/citations flags, off by default) and a read-only GET /{community_id}/faq endpoint that serves generated FAQ entries from the knowledge database. - New PublicFeedsConfig model on CommunityConfig - list_faq_entries browse helper (no FTS query required) with pagination - Endpoint supports q/category/min_quality/limit/offset filters - Email addresses redacted from public output (privacy mitigation) - Returns 404 unless public_feeds.faq is enabled Tests: list helper (ordering, filters, pagination) and endpoint (gate, fields, redaction, filters, validation) against real SQLite data. * fix(faq): address PR review findings - Unify browse + search in list_faq_entries via optional query param so total is the real pre-LIMIT count and offset is honored in both modes (fixes broken pagination on the ?q= path). - Redact emails in tags, not just question/answer. - Guard json.loads(tags) against malformed JSON (shared _parse_faq_tags helper, applied to search_faq_entries too) so a corrupt row degrades to empty tags instead of an unlogged 500. - Add a broad logged 500 fallback in the endpoint alongside the 503 path. - Set Cache-Control: public, max-age=3600, matching /metrics/public. - Include limit/offset in the list_faq_entries sqlite error log. Tests: project-consistent fixture, faq=False gate, 503 browse+search, redaction across question/answer/tags, list_name filter, real search total vs page size, Cache-Control header. * Phase 2: Citation dashboard endpoint (#330) * feat(api): public citations dashboard with cites_doi linkage Record which canonical DOI each citing paper references and expose a per-year + stacked-by-paper citation feed, opt-in per community. - papers.cites_doi column (CREATE TABLE + _migrate_db ALTER for existing DBs); index created in _migrate_db so init_db stays safe on databases predating the column. - upsert_paper records cites_doi; on conflict COALESCE keeps the first link, so a keyword sync (None) never erases it and a re-sync backfills legacy NULL rows. - sync_citing_papers threads the canonical DOI through _store_papers. - get_citation_stats aggregates total/per_year/by_paper (4-digit-year GLOB guard drops undated rows). - GET /{community_id}/citations gated by public_feeds.citations, returns per_year, stacked by_paper, and canonical_dois from config, with Cache-Control and 503/500 handling matching the FAQ feed. Backfill on deploy: run a full citation re-sync to populate cites_doi on existing rows. Tests: stats aggregation, COALESCE link semantics (backfill/first-wins/ no-clobber), legacy-table migration, endpoint gate/content/cache/503. * fix(citations): address PR review findings - Narrow the _migrate_db try to the PRAGMA only so a DDL failure (locked DB, I/O error) on an existing papers table propagates instead of being swallowed at DEBUG with a misleading 'table not found' message. - Document the single-column cross-DOI attribution limitation on upsert_paper. - Cover _store_papers threading cites_doi onto each stored row. - Cover the canonical_dois=[] branch (feed enabled, no citations config) and the unexpected-error 500 path; correct the test module docstring.
Turn on public_feeds.faq and public_feeds.citations for eeglab so GET /eeglab/faq and GET /eeglab/citations serve the generated FAQ and citation-dashboard data added in the public-feeds epic.
* feat(citations): expose per-DOI labels in the citations feed
Add an optional paper_labels (DOI -> label) field to CitationConfig with
DOI-key normalization matching dois, and return a labels map from
GET /{community_id}/citations so consumers can show human-readable series
names instead of bare DOIs.
* feat(bids,eeglab): enable public citations feed with paper labels
- bids: turn on public_feeds.citations (FAQ stays off; no FAQ pipeline) and
add labels for all 14 canonical BIDS papers.
- eeglab: add the Lab Streaming Layer paper (Kothe 2025, 10.1162/IMAG.a.136)
and short labels for all canonical DOIs.
* feat(dashboard): stacked publication-citations-by-year chart
Add a Publication Citations card to the community view that renders a
stacked-by-canonical-paper bar chart from GET /{community_id}/citations,
using configured labels for the legend and an HSL fallback palette for
communities with many tracked papers (e.g. BIDS). Shown only when the
community exposes the feed and has citation data.
* fix: address PR review on citation labels + chart
- validate_paper_labels now rejects malformed DOI keys (same format check
as dois) so a typo fails at config load instead of silently dropping a
label; explicit last-wins dedup documented.
- dashboard: null citationsChartInstance after destroy; sort years numerically.
- tests: invalid-key raises, doi.org-prefix normalization, dedup last-wins,
and the LSL mixed-case DOI label round-trip.
Replace the rainbow HSL fallback with a curated Tableau-derived qualitative palette (10 saturated hues + 10 companion tones). Overflow beyond 20 series walks the HSL wheel by the golden angle so colors stay distinct and balanced instead of clustering.
* feat(citations): OpenAlex client + citation_counts table Add a direct OpenAlex client (openalex_citations.py) that resolves a DOI to its work id, returns the complete per-year citation histogram via group_by (uncapped), and cursor-paginates the latest N citing papers sorted by publication date. Add a citation_counts(cites_doi, year, count) table and a replace_citation_counts helper that mirrors the histogram wholesale. opencite caps citing-paper fetches at one page (<=200) with no pagination and no aggregation, which silently truncated recent citations and inverted the per-year curve; this is the foundation for fixing that. * feat(citations): true uncapped per-year counts; store latest 2000 papers sync_citing_papers now queries OpenAlex directly per canonical DOI: it stores the exact, complete per-year counts in citation_counts (source of truth for the dashboard) and upserts the latest 2000 citing papers (publication date desc) into the papers table for the search tool. get_citation_stats reads the counts table (empty, not error, before the first sync). The CLI decouples the citation storage cap from the query --limit. * test(citations): OpenAlex client, counts-based stats, end-to-end sync - OpenAlex client tests via httpx.MockTransport (resolve/404, group_by parsing, cursor pagination, limit, titleless skip, error propagation). - get_citation_stats and the endpoint now assert against citation_counts; add replace-overwrites and missing-table-is-empty cases. - End-to-end sync_citing_papers test (real client + real DB, mock transport): stores true counts and links recent papers; unresolved DOI skipped. * fix(citations): address PR review findings - Critical: never wipe stored counts on an empty histogram (likely a transient OpenAlex gap) — skip the DOI with a warning instead. - sync all no longer forwards --limit to citations (would re-cap the stored sample at 100); uses the 2000 default like sync papers. - recent_citing_papers: bound page count and stop on an empty results page so a stuck/non-null cursor can't spin; build the stored URL from the normalized DOI for consistency. - replace_citation_counts: explicit rollback so a DOI is never half-replaced. - Per-DOI failure log includes the exception type. Tests: empty-counts-does-not-wipe, empty-results stop, absent-meta stop, normalized-URL.
#336) * feat(citations): version groups to merge preprint + published citations OpenAlex splits citations across a paper's preprint and published records, so a canonical paper can undercount badly (LSL published=61 but its preprint holds 98). Add a citations.aliases config map (primary DOI -> version DOIs); the sync resolves every version to an OpenAlex work id and queries them as one OR-joined, deduplicated cites: filter, attributing the merged per-year counts to the primary DOI. - CitationConfig.aliases (validated/normalized like dois). - OpenAlexCitationClient.counts_by_year/recent_citing_papers accept a work-id group; _cites_filter OR-joins with '|'. - sync_citing_papers builds the group from primary + aliases; CLI and scheduler pass community aliases. - eeglab: LSL bioRxiv preprint; bids: BIDS Apps bioRxiv preprint. LSL combined rises 61 -> 157. * fix(citations): fail loud on misconfigured aliases - Raise on an empty alias version DOI instead of silently dropping it. - Add a model validator: every alias primary DOI must be in dois, so a typo'd primary fails at config load rather than silently never merging. - _cites_filter raises on an empty work-id list (explicit precondition). - Tests for both new validations plus the empty-filter guard. * style: ruff format community.py
Contributor
Dashboard Preview
This preview will be updated automatically when you push new commits. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Release summary
Promotes the public FAQ and citation dashboard work from
developto production. Version automation will strip the.devsuffix and tag the release on merge (do not bump manually).What's included
GET /{community_id}/faqover generated FAQ entries, gated bypublic_feeds.faq, with filters, pagination, and email redaction.GET /{community_id}/citationsand a stacked "Publication Citations by year" chart on the status dashboard.group_by(complete per-year histogram) in a newcitation_countstable, instead of an oldest-first capped sample; the latest 2,000 citing papers are stored for the search corpus.Schema / data notes
citation_countstable (created viainit_db; no destructive migration). Thepapers.cites_doicolumn was added earlier in this line of work.citation_countson prod:docker exec osa python -m src.cli.main sync papers --community <id> --citations(FAQ/citation feeds are config-gated and off by default; only eeglab and bids are enabled.)
Validation