Skip to content

Release: public FAQ + citation dashboards#337

Merged
neuromechanist merged 9 commits into
mainfrom
develop
Jun 10, 2026
Merged

Release: public FAQ + citation dashboards#337
neuromechanist merged 9 commits into
mainfrom
develop

Conversation

@neuromechanist

Copy link
Copy Markdown
Member

Release summary

Promotes the public FAQ and citation dashboard work from develop to production. Version automation will strip the .dev suffix and tag the release on merge (do not bump manually).

What's included

Schema / data notes

  • New citation_counts table (created via init_db; no destructive migration). The papers.cites_doi column was added earlier in this line of work.
  • Post-deploy step required: after prod deploys, run the citation sync per community to populate citation_counts on prod:
    docker exec osa python -m src.cli.main sync papers --community <id> --citations
    (FAQ/citation feeds are config-gated and off by default; only eeglab and bids are enabled.)

Validation

  • All phases individually reviewed (code/tests/error-handling) with findings addressed; CI green across Python 3.11/3.12.
  • Verified end-to-end on dev: BIDS 14/14 series, EEGLAB incl. LSL; counts match OpenAlex.

github-actions Bot and others added 9 commits June 8, 2026 18:47
* Phase 1: FAQ JSON endpoint (#324)

* feat(api): public FAQ JSON feed gated by public_feeds config

Add a top-level public_feeds config block (faq/citations flags, off by
default) and a read-only GET /{community_id}/faq endpoint that serves
generated FAQ entries from the knowledge database.

- New PublicFeedsConfig model on CommunityConfig
- list_faq_entries browse helper (no FTS query required) with pagination
- Endpoint supports q/category/min_quality/limit/offset filters
- Email addresses redacted from public output (privacy mitigation)
- Returns 404 unless public_feeds.faq is enabled

Tests: list helper (ordering, filters, pagination) and endpoint
(gate, fields, redaction, filters, validation) against real SQLite data.

* fix(faq): address PR review findings

- Unify browse + search in list_faq_entries via optional query param so
  total is the real pre-LIMIT count and offset is honored in both modes
  (fixes broken pagination on the ?q= path).
- Redact emails in tags, not just question/answer.
- Guard json.loads(tags) against malformed JSON (shared _parse_faq_tags
  helper, applied to search_faq_entries too) so a corrupt row degrades to
  empty tags instead of an unlogged 500.
- Add a broad logged 500 fallback in the endpoint alongside the 503 path.
- Set Cache-Control: public, max-age=3600, matching /metrics/public.
- Include limit/offset in the list_faq_entries sqlite error log.

Tests: project-consistent fixture, faq=False gate, 503 browse+search,
redaction across question/answer/tags, list_name filter, real search
total vs page size, Cache-Control header.

* Phase 2: Citation dashboard endpoint (#330)

* feat(api): public citations dashboard with cites_doi linkage

Record which canonical DOI each citing paper references and expose a
per-year + stacked-by-paper citation feed, opt-in per community.

- papers.cites_doi column (CREATE TABLE + _migrate_db ALTER for existing
  DBs); index created in _migrate_db so init_db stays safe on databases
  predating the column.
- upsert_paper records cites_doi; on conflict COALESCE keeps the first
  link, so a keyword sync (None) never erases it and a re-sync backfills
  legacy NULL rows.
- sync_citing_papers threads the canonical DOI through _store_papers.
- get_citation_stats aggregates total/per_year/by_paper (4-digit-year
  GLOB guard drops undated rows).
- GET /{community_id}/citations gated by public_feeds.citations, returns
  per_year, stacked by_paper, and canonical_dois from config, with
  Cache-Control and 503/500 handling matching the FAQ feed.

Backfill on deploy: run a full citation re-sync to populate cites_doi on
existing rows.

Tests: stats aggregation, COALESCE link semantics (backfill/first-wins/
no-clobber), legacy-table migration, endpoint gate/content/cache/503.

* fix(citations): address PR review findings

- Narrow the _migrate_db try to the PRAGMA only so a DDL failure (locked
  DB, I/O error) on an existing papers table propagates instead of being
  swallowed at DEBUG with a misleading 'table not found' message.
- Document the single-column cross-DOI attribution limitation on
  upsert_paper.
- Cover _store_papers threading cites_doi onto each stored row.
- Cover the canonical_dois=[] branch (feed enabled, no citations config)
  and the unexpected-error 500 path; correct the test module docstring.
Turn on public_feeds.faq and public_feeds.citations for eeglab so
GET /eeglab/faq and GET /eeglab/citations serve the generated FAQ and
citation-dashboard data added in the public-feeds epic.
* feat(citations): expose per-DOI labels in the citations feed

Add an optional paper_labels (DOI -> label) field to CitationConfig with
DOI-key normalization matching dois, and return a labels map from
GET /{community_id}/citations so consumers can show human-readable series
names instead of bare DOIs.

* feat(bids,eeglab): enable public citations feed with paper labels

- bids: turn on public_feeds.citations (FAQ stays off; no FAQ pipeline) and
  add labels for all 14 canonical BIDS papers.
- eeglab: add the Lab Streaming Layer paper (Kothe 2025, 10.1162/IMAG.a.136)
  and short labels for all canonical DOIs.

* feat(dashboard): stacked publication-citations-by-year chart

Add a Publication Citations card to the community view that renders a
stacked-by-canonical-paper bar chart from GET /{community_id}/citations,
using configured labels for the legend and an HSL fallback palette for
communities with many tracked papers (e.g. BIDS). Shown only when the
community exposes the feed and has citation data.

* fix: address PR review on citation labels + chart

- validate_paper_labels now rejects malformed DOI keys (same format check
  as dois) so a typo fails at config load instead of silently dropping a
  label; explicit last-wins dedup documented.
- dashboard: null citationsChartInstance after destroy; sort years numerically.
- tests: invalid-key raises, doi.org-prefix normalization, dedup last-wins,
  and the LSL mixed-case DOI label round-trip.
Replace the rainbow HSL fallback with a curated Tableau-derived
qualitative palette (10 saturated hues + 10 companion tones). Overflow
beyond 20 series walks the HSL wheel by the golden angle so colors stay
distinct and balanced instead of clustering.
* feat(citations): OpenAlex client + citation_counts table

Add a direct OpenAlex client (openalex_citations.py) that resolves a DOI to
its work id, returns the complete per-year citation histogram via group_by
(uncapped), and cursor-paginates the latest N citing papers sorted by
publication date. Add a citation_counts(cites_doi, year, count) table and a
replace_citation_counts helper that mirrors the histogram wholesale.

opencite caps citing-paper fetches at one page (<=200) with no pagination and
no aggregation, which silently truncated recent citations and inverted the
per-year curve; this is the foundation for fixing that.

* feat(citations): true uncapped per-year counts; store latest 2000 papers

sync_citing_papers now queries OpenAlex directly per canonical DOI: it stores
the exact, complete per-year counts in citation_counts (source of truth for the
dashboard) and upserts the latest 2000 citing papers (publication date desc)
into the papers table for the search tool. get_citation_stats reads the counts
table (empty, not error, before the first sync). The CLI decouples the citation
storage cap from the query --limit.

* test(citations): OpenAlex client, counts-based stats, end-to-end sync

- OpenAlex client tests via httpx.MockTransport (resolve/404, group_by parsing,
  cursor pagination, limit, titleless skip, error propagation).
- get_citation_stats and the endpoint now assert against citation_counts;
  add replace-overwrites and missing-table-is-empty cases.
- End-to-end sync_citing_papers test (real client + real DB, mock transport):
  stores true counts and links recent papers; unresolved DOI skipped.

* fix(citations): address PR review findings

- Critical: never wipe stored counts on an empty histogram (likely a
  transient OpenAlex gap) — skip the DOI with a warning instead.
- sync all no longer forwards --limit to citations (would re-cap the stored
  sample at 100); uses the 2000 default like sync papers.
- recent_citing_papers: bound page count and stop on an empty results page
  so a stuck/non-null cursor can't spin; build the stored URL from the
  normalized DOI for consistency.
- replace_citation_counts: explicit rollback so a DOI is never half-replaced.
- Per-DOI failure log includes the exception type.

Tests: empty-counts-does-not-wipe, empty-results stop, absent-meta stop,
normalized-URL.
#336)

* feat(citations): version groups to merge preprint + published citations

OpenAlex splits citations across a paper's preprint and published records, so
a canonical paper can undercount badly (LSL published=61 but its preprint
holds 98). Add a citations.aliases config map (primary DOI -> version DOIs);
the sync resolves every version to an OpenAlex work id and queries them as one
OR-joined, deduplicated cites: filter, attributing the merged per-year counts
to the primary DOI.

- CitationConfig.aliases (validated/normalized like dois).
- OpenAlexCitationClient.counts_by_year/recent_citing_papers accept a work-id
  group; _cites_filter OR-joins with '|'.
- sync_citing_papers builds the group from primary + aliases; CLI and
  scheduler pass community aliases.
- eeglab: LSL bioRxiv preprint; bids: BIDS Apps bioRxiv preprint.

LSL combined rises 61 -> 157.

* fix(citations): fail loud on misconfigured aliases

- Raise on an empty alias version DOI instead of silently dropping it.
- Add a model validator: every alias primary DOI must be in dois, so a
  typo'd primary fails at config load rather than silently never merging.
- _cites_filter raises on an empty work-id list (explicit precondition).
- Tests for both new validations plus the empty-filter guard.

* style: ruff format community.py
@github-actions

Copy link
Copy Markdown
Contributor

Dashboard Preview

Name Link
Preview URL https://develop.osa-dash.pages.dev
Branch develop
Commit ba274a6

This preview will be updated automatically when you push new commits.

@neuromechanist neuromechanist merged commit 147c5b6 into main Jun 10, 2026
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant