Skip to content

Audience analytics — public-site usage telemetry (readers · reach · most-read) #329

Description

@mmcky

Summary

Umbrella issue for a third plane of telemetry: audience analytics — how readers actually use our public sites (number of users, geography, and most-read resources), and the work to turn today's basic dashboard into a modern, interactive one that serves interested readers, grant providers, and authors.

This is the sibling to the two planes we already track. #321 tracks how each lecture series is configured (declarative, point-in-time, JSON-in-repo + Pages). #328 tracks how our builds perform and cost (ops time-series). This issue covers the third: how the sites are read. Detailed audit, research, and the working plan live in the (private) QuantEcon/project-analytics repo.

The three planes

Plane Question Data Source
Config / status (#321) How is each series built? declarative config repo manifests + scrape
Server / build telemetry (#328) How do builds perform and cost? ops time-series RunsOn / CloudWatch / Actions API
Audience analytics (this issue) How do readers use the sites? behavioural time-series Google Analytics 4 (+ BigQuery)

What the audit found

The current GA setup is fragmented and partly rotted. There are ~13 separate GA4 properties (one per series + the website) with no native roll-up on the free tier, so we cannot answer "how many total readers does QuantEcon have?" inside GA4 today. Some sites are tracking nothing: lecture-dp ships the placeholder G-XXXXXXXXXX, and lecture-econometrics-machine-learning and lecture-intro.zh-cn are commented out (see #121). The Chinese sites reuse the English measurement IDs, so Chinese readership is not separable. The live dashboard at quantecon.org/analytics-dashboard/ reports sessions mislabelled as users, over a trailing 6-month window, merged across all sites, with no most-read-pages view at all — the single most-requested feature.

Two options (sequential, not exclusive)

Option 2 — first step (improve the current setup). Rides entirely on the existing update_plots.py → JSON → Plotly pipeline, no new infrastructure: fix sessions→users and the dead trackers, pin Plotly, add a most-read-resources table, add per-series breakdown + KPI tiles + engagement, bump the cron monthly→daily, and extend the history window. A few days of work that fixes the wrong numbers and adds the missing features on infrastructure we already own.

Option 1 — modern analytics (the leading-edge target). The 2025–2026 reference architecture for a fully static yet genuinely interactive dashboard: GA4 → BigQuery (single source of truth) → pre-aggregated Parquet artifacts → DuckDB-WASM + Observable Framework / Evidence on GitHub Pages, with ECharts maps and per-series cross-filtering. No server, ~$0 hosting, fully GitHub-native. One site, three lenses: global reach for readers, multi-year growth and reach for grant providers, per-lecture most/least-read for authors.

Every Option 2 step is a foothold for Option 1 (BigQuery export is the warehouse foundation; the most-read queries become the summary-table logic; the KPI tiles re-skin into the modern front-end). So we ship Option 2 now for an immediate correct-and-useful win while BigQuery accumulates history, then swap the front-end to the modern stack.

The one time-sensitive action

GA4 → BigQuery export is forward-only — it cannot backfill raw events. Every week it stays off is history we can never recover. So we should enable it across all properties now (after fixing the dead trackers), independent of which dashboard we build. It is free at the platform level within the BigQuery free tier; we only need to confirm the busiest properties stay under the 1M-events/day batch cap.

Governance

Whether to consolidate the ~13 GA4 properties, add a cookieless collector (PostHog / Umami), split the zh-cn traffic, and how to handle the EU consent-banner posture are cross-repo, org-wide, hard-to-reverse decisions. These are tracked as the Decision: issue QuantEcon/project-analytics#1 and are a candidate for the QEP process (#325). The dashboard build itself is just normal implementation issues.

Workstreams

  1. Collection foundation and cleanup — enable BigQuery export everywhere (urgent); fix dead/commented trackers ([lecture reorganisation] Update Google Analytics Codes when going live #121); GCP project + billing; confirm the 1M-events/day headroom.
  2. Option 2 — fix and enrich the current dashboard (website-dynamic + website); folds in [user info] Integrate aggregated map and User / Month Data website#147, Add prominent usage metrics to homepage metrics bar website#196, QuantEcon/website-dynamic#6.
  3. Option 1 — modern dashboard rebuild; folds in QuantEcon/website-dynamic#3.
  4. Collector and consolidation strategy — the Decision: issue above.

Open questions

  • GA4 property creation dates per series (bounds the aggregate backfill depth).
  • Daily event volume of the busiest properties vs the 1M-events/day BigQuery batch cap.
  • Owner of the GCP billing account and the analytics service account long-term.
  • totalUsers vs activeUsers as the org-wide headline definition.
  • Add a cookieless collector now, or stay GA4-only and add a consent banner?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions