You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Umbrella issue for a third plane of telemetry: audience analytics — how readers actually use our public sites (number of users, geography, and most-read resources), and the work to turn today's basic dashboard into a modern, interactive one that serves interested readers, grant providers, and authors.
This is the sibling to the two planes we already track. #321 tracks how each lecture series is configured (declarative, point-in-time, JSON-in-repo + Pages). #328 tracks how our builds perform and cost (ops time-series). This issue covers the third: how the sites are read. Detailed audit, research, and the working plan live in the (private) QuantEcon/project-analytics repo.
The current GA setup is fragmented and partly rotted. There are ~13 separate GA4 properties (one per series + the website) with no native roll-up on the free tier, so we cannot answer "how many total readers does QuantEcon have?" inside GA4 today. Some sites are tracking nothing: lecture-dp ships the placeholder G-XXXXXXXXXX, and lecture-econometrics-machine-learning and lecture-intro.zh-cn are commented out (see #121). The Chinese sites reuse the English measurement IDs, so Chinese readership is not separable. The live dashboard at quantecon.org/analytics-dashboard/ reports sessions mislabelled as users, over a trailing 6-month window, merged across all sites, with no most-read-pages view at all — the single most-requested feature.
Two options (sequential, not exclusive)
Option 2 — first step (improve the current setup). Rides entirely on the existing update_plots.py → JSON → Plotly pipeline, no new infrastructure: fix sessions→users and the dead trackers, pin Plotly, add a most-read-resources table, add per-series breakdown + KPI tiles + engagement, bump the cron monthly→daily, and extend the history window. A few days of work that fixes the wrong numbers and adds the missing features on infrastructure we already own.
Option 1 — modern analytics (the leading-edge target). The 2025–2026 reference architecture for a fully static yet genuinely interactive dashboard: GA4 → BigQuery (single source of truth) → pre-aggregated Parquet artifacts → DuckDB-WASM + Observable Framework / Evidence on GitHub Pages, with ECharts maps and per-series cross-filtering. No server, ~$0 hosting, fully GitHub-native. One site, three lenses: global reach for readers, multi-year growth and reach for grant providers, per-lecture most/least-read for authors.
Every Option 2 step is a foothold for Option 1 (BigQuery export is the warehouse foundation; the most-read queries become the summary-table logic; the KPI tiles re-skin into the modern front-end). So we ship Option 2 now for an immediate correct-and-useful win while BigQuery accumulates history, then swap the front-end to the modern stack.
The one time-sensitive action
GA4 → BigQuery export is forward-only — it cannot backfill raw events. Every week it stays off is history we can never recover. So we should enable it across all properties now (after fixing the dead trackers), independent of which dashboard we build. It is free at the platform level within the BigQuery free tier; we only need to confirm the busiest properties stay under the 1M-events/day batch cap.
Governance
Whether to consolidate the ~13 GA4 properties, add a cookieless collector (PostHog / Umami), split the zh-cn traffic, and how to handle the EU consent-banner posture are cross-repo, org-wide, hard-to-reverse decisions. These are tracked as the Decision: issue QuantEcon/project-analytics#1 and are a candidate for the QEP process (#325). The dashboard build itself is just normal implementation issues.
Summary
Umbrella issue for a third plane of telemetry: audience analytics — how readers actually use our public sites (number of users, geography, and most-read resources), and the work to turn today's basic dashboard into a modern, interactive one that serves interested readers, grant providers, and authors.
This is the sibling to the two planes we already track. #321 tracks how each lecture series is configured (declarative, point-in-time, JSON-in-repo + Pages). #328 tracks how our builds perform and cost (ops time-series). This issue covers the third: how the sites are read. Detailed audit, research, and the working plan live in the (private) QuantEcon/project-analytics repo.
The three planes
What the audit found
The current GA setup is fragmented and partly rotted. There are ~13 separate GA4 properties (one per series + the website) with no native roll-up on the free tier, so we cannot answer "how many total readers does QuantEcon have?" inside GA4 today. Some sites are tracking nothing:
lecture-dpships the placeholderG-XXXXXXXXXX, andlecture-econometrics-machine-learningandlecture-intro.zh-cnare commented out (see #121). The Chinese sites reuse the English measurement IDs, so Chinese readership is not separable. The live dashboard atquantecon.org/analytics-dashboard/reportssessionsmislabelled as users, over a trailing 6-month window, merged across all sites, with no most-read-pages view at all — the single most-requested feature.Two options (sequential, not exclusive)
Option 2 — first step (improve the current setup). Rides entirely on the existing
update_plots.py→ JSON → Plotly pipeline, no new infrastructure: fixsessions→users and the dead trackers, pin Plotly, add a most-read-resources table, add per-series breakdown + KPI tiles + engagement, bump the cron monthly→daily, and extend the history window. A few days of work that fixes the wrong numbers and adds the missing features on infrastructure we already own.Option 1 — modern analytics (the leading-edge target). The 2025–2026 reference architecture for a fully static yet genuinely interactive dashboard: GA4 → BigQuery (single source of truth) → pre-aggregated Parquet artifacts → DuckDB-WASM + Observable Framework / Evidence on GitHub Pages, with ECharts maps and per-series cross-filtering. No server, ~$0 hosting, fully GitHub-native. One site, three lenses: global reach for readers, multi-year growth and reach for grant providers, per-lecture most/least-read for authors.
Every Option 2 step is a foothold for Option 1 (BigQuery export is the warehouse foundation; the most-read queries become the summary-table logic; the KPI tiles re-skin into the modern front-end). So we ship Option 2 now for an immediate correct-and-useful win while BigQuery accumulates history, then swap the front-end to the modern stack.
The one time-sensitive action
GA4 → BigQuery export is forward-only — it cannot backfill raw events. Every week it stays off is history we can never recover. So we should enable it across all properties now (after fixing the dead trackers), independent of which dashboard we build. It is free at the platform level within the BigQuery free tier; we only need to confirm the busiest properties stay under the 1M-events/day batch cap.
Governance
Whether to consolidate the ~13 GA4 properties, add a cookieless collector (PostHog / Umami), split the
zh-cntraffic, and how to handle the EU consent-banner posture are cross-repo, org-wide, hard-to-reverse decisions. These are tracked as theDecision:issue QuantEcon/project-analytics#1 and are a candidate for the QEP process (#325). The dashboard build itself is just normal implementation issues.Workstreams
website-dynamic+website); folds in [user info] Integrate aggregated map and User / Month Data website#147, Add prominent usage metrics to homepage metrics bar website#196, QuantEcon/website-dynamic#6.Decision:issue above.Open questions
totalUsersvsactiveUsersas the org-wide headline definition.