Audience analytics — public-site usage telemetry (readers · reach · most-read)

## Summary

Umbrella issue for a third plane of telemetry: **audience analytics** — how readers actually use our public sites (number of users, geography, and most-read resources), and the work to turn today's basic dashboard into a modern, interactive one that serves interested readers, grant providers, and authors.

This is the sibling to the two planes we already track. #321 tracks **how each lecture series is configured** (declarative, point-in-time, JSON-in-repo + Pages). #328 tracks **how our builds perform and cost** (ops time-series). This issue covers the third: **how the sites are read**. Detailed audit, research, and the working plan live in the (private) [QuantEcon/project-analytics](https://github.com/QuantEcon/project-analytics) repo.

## The three planes

| Plane | Question | Data | Source |
|---|---|---|---|
| Config / status (#321) | How is each series *built*? | declarative config | repo manifests + scrape |
| Server / build telemetry (#328) | How do builds *perform and cost*? | ops time-series | RunsOn / CloudWatch / Actions API |
| **Audience analytics (this issue)** | How do readers *use* the sites? | behavioural time-series | Google Analytics 4 (+ BigQuery) |

## What the audit found

The current GA setup is fragmented and partly rotted. There are ~13 separate GA4 properties (one per series + the website) with **no native roll-up** on the free tier, so we cannot answer "how many total readers does QuantEcon have?" inside GA4 today. Some sites are tracking nothing: `lecture-dp` ships the placeholder `G-XXXXXXXXXX`, and `lecture-econometrics-machine-learning` and `lecture-intro.zh-cn` are commented out (see #121). The Chinese sites reuse the English measurement IDs, so Chinese readership is not separable. The live dashboard at `quantecon.org/analytics-dashboard/` reports `sessions` mislabelled as users, over a trailing 6-month window, merged across all sites, with **no most-read-pages view at all** — the single most-requested feature.

## Two options (sequential, not exclusive)

**Option 2 — first step (improve the current setup).** Rides entirely on the existing `update_plots.py` → JSON → Plotly pipeline, no new infrastructure: fix `sessions`→users and the dead trackers, pin Plotly, add a most-read-resources table, add per-series breakdown + KPI tiles + engagement, bump the cron monthly→daily, and extend the history window. A few days of work that fixes the wrong numbers and adds the missing features on infrastructure we already own.

**Option 1 — modern analytics (the leading-edge target).** The 2025–2026 reference architecture for a fully static yet genuinely interactive dashboard: GA4 → BigQuery (single source of truth) → pre-aggregated Parquet artifacts → **DuckDB-WASM + Observable Framework / Evidence** on GitHub Pages, with ECharts maps and per-series cross-filtering. No server, ~$0 hosting, fully GitHub-native. One site, three lenses: global reach for readers, multi-year growth and reach for grant providers, per-lecture most/least-read for authors.

Every Option 2 step is a foothold for Option 1 (BigQuery export is the warehouse foundation; the most-read queries become the summary-table logic; the KPI tiles re-skin into the modern front-end). So we ship Option 2 now for an immediate correct-and-useful win while BigQuery accumulates history, then swap the front-end to the modern stack.

## The one time-sensitive action

GA4 → BigQuery export is **forward-only** — it cannot backfill raw events. Every week it stays off is history we can never recover. So we should enable it across all properties now (after fixing the dead trackers), independent of which dashboard we build. It is free at the platform level within the BigQuery free tier; we only need to confirm the busiest properties stay under the 1M-events/day batch cap.

## Governance

Whether to **consolidate the ~13 GA4 properties**, **add a cookieless collector** (PostHog / Umami), split the `zh-cn` traffic, and how to handle the EU consent-banner posture are cross-repo, org-wide, hard-to-reverse decisions. These are tracked as the `Decision:` issue [QuantEcon/project-analytics#1](https://github.com/QuantEcon/project-analytics/issues/1) and are a candidate for the QEP process (#325). The dashboard build itself is just normal implementation issues.

## Workstreams

1. Collection foundation and cleanup — enable BigQuery export everywhere (urgent); fix dead/commented trackers (#121); GCP project + billing; confirm the 1M-events/day headroom.
2. Option 2 — fix and enrich the current dashboard (`website-dynamic` + `website`); folds in QuantEcon/website#147, QuantEcon/website#196, QuantEcon/website-dynamic#6.
3. Option 1 — modern dashboard rebuild; folds in QuantEcon/website-dynamic#3.
4. Collector and consolidation strategy — the `Decision:` issue above.

## Open questions

- GA4 property creation dates per series (bounds the aggregate backfill depth).
- Daily event volume of the busiest properties vs the 1M-events/day BigQuery batch cap.
- Owner of the GCP billing account and the analytics service account long-term.
- `totalUsers` vs `activeUsers` as the org-wide headline definition.
- Add a cookieless collector now, or stay GA4-only and add a consent banner?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Audience analytics — public-site usage telemetry (readers · reach · most-read) #329

Summary

The three planes

What the audit found

Two options (sequential, not exclusive)

The one time-sensitive action

Governance

Workstreams

Open questions

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Plane	Question	Data	Source
Config / status (#321)	How is each series built?	declarative config	repo manifests + scrape
Server / build telemetry (#328)	How do builds perform and cost?	ops time-series	RunsOn / CloudWatch / Actions API
Audience analytics (this issue)	How do readers use the sites?	behavioural time-series	Google Analytics 4 (+ BigQuery)

Uh oh!

Uh oh!

Audience analytics — public-site usage telemetry (readers · reach · most-read) #329

Description

Summary

The three planes

What the audit found

Two options (sequential, not exclusive)

The one time-sensitive action

Governance

Workstreams

Open questions

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions