diff --git a/docs/data-analysis/data-topics/querying-experiment-data.md b/docs/data-analysis/data-topics/querying-experiment-data.md new file mode 100644 index 000000000..76efa7c09 --- /dev/null +++ b/docs/data-analysis/data-topics/querying-experiment-data.md @@ -0,0 +1,236 @@ +--- +id: querying-experiment-data +title: Querying Experiment Data in BigQuery +slug: /data-analysis/data-topics/querying-experiment-data +--- + +This guide provides SQL patterns for two common tasks: querying Jetstream's pre-computed results tables, and querying raw telemetry for clients enrolled in your experiment. + +## Querying Jetstream Results + +Jetstream writes statistical results and per-client enrollment data to BigQuery in the `mozanalysis` dataset within the `moz-fx-data-experiments` project. + +### Table Naming + +Experiment slug hyphens are converted to underscores. For an experiment with slug `my-experiment-slug`: + +| Table Pattern | Example | Contents | +|---|---|---| +| `statistics_{slug}_{period}` | `statistics_my_experiment_slug_weekly` | Statistical comparisons (point estimates, CIs) | +| `statistics_{slug}_day_{N}` | `statistics_my_experiment_slug_day_7` | Per-day statistical snapshots | +| `{slug}_enrollments_{period}` | `my_experiment_slug_enrollments_weekly` | Per-client raw metric data (enrollment basis) | +| `{slug}_exposures_{period}` | `my_experiment_slug_exposures_weekly` | Per-client raw metric data (exposure basis) | +| `enrollments_{slug}` | `enrollments_my_experiment_slug` | Base enrollment table | + +Available periods: `daily`, `weekly`, `overall_1`, `day_{N}`, `week_{N}`, `preenrollment_week_1`, `preenrollment_days28_1` + +### Statistics Table Schema + +The `statistics_*` tables contain the computed results that appear on the Experimenter results page: + +| Column | Description | +|--------|-------------| +| `metric` | Metric name (e.g., `active_hours`, `days_of_use`, `retained`) | +| `statistic` | Statistical method (`binomial`, `mean`, `deciles`, etc.) | +| `branch` | Branch name | +| `comparison` | `NULL` (absolute), `difference`, or `relative_uplift` | +| `comparison_to_branch` | Which branch is the baseline for this comparison | +| `point` | Point estimate | +| `lower` | Lower bound of confidence interval | +| `upper` | Upper bound of confidence interval | +| `ci_width` | Confidence interval width (typically 0.95) | +| `segment` | Segment name (`all` for the full population) | +| `analysis_basis` | `enrollments` or `exposures` | +| `window_index` | Analysis window index (for daily/weekly periods) | + +### Example: Pull Overall Results for a Metric + +```sql +SELECT + metric, + branch, + comparison, + comparison_to_branch, + point, + lower, + upper, + segment +FROM `moz-fx-data-experiments.mozanalysis.statistics_my_experiment_slug_overall_1` +WHERE metric = 'active_hours' + AND statistic = 'mean' + AND segment = 'all' +ORDER BY branch, comparison +``` + +### Example: Check Weekly Retention Across Windows + +```sql +SELECT + window_index, + branch, + point, + lower, + upper +FROM `moz-fx-data-experiments.mozanalysis.statistics_my_experiment_slug_weekly` +WHERE metric = 'retained' + AND statistic = 'binomial' + AND comparison = 'relative_uplift' + AND comparison_to_branch = 'control' + AND segment = 'all' +ORDER BY window_index, branch +``` + +### Example: List All Result Tables for an Experiment + +```sql +SELECT table_name +FROM `moz-fx-data-experiments.mozanalysis`.INFORMATION_SCHEMA.TABLES +WHERE table_name LIKE '%my_experiment_slug%' +ORDER BY table_name +``` + +## Querying Raw Telemetry for Experiment Users + +For ad-hoc analysis beyond what Jetstream computes, you can query raw telemetry tables and filter to clients enrolled in your experiment. + +### Filtering by Experiment Enrollment + +All Glean ping tables include a `ping_info.experiments` field — a map of experiment slug to branch assignment. Use this to filter telemetry to enrolled clients: + +```sql +SELECT * +FROM `mozdata.firefox_desktop.newtab` +WHERE DATE(submission_timestamp) = '2025-01-15' + AND ping_info.experiments['my-experiment-slug'].branch IS NOT NULL +``` + +To filter to a specific branch: + +```sql +WHERE ping_info.experiments['my-experiment-slug'].branch = 'treatment' +``` + +:::info +The `events_stream` derived table also has an `experiments` column with the same structure, accessible as `experiments['slug'].branch`. +::: + +### Example: Scalar Metric for Enrolled Users + +Query a boolean metric from the `metrics` ping table for clients in your experiment: + +```sql +SELECT + ping_info.experiments['my-experiment-slug'].branch AS branch, + COUNTIF(metrics.boolean.newtab_search_enabled) AS search_enabled_count, + COUNT(*) AS total_pings +FROM `mozdata.firefox_desktop.metrics` +WHERE DATE(submission_timestamp) BETWEEN '2025-01-15' AND '2025-01-22' + AND ping_info.experiments['my-experiment-slug'].branch IS NOT NULL +GROUP BY 1 +``` + +### Example: Event from a Custom Ping (UNNEST Pattern) + +Events in custom pings live in that table's `events` array. Unnest to access them: + +```sql +SELECT + ping_info.experiments['my-experiment-slug'].branch AS branch, + e.name AS event_name, + (SELECT value FROM UNNEST(e.extra) WHERE key = 'is_sponsored') AS is_sponsored, + COUNT(*) AS event_count +FROM `mozdata.firefox_desktop.newtab`, +UNNEST(events) AS e +WHERE DATE(submission_timestamp) BETWEEN '2025-01-15' AND '2025-01-22' + AND ping_info.experiments['my-experiment-slug'].branch IS NOT NULL + AND e.category = 'pocket' + AND e.name = 'click' +GROUP BY 1, 2, 3 +``` + +### Example: Event from `events_stream` + +Events from the built-in `events` ping are pre-unnested in `events_stream`: + +```sql +SELECT + JSON_EXTRACT_SCALAR(event_extra, '$.experiment') AS experiment, + JSON_EXTRACT_SCALAR(event_extra, '$.branch') AS branch, + event_name, + COUNT(*) AS event_count +FROM `mozdata.firefox_desktop.events_stream` +WHERE DATE(submission_timestamp) BETWEEN '2025-01-15' AND '2025-01-22' + AND event_category = 'nimbus_events' + AND event_name = 'exposure' +GROUP BY 1, 2, 3 +``` + +:::caution +`events_stream` only contains events from the built-in `events` ping. If the event you're looking for is sent to a custom ping (like `newtab`), you need to query that ping's table using the UNNEST pattern above. Check the metric's `send_in_pings` field to know which table to query. See [Finding Telemetry in BigQuery](/data-analysis/data-topics/telemetry-discovery) for details. +::: + +## Common Patterns + +### Date Partitioning (Required) + +All BigQuery telemetry tables are partitioned by `submission_timestamp`. Every query **must** include a date filter for cost control: + +```sql +WHERE DATE(submission_timestamp) = '2025-01-15' +-- or for a range: +WHERE DATE(submission_timestamp) BETWEEN '2025-01-15' AND '2025-01-22' +``` + +### Extracting Values from Event Extras + +Event extra data is stored as key-value pairs. The syntax differs between custom ping events and `events_stream`: + +```sql +-- In custom ping tables (events are ARRAY>): +(SELECT value FROM UNNEST(e.extra) WHERE key = 'my_key') AS my_value + +-- In events_stream (event_extra is JSON): +JSON_EXTRACT_SCALAR(event_extra, '$.my_key') AS my_value +``` + +### Using `sample_id` for Cheaper Dev Queries + +Every table has a `sample_id` column (0–99) derived from the client_id hash. Use it to run queries on a fraction of data while developing: + +```sql +WHERE DATE(submission_timestamp) = '2025-01-15' + AND sample_id = 0 -- ~1% of clients +``` + +### Filtering by Channel, Version, or OS + +```sql +WHERE normalized_channel = 'release' + AND client_info.app_display_version LIKE '134%' + AND normalized_os = 'Windows' +``` + +### Live Tables for Same-Day Data + +Each ping has a live table with streaming latency (seconds) but only 30-day retention. Useful for monitoring experiments in real time: + +```sql +SELECT COUNT(*) AS events_last_hour +FROM `mozdata.firefox_desktop.newtab_live`, +UNNEST(events) AS e +WHERE submission_timestamp > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR) + AND e.category = 'pocket' + AND e.name = 'click' +``` + +:::info +Live tables have flattened column names — `client_id` instead of `client_info.client_id`. Not all columns from the stable view are available. +::: + +## Tips and Pitfalls + +- **Always include a date partition filter.** Queries without one scan the entire table history and can be very expensive. +- **`client_info.client_id` vs `client_id`**: In most stable tables, client_id is nested under `client_info`. In `events_stream` and live tables, it's a top-level column. Check the table schema if you get a column-not-found error. +- **`events_stream` only has `events` ping events.** This is the most common mistake — looking for custom-ping events in `events_stream` and getting zero results. Check `send_in_pings` in the metric definition. +- **Slug hyphens become underscores in table names.** `my-experiment` becomes `statistics_my_experiment_overall_1`. +- **Use `mozdata` views, not `_stable` tables.** The `mozdata.*` views add useful computed columns and normalize metadata. Only use `_stable` tables if you need raw data. diff --git a/docs/data-analysis/data-topics/telemetry-discovery.md b/docs/data-analysis/data-topics/telemetry-discovery.md new file mode 100644 index 000000000..01301ea20 --- /dev/null +++ b/docs/data-analysis/data-topics/telemetry-discovery.md @@ -0,0 +1,236 @@ +--- +id: telemetry-discovery +title: Finding Telemetry in BigQuery +slug: /data-analysis/data-topics/telemetry-discovery +--- + +When you need to measure something in an experiment, the first challenge is finding the right BigQuery table and column for the telemetry you care about. This guide explains how Firefox telemetry flows into BigQuery and how to trace any metric from its source definition to a queryable column. + +## How Telemetry Flows to BigQuery + +Firefox telemetry goes through a multi-stage pipeline before it reaches BigQuery: + +``` +metrics.yaml → Glean SDK → Pings → Ingestion Pipeline → BigQuery Tables +``` + +1. **`metrics.yaml`** files in the application source define what gets recorded (metric name, type, which ping carries it) +2. **`pings.yaml`** files define the ping types that carry metrics to the server +3. The **Glean SDK** serializes metrics into JSON payloads and sends them as pings +4. The **ingestion pipeline** validates payloads and writes them into BigQuery +5. **bigquery-etl** generates user-facing views and derived tables (like `events_stream` and `clients_daily`) + +## How Pings Become Tables + +Each ping type defined in `pings.yaml` becomes its own BigQuery table. The app ID determines the dataset name, and the ping name determines the table name within that dataset. + +For Firefox Desktop (`firefox-desktop`): + +- **Stable tables:** `moz-fx-data-shared-prod.firefox_desktop_stable._v1` +- **User-facing views:** `mozdata.firefox_desktop.` + +Ping names undergo hyphen-to-underscore conversion: `newtab-content` becomes `newtab_content`. + +### Built-in Glean Pings + +Every Glean application automatically gets these pings: + +| Ping | BigQuery Table | What It Carries | When Sent | +|------|----------------|-----------------|-----------| +| `baseline` | `.baseline` | Library-managed metrics (duration, etc.) | App becomes active/inactive | +| `metrics` | `.metrics` | All non-event metrics with default `send_in_pings` | Daily at 4AM | +| `events` | `.events` / `events_stream` | All event-type metrics with default `send_in_pings` | App inactive or 500 events batched | +| `deletion-request` | `.deletion_request` | Signals: delete user data | User opts out of telemetry | + +### Custom Pings + +Applications can define additional custom pings. For example, Firefox Desktop defines ~33 custom pings including: + +| Custom Ping | BigQuery Table | Purpose | +|-------------|----------------|---------| +| `newtab` | `firefox_desktop.newtab` | Per-session New Tab instrumentation | +| `top-sites` | `firefox_desktop.top_sites` | Top Sites events (no client_id) | +| `urlbar-events` | `firefox_desktop.urlbar_events` | Address bar interaction events | + +You can find all custom pings for an app in the [Glean Dictionary](https://dictionary.telemetry.mozilla.org/). + +## How Metrics Become Columns + +The `send_in_pings` field in a metric's `metrics.yaml` definition determines which table the metric lands in. The column name follows a formula: + +``` +metrics.._ +``` + +Where: + +- **``** is the Glean type: `boolean`, `string`, `counter`, `quantity`, `labeled_counter`, `timing_distribution`, `string_list`, `uuid`, `url`, `text`, `object` +- **``** is the YAML category key (e.g., `newtab`, `pocket`, `topsites`) +- **``** is the YAML metric key within the category +- Dots in category names become underscores: `newtab.search` becomes `newtab_search` + +### Examples + +Given this `metrics.yaml` definition: + +```yaml +newtab: + locale: + type: string + send_in_pings: + - newtab +``` + +The resulting BigQuery column is `metrics.string.newtab_locale` in the `firefox_desktop.newtab` table. + +Here are more examples showing how the mapping works: + +| metrics.yaml Definition | send_in_pings | BigQuery Table | BigQuery Column | +|---|---|---|---| +| `newtab.locale` (string) | `newtab` | `firefox_desktop.newtab` | `metrics.string.newtab_locale` | +| `newtab.search_enabled` (boolean) | `newtab` | `firefox_desktop.newtab` | `metrics.boolean.newtab_search_enabled` | +| `pocket.enabled` (boolean) | `newtab` | `firefox_desktop.newtab` | `metrics.boolean.pocket_enabled` | +| `topsites.rows` (quantity) | `newtab` | `firefox_desktop.newtab` | `metrics.quantity.topsites_rows` | +| `newtab.activity_stream_ctor_success` (boolean) | *(default)* | `firefox_desktop.metrics` | `metrics.boolean.newtab_activity_stream_ctor_success` | + +When `send_in_pings` is omitted, non-event metrics default to the built-in `metrics` ping and event metrics default to the built-in `events` ping. + +## How `send_in_pings` Determines the Table + +The routing rules are: + +| `send_in_pings` value | Metric type | Where it lands | +|---|---|---| +| Custom ping name (e.g., `newtab`) | Any | That ping's table (e.g., `firefox_desktop.newtab`) | +| `events` | Event | `firefox_desktop.events` raw table, or query via `events_stream` | +| `metrics` | Non-event | `firefox_desktop.metrics` | +| Not specified (default) | Event | Built-in `events` ping → `events_stream` | +| Not specified (default) | Non-event | Built-in `metrics` ping → `firefox_desktop.metrics` | + +## Events: Two Paths + +Event-type metrics can end up in two different places depending on their `send_in_pings` value. This is a common source of confusion. + +### Path 1: Events in custom pings + +Events sent to a custom ping land in that ping table's `events` ARRAY column. You query them by unnesting: + +```sql +SELECT + e.category, + e.name, + (SELECT value FROM UNNEST(e.extra) WHERE key = 'is_sponsored') AS is_sponsored +FROM `mozdata.firefox_desktop.newtab`, +UNNEST(events) AS e +WHERE DATE(submission_timestamp) = '2025-01-15' + AND e.category = 'pocket' + AND e.name = 'click' +``` + +### Path 2: Events in `events_stream` + +Events sent to the built-in `events` ping (or with no `send_in_pings` specified) land in the `events_stream` derived table, which pre-unnests the events into flat rows: + +```sql +SELECT + event_category, + event_name, + JSON_EXTRACT_SCALAR(event_extra, '$.experiment') AS experiment +FROM `mozdata.firefox_desktop.events_stream` +WHERE DATE(submission_timestamp) = '2025-01-15' + AND event_category = 'nimbus_events' + AND event_name = 'enrollment' +``` + +:::caution +`events_stream` only contains events from the built-in `events` ping. Events sent to custom pings (like `newtab`) are **not** in `events_stream` — they live in their respective ping tables. +::: + +## Standard Columns in Every Ping Table + +All Glean ping tables share a common set of columns: + +| Column | Description | +|--------|-------------| +| `client_info.client_id` | UUID identifying the client | +| `client_info.app_display_version` | Firefox version string | +| `client_info.app_channel` | Release channel (release, beta, nightly, esr) | +| `client_info.os` | Operating system (Windows, Darwin, Linux) | +| `client_info.locale` | App locale | +| `ping_info.experiments` | Map of active experiment slug → branch assignment | +| `submission_timestamp` | When the ping was received (use for date partitioning) | +| `sample_id` | Hash of client_id, values 0–99 (use for cheaper dev queries) | +| `normalized_channel` | Standardized channel name | +| `normalized_country_code` | ISO country code | + +:::tip +The `ping_info.experiments` field is how you filter telemetry to clients enrolled in a specific experiment. See [Querying Experiment Data](/data-analysis/data-topics/querying-experiment-data) for examples. +::: + +## Five-Step Workflow + +Given any metric or event you want to analyze, follow these steps: + +### Step 1: Find the definition + +Look up the metric in the [Glean Dictionary](https://dictionary.telemetry.mozilla.org/) or search the application's `metrics.yaml` files. Note the `type`, `send_in_pings`, and (for events) `extra_keys`. + +### Step 2: Determine the table + +Use `send_in_pings` to identify the BigQuery table: + +- `send_in_pings: [newtab]` → `mozdata.firefox_desktop.newtab` +- `send_in_pings: [events]` → `mozdata.firefox_desktop.events_stream` +- No `send_in_pings` + event type → `mozdata.firefox_desktop.events_stream` +- No `send_in_pings` + non-event type → `mozdata.firefox_desktop.metrics` + +### Step 3: Build the column path + +For non-event metrics, use the formula: `metrics.._` + +For events, you query by `event_category` and `event_name` (in `events_stream`) or by `e.category` and `e.name` after unnesting (in custom ping tables). + +### Step 4: Validate with a sample query + +Run a quick query to confirm the column exists and has data: + +```sql +SELECT metrics.boolean.newtab_search_enabled, COUNT(*) +FROM `mozdata.firefox_desktop.newtab` +WHERE DATE(submission_timestamp) = CURRENT_DATE() - 1 +GROUP BY 1 +``` + +### Step 5: For real-time data, use live tables + +Each ping also has a live table with streaming latency (seconds) but only 30-day retention. Replace the table name with the `_live` suffix: + +- Stable: `mozdata.firefox_desktop.newtab` +- Live: `mozdata.firefox_desktop.newtab_live` + +:::info +Live tables have flattened column names — `client_id` instead of `client_info.client_id`, `pocket_enabled` instead of `metrics.boolean.pocket_enabled`. Not all columns from the stable view are available in live views. +::: + +## Derived Tables + +Beyond raw ping tables, bigquery-etl generates derived tables that aggregate or reshape data for common analysis patterns: + +| Table | Source | Purpose | +|-------|--------|---------| +| `events_stream` | Built-in `events` ping | Flat event rows with `event_category`, `event_name`, `event_extra` | +| `baseline_clients_daily` | `baseline` ping | One row per client per day | +| `metrics_clients_daily` | `metrics` ping | One row per client per day | +| `clients_first_seen` | Various | First-seen date for each client | +| `clients_last_seen_joined` | Various | Last-seen tracking | +| `active_users_aggregates` | Various | DAU/WAU/MAU aggregates | + +Use derived tables when you need pre-aggregated data or when the raw ping tables would require complex joins. + +## Useful Resources + +- **[Glean Dictionary](https://dictionary.telemetry.mozilla.org/)** — Browse all metrics and pings for any Glean application +- **[probe-scraper](https://probeinfo.telemetry.mozilla.org/)** — API for metric/ping discovery across all apps +- **[metric-hub `definitions/`](https://github.com/mozilla/metric-hub/tree/main/definitions)** — Reusable metric SQL definitions used by Jetstream +- **[Mozilla Data Documentation](https://docs.telemetry.mozilla.org/)** — Comprehensive data platform documentation +- **[STMO (sql.telemetry.mozilla.org)](https://sql.telemetry.mozilla.org/)** — SQL query interface for BigQuery diff --git a/docs/data-analysis/jetstream/configuration.md b/docs/data-analysis/jetstream/configuration.md index b29d0f28d..b03a275d9 100644 --- a/docs/data-analysis/jetstream/configuration.md +++ b/docs/data-analysis/jetstream/configuration.md @@ -90,6 +90,10 @@ segments = ["is_regular_user_v3", "new_or_resurrected_v3"] # Nominal length of the enrollment period in days. # Mozanalysis will consider enrollment_period + 1 "dates" of enrollments. +# If not set here, Jetstream resolves the enrollment period in this order: +# 1. This TOML value (if set) +# 2. Computed from (enrollment_end_date - start_date + 1) if enrollment has ended +# 3. proposed_enrollment from the Experimenter API enrollment_period = 7 # The name of the control branch. @@ -276,6 +280,18 @@ data_source = "main" [metrics.ever_clicked_cows.statistics.binomial] ``` +### Common `select_expression` Patterns + +Here are the most common patterns for `select_expression`, depending on the type of metric you want to compute: + +- **Binary metric** (did the client do X at least once?): `COALESCE(LOGICAL_OR(condition), FALSE)` — use with `[statistics.binomial]` +- **Count metric** (how many times did X happen?): `COALESCE(COUNTIF(condition), 0)` — use with `[statistics.bootstrap_mean]` +- **Sum metric** (total value of X): `COALESCE(SUM(column), 0)` — use with `[statistics.bootstrap_mean]` + +:::tip +To find the right BigQuery table and column names for your `from_expression` when defining a custom data source, see [Finding Telemetry in BigQuery](/data-analysis/data-topics/telemetry-discovery). +::: + ### Defining Data Sources Most of the regular data sources are already defined in mozanalysis. @@ -290,7 +306,14 @@ Add a section that looks like: # FROM expression - often just a fully-qualified table name. Sometimes a subquery. from_expression = "(SELECT client_id, experiments, submission_date FROM my_cool_table)" -# See https://mozilla.github.io/mozanalysis/api/metrics.html#mozanalysis.metrics.DataSource for details. +# How experiment enrollment information is stored in this data source. +# "native" — the data source has an experiments column with enrollment info +# (e.g., ping_info.experiments in Glean tables). Jetstream uses +# this column to filter to enrolled clients. +# "none" — the data source does NOT contain experiment enrollment info. +# Jetstream will join this data source with its own enrollment +# table to filter to enrolled clients. Use this for most custom +# data sources built from subqueries. experiments_column_type = "native" # Data sources can support aggregations on client_id and/or profile_group_id. diff --git a/docs/data-analysis/jetstream/data-products.md b/docs/data-analysis/jetstream/data-products.md index d5db52bbe..6ac3d5a40 100644 --- a/docs/data-analysis/jetstream/data-products.md +++ b/docs/data-analysis/jetstream/data-products.md @@ -13,6 +13,31 @@ Jetstream writes analysis results and enrollments information to BigQuery. Stati The datasets that back the Experimenter results dashboards are available in BigQuery in the `mozanalysis` dataset in `moz-fx-data-experiments`. [Technical documentation][jetstream-dtmo] is available in the Mozilla data docs. +:::tip +For query examples and common patterns for working with these tables, see [Querying Experiment Data in BigQuery](/data-analysis/data-topics/querying-experiment-data). +::: + +#### Results Table Schema + +The `statistics_*` tables contain the computed experiment results: + +| Column | Description | +|--------|-------------| +| `metric` | Metric name (e.g., `active_hours`, `days_of_use`) | +| `statistic` | Statistical method (`binomial`, `mean`, `deciles`, etc.) | +| `branch` | Branch name | +| `comparison` | `NULL` (absolute), `difference`, or `relative_uplift` | +| `comparison_to_branch` | Baseline branch for this comparison | +| `point` | Point estimate | +| `lower` | Lower bound of confidence interval | +| `upper` | Upper bound of confidence interval | +| `ci_width` | Confidence interval width (typically 0.95) | +| `segment` | Segment name (`all` for full population) | +| `analysis_basis` | `enrollments` or `exposures` | +| `window_index` | Analysis window index (for daily/weekly periods) | + +The `enrollments_*` and `exposures_*` tables contain per-client raw metric data with columns: `client_id`, `branch`, `enrollment_date`, `exposure_date`, plus one column per metric and segment boolean columns. + ### Monitoring Datasets Datasets used for monitoring the operation of Jetstream are part of the `monitoring` dataset in `moz-fx-data-experiments`. @@ -30,9 +55,13 @@ The `logs` table has the following schema: | `message` | `STRING` | Log message | | `log_level` | `STRING` | Log level: ERROR, WARNING | | `exception` | `STRING` | Raised exception object | -| `filename` | `STRING` | Name the Jetstream code file the exception was raised | -| `func_name` | `STRING` | Name the Jetstream function the exception was raised | -| `exception_type` | `STRING` | Class name the exception raised | +| `exception_type` | `STRING` | Class name of the exception raised (see [Troubleshooting](/data-analysis/jetstream/troubleshooting#common-exception-types) for a guide) | +| `filename` | `STRING` | Jetstream source file where the exception was raised | +| `func_name` | `STRING` | Jetstream function where the exception was raised | +| `metric` | `STRING` | Metric slug that failed (if applicable) | +| `statistic` | `STRING` | Statistic that failed (if applicable) | +| `analysis_basis` | `STRING` | `enrollments` or `exposures` | +| `segment` | `STRING` | Segment being computed when the error occurred | #### Query Cost @@ -42,16 +71,43 @@ The `query_cost_v1` table has the following schema: | Column name | Type | Description | | ----------------------- | ----------- | ----------------------------------------------------- | -| `submission_timestamp` | `TIMESTAMP` | Timestamp of when the query was executed | +| `submission_timestamp` | `TIMESTAMP` | Timestamp of when the query was executed (table is partitioned on this column) | | `destination_table` | `STRING` | Name of the table query was writing data to | | `query` | `STRING` | SQL of the executed query | | `total_bytes_processed` | `INT64` | Number of bytes the query processed | | `cost_usd` | `FLOAT` | Cost of the query in USD based on [BigQuery pricing] | +| `experiment_slug` | `STRING` | Experiment slug the query was run for | +| `total_slot_ms` | `INT64` | Total BigQuery slot milliseconds consumed | +| `duration_minutes` | `FLOAT` | Wall-clock duration of the query in minutes | +| `error_reason` | `STRING` | BigQuery error reason (if the query failed) | +| `error_message` | `STRING` | BigQuery error message (if the query failed) | + +:::tip +The `query_cost_v1` table is the best way to confirm whether Jetstream actually ran queries for a specific experiment on a given day. Query it by `experiment_slug` and `DATE(submission_timestamp)`. +::: #### Experimenter Experiments For monitoring Nimbus experiments, some common failure cases are exposed as part of the [Experiments Enrollments dashboard](https://mozilla.cloud.looker.com/dashboards-next/216). These monitoring rules will require access to collected experiments enrollment data which is available in `monitoring.experimenter_experiments_v1`. This dataset is part of [bigquery-etl](https://github.com/mozilla/bigquery-etl/tree/main/sql/moz-fx-data-experiments/monitoring/experimenter_experiments_v1) and updated every 10 minutes by fetching data from the Experimenter API. +Key columns in `experimenter_experiments_v1`: + +| Column name | Type | Description | +| ----------------------- | ----------- | ----------------------------------------------------- | +| `normandy_slug` | `STRING` | Experiment slug (matches the slug in Experimenter) | +| `status` | `STRING` | Experiment status (Live, Complete, etc.) | +| `start_date` | `DATE` | Experiment start date | +| `end_date` | `DATE` | Experiment end date | +| `enrollment_end_date` | `DATE` | When enrollment ended | +| `proposed_enrollment` | `INT64` | Proposed enrollment period in days | +| `reference_branch` | `STRING` | Name of the control/reference branch | +| `is_high_population` | `BOOL` | Whether the experiment is marked as high-population | +| `branches` | `STRING` | JSON array of branch definitions | +| `app_id` | `STRING` | Application identifier | +| `app_name` | `STRING` | Application name | +| `channel` | `STRING` | Release channel | +| `is_rollout` | `BOOL` | Whether this is a rollout (excluded from Jetstream) | + ## GCS Data Export Jetstream exports statistics data and metadata of analysed experiments to the `mozanalysis` GCS bucket. diff --git a/docs/data-analysis/jetstream/overview.md b/docs/data-analysis/jetstream/overview.md index 56605df43..f16c50905 100644 --- a/docs/data-analysis/jetstream/overview.md +++ b/docs/data-analysis/jetstream/overview.md @@ -30,6 +30,18 @@ a week after the enrollment period ends. Typically, that means results will begin to appear two weeks after the day the experiment launches. +## Which Experiments Does Jetstream Analyze? + +Jetstream runs daily and selects experiments to analyze based on these rules: + +1. Fetches all experiments from the Experimenter API +2. Includes experiments that are **Live** or that **ended within the last 90 days** +3. **Excludes rollouts** — rollouts are not analyzed by Jetstream +4. **Excludes experiments with `skip = true`** in their [custom configuration](./configuration) +5. Merges each experiment's config from three sources: Experimenter API data, custom TOML config (if any) in metric-hub, and platform defaults + +If your experiment ended more than 90 days ago, Jetstream will stop computing results for it. If you need results recomputed for an older experiment, you can trigger a manual rerun using the Jetstream CLI. + ## Analysis Paradigm Experiments are analyzed using the concept of analysis windows. Analysis diff --git a/docs/data-analysis/jetstream/troubleshooting.md b/docs/data-analysis/jetstream/troubleshooting.md index 3c9a900cd..2c5bc3a39 100644 --- a/docs/data-analysis/jetstream/troubleshooting.md +++ b/docs/data-analysis/jetstream/troubleshooting.md @@ -29,6 +29,25 @@ Errors can be viewed on the [Jetstream error dashboard] in Looker. Additionally, alerts can be set up in Looker to check for errors daily and sent an email if failures have been detected. To subscribe to these alerts, go to the [Jetstream error dashboard], click on the _Alerts_ (bell) icon on the _Critical Errors Last Run_ tiles and follow the "Error Count" alert. +## Common Exception Types + +The `exception_type` field in the error logs (and the [Jetstream error dashboard]) indicates what went wrong. Here are the most common ones: + +| Exception | Meaning | What to Do | +|---|---|---| +| `EnrollmentNotCompleteException` | Enrollment hasn't ended yet | Normal — Jetstream retries daily until enrollment ends. No action needed. | +| `EndedException` | Experiment `end_date` is in the past | Normal — the experiment is complete. | +| `StatisticComputationException` | A specific statistic failed to compute | Check the `metric`, `statistic`, and `segment` fields in the log entry. This usually indicates a bug in the metric definition or incompatible data. | +| `Exception` (BadRequest / timeout) | BigQuery query timed out or failed | The query may be too expensive. Try simplifying the SQL in the custom config, or use source tables instead of derived views. | +| `ClassValidationError` | Invalid data from the Experimenter API for a slug | Usually caused by a stale or deleted experiment that still has a TOML config in metric-hub. Remove the orphaned config. | +| `NoEnrollmentPeriodException` | No enrollment period could be determined | Set `enrollment_period` explicitly in the TOML config. | +| `HighPopulationException` | Experiment is marked as high-population | Skipped by design — high-population experiments use a different analysis path. | +| `RolloutSkipException` | The experiment is a rollout | Rollouts are excluded from Jetstream analysis by design. | + +:::tip +`EnrollmentNotCompleteException` is the most common "error" and is **not a real problem** — it just means Jetstream checked the experiment and will try again tomorrow. Don't file a bug for it. +::: + ## Something Went Wrong, What Do I Do? 1. Check the [Jetstream error dashboard] for more details on the error that occurred. diff --git a/docs/data-analysis/telemetry.md b/docs/data-analysis/telemetry.md index 99d162779..6fcd6c4e6 100644 --- a/docs/data-analysis/telemetry.md +++ b/docs/data-analysis/telemetry.md @@ -6,6 +6,10 @@ slug: /data-analysis/telemetry This section is an overview of Nimbus Telemetry intended for the analysis of experiments. +:::tip Looking for product telemetry? +This page covers Nimbus SDK lifecycle events (enrollment, exposure, etc.). To find product telemetry metrics in BigQuery for your experiment analysis, see [Finding Telemetry in BigQuery](/data-analysis/data-topics/telemetry-discovery). +::: + ## Standard Events The following events are sent during an experiment's lifecycle. @@ -79,6 +83,47 @@ specific `reason` is included in the event's `extra` field. | ----------------------------------- | --------------------------------- | ------------------------------ | | `enroll_failed` | `enroll_failed` | `enroll_failed` | +### Enrollment Status + +A newer, richer form of enrollment telemetry that records the SDK's evaluation of **every recipe** each time it applies pending experiments. Unlike the older `enrollment`/`unenrollment` events (which only fire on state changes), `enrollment_status` gives a complete snapshot of why each recipe is or isn't enrolled. + +:::info +`enrollment_status` is disabled by default and currently enabled via a rollout on Desktop and Fenix. Only clients enrolled in the enabling rollout emit these events. +::: + +These events are sent via the `nimbus-targeting-context` ping, so they live in the `nimbus_targeting_context` table (not in `events_stream`). + +**Extra keys:** + +| Key | Type | Description | +|-----|------|-------------| +| `slug` | string | Experiment/rollout slug | +| `status` | string | `Enrolled`, `NotEnrolled`, `Disqualified`, `WasEnrolled` | +| `reason` | string | Why this status was assigned (see below) | +| `branch` | string | Branch assigned (only when status is `Enrolled`) | +| `error_string` | string | Error message (when reason is `Error`) | +| `conflict_slug` | string | Conflicting experiment/rollout slug (when reason is `FeatureConflict`) | + +**Possible `reason` values:** `Qualified`, `NotTargeted`, `EnrollmentsPaused`, `NotSelected`, `Error`, `FeatureConflict`, `OptOut`, `OptIn`, `ChangedPref`, `UnenrolledInAnotherProfile`, `ForceEnrollment` + +**Example query** — check enrollment status distribution for a specific experiment: + +```sql +SELECT + (SELECT value FROM UNNEST(e.extra) WHERE key = 'slug') AS slug, + (SELECT value FROM UNNEST(e.extra) WHERE key = 'status') AS status, + (SELECT value FROM UNNEST(e.extra) WHERE key = 'reason') AS reason, + COUNT(*) AS cnt +FROM `mozdata.firefox_desktop.nimbus_targeting_context`, +UNNEST(events) AS e +WHERE DATE(submission_timestamp) = '2025-01-15' + AND e.category = 'nimbus_events' + AND e.name = 'enrollment_status' + AND (SELECT value FROM UNNEST(e.extra) WHERE key = 'slug') = 'my-experiment-slug' +GROUP BY 1, 2, 3 +ORDER BY cnt DESC +``` + ## Experiment Annotations In addition to the standard Nimbus events that are generated, Nimbus diff --git a/sidebars.js b/sidebars.js index e66a889ca..943cc998d 100644 --- a/sidebars.js +++ b/sidebars.js @@ -201,6 +201,8 @@ module.exports = { "data-analysis/data-topics/preenrollment_bias", "data-analysis/data-topics/population_representativeness", "data-analysis/telemetry", + "data-analysis/data-topics/telemetry-discovery", + "data-analysis/data-topics/querying-experiment-data", "data-analysis/validating-experiments" ] },