feat(DENG-11298): Drop parse_campaign_name parsing logic#9644
feat(DENG-11298): Drop parse_campaign_name parsing logic#9644phil-lee70 wants to merge 8 commits into
Conversation
There was a problem hiding this comment.
This PR replaces the in-repo parsing logic of mozfun.marketing.parse_campaign_name with a delegation to a new moz-fx-data-shared-prod.udf.parse_campaign_name (a stub returning typed NULL), drops the function's test suite down to a single NULL-input assertion, and adds the new UDF file to routine.publish.skip.
The mechanical rewrite of the mozfun function and its delegation are straightforward. My main question is around the publish.skip entry: it behaves differently from the existing main_summary_scalars skip (which is required because that function uses ANY TYPE), and it determines whether the new shared-prod UDF actually gets deployed for the runtime reference to resolve. I've left details inline. Note I couldn't observe the production deployment path from this repo alone, so if shared-prod routine deployment uses a path that ignores routine.publish.skip, that inline concern may not apply.
| gcs_path: "" | ||
| skip: | ||
| - sql/moz-fx-data-shared-prod/udf/main_summary_scalars/udf.sql | ||
| - sql/moz-fx-data-shared-prod/udf/parse_campaign_name/udf.sql |
There was a problem hiding this comment.
issue: This skip entry is categorically different from the existing main_summary_scalars one. main_summary_scalars takes processes ANY TYPE, so it cannot be created as a persistent UDF and is only ever inlined via SQL generation — skipping it from publishing is required. parse_campaign_name, by contrast, is a concrete, deployable UDF that mozfun.marketing.parse_campaign_name references at runtime.
publish_routines.publish() honors routine.publish.skip for every target project (the skip set is global, not mozfun-only — see skipped_routines() / lines 168-172, 256 in publish_routines.py). So adding this file here means bqetl routine publish --project-id=moz-fx-data-shared-prod will not deploy it. Since this is a brand-new function with no prior deployed version, the marketing.parse_campaign_name reference would then resolve to a non-existent routine — which contradicts the PR's stated goal of having the function return NULL so dependent queries keep resolving.
If the intent is for the stub to be the deployed definition (returning NULL), remove it from the skip list so it deploys. If moz-fx-data-shared-prod.udf.parse_campaign_name is instead owned/deployed by a system outside this repo and the skip exists to avoid clobbering it, add a comment here documenting that, since it's not inferable from the change.
| @@ -0,0 +1,5 @@ | |||
| -- Parses a campaign name into known segments. | |||
There was a problem hiding this comment.
nitpick: The comment here and the description in metadata.yaml ("Parses a campaign name into known segments.") describe behavior this body doesn't have — it returns a typed NULL and parses nothing. Since this is published as routine documentation, consider noting that it's currently a placeholder/no-op so readers and the generated docs aren't misled.
This comment has been minimized.
This comment has been minimized.
5cfd431 to
03da916
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
…ublish ordering get_routines() only scanned sql/mozfun when parsing a mozfun routine, so a mozfun UDF that delegates to a moz-fx-data-shared-prod UDF had no dependency edge recorded. The topological sorter then published the wrapper before its dependency, failing stage deploy with 'Function not found'. Include moz-fx-data-shared-prod in the candidate routine list (deduped via a set).
This comment has been minimized.
This comment has been minimized.
… order The --target routine publish groups routines into one _publish_to_target batch per source project and published them in arbitrary (insertion) order. A mozfun UDF that delegates to a moz-fx-data-shared-prod UDF spans two batches, so the dependent batch could be published before its dependency, failing stage deploy with 'Function not found'. Topologically order the source-project batches so a project is published after the projects providing routines it depends on.
This comment has been minimized.
This comment has been minimized.
parse_routines only loaded mozfun into the dependency pool, so a mozfun UDF that delegates to a moz-fx-data-shared-prod UDF could not have that dependency inlined as a temp function in its test SQL. The test then invoked the real persistent routine, failing in CI with a 403 when the test service account can't invoke the (private) routine. Always include moz-fx-data-shared-prod in the pool so the local (stubbed) definition is inlined instead.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
| gcs_path: "" | ||
| skip: | ||
| - sql/moz-fx-data-shared-prod/udf/main_summary_scalars/udf.sql | ||
| - sql/moz-fx-data-shared-prod/udf/marketing_parse_campaign_name/udf.sql |
There was a problem hiding this comment.
I don't think this is needed
| if state.get(project) == "done": | ||
| return | ||
| if state.get(project) == "visiting": | ||
| # cyclic dependency between projects; fall back to best-effort order |
There was a problem hiding this comment.
This is fine as a best-effort attempt but there are a lot of existing cases of shared-prod udf referencing mozfun so it's not impossible that udfs that work in both directions are in the same PR. I recommend at least adding logging here to point this out in the rare event that this happens
| # cyclic dependency between projects; fall back to best-effort order | |
| log.warning("Routine ordering: cyclic dependency between projects; fall back to best-effort order") |
Alternatively you could do the ordering per udf but that's probably more complex
There was a problem hiding this comment.
will add the logging 👍
Integration report
|
feat(DENG-11298): Drop parse_campaign_name parsing logic