Skip to content

fix: gerrit repositories filtering (CM-1079)#3977

Merged
joanagmaia merged 6 commits intomainfrom
fix/gerrit-repositories-filtering
Apr 2, 2026
Merged

fix: gerrit repositories filtering (CM-1079)#3977
joanagmaia merged 6 commits intomainfrom
fix/gerrit-repositories-filtering

Conversation

@joanagmaia
Copy link
Copy Markdown
Contributor

@joanagmaia joanagmaia commented Mar 30, 2026

Problem

Since the bucketing architecture was introduced, the Active Contributors widget (and any widget backed by activities_filtered) showed no data for projects with Gerrit integrations, even when activity data existed.

Root cause: The enrichment/cleaning copy pipes (activityRelations_bucket_clean_enrich_copy_pipe_X) validate git-platform activities by checking:
(channel, segmentId) IN (SELECT r.url, r.segmentId FROM repositories ...)

Gerrit activities store their channel in /q/project: format (e.g. https://gerrit.example.com/r/q/project:myproject), but repositories.url stores the base URL (https://gerrit.example.com/r/myproject). The match always fails, so all Gerrit activities were silently dropped during the cleaning step and never made it into the cleaned bucket datasources that widgets query.

The repos_to_channels pipe already handled this URL expansion correctly at query time — but the cleaning step wasn't using it.

Fix

Extended repos_to_channels.pipe to also output segmentId alongside channel. This is backward-compatible: all 21 existing consumers only SELECT channel and are unaffected.

The 10 cleaning pipes now delegate to repos_to_channels instead of maintaining an inline subquery. This means the Gerrit URL expansion logic lives in exactly one place — any future change to channel formats only needs updating in repos_to_channels.pipe.


Note

Medium Risk
Touches Tinybird ETL/cleaning queries and changes which activities are retained in cleaned bucket datasources; mistakes could affect analytics completeness across all git platforms.

Overview
Fixes Gerrit activities being dropped during the bucket cleaning/enrichment COPY step by switching the repository validation predicate in activityRelations_bucket_clean_enrich_copy_pipe_0..9.pipe from an inline repositories join to repos_to_channels.

Extends repos_to_channels.pipe to also emit segmentId (and to filter out deleted/disabled repos and deleted projects), allowing the cleaning pipes to match on (channel, segmentId) and correctly include Gerrit /q/project: channel variants while keeping non-Gerrit behavior consistent.

Written by Cursor Bugbot for commit 721fe4d. This will update automatically on new commits. Configure here.

Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
Copilot AI review requested due to automatic review settings March 30, 2026 13:12
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes Gerrit-backed projects showing no data in widgets that rely on activities_filtered by ensuring the bucket cleaning/enrichment COPY pipes validate git-platform activities against the expanded set of possible repository channel formats (including Gerrit /q/project: variants).

Changes:

  • Extend repos_to_channels.pipe to also return segmentId alongside channel and to filter out repos belonging to deleted insightsProjects.
  • Update the 10 activityRelations_bucket_clean_enrich_copy_pipe_{0..9}.pipe cleaning pipes to validate (channel, segmentId) via repos_to_channels instead of an inline repositories subquery.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
services/libs/tinybird/pipes/repos_to_channels.pipe Adds segmentId output and keeps Gerrit channel expansion centralized for reuse by cleaning/validation logic.
services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_0.pipe Switches repo validation subquery to repos_to_channels to include Gerrit channel variants.
services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_1.pipe Same change as bucket 0.
services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_2.pipe Same change as bucket 0.
services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_3.pipe Same change as bucket 0.
services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_4.pipe Same change as bucket 0.
services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_5.pipe Same change as bucket 0.
services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_6.pipe Same change as bucket 0.
services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_7.pipe Same change as bucket 0.
services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_8.pipe Same change as bucket 0.
services/libs/tinybird/pipes/activityRelations_bucket_clean_enrich_copy_pipe_9.pipe Same change as bucket 0.
Comments suppressed due to low confidence (1)

services/libs/tinybird/pipes/repos_to_channels.pipe:44

  • When repos is provided, repos_to_expand hard-codes segmentId to '' (lines 19-24), but gerrit_repos later rehydrates segmentId from repositories (lines 40-44). This makes the output inconsistent: non-Gerrit rows will have empty segmentId, while Gerrit variants can have a real segmentId, which also contradicts the response description. Consider making segmentId consistently empty for all outputs when repos is provided (e.g., carry segmentId from repos_to_expand into Gerrit variants), or update the contract/documentation and ensure all branches follow it.
    {% if defined(repos) %}
        SELECT
            arrayJoin(
                {{ Array(repos, 'String', description="Repository URLs to expand", required=False) }}
            ) AS url,
            '' AS segmentId
    {% else %}
        SELECT r.url, r.segmentId
        FROM repositories r FINAL
        INNER JOIN insightsProjects i FINAL ON r.insightsProjectId = i.id
        WHERE
            isNull (r.deletedAt) AND r.enabled = true AND isNull (i.deletedAt)
            {% if defined(excluded) and excluded %} AND r.excluded = true
            {% end %}
    {% end %}

NODE gerrit_repos
DESCRIPTION >
    Identify Gerrit repositories by joining with integrations table

SQL >
    SELECT r.url, r.segmentId
    FROM repositories r FINAL
    JOIN integrations i FINAL ON r.sourceIntegrationId = i.id
    WHERE i.platform = 'gerrit' AND isNull (r.deletedAt) AND r.url IN (SELECT url FROM repos_to_expand)


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@joanagmaia joanagmaia requested a review from epipav March 30, 2026 13:21
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

@joanagmaia joanagmaia requested a review from mbani01 March 30, 2026 13:21
mbani01
mbani01 previously approved these changes Mar 30, 2026
Copy link
Copy Markdown
Contributor

@mbani01 mbani01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
@joanagmaia joanagmaia merged commit e885a25 into main Apr 2, 2026
16 checks passed
@joanagmaia joanagmaia deleted the fix/gerrit-repositories-filtering branch April 2, 2026 10:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants