Skip to content

Provide ClickHouse Peers validations as a library#4288

Open
pfcoperez wants to merge 21 commits into
mainfrom
DBI-65/shared-ch-validations
Open

Provide ClickHouse Peers validations as a library#4288
pfcoperez wants to merge 21 commits into
mainfrom
DBI-65/shared-ch-validations

Conversation

@pfcoperez
Copy link
Copy Markdown
Contributor

@pfcoperez pfcoperez commented May 8, 2026

This Pull Request introduces the possibility of performing ClickHouse peers validations outside PeerDB while
offering maximum re-utilization of the validation logic.

To achieve this, it offers four main changes:

  1. Staging bucket validation logic is abstracted out behind the StagingValidator type which is implemented as a function taking a context and potentially returning an error. The logic used at PeerDB to implement this function is shared at flow/pkg for external code to use through NewS3StagingValidator and NewGCSStagingValidator factory methods.

  2. In a similar vein, accepted ClickHouse hosts becomes a parameter for the ClickHouse PeerValidation logic.

  3. Which in turn is shared at flow/pkg through the ValidateClickHousePeer method. This method is also used within PeerDB when peer validation is not disabled.

  4. Hence, peer validation can be now disabled for any peer in the gRPC methods CreatePeer and ValidatePeer using a new optional parameter: disable_connector_validation We assume validation happens both upstream and at PeerDB provision side

  5. We have disabled bucket checks at ClickHouse validation time, this was actually testing static infrastructure.

Part of DBI-65

@pfcoperez pfcoperez requested a review from a team as a code owner May 8, 2026 11:30
@pfcoperez pfcoperez requested a review from Copilot May 8, 2026 11:33
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors ClickHouse peer validation so it can be reused as a standalone library (flow/pkg), and adds a gRPC option to skip connector-side validations while still performing basic connectivity checks.

Changes:

  • Added disable_connector_validation to ValidatePeer and CreatePeer requests and wired it through the Flow API handlers.
  • Extracted ClickHouse peer validation + staging-bucket smoke tests into flow/pkg/clickhouse as reusable helpers (ValidateClickHousePeer, NewS3StagingValidator, NewGCSStagingValidator).
  • Updated ClickHouse connector validation and added E2E/test coverage for the new staging validators and bypass behavior.

Reviewed changes

Copilot reviewed 15 out of 17 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
protos/route.proto Adds optional disable_connector_validation to peer-validation related RPC requests.
flow/shared/random.go Routes RandomString through flow/pkg/common to share logic across modules.
flow/pkg/go.mod Adds dependencies needed for reusable staging validators and ClickHouse validation library.
flow/pkg/go.sum Updates sums for new/updated flow/pkg dependencies.
flow/pkg/common/utils.go Introduces shared RandomString helper in the flow/pkg module.
flow/pkg/clickhouse/validation.go Adds reusable ClickHouse peer validation API and host allow-list validation logic.
flow/pkg/clickhouse/validation_test.go Minor test struct field ordering adjustment.
flow/pkg/clickhouse/staging_validators.go Adds reusable S3/GCS staging bucket smoke-test validators.
flow/pkg/clickhouse/staging_validators_test.go Adds unit tests for the new S3/GCS staging validators.
flow/go.mod Updates main Flow module dependencies (incl. versions aligned with flow/pkg).
flow/go.sum Updates sums for main Flow module dependency changes.
flow/e2e/api_test.go Adds E2E test ensuring connector validation can be bypassed for ClickHouse.
flow/connectors/clickhouse/staging_s3.go Switches S3 staging validation to the shared flow/pkg validator.
flow/connectors/clickhouse/staging_gcs.go Switches GCS staging validation to the shared flow/pkg validator.
flow/connectors/clickhouse/clickhouse.go Switches ClickHouse connector validation to the shared ValidateClickHousePeer.
flow/cmd/validate_peer.go Implements connector-validation bypass while keeping connectivity checks.
flow/cmd/handler.go Plumbs disable_connector_validation from CreatePeer into ValidatePeer.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread flow/pkg/clickhouse/validation.go
Comment thread flow/pkg/clickhouse/validation.go Outdated
Comment thread flow/pkg/clickhouse/validation.go Outdated
Comment thread flow/pkg/clickhouse/validation.go Outdated
Comment thread flow/pkg/clickhouse/validation.go
Comment thread flow/pkg/objectstore/validation.go
Comment thread flow/cmd/validate_peer.go Outdated
Comment thread flow/pkg/common/utils.go
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

❌ Test Failure

Analysis: TestClickHousePeerValidationBypass panics deterministically at api_test.go:1251 because it calls t.Setenv inside a parallel test suite (e2eshared.RunSuite uses t.Parallel), which Go's testing framework forbids — this is a real bug, not a flaky failure.
Confidence: 0.97

⚠️ This appears to be a real bug - manual intervention needed

View workflow run

@pfcoperez pfcoperez force-pushed the DBI-65/shared-ch-validations branch from f71810c to c213c28 Compare May 8, 2026 11:54
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

🔄 Flaky Test Detected

Analysis: The e2e test TestGenericBQ/Test_Inheritance_Table_With_Dynamic_Setting timed out after 195s with an "UNEXPECTED TIMEOUT" waiting for BigQuery row counts to converge (destination had 6 rows while source had 3), a classic WaitFor polling timeout pattern in CDC streaming tests indicating a race condition or timing sensitivity rather than a deterministic code bug.
Confidence: 0.78

✅ Automatically retrying the workflow

View workflow run

@pfcoperez pfcoperez force-pushed the DBI-65/shared-ch-validations branch from 86ca2cd to 2eaba06 Compare May 8, 2026 12:55
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

🔄 Flaky Test Detected

Analysis: The entire e2e test suite hit Go's global 900-second timeout (panic: test timed out after 15m0s), killing all remaining tests simultaneously — this is a CI infrastructure slowness issue, not a real code bug.
Confidence: 0.95

✅ Automatically retrying the workflow

View workflow run

@pfcoperez pfcoperez requested a review from jgao54 May 8, 2026 13:17
@pfcoperez
Copy link
Copy Markdown
Contributor Author

@copilot resolve the merge conflicts in this pull request

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented May 11, 2026

CLA assistant check
All committers have signed the CLA.

Copy link
Copy Markdown

Copilot AI commented May 11, 2026

@copilot resolve the merge conflicts in this pull request

Merge conflicts are resolved and pushed in commit 26d449b.

Comment thread flow/shared/random.go Outdated
Comment thread flow/pkg/objectstore/validation.go
Comment thread protos/route.proto Outdated
@pfcoperez pfcoperez force-pushed the DBI-65/shared-ch-validations branch from 26d449b to 4905fec Compare May 12, 2026 09:29
@codecov
Copy link
Copy Markdown

codecov Bot commented May 12, 2026

❌ 2 Tests Failed:

Tests completed Failed Passed Skipped
2212 2 2210 189
View the top 2 failed test(s) by shortest run time
github.com/PeerDB-io/peerdb/flow/e2e::TestApiPg
Stack Traces | 0.01s run time
=== RUN   TestApiPg
=== PAUSE TestApiPg
=== CONT  TestApiPg
--- FAIL: TestApiPg (0.01s)
2026/05/12 11:12:11 INFO Received AWS credentials from peer for connector: ci x-peerdb-additional-metadata={Operation:FLOW_OPERATION_UNKNOWN}
2026/05/12 11:12:11 INFO Received AWS credentials from peer for connector: clickhouse x-peerdb-additional-metadata={Operation:FLOW_OPERATION_UNKNOWN}
github.com/PeerDB-io/peerdb/flow/e2e::TestApiPg/TestPostgresDestinationValidation_UserDefinedTypeMismatch
Stack Traces | 0.05s run time
=== RUN   TestApiPg/TestPostgresDestinationValidation_UserDefinedTypeMismatch
=== PAUSE TestApiPg/TestPostgresDestinationValidation_UserDefinedTypeMismatch
=== CONT  TestApiPg/TestPostgresDestinationValidation_UserDefinedTypeMismatch
2026/05/12 11:10:35 INFO Received AWS credentials from peer for connector: ci x-peerdb-additional-metadata={Operation:FLOW_OPERATION_UNKNOWN}
2026/05/12 11:10:35 INFO Received AWS credentials from peer for connector: clickhouse x-peerdb-additional-metadata={Operation:FLOW_OPERATION_UNKNOWN}
2026/05/12 11:10:35 INFO Received AWS credentials from peer for connector: ci x-peerdb-additional-metadata={Operation:FLOW_OPERATION_UNKNOWN}
2026/05/12 11:10:35 ERROR Failed to get key x-peerdb-additional-metadata={Operation:FLOW_OPERATION_UNKNOWN} error="FATAL: terminating connection due to administrator command (SQLSTATE 57P01)"
2026/05/12 11:10:35 ERROR Failed to parse as int64 x-peerdb-additional-metadata={Operation:FLOW_OPERATION_UNKNOWN} key=PEERDB_CLICKHOUSE_MAX_INSERT_THREADS error="failed to get key: FATAL: terminating connection due to administrator command (SQLSTATE 57P01)"
    e2eshared.go:34: 
        	Error Trace:	.../flow/e2e/clickhouse.go:415
        	            				.../flow/e2e/api_test.go:233
        	            				.../flow/e2eshared/e2eshared.go:34
        	Error:      	Received unexpected error:
        	            	failed to open connection to ClickHouse peer: failed to load max_insert_threads config: failed to parse PEERDB_CLICKHOUSE_MAX_INSERT_THREADS as int64: failed to get key: FATAL: terminating connection due to administrator command (SQLSTATE 57P01)
        	Test:       	TestApiPg/TestPostgresDestinationValidation_UserDefinedTypeMismatch
--- FAIL: TestApiPg/TestPostgresDestinationValidation_UserDefinedTypeMismatch (0.05s)

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

@github-actions
Copy link
Copy Markdown
Contributor

❌ Test Failure

Analysis: Deterministic build failure: internal/env.go imports cloud.google.com/go/kms/apiv1 and cloud.google.com/go/kms/apiv1/kmspb but go.sum is missing the required entries — run go get github.com/PeerDB-io/peerdb/flow/internal to fix.
Confidence: 0.98

⚠️ This appears to be a real bug - manual intervention needed

View workflow run

it's redundant and checks a piece of infrastructure which is static and reliable.
@github-actions
Copy link
Copy Markdown
Contributor

❌ Test Failure

Analysis: Real build failure: missing go.sum entries for cloud.google.com/go/kms/apiv1 and cloud.google.com/go/kms/apiv1/kmspb imported in internal/env.go — go mod tidy was not run after adding these imports.
Confidence: 0.98

⚠️ This appears to be a real bug - manual intervention needed

View workflow run

@pfcoperez pfcoperez requested review from a team as code owners May 12, 2026 10:25
@PeerDB-io PeerDB-io deleted a comment from claude Bot May 12, 2026
@claude
Copy link
Copy Markdown

claude Bot commented May 12, 2026

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

@github-actions
Copy link
Copy Markdown
Contributor

❌ Test Failure

Analysis: Test_PartitionBy fails deterministically across all four ClickHouse test suites with a consistent assertion mismatch — expected "num" but got "(num)" — indicating a real regression in partition key expression formatting, not a flaky failure.
Confidence: 0.85

⚠️ This appears to be a real bug - manual intervention needed

View workflow run

@pfcoperez pfcoperez requested a review from jgao54 May 12, 2026 10:57
@github-actions
Copy link
Copy Markdown
Contributor

❌ Test Failure

Analysis: Real bug: Test_PartitionBy fails deterministically across all 4 ClickHouse test suites with expected: "num", actual: "(num)" at clickhouse_test.go:3029, indicating a code change that incorrectly wraps PARTITION BY column names in parentheses.
Confidence: 0.92

⚠️ This appears to be a real bug - manual intervention needed

View workflow run

@github-actions
Copy link
Copy Markdown
Contributor

🔄 Flaky Test Detected

Analysis: The e2e test suite timed out after exactly 900 seconds (the configured -timeout 900s limit), indicating tests ran too slowly rather than failing with a deterministic assertion error.
Confidence: 0.92

✅ Automatically retrying the workflow

View workflow run

@github-actions
Copy link
Copy Markdown
Contributor

❌ Test Failure

Analysis: Test_PartitionBy fails deterministically across all 4 test variants with a consistent assertion mismatch (expected "num" but got "(num)" from ClickHouse system.tables.partition_key), indicating a real behavioral change — likely from a ClickHouse version upgrade — rather than a flaky test.
Confidence: 0.9

⚠️ This appears to be a real bug - manual intervention needed

View workflow run

@pfcoperez pfcoperez changed the title Provide ClickHouse Peers validations as a library and allow skipping peer validations via CreatePeer and ValidatePeer request parameters Provide ClickHouse Peers validations as a library May 12, 2026
@github-actions
Copy link
Copy Markdown
Contributor

❌ Test Failure

Analysis: Test_PartitionBy fails deterministically across all 4 ClickHouse test suites with an identical assertion error (expected column name "num" but got "(num)"), indicating a real regression in how PARTITION BY expressions are formatted or parsed.
Confidence: 0.92

⚠️ This appears to be a real bug - manual intervention needed

View workflow run

Comment thread flow/pkg/go.mod
gopkg.in/natefinch/lumberjack.v2 v2.2.1 // indirect
gopkg.in/yaml.v3 v3.0.1 // indirect
)
) No newline at end of file
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Copy Markdown
Contributor

@jgao54 jgao54 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

// To enable it back, just replace `objectstore.NoStagingValidator`
// with `c.staging.Validate`.
if err := peerdb_clickhouse.ValidateClickHousePeer(
ctx, c.logger, allowedDomains, c.Config.Host, c.database, objectstore.NoStagingValidator,
Copy link
Copy Markdown
Contributor

@jgao54 jgao54 May 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, because of OSS validation flow via /api/v1/peers/validate, we probably want to set this to c.staging.Validate still; since with OSS user set up staging themselves and and validation is still useful.
(the side-effect of this is that CreatePeer will re-validate object store, which is fine for now as we have been running this pattern for a while).

The alternative is passing in an option to CreatePeerRequest to skip object storage validation and thread it thru, but for now I think it's okay to just validate during peer creation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants