Provide ClickHouse Peers validations as a library#4288
Conversation
There was a problem hiding this comment.
Pull request overview
This PR refactors ClickHouse peer validation so it can be reused as a standalone library (flow/pkg), and adds a gRPC option to skip connector-side validations while still performing basic connectivity checks.
Changes:
- Added
disable_connector_validationtoValidatePeerandCreatePeerrequests and wired it through the Flow API handlers. - Extracted ClickHouse peer validation + staging-bucket smoke tests into
flow/pkg/clickhouseas reusable helpers (ValidateClickHousePeer,NewS3StagingValidator,NewGCSStagingValidator). - Updated ClickHouse connector validation and added E2E/test coverage for the new staging validators and bypass behavior.
Reviewed changes
Copilot reviewed 15 out of 17 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| protos/route.proto | Adds optional disable_connector_validation to peer-validation related RPC requests. |
| flow/shared/random.go | Routes RandomString through flow/pkg/common to share logic across modules. |
| flow/pkg/go.mod | Adds dependencies needed for reusable staging validators and ClickHouse validation library. |
| flow/pkg/go.sum | Updates sums for new/updated flow/pkg dependencies. |
| flow/pkg/common/utils.go | Introduces shared RandomString helper in the flow/pkg module. |
| flow/pkg/clickhouse/validation.go | Adds reusable ClickHouse peer validation API and host allow-list validation logic. |
| flow/pkg/clickhouse/validation_test.go | Minor test struct field ordering adjustment. |
| flow/pkg/clickhouse/staging_validators.go | Adds reusable S3/GCS staging bucket smoke-test validators. |
| flow/pkg/clickhouse/staging_validators_test.go | Adds unit tests for the new S3/GCS staging validators. |
| flow/go.mod | Updates main Flow module dependencies (incl. versions aligned with flow/pkg). |
| flow/go.sum | Updates sums for main Flow module dependency changes. |
| flow/e2e/api_test.go | Adds E2E test ensuring connector validation can be bypassed for ClickHouse. |
| flow/connectors/clickhouse/staging_s3.go | Switches S3 staging validation to the shared flow/pkg validator. |
| flow/connectors/clickhouse/staging_gcs.go | Switches GCS staging validation to the shared flow/pkg validator. |
| flow/connectors/clickhouse/clickhouse.go | Switches ClickHouse connector validation to the shared ValidateClickHousePeer. |
| flow/cmd/validate_peer.go | Implements connector-validation bypass while keeping connectivity checks. |
| flow/cmd/handler.go | Plumbs disable_connector_validation from CreatePeer into ValidatePeer. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
❌ Test FailureAnalysis: TestClickHousePeerValidationBypass panics deterministically at api_test.go:1251 because it calls t.Setenv inside a parallel test suite (e2eshared.RunSuite uses t.Parallel), which Go's testing framework forbids — this is a real bug, not a flaky failure. |
f71810c to
c213c28
Compare
🔄 Flaky Test DetectedAnalysis: The e2e test ✅ Automatically retrying the workflow |
86ca2cd to
2eaba06
Compare
🔄 Flaky Test DetectedAnalysis: The entire e2e test suite hit Go's global 900-second timeout ( ✅ Automatically retrying the workflow |
|
@copilot resolve the merge conflicts in this pull request |
…ed by common code.
…e used outside peerdb
…rnally at PeerDB and made available for external users of flow/pkg to use
…act. This allows skipping checks that can happen upstream in the API calls chain.
…ly required. Absent is not skipping is actually what we want, this commit simplifies the protocol
This reverts commit 2eaba06.
…nal on the boolean flag is not actually required. Absent is not skipping is actually what we want, this commit simplifies the protocol" This reverts commit a389643.
26d449b to
4905fec
Compare
❌ 2 Tests Failed:
View the top 2 failed test(s) by shortest run time
To view more test analytics, go to the Test Analytics Dashboard |
❌ Test FailureAnalysis: Deterministic build failure: |
it's redundant and checks a piece of infrastructure which is static and reliable.
❌ Test FailureAnalysis: Real build failure: missing go.sum entries for cloud.google.com/go/kms/apiv1 and cloud.google.com/go/kms/apiv1/kmspb imported in internal/env.go — |
…y to limit the changeset
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. |
❌ Test FailureAnalysis: Test_PartitionBy fails deterministically across all four ClickHouse test suites with a consistent assertion mismatch — expected "num" but got "(num)" — indicating a real regression in partition key expression formatting, not a flaky failure. |
❌ Test FailureAnalysis: Real bug: |
🔄 Flaky Test DetectedAnalysis: The e2e test suite timed out after exactly 900 seconds (the configured ✅ Automatically retrying the workflow |
❌ Test FailureAnalysis: Test_PartitionBy fails deterministically across all 4 test variants with a consistent assertion mismatch (expected "num" but got "(num)" from ClickHouse system.tables.partition_key), indicating a real behavioral change — likely from a ClickHouse version upgrade — rather than a flaky test. |
❌ Test FailureAnalysis: Test_PartitionBy fails deterministically across all 4 ClickHouse test suites with an identical assertion error (expected column name "num" but got "(num)"), indicating a real regression in how PARTITION BY expressions are formatted or parsed. |
| gopkg.in/natefinch/lumberjack.v2 v2.2.1 // indirect | ||
| gopkg.in/yaml.v3 v3.0.1 // indirect | ||
| ) | ||
| ) No newline at end of file |
| // To enable it back, just replace `objectstore.NoStagingValidator` | ||
| // with `c.staging.Validate`. | ||
| if err := peerdb_clickhouse.ValidateClickHousePeer( | ||
| ctx, c.logger, allowedDomains, c.Config.Host, c.database, objectstore.NoStagingValidator, |
There was a problem hiding this comment.
actually, because of OSS validation flow via /api/v1/peers/validate, we probably want to set this to c.staging.Validate still; since with OSS user set up staging themselves and and validation is still useful.
(the side-effect of this is that CreatePeer will re-validate object store, which is fine for now as we have been running this pattern for a while).
The alternative is passing in an option to CreatePeerRequest to skip object storage validation and thread it thru, but for now I think it's okay to just validate during peer creation
This Pull Request introduces the possibility of performing ClickHouse peers validations outside PeerDB while
offering maximum re-utilization of the validation logic.
To achieve this, it offers four main changes:
Staging bucket validation logic is abstracted out behind the
StagingValidatortype which is implemented as a function taking a context and potentially returning an error. The logic used at PeerDB to implement this function is shared atflow/pkgfor external code to use throughNewS3StagingValidatorandNewGCSStagingValidatorfactory methods.In a similar vein, accepted ClickHouse hosts becomes a parameter for the ClickHouse PeerValidation logic.
Which in turn is shared at
flow/pkgthrough theValidateClickHousePeermethod. This method is also used within PeerDB when peer validation is not disabled.Hence, peer validation can be now disabled for any peer in the gRPC methodsWe assume validation happens both upstream and at PeerDB provision sideCreatePeerandValidatePeerusing a new optional parameter:disable_connector_validationWe have disabled bucket checks at ClickHouse validation time, this was actually testing static infrastructure.
Part of DBI-65