Skip to content

V143 json#10

Open
tolivedxb wants to merge 18 commits into
masterfrom
v143-json
Open

V143 json#10
tolivedxb wants to merge 18 commits into
masterfrom
v143-json

Conversation

@tolivedxb
Copy link
Copy Markdown

@tolivedxb tolivedxb commented Sep 4, 2025

Note

High Risk
High risk because it substantially refactors the BigQuery producer’s concurrency/batching/acknowledgement behavior and changes the Docker runtime base image and packaging, which can affect throughput, ordering, and operational stability.

Overview
BigQuery producer behavior is overhauled to support configurable bigquery_threads, multi-worker ingestion, and batch appends (up to BATCH_SIZE / approximate byte limits) with a scheduled flush; callbacks now complete per-row within a batch and retry at the batch level.

BigQuery output handling is extended by filtering bq_inserted_at from the write schema, adding support for BigQuery JSON field type conversion, skipping oversized records, and optionally clearing configured columns via cleared_columns env var.

Build/runtime changes update the default Kafka client version to 3.7.2 and switch the Docker runtime image to eclipse-temurin:23-jre-noble, adding debugging/profiling tooling (apt utilities + async-profiler) and changing how /REVISION is produced (no longer copied from builder).

Reviewed by Cursor Bugbot for commit 5a0cc82. Bugbot is set up for automated code reviews on this repo. Configure here.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 6 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 5a0cc82. Configure here.

int numWorkers = this.workers.size();
TableSchema tableSchema = getTableSchema(tableName);
// Create and start workers
for (int i = 0; i < Math.max(1, numWorkers); i++) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

startWorkers always creates exactly one worker

High Severity

startWorkers reads numWorkers from this.workers.size(), but this.workers was just initialized as new ArrayList<>(bigqueryThreads) — which sets capacity, not size. So numWorkers is always 0, and Math.max(1, numWorkers) is always 1. The bigqueryThreads configuration is effectively ignored and only a single worker is ever started.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 5a0cc82. Configure here.

this.attemptBatch(this.appendContext);
this.appendContext = null;
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Race condition on appendContext outside synchronized block

High Severity

this.appendContext is read and modified at lines 370–373 without holding the lock, while the scheduleAttempt scheduled task concurrently calls attemptBatch(this.appendContext) and sets this.appendContext = null inside the lock on a different thread. This creates a race where the worker thread can dereference a null appendContext, causing a NullPointerException, or concurrently mutate the same AppendContext (e.g., addRow vs append).

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 5a0cc82. Configure here.

return;
} catch (Exception e) {
System.out.format("Failed to retry append: %s\n", e);
System.out.format("Worker {} Failed to retry append: %s\n", parent.getWorkerId(), e);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

System.out.format uses wrong format placeholders

Low Severity

System.out.format("Worker {} Failed to retry append: %s\n", parent.getWorkerId(), e) mixes SLF4J-style {} with printf-style %s. Since System.out.format uses printf conventions, {} is literal text, %s consumes parent.getWorkerId(), and the exception e is silently ignored. The error message will be misleading and the exception details will be lost.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 5a0cc82. Configure here.

Comment thread Dockerfile
&& tar -xzf /tmp/async-profiler.tar.gz -C /opt \
&& rm /tmp/async-profiler.tar.gz
ENV ASYNC_PROFILER_HOME=/opt/async-profiler-${ASYNC_PROFILER_VERSION}-linux-x64
ENV PATH="$PATH:${ASYNC_PROFILER_HOME}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dockerfile runs as root with debug tools installed

Medium Severity

The non-root USER 1000 directive is commented out, and development/debug tools (htop, procps, python3-pip, wget, async-profiler) are installed in the production image. The MAXWELL_VERSION variable used in echo "$MAXWELL_VERSION" > /REVISION is not defined in this build stage (only in the builder stage), so /REVISION will be empty. These appear to be development-time changes not intended for production.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 5a0cc82. Configure here.

try {
synchronized (this.getLock()) {
this.attemptBatch(this.appendContext);
this.appendContext = null; // Nullify after attempting via scheduler
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scheduled task references field instead of captured parameter

High Severity

Inside scheduleAttempt, the scheduled lambda uses this.appendContext (the instance field) instead of the captured appendContext parameter. When the task fires after 1 minute, the field may have already been flushed and set to null (causing NPE in attemptBatch), or it may point to a completely different AppendContext that's still being built — causing a premature or duplicate flush of the wrong batch.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 5a0cc82. Configure here.

Comment thread Dockerfile

RUN useradd -u 1000 maxwell -d /app
RUN chown 1000:1000 /app
RUN echo "$MAXWELL_VERSION" > /REVISION
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dockerfile writes empty REVISION in second stage

Low Severity

RUN echo "$MAXWELL_VERSION" > /REVISION runs in the second build stage where MAXWELL_VERSION is not defined (it's an ENV only in the builder stage). This writes an empty string to /REVISION, whereas the original COPY --from=builder /REVISION /REVISION correctly propagated the version.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 5a0cc82. Configure here.

@cursor
Copy link
Copy Markdown

cursor Bot commented Apr 17, 2026

You have used all of your free Bugbot PR reviews.

To receive reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants