Skip to content

Load testing with new Kafka offset logic and update documentation#13

Merged
AndrewAct merged 1 commit into
mainfrom
dev
May 18, 2026
Merged

Load testing with new Kafka offset logic and update documentation#13
AndrewAct merged 1 commit into
mainfrom
dev

Conversation

@AndrewAct

Copy link
Copy Markdown
Owner

What was wrong

  • Kafka's offset did not work and the KafkaDeserializer did not function as expected

Fix

Replace the broken KafkaSource.builder() approach with Flink SQL metadata columns:

  CREATE TEMPORARY TABLE raw_orderbook_stream (
      payload         STRING,
      kafka_partition INT    METADATA FROM 'partition' VIRTUAL,
      kafka_offset    BIGINT METADATA FROM 'offset'    VIRTUAL
  ) WITH ('format' = 'raw', ...)

Bridged to DataStream via t_env.to_append_stream(). Real partition/offset still stored in Iceberg.

Also cleaned up:

  • Deleted flink/jobs/lib/kafka_schema.py (Phase 5 merge leftover, should've been removed)
  • Added flink/.dockerignore and root .dockerignore — host Python 3.13 pycache was being copied into Python
    3.10 Flink containers

Test plan

  • 196 tests pass
  • All 6 init containers exit 0; full stack healthy
  • Airflow + Spark E2E — pending

@AndrewAct AndrewAct merged commit 923dd08 into main May 18, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant