Skip to content

Add member ID 32-bit int type validation for Iceberg table schemas#475

Open
li-ukumar wants to merge 1 commit intolinkedin:mainfrom
li-ukumar:ukumar/block-32bit-member-id
Open

Add member ID 32-bit int type validation for Iceberg table schemas#475
li-ukumar wants to merge 1 commit intolinkedin:mainfrom
li-ukumar:ukumar/block-32bit-member-id

Conversation

@li-ukumar
Copy link
Member

Summary

  • Adds validateMemberIdColumnTypes() to OpenHouseTablesApiValidator that rejects Iceberg table schemas using INTEGER type for member identity columns (memberId, actorId, profileId, customerId, sourceId, destId, etc.)
  • Member IDs will overflow 32-bit int max (2^31) — all member identity columns must use LONG
  • Integrated into both validateCreateTable() and validateUpdateTable()

Testing Done

  • Unit tests: long memberId column passes
  • Unit tests: int memberId column rejected (create path)
  • Unit tests: non-member-ID int column (retryCount) passes
  • Unit tests: new patterns (customerId, sourceId, destId) rejected
  • Unit tests: exact pattern mid rejected
  • Unit tests: underscore normalization (member_id) rejected
  • Unit tests: int memberId rejected via update path
  • ./gradlew :services:tables:build passes

Adds validateMemberIdColumnTypes() to OpenHouseTablesApiValidator that
rejects Iceberg table schemas using INTEGER type for member identity
columns (memberId, actorId, profileId, customerId, sourceId, destId,
etc.). Member IDs will overflow 32-bit int — all must use LONG.

Integrated into both validateCreateTable() and validateUpdateTable().

See go/project-2b for details.
validationFailures.addAll(
validateUpdateTimestampForReplicatedTable(createUpdateTableRequestBody));
if (createUpdateTableRequestBody.getSchema() != null) {
validateMemberIdColumnTypes(createUpdateTableRequestBody.getSchema(), validationFailures);
Copy link
Member

@abhisheknath2011 abhisheknath2011 Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this is specific usecase for LInkedIn memberId. Can we have this validation in the internal li repo as the validation is specific to LinkedIn usecase considering this is OSS codebase? We would need to extend OpenHouseTablesApiValidator in the internal repo and that can be enabled by adding something like @primary annotation.

@cbb330
Copy link
Collaborator

cbb330 commented Mar 2, 2026

this doesn't belong in api layer. it should live next to the existing schema validation logic on the server. check if that exists at all in the linkedin implementation.

also, what is the rollout plan? just fail users? thats not acceptable. we need an answer to:

  1. existing tables
  2. new tables

and minimum shippable code would have debug / log info only.

also need to attach e2e tests with spark or create table curl requests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants