fix(ingest/teradata): correct nullable and autoincrement hydration for CHAR(N)-padded values by JohnRTurner · Pull Request #17603 · datahub-project/datahub

JohnRTurner · 2026-05-27T18:07:31Z

Summary

teradatasqlalchemy._get_column_info uses strict equality / membership checks that don't tolerate CHAR(N) space-padding:

'nullable': row['Nullable'] == 'Y' — Teradata returns 'Y ' / 'N ' (CHAR(1) padded), so the check evaluates False for every column. Genuinely nullable columns hydrate as nullable: false everywhere.
autoinc = row['IdColType'] in ('GA', 'GD') — Teradata returns 'GA ' / 'GD ' (CHAR(4) padded), so the check evaluates False for every identity column. Identity columns hydrate as autoincrement: false.

Both bugs reproduce on a stock Teradata install with teradatasqlalchemy==20.0.0.2, no special configuration. The nullable bug is what surfaced the report (customer ticket: every column appears non-nullable); the IdColType bug is a latent sibling found by auditing other CHAR field accesses in the same code path.

Fix

After self._get_column_info(row) in optimized_get_columns, re-derive nullable and autoincrement from the raw row fields with whitespace-stripping. Shared _read_padded_char_field helper handles SQLAlchemy Row attribute access and dict access. Both helpers are no-ops when the raw value isn't a string (e.g. NULL — most often a view column without QVCI, where the metadata is genuinely unknown and guessing either direction would be wrong).

Test plan

Unit: new TestCharPaddingFixes class with 7 focused tests covering padded 'Y '/'N ', clean 'Y', dict-row vs Row-attr access, padded 'GA ' and 'GD ', and IdColType=None no-op.
Full tests/unit/test_teradata_source.py: 112 passing.
./gradlew :metadata-ingestion:lintFix: clean.
End-to-end against a real local Teradata: padded 'Y ' now hydrates as nullable=True; padded 'GA ' now hydrates as autoincrement=True.

What this does and doesn't cover

✅ Padded 'Y ' / 'N ' for Nullable (the primary customer-reported bug)
✅ Padded 'GA ' / 'GD ' for IdColType (autoincrement detection)
❌ View columns where Nullable is genuinely None (e.g., views without QVCI enabled on Teradata) — the metadata isn't available, and the helper correctly does not guess. Addressed customer-side via QVCI on Teradata.

Out of scope

Other padded CHAR fields the library handles internally (ColumnType, ColumnName, etc.) — these flow through normalize_name or _resolve_type in the library, which already strip. Audited and confirmed no user-visible breakage from those.

🤖 Generated with Claude Code

github-actions · 2026-05-27T18:07:47Z

Linear: ING-2769

Thanks for your contribution! We have created an internal ticket to track this PR. A member of the core DataHub team will be assigned to review it within the next few business days - you will get a follow-up comment once a reviewer is assigned.

…alues Teradata returns dbc.ColumnsV.Nullable and HELP COLUMN's Nullable column as CHAR(1) space-padded ('Y ' / 'N '). teradatasqlalchemy's _get_column_info checks `row['Nullable'] == 'Y'` strictly, which evaluates False for every column — so all genuinely nullable columns hydrate as nullable=False. Override col_info['nullable'] with a defensive parse of the raw field after _get_column_info. Leaves the value alone when the raw field is None or unparseable (e.g. view columns without QVCI, where the metadata is genuinely unknown and guessing would be wrong in either direction). Reproduced against a stock Teradata instance with teradatasqlalchemy 20.0.0.2: dbc.ColumnsV returns 'Y ' for nullable table columns, and HELP COLUMN returns 'N ' for non-nullable view columns. The latter happens to evaluate correctly to False, which is why the bug only manifests when the correct answer is True. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Reduce the helper to a minimal _strip_padded_nullable that handles only the realistic row shapes (SQLAlchemy Row attribute access, dict) and stripped- equality check. Drop the speculative bytes/case handling — Teradata returns uppercase strings; bytes never happen in practice. Replace the 24 helper-level unit tests with 4 focused integration tests that exercise the fix through optimized_get_columns, matching the existing test pattern in this file and exercising the actual code path that the customer-reported bug runs through. Net: 16 lines removed (53 -> ~20 in source, 149 -> tests cleaned). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Same root cause as the Nullable fix in this PR. teradatasqlalchemy does `row['IdColType'] in ('GA', 'GD')` strictly, but Teradata returns IdColType space-padded as 'GA ' (CHAR(4)), so the check evaluates False for every identity column on a Teradata system — silently breaking autoincrement detection across the connector. Reproduced live: every identity column in dbc.ColumnsV on a stock Teradata returns raw IdColType='GA '. Before the fix, `optimized_get_columns` reports autoincrement=False on the column TDStats.StatsTbl.SCOID; after the fix, it correctly reports True. Factored the row-field read into a small _read_padded_char_field helper shared between the nullable and autoincrement strippers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions Bot added ingestion PR or Issue related to the ingestion of metadata community-contribution PR or Issue raised by member(s) of DataHub Community labels May 27, 2026

vercel Bot deployed to Preview May 27, 2026 18:27 View deployment

JohnRTurner changed the title ~~fix(ingest/teradata): correct nullable hydration for CHAR(1)-padded values~~ fix(ingest/teradata): correct nullable and autoincrement hydration for CHAR(N)-padded values May 27, 2026

vercel Bot deployed to Preview May 27, 2026 18:59 View deployment

maggiehays added the needs-review Label for PRs that need review from a maintainer. label May 27, 2026

JohnRTurner and others added 3 commits June 2, 2026 08:32

treff7es force-pushed the fix/teradata-nullable-char-padding branch from 4419186 to a9b1cb2 Compare June 2, 2026 06:35

treff7es approved these changes Jun 2, 2026

View reviewed changes

treff7es enabled auto-merge (squash) June 2, 2026 06:37

vercel Bot deployed to Preview June 2, 2026 06:50 View deployment

treff7es merged commit 2eea7f2 into datahub-project:master Jun 2, 2026
49 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ingest/teradata): correct nullable and autoincrement hydration for CHAR(N)-padded values#17603

fix(ingest/teradata): correct nullable and autoincrement hydration for CHAR(N)-padded values#17603
treff7es merged 3 commits into
datahub-project:masterfrom
JohnRTurner:fix/teradata-nullable-char-padding

JohnRTurner commented May 27, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

JohnRTurner commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Fix

Test plan

What this does and doesn't cover

Out of scope

Uh oh!

github-actions Bot commented May 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

JohnRTurner commented May 27, 2026 •

edited

Loading