Skip to content

fix(ingest/teradata): correct nullable and autoincrement hydration for CHAR(N)-padded values#17603

Merged
treff7es merged 3 commits into
datahub-project:masterfrom
JohnRTurner:fix/teradata-nullable-char-padding
Jun 2, 2026
Merged

fix(ingest/teradata): correct nullable and autoincrement hydration for CHAR(N)-padded values#17603
treff7es merged 3 commits into
datahub-project:masterfrom
JohnRTurner:fix/teradata-nullable-char-padding

Conversation

@JohnRTurner
Copy link
Copy Markdown
Contributor

@JohnRTurner JohnRTurner commented May 27, 2026

Summary

teradatasqlalchemy._get_column_info uses strict equality / membership checks that don't tolerate CHAR(N) space-padding:

  • 'nullable': row['Nullable'] == 'Y' — Teradata returns 'Y ' / 'N ' (CHAR(1) padded), so the check evaluates False for every column. Genuinely nullable columns hydrate as nullable: false everywhere.
  • autoinc = row['IdColType'] in ('GA', 'GD') — Teradata returns 'GA ' / 'GD ' (CHAR(4) padded), so the check evaluates False for every identity column. Identity columns hydrate as autoincrement: false.

Both bugs reproduce on a stock Teradata install with teradatasqlalchemy==20.0.0.2, no special configuration. The nullable bug is what surfaced the report (customer ticket: every column appears non-nullable); the IdColType bug is a latent sibling found by auditing other CHAR field accesses in the same code path.

Fix

After self._get_column_info(row) in optimized_get_columns, re-derive nullable and autoincrement from the raw row fields with whitespace-stripping. Shared _read_padded_char_field helper handles SQLAlchemy Row attribute access and dict access. Both helpers are no-ops when the raw value isn't a string (e.g. NULL — most often a view column without QVCI, where the metadata is genuinely unknown and guessing either direction would be wrong).

Test plan

  • Unit: new TestCharPaddingFixes class with 7 focused tests covering padded 'Y '/'N ', clean 'Y', dict-row vs Row-attr access, padded 'GA ' and 'GD ', and IdColType=None no-op.
  • Full tests/unit/test_teradata_source.py: 112 passing.
  • ./gradlew :metadata-ingestion:lintFix: clean.
  • End-to-end against a real local Teradata: padded 'Y ' now hydrates as nullable=True; padded 'GA ' now hydrates as autoincrement=True.

What this does and doesn't cover

✅ Padded 'Y ' / 'N ' for Nullable (the primary customer-reported bug)
✅ Padded 'GA ' / 'GD ' for IdColType (autoincrement detection)
❌ View columns where Nullable is genuinely None (e.g., views without QVCI enabled on Teradata) — the metadata isn't available, and the helper correctly does not guess. Addressed customer-side via QVCI on Teradata.

Out of scope

Other padded CHAR fields the library handles internally (ColumnType, ColumnName, etc.) — these flow through normalize_name or _resolve_type in the library, which already strip. Audited and confirmed no user-visible breakage from those.

🤖 Generated with Claude Code

@github-actions github-actions Bot added ingestion PR or Issue related to the ingestion of metadata community-contribution PR or Issue raised by member(s) of DataHub Community labels May 27, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Linear: ING-2769

Thanks for your contribution! We have created an internal ticket to track this PR. A member of the core DataHub team will be assigned to review it within the next few business days - you will get a follow-up comment once a reviewer is assigned.

@JohnRTurner JohnRTurner changed the title fix(ingest/teradata): correct nullable hydration for CHAR(1)-padded values fix(ingest/teradata): correct nullable and autoincrement hydration for CHAR(N)-padded values May 27, 2026
@maggiehays maggiehays added the needs-review Label for PRs that need review from a maintainer. label May 27, 2026
JohnRTurner and others added 3 commits June 2, 2026 08:32
…alues

Teradata returns dbc.ColumnsV.Nullable and HELP COLUMN's Nullable column as
CHAR(1) space-padded ('Y ' / 'N '). teradatasqlalchemy's _get_column_info
checks `row['Nullable'] == 'Y'` strictly, which evaluates False for every
column — so all genuinely nullable columns hydrate as nullable=False.

Override col_info['nullable'] with a defensive parse of the raw field after
_get_column_info. Leaves the value alone when the raw field is None or
unparseable (e.g. view columns without QVCI, where the metadata is genuinely
unknown and guessing would be wrong in either direction).

Reproduced against a stock Teradata instance with teradatasqlalchemy
20.0.0.2: dbc.ColumnsV returns 'Y ' for nullable table columns, and HELP
COLUMN returns 'N ' for non-nullable view columns. The latter happens to
evaluate correctly to False, which is why the bug only manifests when the
correct answer is True.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reduce the helper to a minimal _strip_padded_nullable that handles only the
realistic row shapes (SQLAlchemy Row attribute access, dict) and stripped-
equality check. Drop the speculative bytes/case handling — Teradata returns
uppercase strings; bytes never happen in practice.

Replace the 24 helper-level unit tests with 4 focused integration tests
that exercise the fix through optimized_get_columns, matching the existing
test pattern in this file and exercising the actual code path that the
customer-reported bug runs through.

Net: 16 lines removed (53 -> ~20 in source, 149 -> tests cleaned).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Same root cause as the Nullable fix in this PR. teradatasqlalchemy does
`row['IdColType'] in ('GA', 'GD')` strictly, but Teradata returns IdColType
space-padded as 'GA  ' (CHAR(4)), so the check evaluates False for every
identity column on a Teradata system — silently breaking autoincrement
detection across the connector.

Reproduced live: every identity column in dbc.ColumnsV on a stock Teradata
returns raw IdColType='GA  '. Before the fix, `optimized_get_columns`
reports autoincrement=False on the column TDStats.StatsTbl.SCOID; after
the fix, it correctly reports True.

Factored the row-field read into a small _read_padded_char_field helper
shared between the nullable and autoincrement strippers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@treff7es treff7es force-pushed the fix/teradata-nullable-char-padding branch from 4419186 to a9b1cb2 Compare June 2, 2026 06:35
@treff7es treff7es enabled auto-merge (squash) June 2, 2026 06:37
@treff7es treff7es merged commit 2eea7f2 into datahub-project:master Jun 2, 2026
49 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution PR or Issue raised by member(s) of DataHub Community ingestion PR or Issue related to the ingestion of metadata needs-review Label for PRs that need review from a maintainer.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants