fix(ingest/teradata): correct nullable and autoincrement hydration for CHAR(N)-padded values#17603
Merged
treff7es merged 3 commits intoJun 2, 2026
Conversation
Contributor
|
Linear: ING-2769 Thanks for your contribution! We have created an internal ticket to track this PR. A member of the core DataHub team will be assigned to review it within the next few business days - you will get a follow-up comment once a reviewer is assigned. |
…alues
Teradata returns dbc.ColumnsV.Nullable and HELP COLUMN's Nullable column as
CHAR(1) space-padded ('Y ' / 'N '). teradatasqlalchemy's _get_column_info
checks `row['Nullable'] == 'Y'` strictly, which evaluates False for every
column — so all genuinely nullable columns hydrate as nullable=False.
Override col_info['nullable'] with a defensive parse of the raw field after
_get_column_info. Leaves the value alone when the raw field is None or
unparseable (e.g. view columns without QVCI, where the metadata is genuinely
unknown and guessing would be wrong in either direction).
Reproduced against a stock Teradata instance with teradatasqlalchemy
20.0.0.2: dbc.ColumnsV returns 'Y ' for nullable table columns, and HELP
COLUMN returns 'N ' for non-nullable view columns. The latter happens to
evaluate correctly to False, which is why the bug only manifests when the
correct answer is True.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reduce the helper to a minimal _strip_padded_nullable that handles only the realistic row shapes (SQLAlchemy Row attribute access, dict) and stripped- equality check. Drop the speculative bytes/case handling — Teradata returns uppercase strings; bytes never happen in practice. Replace the 24 helper-level unit tests with 4 focused integration tests that exercise the fix through optimized_get_columns, matching the existing test pattern in this file and exercising the actual code path that the customer-reported bug runs through. Net: 16 lines removed (53 -> ~20 in source, 149 -> tests cleaned). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Same root cause as the Nullable fix in this PR. teradatasqlalchemy does
`row['IdColType'] in ('GA', 'GD')` strictly, but Teradata returns IdColType
space-padded as 'GA ' (CHAR(4)), so the check evaluates False for every
identity column on a Teradata system — silently breaking autoincrement
detection across the connector.
Reproduced live: every identity column in dbc.ColumnsV on a stock Teradata
returns raw IdColType='GA '. Before the fix, `optimized_get_columns`
reports autoincrement=False on the column TDStats.StatsTbl.SCOID; after
the fix, it correctly reports True.
Factored the row-field read into a small _read_padded_char_field helper
shared between the nullable and autoincrement strippers.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4419186 to
a9b1cb2
Compare
treff7es
approved these changes
Jun 2, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
teradatasqlalchemy._get_column_infouses strict equality / membership checks that don't tolerate CHAR(N) space-padding:'nullable': row['Nullable'] == 'Y'— Teradata returns'Y '/'N '(CHAR(1) padded), so the check evaluatesFalsefor every column. Genuinely nullable columns hydrate asnullable: falseeverywhere.autoinc = row['IdColType'] in ('GA', 'GD')— Teradata returns'GA '/'GD '(CHAR(4) padded), so the check evaluatesFalsefor every identity column. Identity columns hydrate asautoincrement: false.Both bugs reproduce on a stock Teradata install with
teradatasqlalchemy==20.0.0.2, no special configuration. The nullable bug is what surfaced the report (customer ticket: every column appears non-nullable); the IdColType bug is a latent sibling found by auditing other CHAR field accesses in the same code path.Fix
After
self._get_column_info(row)inoptimized_get_columns, re-derivenullableandautoincrementfrom the raw row fields with whitespace-stripping. Shared_read_padded_char_fieldhelper handles SQLAlchemy Row attribute access and dict access. Both helpers are no-ops when the raw value isn't a string (e.g. NULL — most often a view column without QVCI, where the metadata is genuinely unknown and guessing either direction would be wrong).Test plan
TestCharPaddingFixesclass with 7 focused tests covering padded'Y '/'N ', clean'Y', dict-row vs Row-attr access, padded'GA 'and'GD ', andIdColType=Noneno-op.tests/unit/test_teradata_source.py: 112 passing../gradlew :metadata-ingestion:lintFix: clean.'Y 'now hydrates asnullable=True; padded'GA 'now hydrates asautoincrement=True.What this does and doesn't cover
✅ Padded
'Y '/'N 'forNullable(the primary customer-reported bug)✅ Padded
'GA '/'GD 'forIdColType(autoincrement detection)❌ View columns where
Nullableis genuinelyNone(e.g., views without QVCI enabled on Teradata) — the metadata isn't available, and the helper correctly does not guess. Addressed customer-side via QVCI on Teradata.Out of scope
Other padded CHAR fields the library handles internally (
ColumnType,ColumnName, etc.) — these flow throughnormalize_nameor_resolve_typein the library, which already strip. Audited and confirmed no user-visible breakage from those.🤖 Generated with Claude Code