Hi Team,
I would like to request support for loading historical CDC (Change Data Capture) data as part of a typical ingestion pattern from SAP DI sources.
Use Case
My use case involves:
- Initial load (historical backfill) of CDC data
- Followed by incremental CDC ingestion
Current Approach
I am currently implementing this using:
- Append flow for CDC ingestion → working as expected
- Append flow with
once=True for initial backfill, based on official Databricks guidance:
https://learn.microsoft.com/en-us/azure/databricks/ldp/flows-backfill
This approach works successfully when using Spark Declarative Pipelines.
Challenge in DLT-META
In DLT-META, I am facing limitations because:
- It currently does not support batch reads from Parquet or other file formats
- Batch Supported formats appear limited to:
- Delta
- Snapshot-based ingestion
Due to this limitation:
- I am unable to implement the initial historical load using append flow (
once=True)
- This blocks a standard CDC ingestion pattern (Initial Load + Incremental CDC)
Expected Behavior / Feature Request
It would be very helpful if DLT-META could support:
- Batch ingestion from Parquet (and potentially other file formats)
- Compatibility with append flows using
once=True for backfill scenarios
- A unified pattern to support:
- Initial historical load
- Continuous CDC ingestion
Questions
- Is this functionality currently supported in any way within DLT-META that I may have missed?
- Are there any recommended workarounds for implementing this pattern except using apply changes?
- Is there any plan to include this capability in the DLT-META roadmap?
Additional Context
Contribution
If this feature is not yet supported and roadmap, I would be happy to:
- Contribute to the implementation
- Collaborate on design or testing
Impact
This feature would enable:
- Standardized CDC ingestion patterns
- Better support for CDC sources like SAP DI, Kafka, Cloud Sources, event hubs, kinesis.
- Greater flexibility in handling historical data loads
Thanks in advance for your guidance and support!
Hi Team,
I would like to request support for loading historical CDC (Change Data Capture) data as part of a typical ingestion pattern from SAP DI sources.
Use Case
My use case involves:
Current Approach
I am currently implementing this using:
once=Truefor initial backfill, based on official Databricks guidance:https://learn.microsoft.com/en-us/azure/databricks/ldp/flows-backfill
This approach works successfully when using Spark Declarative Pipelines.
Challenge in DLT-META
In DLT-META, I am facing limitations because:
Due to this limitation:
once=True)Expected Behavior / Feature Request
It would be very helpful if DLT-META could support:
once=Truefor backfill scenariosQuestions
Additional Context
https://learn.microsoft.com/en-us/azure/databricks/ldp/flows-backfill
Contribution
If this feature is not yet supported and roadmap, I would be happy to:
Impact
This feature would enable:
Thanks in advance for your guidance and support!