Skip to content

Semprini/md-ddl

Repository files navigation

Markdown Data Definition Language (MD-DDL)

CC BY 4.0

Version 0.9.2

Model once. Reuse everywhere.

MD-DDL is a Markdown-native standard for defining what data means, where it comes from, and how it is governed — then generating physical artifacts from a single source of truth that humans and AI agents share.

md-ddl is: AI‑native · Human‑friendly · Version‑controlled · Semantically rich · Ready for automation

Read the spec: 1-Foundation.md or MD-DDL-Complete.md for single-file AI context


What MD-DDL covers

  • Domain layer — domains, entities, enums, relationships, events, and constraints
  • Source layer — source system declarations and column-level transformation rules (direct, derived, conditional, lookup, reconciliation, aggregation)
  • Data products — source-aligned, domain-aligned, and consumer-aligned products declaring scope, shape, consumers, SLA, governance, and masking — driving automated artifact generation
  • Governance — classification, PII, retention, regulatory scope, access roles, and masking strategies living with the model, not in a separate system
  • Physical artifacts — dimensional star schemas, normalized 3NF DDL, wide-column schemas, knowledge graph (Cypher), JSON Schema, Parquet contracts

Quick Start

Start a new project using the bootstrap script — it sets up git, adds MD-DDL as a submodule, and installs the agent wrappers for your AI tool in one step.

Bash (macOS / Linux / WSL):

bash <(curl -fsSL https://raw.githubusercontent.com/Semprini/md-ddl/main/start-project.sh)

PowerShell (Windows):

Invoke-Expression (Invoke-WebRequest https://raw.githubusercontent.com/Semprini/md-ddl/main/start-project.ps1).Content

Or download start-project.sh / start-project.ps1 and run them locally.


Learn by conversation: MD-DDL includes Agent Guide an AI learning companion available from the repo via Claude or CoPilot in VS Code. It adapts to your role and goals, teaches through discussion rather than documentation, and routes you to the right specialist agent when you're ready to work.

Example prompts (Claude AI uses /agent-guide, CoPilot uses @agent-guide):

/agent-guide I'm new to MD-DDL — walk me through the key concepts and help me get started.
@agent-guide I'm a data architect at a retail bank. We have 15+ legacy source systems and no canonical data model. Give me an overview of MD-DDL and help me decide where to start.
/agent-guide I need to model a Customer domain. We track individuals and business accounts. Walk me through the MD-DDL approach.

Workflow

md-ddl is not rigid or dogmatic. A typical flow is:

  1. Position — discuss the architectural approach with Agent Architect: compare to alternatives, prepare material for governance councils or CIOs
  2. Discover — scope the domain with Agent Ontology: identify entities, relationships, events, and governance posture
  3. Model — write domain.md, entity files, enums, and events
  4. Map sources — declare source systems and column-level transforms
  5. Publish — declare data products with scope, shape, SLA, and masking
  6. Generate — produce physical artifacts with Agent Artifact
  7. Govern — audit standards conformance and regulatory posture with Agent Governance

Agent Guide helps you navigate between these stages and explains any concept along the way.


Using MD-DDL in your project

MD-DDL is designed to be used as a git submodule dependency. Your model files live in your own repository; MD-DDL provides the specification, agents, and examples.

Manual setup

If you prefer not to use the scripts and set up manually:

mkdir myproject
cd myproject
git init
git submodule add https://github.com/Semprini/md-ddl .md-ddl
git submodule update --init

Then copy the agent wrappers for your AI tool:

  • Copilot: .md-ddl/.github/agents/*.agent.md.github/agents/
  • Claude: .md-ddl/.claude/commands/*.md.claude/commands/

If you use Claude, you need to update ./claude/commands/*.md files. The agents/ path needs to be .md-ddl/agents

Next, create your copilot-instructions.md or CLAUDE.md. See the start project scripts for examples.

Update MD-DDL to a new release later:

git submodule update --remote .md-ddl

Suggested project layout

your-project/
  .md-ddl/                   ← submodule (this repo)
  .github/agents/            ← Copilot agent wrappers  (Copilot users)
  .claude/commands/          ← Claude slash commands    (Claude users)
  domains/
    customer/
      domain.md
      entities/
      products/
  sources/
    salesforce-crm/
      source.md
      transforms/

Examples

Five reference domains at increasing complexity:

Example Focus Complexity
Simple Customer Minimal — one domain, three entities, one event Starter
Financial Crime AML/KYC/CTF — BIAN alignment, 15+ entities, sources, products, generated artifacts Intermediate
Healthcare FHIR R4 — HIPAA governance, source transforms, knowledge-graph product Intermediate
Telecom TM Forum ODA — PCI-DSS, associative entities, new relationship types, dimensional product Advanced
Retail Sales + Retail Service Bounded Context — two greenfield domains defining Customer differently, cross-domain Customer 360 Advanced

The feature coverage matrix maps every spec feature to the example that demonstrates it.


Repository layout

md-ddl-specification/        Normative standard
  1-Foundation.md            Start here to understand the model
  2-Domains.md … 10-Adoption.md
  MD-DDL-Complete.md         Single-file version for AI context windows

agents/                      Canonical agent prompts and skills
  agent-guide/               Learning companion and navigator
  agent-ontology/            Domain modelling and source mapping
  agent-artifact/            Physical schema generation
  agent-architect/           Architecture philosophy, data product design, ODPS
  agent-governance/          Standards conformance and compliance auditing

examples/                    Reference examples
  Simple Customer/
  Financial Crime/
  Healthcare/
  Telecom/
  Retail Sales/
  Retail Service/

references/                  Architecture and industry reference data
  industry_standards/        BIAN, FHIR, TM Forum reference datasets
  architecture/              Data Autonomy blog series, external references, Mermaid diagrams

This work is licensed under a Creative Commons Attribution 4.0 International License.

CC BY 4.0

About

Markdown Data Definition Language (MD‑DDL) - a human‑ and AI‑friendly way to model data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors