Skip to content

Data documentation using YAML #430

@jesper-friis

Description

@jesper-friis

We need a simple way to populate the knowledge base with documentation of data resources (sources and sinks). Below is a suggested YAML format for documenting a set of resources in a single YAML document. The idea is to create a Python API and a tool that takes this YAML and a base IRI as input and populate the knowledge base with the documented data resources.

The keyword partial_pipelines is too implementation specific. I would prefer data_resources.

---
version: 1.0

partial_pipelines:  # Find a better name. `data_resources` would be ideal, except that it may create confusion with the `dataresource` strategy below.
  TEM-BF-image1:   # This name will be prepended a `base_iri` when creating an IRI for the data resource in the knowledge base.
    dataresource:
      downloadURL: http://...
      mediaType: text/csv
      ...
    parse:
      parserType: dlite-parse
      ...
    mapping:
      mappingType: mappings
      prefixes:
        ...
      triples:
        ...
        
  TEM-BF-image2:
    dataresource:
      ...
    parse:
      ...
    mappings:
      ...

  my-data-sink:
    mappings:
      ...
    generate:
      ...
    dataresource:
      downloadURL: http://...
      mediaType: text/csv
      ```

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions