Skip to content

ericlebail/tap-test-data-generator

Repository files navigation

tap-test-data-generator

This is a Singer tap that produces JSON-formatted test data following the Singer spec.

This tap generates test data complying with the JSON Schema passed as input. Useful for Data Driven Testing (DDT)

This tap:

  • Read the provided JSON schema
  • Create one stream per provided schema
  • Outputs the schema for each stream
  • Incrementally generate data based on the schema and send the generated Singer records to the data streams.

This tap uses JSON Schema Draft 7

Build status codecov Python 3.8 Python 3.9 Python 3.10.2

Sources on Github

Package on PyPI

Standard JSON schema has been extended to add required parameters for data generation

  • for "string" properties:

    • generate constant string

        "type": "string",
        "const": "constant Value"
      
    • generate empty string

        "$generator": "#/string-type/empty"
      
    • generate UUID4 using Faker UUID4

        "$generator": "#/string-type/uuid" 
      
    • generate a text with specified length ( "maxLength" is optional default value is 100) using Faker text

        "$generator": "#/string-type/text", "maxLength": 30
      
    • generate a title "Mr.","Miss", ... using Faker prefix

        "$generator": "#/string-type/title"
      
    • generate a person first name using Faker first_name

        "$generator": "#/string-type/firstName"
      
    • generate a person last name using Faker last_name

        "$generator": "#/string-type/lastName"
      
    • generate a phone number using Faker phone_number

        "$generator": "#/string-type/phone"
      
    • generate an Email address using Faker email

        "$generator": "#/string-type/email"
      
    • generate a city name using Faker city

        "$generator": "#/string-type/city"
      
    • generate a country name using Faker country

        "$generator": "#/string-type/country"
      
    • generate an ISO country code using Faker country_code

        "$generator": "#/string-type/countryCode"
      
    • generate an I18n language code using Faker language_code

        "$generator": "#/string-type/languageCode"
      
    • generate a date using Faker date_between_dates date format is YYYY-mm-dd

    minimum : the number of days from today for minimum date (default value is -30 years in days) MUST BE INTEGER (positive or negative)

    maximum : the number of days from today for maximum date (default is 0) MUST BE INTEGER (positive or negative)

          "type": "string",
          "format": "date",
          "minimum": -5,
          "maximum": 10
    
  • for "object" properties:

    • get one JSON object from the file "object-name.json" in the configured object_repository_dir directory

        "$generator": "#/object-repository/object-name"
      
    • generate empty object

        "$generator": "#/object-type/empty"
      
  • for "number" properties:

    • generate constant number

        "type": "number",
        "const": 25.00
      
    • generate null/None number

        "type": ["number", "null"],
        "const": null
      
    • generate number between

        "type": "number",
        "maximum": 1000.00,
        "minimum": 0.00
      
    • generate a random number or null/None (By default 5% of null are generated, this frequency can be configured)

        "type": ["number", "null"]
      
  • for "integer" properties:

    • generate constant integer

        "type": "integer",
        "const": 25
      
    • generate null/None integer

        "type": ["integer", "null"],
        "const": null
      
    • generate integer between

        "type": "integer",
        "maximum": 1000,
        "minimum": 0
      
    • generate a random integer or null/None (By default 5% of null are generated, this frequency can be configured)

        "type": ["integer", "null"]
      
  • Pair combination generation is available: to activate it you need to add on the property.

      "$pairwise": true
    

    this mode is available for:

    • boolean propeties
    • String properties with "Enum" or "pattern" (Warning pairwise generation on pattern can be very slow depending on your pattern complexity)
    • Object with "$generator": "#/object-repository/object-name"

Config file description:

Here is a sample config file:

    {
      "schema_dir": "schemas",
      "metadata_dir": "metadatas",
      "static_input_dir": "",
      "object_repository_dir": "object-repositories",
      "record_number": 1,
      "apply_record_number_on_pairwise": true,
      "generate_pairwise_hash": false,
      "data_locale_list": ["en_US","fr_FR"],
      "null_percent": 5,
      "stream_configs": {
        "sample": {
          "record_number": 100,
          "apply_record_number_on_pairwise": true,
          "generate_pairwise_hash": true,
          "data_locale_list": ["en_US","fr_FR"],
          "pair_generation_mode": "pairwise"
        }
      }
    }

First part is "global configuration":

  • "schema_dir" path to directory that contains JSON schema file(s).
  • "metadata_dir" path to directory that contains Singer Metadata file(s).
  • "static_input_dir" âth to directory that contains JSON static inputs file.

In those 3 directories we expect 1 file per stream, filename = .json

  • "object_repository_dir" path to the directory that contains repositories JSON files.

Second part is default configuration for all streams:

  • "record_number" : the default number of generated records (if not override)
  • "apply_record_number_on_pairwise" : boolean, if true the previous record number is generated ignoring the number of possible permutation number computed by pairwise algorithm
  • "generate_pairwise_hash" : boolean, if true a "pairwise_hash" property is added to the generated data to identify the Pair used by each record.
  • "data_locale_list" : list of locale for generated data Faker Documentation
  • "pair_generation_mode": Optional Possible values are "pairwise" (Default mode) "all_combinations" and "every_value_at_least_once"

This parameter defines the type of combination generated with the possible values of all properties marked with "$pairwise": true

- every_value_at_least_once : is the smallest combination, every value will be used at least once.
- pairwise : generates more combination compliant with [Pairwise Testing](http://pairwise.org/)
- all_combinations : is the biggest, is will generate all possible combinations of the provided values (cartesian product)
  • "null_percent": Optional frequency in percent of Null values generated.

Third part is stream specific configuration:

expected structure is:

    "stream_configs": {
        <stream-id1> : {},
        <stream-id2> : {}
    }

All values from second part (Default values) can be overridden for each stream.

Dependencies:

Example:

In order to generate the following JSON:

{
    "checked": false,
    "dimensions": {
        "width": 5,
        "height": 10
    },
    "id": 1,
    "name": "A green door",
    "color": "green",
    "price": 12.5,
    "tags": [
        "home",
        "green"
    ],
    "hour": "09:31:40 AM"
}

We first generate the JSON schema:

{
  "$schema": "http://json-schema.org/draft-07/schema",
  "$id": "http://example.com/example.json",
  "type": "object",
  "required": [
    "checked",
    "dimensions",
    "id",
    "name",
    "color",
    "price",
    "tags",
    "hour"
  ],
  "properties": {
    "checked": {
      "$id": "#/properties/checked",
      "type": "boolean"
    },
    "dimensions": {
      "$id": "#/properties/dimensions",
      "type": "object",
      "required": [
        "width",
        "height"
      ],
      "properties": {
        "width": {
          "$id": "#/properties/dimensions/properties/width",
          "type": "integer"
        },
        "height": {
          "$id": "#/properties/dimensions/properties/height",
          "type": "integer"
        }
      },
      "additionalProperties": true
    },
    "id": {
      "$id": "#/properties/id",
      "type": "integer"
    },
    "name": {
      "$id": "#/properties/name",
      "type": "string"
    },
    "color": {
      "$id": "#/properties/color",
      "type": "string",
      "enum": ["green", "yellow", "red"]
    },
    "price": {
      "$id": "#/properties/price",
      "type": "number"
    },
    "tags": {
      "$id": "#/properties/tags",
      "type": "array",
      "additionalItems": true,
      "items": {
        "$id": "#/properties/tags/items",
        "type": "string"
      }
    },
    "hour": {
      "$id": "#/properties/hour",
      "type": "string",
      "pattern": "(1[0-2]|0[1-9])(:[0-5]\\d){2} (A|P)M"
    }
  },
  "additionalProperties": true
}

Then we add the data generation details

{
  "$schema": "http://json-schema.org/draft-07/schema",
  "$id": "http://example.com/example.json",
  "type": "object",
  "required": [
    "checked",
    "dimensions",
    "id",
    "name",
    "color",
    "price",
    "tags",
    "hour"
  ],
  "properties": {
    "checked": {
      "$id": "#/properties/checked",
      "type": "boolean",
      "$pairwise": true
    },
    "dimensions": {
      "$id": "#/properties/dimensions",
      "type": "object",
      "required": [
        "width",
        "height"
      ],
      "properties": {
        "width": {
          "$id": "#/properties/dimensions/properties/width",
          "type": "integer"
        },
        "height": {
          "$id": "#/properties/dimensions/properties/height",
          "type": "integer"
        }
      },
      "additionalProperties": true,
      "$generator": "#/object-repository/dim-sample",
      "$pairwise": true
    },
    "id": {
      "$id": "#/properties/id",
      "type": "integer"
    },
    "name": {
      "$id": "#/properties/name",
      "type": "string",
      "$generator": "#/string-type/lastName"
    },
    "color": {
      "$id": "#/properties/color",
      "type": "string",
      "enum": ["green", "yellow", "red"],
      "$pairwise": true
    },
    "price": {
      "$id": "#/properties/price",
      "type": "number"
    },
    "tags": {
      "$id": "#/properties/tags",
      "type": "array",
      "additionalItems": true,
      "items": {
        "$id": "#/properties/tags/items",
        "type": "string"
      }
    },
    "hour": {
      "$id": "#/properties/hour",
      "type": "string",
      "pattern": "(1[0-2]|0[1-9])(:[0-5]\\d){2} (A|P)M"
    }
  },
  "additionalProperties": true
}

Then we setup the config file (we have 1 stream, no stream specific configuration):

{
  "schema_dir": "Path to schemas directory",
  "metadata_dir": "Path to metadatas directory",
  "object_repository_dir": "Path to object-repositories directory",
  "static_input_dir": "Path to static-input directory",
  "record_number": 100,
  "apply_record_number_on_pairwise": true,
  "generate_pairwise_hash": false,
  "data_locale_list": ["en_US","fr_FR"]
}

For local list see Faker Documentation


Copyright © 2020 Elebail

About

A Singer TAP that generate Fake tests data

Resources

License

Stars

Watchers

Forks

Contributors 2

  •  
  •  

Languages