This is a Singer tap that produces JSON-formatted test data following the Singer spec.
This tap generates test data complying with the JSON Schema passed as input. Useful for Data Driven Testing (DDT)
This tap:
- Read the provided JSON schema
- Create one stream per provided schema
- Outputs the schema for each stream
- Incrementally generate data based on the schema and send the generated Singer records to the data streams.
This tap uses JSON Schema Draft 7
-
for "string" properties:
-
generate constant string
"type": "string", "const": "constant Value" -
generate empty string
"$generator": "#/string-type/empty" -
generate UUID4 using Faker UUID4
"$generator": "#/string-type/uuid" -
generate a text with specified length ( "maxLength" is optional default value is 100) using Faker text
"$generator": "#/string-type/text", "maxLength": 30 -
generate a title "Mr.","Miss", ... using Faker prefix
"$generator": "#/string-type/title" -
generate a person first name using Faker first_name
"$generator": "#/string-type/firstName" -
generate a person last name using Faker last_name
"$generator": "#/string-type/lastName" -
generate a phone number using Faker phone_number
"$generator": "#/string-type/phone" -
generate an Email address using Faker email
"$generator": "#/string-type/email" -
generate a city name using Faker city
"$generator": "#/string-type/city" -
generate a country name using Faker country
"$generator": "#/string-type/country" -
generate an ISO country code using Faker country_code
"$generator": "#/string-type/countryCode" -
generate an I18n language code using Faker language_code
"$generator": "#/string-type/languageCode" -
generate a date using Faker date_between_dates date format is YYYY-mm-dd
minimum : the number of days from today for minimum date (default value is -30 years in days) MUST BE INTEGER (positive or negative)
maximum : the number of days from today for maximum date (default is 0) MUST BE INTEGER (positive or negative)
"type": "string", "format": "date", "minimum": -5, "maximum": 10 -
-
for "object" properties:
-
get one JSON object from the file "object-name.json" in the configured object_repository_dir directory
"$generator": "#/object-repository/object-name" -
generate empty object
"$generator": "#/object-type/empty"
-
-
for "number" properties:
-
generate constant number
"type": "number", "const": 25.00 -
generate null/None number
"type": ["number", "null"], "const": null -
generate number between
"type": "number", "maximum": 1000.00, "minimum": 0.00 -
generate a random number or null/None (By default 5% of null are generated, this frequency can be configured)
"type": ["number", "null"]
-
-
for "integer" properties:
-
generate constant integer
"type": "integer", "const": 25 -
generate null/None integer
"type": ["integer", "null"], "const": null -
generate integer between
"type": "integer", "maximum": 1000, "minimum": 0 -
generate a random integer or null/None (By default 5% of null are generated, this frequency can be configured)
"type": ["integer", "null"]
-
-
Pair combination generation is available: to activate it you need to add on the property.
"$pairwise": truethis mode is available for:
- boolean propeties
- String properties with "Enum" or "pattern" (Warning pairwise generation on pattern can be very slow depending on your pattern complexity)
- Object with "$generator": "#/object-repository/object-name"
Here is a sample config file:
{
"schema_dir": "schemas",
"metadata_dir": "metadatas",
"static_input_dir": "",
"object_repository_dir": "object-repositories",
"record_number": 1,
"apply_record_number_on_pairwise": true,
"generate_pairwise_hash": false,
"data_locale_list": ["en_US","fr_FR"],
"null_percent": 5,
"stream_configs": {
"sample": {
"record_number": 100,
"apply_record_number_on_pairwise": true,
"generate_pairwise_hash": true,
"data_locale_list": ["en_US","fr_FR"],
"pair_generation_mode": "pairwise"
}
}
}
- "schema_dir" path to directory that contains JSON schema file(s).
- "metadata_dir" path to directory that contains Singer Metadata file(s).
- "static_input_dir" âth to directory that contains JSON static inputs file.
In those 3 directories we expect 1 file per stream, filename = .json
- "object_repository_dir" path to the directory that contains repositories JSON files.
- "record_number" : the default number of generated records (if not override)
- "apply_record_number_on_pairwise" : boolean, if true the previous record number is generated ignoring the number of possible permutation number computed by pairwise algorithm
- "generate_pairwise_hash" : boolean, if true a "pairwise_hash" property is added to the generated data to identify the Pair used by each record.
- "data_locale_list" : list of locale for generated data Faker Documentation
- "pair_generation_mode": Optional Possible values are "pairwise" (Default mode) "all_combinations" and "every_value_at_least_once"
This parameter defines the type of combination generated with the possible values of all properties marked with "$pairwise": true
- every_value_at_least_once : is the smallest combination, every value will be used at least once.
- pairwise : generates more combination compliant with [Pairwise Testing](http://pairwise.org/)
- all_combinations : is the biggest, is will generate all possible combinations of the provided values (cartesian product)
- "null_percent": Optional frequency in percent of Null values generated.
expected structure is:
"stream_configs": {
<stream-id1> : {},
<stream-id2> : {}
}
All values from second part (Default values) can be overridden for each stream.
In order to generate the following JSON:
{
"checked": false,
"dimensions": {
"width": 5,
"height": 10
},
"id": 1,
"name": "A green door",
"color": "green",
"price": 12.5,
"tags": [
"home",
"green"
],
"hour": "09:31:40 AM"
}
We first generate the JSON schema:
{
"$schema": "http://json-schema.org/draft-07/schema",
"$id": "http://example.com/example.json",
"type": "object",
"required": [
"checked",
"dimensions",
"id",
"name",
"color",
"price",
"tags",
"hour"
],
"properties": {
"checked": {
"$id": "#/properties/checked",
"type": "boolean"
},
"dimensions": {
"$id": "#/properties/dimensions",
"type": "object",
"required": [
"width",
"height"
],
"properties": {
"width": {
"$id": "#/properties/dimensions/properties/width",
"type": "integer"
},
"height": {
"$id": "#/properties/dimensions/properties/height",
"type": "integer"
}
},
"additionalProperties": true
},
"id": {
"$id": "#/properties/id",
"type": "integer"
},
"name": {
"$id": "#/properties/name",
"type": "string"
},
"color": {
"$id": "#/properties/color",
"type": "string",
"enum": ["green", "yellow", "red"]
},
"price": {
"$id": "#/properties/price",
"type": "number"
},
"tags": {
"$id": "#/properties/tags",
"type": "array",
"additionalItems": true,
"items": {
"$id": "#/properties/tags/items",
"type": "string"
}
},
"hour": {
"$id": "#/properties/hour",
"type": "string",
"pattern": "(1[0-2]|0[1-9])(:[0-5]\\d){2} (A|P)M"
}
},
"additionalProperties": true
}
Then we add the data generation details
{
"$schema": "http://json-schema.org/draft-07/schema",
"$id": "http://example.com/example.json",
"type": "object",
"required": [
"checked",
"dimensions",
"id",
"name",
"color",
"price",
"tags",
"hour"
],
"properties": {
"checked": {
"$id": "#/properties/checked",
"type": "boolean",
"$pairwise": true
},
"dimensions": {
"$id": "#/properties/dimensions",
"type": "object",
"required": [
"width",
"height"
],
"properties": {
"width": {
"$id": "#/properties/dimensions/properties/width",
"type": "integer"
},
"height": {
"$id": "#/properties/dimensions/properties/height",
"type": "integer"
}
},
"additionalProperties": true,
"$generator": "#/object-repository/dim-sample",
"$pairwise": true
},
"id": {
"$id": "#/properties/id",
"type": "integer"
},
"name": {
"$id": "#/properties/name",
"type": "string",
"$generator": "#/string-type/lastName"
},
"color": {
"$id": "#/properties/color",
"type": "string",
"enum": ["green", "yellow", "red"],
"$pairwise": true
},
"price": {
"$id": "#/properties/price",
"type": "number"
},
"tags": {
"$id": "#/properties/tags",
"type": "array",
"additionalItems": true,
"items": {
"$id": "#/properties/tags/items",
"type": "string"
}
},
"hour": {
"$id": "#/properties/hour",
"type": "string",
"pattern": "(1[0-2]|0[1-9])(:[0-5]\\d){2} (A|P)M"
}
},
"additionalProperties": true
}
Then we setup the config file (we have 1 stream, no stream specific configuration):
{
"schema_dir": "Path to schemas directory",
"metadata_dir": "Path to metadatas directory",
"object_repository_dir": "Path to object-repositories directory",
"static_input_dir": "Path to static-input directory",
"record_number": 100,
"apply_record_number_on_pairwise": true,
"generate_pairwise_hash": false,
"data_locale_list": ["en_US","fr_FR"]
}
For local list see Faker Documentation
Copyright © 2020 Elebail