Skip to content

oscaro/ds-test-tools

Repository files navigation

ds-test-tools Clojure CI Clojars Project

ds-test-tools is a small library to help test Datasplash pipelines.

Usage

[com.oscaro/ds-test-tools "0.3.0"]

Then:

(ns your.project
  (:require [ds-test-tools.core :as dt]))

The only function you need is dt/run-pipeline.

It takes inputs as Clojure data, mapping of keys to result files, and a function to call in order to build the pipeline. It dumps the input data in the appropriate files; builds a configuration map; pass it to your function; run the pipeline; collect the results; and return them to you.

Simple Usage

;; Your pipeline
(defn my-job [conf p]
  (->> p
    (ds/read-edn-file (:numbers conf))
    (ds/map inc)
    (ds/write-edn-file (str (:output conf) "/higher.edn"))))


(let [{:keys [result]} (dt/run-pipeline
                         {:numbers [1 2 3 4]}
                         {:result "higher"}
                         my-job)]
  (println (sort result))) ; '(2 3 4 5)

Specifying inputs

The inputs config map uses the same format as your configuration map. Your build function should take a map of keywords to file paths:

(defn my-job [{:keys [people houses output]} p]
  (let [people (ds/read-edn-file people p)
        houses (ds/read-edn-file houses p)]
    (->> (ds/join-by (fn [p h] [p :lives-in h])
                     [[people :house-id {:type :required}]
                      [houses :id {:type :required}]])
         (ds/write-edn-file (tio/join-path output "housing.edn")))))

The pipeline above would use the following inputs config map:

{:people [{:name "John" :house-id 1} {:name "Jane" :house-id 2} ...]
 :houses [{:id 1 :name "Red House"} {:id 2 :name "Green House"} ...]}

Changing the input/output format

By default, it assumes you use EDN as inputs and outputs. You can change that by setting the :reader (used to read outputs) and :writer (used to write inputs) keys in the optional options map:

(dt/run-pipeline
    {:reader :jsons  ; or :edns (the default)
     :writer :jsons} ; or :edns (the default)
    inputs-config
    outputs-config)

If you have mixed formats or something else than EDN/JSONS, you can also provide a function of two (readers) or three (writers) arguments:

(dt/run-pipeline
    {:reader (fn [k filename] ; k is the key in your inputs-config map
               (case k
                 :my-jsons-output (tio/read-jsons-file filename)
                 :my-edns-output (tio/read-edns-file filename)))

     :writer (fn [k filename data]
               (case k
                 :my-jsons-input (tio/write-jsons-file filename data)
                 :my-csv-input (tio/write-csv-file filename data)
                 :my-text-input (tio/write-text-file filename data)))}
    inputs-config
    outputs-config)

License

Copyright © 2018-2025 Oscaro

Distributed under the Eclipse Public License either version 1.0 or (at your option) any later version.

About

Small library to help test Datasplash pipelines.

Topics

Resources

License

Stars

Watchers

Forks

Contributors 5