This repository runs Differential Privacy (DP) queries from JSON configs over CSV data.
Supported dimensions:
- DP level:
item,user - Queries:
mean,histogram,count - Config modes: single-query and multi-query
DP/: core DP routing, pipelines, queries, and utilitiesconfig/: runtime configs consumed byrun_from_config_dir.pyall_config/: sample configs for manual testingdata/: CSV datasetsoutput/: runtime outputs (status.json, plots, JSON artifacts)
cd differential-privacy
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtpython3 run_from_config_dir.pyOutput is written to:
output/status.json
python3 run_all_configs.pyOutputs are written to:
output/results/*_output.json
python3 - <<'PY'
import json
from iudx_dp_main import main_process
with open("config/multi_query.json", "r") as f:
cfg = json.load(f)
print(main_process(cfg))
PYThe preferred config format is:
{
"operations": ["dp"],
"data_type": "dp_test_dataset",
"dp_test_dataset": {
"level": "user",
"query": "mean",
"attribute": "age",
"epsilon": 1.0,
"user_column": "user_id"
}
}Notes:
data_typeis used to resolve a CSV indata/whendata.csvis not provided.- The dataset key must match
data_typeexactly. insensitive_columnsis optional metadata.
Use queries for multiple query executions in one request, each with its own epsilon:
{
"operations": ["dp"],
"data_type": "dp_test_dataset",
"dp_test_dataset": {
"level": "user",
"user_column": "user_id",
"queries": [
{
"query": "mean",
"attribute": "age",
"epsilon": 0.7,
"min_value": 0,
"max_value": 100
},
{
"query": "count",
"count_attribute": "age",
"count_operator": ">",
"count_value": 25,
"epsilon": 0.3
}
]
}
}Result includes:
query_results(per query)cumulative_epsilon_budget(sum of selected query epsilons)
Recommended: keep this JSON as a local runtime file under config/ (for example config/multi_query.json) and do not commit it.
- Item level: noisy mean over records (with clipping bounds).
- User level: user-contribution-aware mean with clipping.
- Categorical mode: bins from categories or unique values.
- Numeric mode: bins from
U/Vandbin_widthorbins.
- Item level: count of records that satisfy predicate filter.
- User level: count of distinct users with at least one matching contribution.
dp_countis rounded to an integer.
Predicate fields:
count_attributecount_operator:>,>=,<,<=,==,!=count_value
Top-level result shape:
- Success:
{"status":"success","result":{...}}
- Partial multi-query failure:
{"status":"partial_error","result":{"query_results":[...], ...}}
- Error:
{"status":"error","error":{"code":"...","message":"..."}}
run_from_config_dir.py aggregates all config runs into output/status.json and adds top-level cumulative_epsilon_budget.
python3 -m unittest discover -s tests -vdocker build -t dp-app .
docker run --rm \
-v "$(pwd)/config:/app/config" \
-v "$(pwd)/data:/app/data" \
-v "$(pwd)/output:/app/output" \
dp-app