Releases: NVIDIA-NeMo/Anonymizer
Releases · NVIDIA-NeMo/Anonymizer
v0.2.0
What's Changed
- chore: fix data designer version by @lipikaramaswamy in #135
- docs: updating build notice by @alexahaushalter in #138
- fix: add visibility to silent failures in NDD adapter by @memadi-nv in #136
- fix(docs): publish release docs from release workflow by @lipikaramaswamy in #140
- feat: enhance repair by @asteier2026 in #137
- feat: re-export ModelProvider from anonymizer public API by @memadi-nv in #139
- chore: bump pytest to 9.0.3 by @kendrickb-nvidia in #147
- fix(replace): drop workflow-internal columns and use COL_* constants by @lipikaramaswamy in #154
- docs: add AGENTS.md, STYLEGUIDE.md, agent-assisted contribution (#114) by @lipikaramaswamy in #149
- feat: unify rewrite domain metadata into a single source by @memadi-nv in #143
- docs: add anonymizer Claude Code skill and supporting concept docs by @lipikaramaswamy in #153
- docs: add dataset license files and readme by @asteier2026 in #159
- feat: improve sensitivity disposition calibration by @asteier2026 in #150
- docs: define sensitivity and protection method by @asteier2026 in #151
- refactor: replace df.attrs with typed ResolvedInput context by @memadi-nv in #141
- feat: add anonymous run-level telemetry by @lipikaramaswamy in #155
- feat: enhance meaning units and utility by @asteier2026 in #156
- chore: upgrade data-designer to 0.6.0 by @lipikaramaswamy in #160
- fix(telemetry): flush events when called from a running event loop by @lipikaramaswamy in #161
- refactor: derive needs_protection from prot method and fix latent entity by @asteier2026 in #163
- docs(devnotes): add Introducing NeMo Anonymizer post by @lipikaramaswamy in #157
- refactor(display): consume structured rewrite columns in native shape by @memadi-nv in #144
- docs: update tutorials by @lipikaramaswamy in #166
New Contributors
- @kendrickb-nvidia made their first contribution in #147
Full Changelog: v0.1.1...v0.2.0
v0.1.1
What's Changed
- docs: update readme file by @memadi-nv in #113
- fix: Fix max_repair_iterations to be 3 by @asteier2026 in #108
- fix: register ipykernel before jupytext notebook execution by @lipikaramaswamy in #104
- feat: make preview input loading lightweight by @memadi-nv in #116
- feat: add strict_entity_protection by @asteier2026 in #123
- fix(cli): model config file paths and flag fixes by @lipikaramaswamy in #124
- fix: Input Error when duplicate columns in output data frame by @memadi-nv in #122
- feat: enable detection pipeline for async engine by @andreatgretel in #119
- docs: add strict_entity_protection by @asteier2026 in #131
- docs: replace placeholder patient_data.csv with real sample dataset by @lipikaramaswamy in #133
- feat(detect): chunked validation with validator pools by @lipikaramaswamy in #126
New Contributors
- @memadi-nv made their first contribution in #113
Full Changelog: v0.1.0...v0.1.1
v0.1.0
NeMo Anonymizer v0.1.0
First public release of NeMo Anonymizer — detect and anonymize sensitive entities in text using LLM-powered workflows.
Features
- Entity detection using GLiNER-PII with LLM-based augmentation and validation
- 4 replace strategies — Substitute (LLM-generated), Redact, Annotate, Hash
- Rewrite mode — transforms entire documents to reduce explicit and inferable identifiers, with customizable privacy goals, utility preservation, and automated repair
- Preview mode — inspect results on a small sample before full runs with
display_record()visualization
Install
pip install nemo-anonymizerQuick start
from anonymizer import Anonymizer, AnonymizerConfig, AnonymizerInput, Redact
anonymizer = Anonymizer()
config = AnonymizerConfig(replace=Redact())
data = AnonymizerInput(source="data.csv", text_column="text")
result = anonymizer.preview(config=config, data=data, num_records=3)
result.display_record()Documentation
https://nvidia-nemo.github.io/Anonymizer/
Requirements
Python 3.11+
v0.1.0rc1
What's Changed
- chore: update github files by @lipikaramaswamy in #1
- chore: add repository boilerplate and contributor guidelines by @lipikaramaswamy in #2
- feat: implement replace mode (detection, replacement strategies, visualization) by @lipikaramaswamy in #3
- chore: add entity class examples to detection and augment prompts by @lipikaramaswamy in #11
- refactor: restructure AnonymizerConfig, rename strategies, add filter_labels by @lipikaramaswamy in #16
- chore: assign anonymizer-reviewers to docs/ in CODEOWNERS by @lipikaramaswamy in #19
- fix: resolve replaced highlight drift by @lipikaramaswamy in #17
- fix: harden io path validation and error handling by @lipikaramaswamy in #18
- fix: surface JSON parse failures for observability by @lipikaramaswamy in #20
- feat: improve validation id alignment with template-guided decisions by @lipikaramaswamy in #21
- fix: correct GLiNER model name and skip health check by @andreatgretel in #24
- chore: remove unnecessary skip_health_check for GLiNER by @andreatgretel in #25
- refactor: improve replacement mapping prompt by @lipikaramaswamy in #23
- refactor: update notebook source to use defaults by @lipikaramaswamy in #26
- feat: update latent entity detection prompt with demographic specifics by @lipikaramaswamy in #38
- feat: add preflight config validation for active workflows by @lipikaramaswamy in #27
- feat: add user-facing pipeline logging with progress signals by @andreatgretel in #22
- fix: filtered-label preview mismatch in replace mode by @lipikaramaswamy in #41
- fix: update detection prompts and entity examples by @lipikaramaswamy in #42
- feat: add rewrite foundation - schemas, constants, and model selection by @lipikaramaswamy in #40
- fix: align filtered display and harden detection prompt notation by @lipikaramaswamy in #43
- refactor: unify detection and replacement label scope by @lipikaramaswamy in #44
- fix: tighten substitute replacement-map generation by @lipikaramaswamy in #47
- feat: domain classification and sensitivity disposition workflows by @lipikaramaswamy in #45
- fix: drop age duration tags and first_name initials in validation prompt by @lipikaramaswamy in #50
- fix: cap preview_num_records to entity row count in replace workflow by @lipikaramaswamy in #59
- feat: add QA generation workflow for rewrite pipeline by @lipikaramaswamy in #48
- feat: add rewrite generation workflow by @lipikaramaswamy in #49
- fix: enforce entity_labels scope on augmented entities in detection output by @lipikaramaswamy in #57
- ci: upgrade GitHub Actions for Node.js 24 compatibility by @ko3n1g in #62
- feat: add evaluate and repair workflows by @lipikaramaswamy in #61
- chore: address a handful of cve versions with overrides/updates by @mckornfield in #69
- feat: add final judge and RewriteWorkflow orchestrator by @lipikaramaswamy in #64
- feat: port developer tooling and build from Safe-Synthesizer repo by @binaryaaron in #51
- feat: add Cyclopts-based CLI with run/preview/validate subcommands by @binaryaaron in #63
- chore: bump pygments and cryptography for security fixes by @lipikaramaswamy in #75
- feat: interface wiring + display by @lipikaramaswamy in #68
- chore: add lower-bound pins for pygments and cryptography by @lipikaramaswamy in #76
- docs: add tutorial datasets by @lipikaramaswamy in #81
- docs: Update README.md by @nina-xu in #85
- feat: composite actions, matrix testing, and release workflows by @binaryaaron in #54
- refactor: extract shared row-partitioning helpers by @andreatgretel in #79
- feat: improve privacy judge prompt by @asteier2026 in #86
- feat: add confidence-aware privacy re-answers and weighted leakage metric by @lipikaramaswamy in #88
- refactor: rename sensitivity disposition paraphrase+left_as_is actions by @lipikaramaswamy in #91
- refactor: simplify rewrite config and rename risk tolerance levels by @lipikaramaswamy in #92
- refactor: migrate prompt .replace() chains to substitute_placeholders by @andreatgretel in #89
- chore(ci): line up CI with fixes in safe-synthesizer repo by @mckornfield in #93
- docs: documentation site content by @lipikaramaswamy in #84
- docs: add step to activate venv by @nina-xu in #90
- feat: cli: expose rewrite mode on the CLI by @lipikaramaswamy in #78
- fix: remove stale substitution in detection workflow by @lipikaramaswamy in #97
- docs: update notebook csv inputs by @lipikaramaswamy in #96
- docs: add endpoint guidance and language coverage notes by @lipikaramaswamy in #99
- feat: Improve Sensitivity Disposition supplements by @asteier2026 in #94
- docs: clarify standard vs latent entity example by @lipikaramaswamy in #100
- fix: docs link fixes and notebook validation for release by @lipikaramaswamy in #102
- chore: fix stale package name in uv.lock by @lipikaramaswamy in #103
- test: clean up e2e smoke test and make target by @lipikaramaswamy in #101
New Contributors
- @lipikaramaswamy made their first contribution in #1
- @andreatgretel made their first contribution in #24
- @ko3n1g made their first contribution in #62
- @mckornfield made their first contribution in #69
- @nina-xu made their first contribution in #85
- @asteier2026 made their first contribution in #86
Full Changelog: https://github.com/NVIDIA-NeMo/Anonymizer/commits/v0.1.0rc1