Enterprise-grade PII detection and anonymization API
Fast Β· Accurate Β· GDPR/CCPA Ready Β· 31 Entity Types
Quick Start Β· Documentation Β· Use Cases Β· API Reference
PIICloak is a production-ready REST API service for detecting and anonymizing Personally Identifiable Information (PII) in text and documents. Built on Microsoft's Presidio with custom recognizers optimized for:
- π’ Salesforce data (Account/Contact/Case IDs)
- βοΈ Legal documents (Case numbers, contracts)
- π° Financial data (Bank accounts, tax IDs)
- π₯ Healthcare (Medical records, HIPAA compliance)
- π» Technical data (API keys, IP addresses)
| Feature | PIICloak | Alternatives |
|---|---|---|
| Entity Types | 31 (including custom business entities) | 10-15 standard types |
| Organization Detection | β NER-based (works with ANY company name) | β Pattern-only |
| Salesforce Support | β Native (Account/Contact/Case/Lead IDs) | β Not included |
| Legal Document Support | β Case numbers, contracts, dockets | β Not included |
| API Keys Detection | β OpenAI, Anthropic, OpenRouter, GitHub, GitLab, Stripe, Slack, Telegram, Sentry, generic | |
| SDK | β Python SDK included | β API only |
| One-Line Install | β
pip install piicloak |
|
| Docker Ready | β Production-grade image | |
| Metrics | β Prometheus built-in | β None |
| Auth | β Optional API key | β None |
# Install
pip install piicloak
# Run
python -m piicloakServer starts on http://localhost:8000 π
curl -X POST http://localhost:8000/anonymize \
-H "Content-Type: application/json" \
-d '{"text": "Email john@acme.com, SSN 123-45-6789"}'Response:
{
"anonymized": "Email <EMAIL_ADDRESS>, SSN <US_SSN>",
"entities_found": [
{"type": "EMAIL_ADDRESS", "text": "john@acme.com", "score": 1.0},
{"type": "US_SSN", "text": "123-45-6789", "score": 0.85}
]
}docker run -p 8000:8000 dimanjet/piicloakfrom piicloak import PIICloak
cloak = PIICloak()
result = cloak.anonymize("Contact John Smith at john@acme.com")
print(result.anonymized) # "Contact <PERSON> at <EMAIL_ADDRESS>"| Entity Type | Description | Example |
|---|---|---|
| π€ PERSONAL IDENTIFIABLE INFORMATION | ||
PERSON |
Names of individuals (NER-based) | "John Smith", "Jane Doe" |
EMAIL_ADDRESS |
Email addresses | "john@example.com" |
PHONE_NUMBER |
Phone numbers (multiple formats) | "+1-555-123-4567", "(555) 123-4567" |
US_SSN |
US Social Security Numbers | "123-45-6789" |
US_PASSPORT |
US Passport numbers | "123456789" |
US_DRIVER_LICENSE |
US Driver's License numbers | "D1234567" |
ADDRESS |
Physical addresses (NER + patterns) | "123 Main St, New York, NY 10001" |
| π³ FINANCIAL INFORMATION | ||
CREDIT_CARD |
Credit card numbers (all major brands) | "4532-1234-5678-9010" |
IBAN_CODE |
International Bank Account Numbers | "GB82 WEST 1234 5698 7654 32" |
US_BANK_NUMBER |
US bank account numbers | "123456789012" |
BANK_ACCOUNT |
Generic bank account patterns | "ACC-123456789" |
TAX_ID |
Tax IDs (EIN/TIN) | "12-3456789" |
CRYPTO |
Cryptocurrency addresses | "1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa" |
| π’ ORGANIZATIONAL DATA | ||
ORGANIZATION |
Company names (NER-based) | "Acme Corp", "Tech Industries Inc" |
DOMAIN |
Internet domains | "example.com", "company.io" |
SALESFORCE_ID |
Salesforce record IDs (Account/Contact/Case/Lead) | "0015000000AbcDEF", "5005000000XyzABC" |
ACCOUNT_ID |
Generic account identifiers | "ACC-123456", "A-987654" |
| βοΈ LEGAL DOCUMENTS | ||
CASE_NUMBER |
Court case numbers (Federal/State) | "1:24-cv-12345", "CR-2024-001234" |
CONTRACT_NUMBER |
Contract and agreement numbers | "CONT-2024-001", "AGR-123456" |
| π» TECHNICAL & SECURITY | ||
USERNAME |
Usernames and login IDs | "john_smith123", "@johndoe", "admin" |
API_KEY |
API keys and secrets (OpenAI, Anthropic, OpenRouter, GitHub, GitLab, Hugging Face, Stripe, Slack, Telegram, ClickUp-labeled tokens, Sentry, JWT, generic) | "sk-1234567890abcdef...", "ghp_abc..." |
IP_ADDRESS |
IPv4 and IPv6 addresses | "192.168.1.1", "2001:0db8::1" |
URL |
Web URLs | "https://example.com/page" |
| π₯ HEALTHCARE & OTHER | ||
MEDICAL_LICENSE |
Medical license numbers | "MD-123456" |
UK_NHS |
UK NHS numbers | "123 456 7890" |
NRP |
NΓΊmero de Registro de Personas (Spanish ID) | "12345678A" |
LOCATION |
Geographic locations (NER-based) | "New York", "San Francisco" |
DATE_TIME |
Dates and timestamps | "2024-01-20", "January 20th, 2024" |
Total: 31 entity types covering personal, financial, organizational, legal, technical, and healthcare data.
# Replace with entity type (default)
{"mode": "replace"} β "Contact <PERSON> at <EMAIL_ADDRESS>"
# Mask with asterisks
{"mode": "mask"} β "Contact ******** at ****************"
# Redact (remove completely)
{"mode": "redact"} β "Contact at "
# Hash (SHA256)
{"mode": "hash"} β "Contact a1b2c3d4... at e5f6g7h8..."curl -X POST http://localhost:8000/anonymize \
-H "Content-Type: application/json" \
-d '{
"text": "Account: 0015000000AbcDEFG, Contact: Jane Doe (jane@company.com), Case: 5005000000XyzABC"
}'Output:
Account: <SALESFORCE_ID>, Contact: <PERSON> (<EMAIL_ADDRESS>), Case: <SALESFORCE_ID>
curl -X POST http://localhost:8000/anonymize \
-H "Content-Type: application/json" \
-d '{
"text": "Case No. 1:24-cv-12345 - Plaintiff John Doe (SSN: 123-45-6789) vs. Acme Corp (EIN: 12-3456789)"
}'Output:
Case No. <CASE_NUMBER> - Plaintiff <PERSON> (SSN: <US_SSN>) vs. <ORGANIZATION> (EIN: <TAX_ID>)
curl -X POST http://localhost:8000/anonymize \
-H "Content-Type: application/json" \
-d '{
"text": "OpenAI key: sk-1234567890abcdefghijklmnopqrstuv, GitHub: ghp_abcdefghijklmnopqrstuvwxyz1234567890"
}'Output:
OpenAI key: <API_KEY>, GitHub: <API_KEY>
Agent memory and coding-assistant tools often index chat transcripts for later recall. Use API_KEY
detection with safe_response to redact secret-shaped values without echoing raw matches in the API
response.
curl -X POST http://localhost:8000/anonymize \
-H "Content-Type: application/json" \
-d '{
"text": "Save commit 1eeb16dd but redact OpenRouter sk-or-v1-abcdefghijklmnopqrstuvwxyz123456",
"entities": ["API_KEY"],
"safe_response": true
}'Output:
{
"anonymized": "Save commit 1eeb16dd but redact OpenRouter <API_KEY>",
"entities_found": [
{"type": "API_KEY", "start": 43, "end": 84, "score": 0.95}
],
"safe_response": true
}For local transcript files, use the secrets profile CLI. This path preserves people,
organizations, domains, commit SHAs, UUIDs, and other useful recall context while redacting
technical secrets.
piicloak redact \
--profile secrets \
--input session.jsonl \
--output session.redacted.jsonlDry-run mode reports safe counts without writing a redacted file:
piicloak redact --profile secrets --input session.jsonl --dry-runcurl -X POST http://localhost:8000/anonymize/docx \
-F "document=@contract.docx" \
-F "mode=replace"# Basic installation
pip install piicloak
# Download NLP model (required for the full API/server Presidio backend)
python -m spacy download en_core_web_lg
# Or install everything at once
pip install piicloak && python -m spacy download en_core_web_lg
# Optional OpenAI Privacy Filter backend from the official OpenAI repository (Python 3.10+)
pip install "git+https://github.com/openai/privacy-filter.git@f7f00ca7fb869683eb732c010299d901457f19c3"piicloak redact --profile secrets is a lightweight regex-only file redaction path. It does not load
the spaCy model and does not require or download an OpenAI Privacy Filter checkpoint.
All settings use the PIICLOAK_ prefix and have sensible defaults:
| Environment Variable | Default | Description |
|---|---|---|
PIICLOAK_HOST |
0.0.0.0 |
Server host |
PIICLOAK_PORT |
8000 |
Server port (standard) |
PIICLOAK_DEBUG |
false |
Debug mode |
PIICLOAK_WORKERS |
4 |
Gunicorn workers |
PIICLOAK_LOG_LEVEL |
INFO |
Logging level |
PIICLOAK_SPACY_MODEL |
en_core_web_lg |
spaCy model |
PIICLOAK_DETECTOR_BACKEND |
presidio |
Detector backend: presidio or privacy-filter |
PIICLOAK_PRIVACY_FILTER_CHECKPOINT |
"" |
Privacy Filter checkpoint path |
PIICLOAK_PRIVACY_FILTER_ALLOW_DOWNLOAD |
false |
Allow Privacy Filter to download its default checkpoint |
PIICLOAK_PRIVACY_FILTER_DEVICE |
cpu |
Privacy Filter inference device |
PIICLOAK_SCORE_THRESHOLD |
0.4 |
Min confidence score (0-1) |
PIICLOAK_DEFAULT_MODE |
replace |
Default anonymization mode |
PIICLOAK_CORS_ORIGINS |
* |
CORS allowed origins |
PIICLOAK_API_KEY |
"" |
Optional API key (empty = no auth) |
PIICLOAK_RATE_LIMIT |
100/minute |
Rate limiting |
PIICLOAK_ENABLE_METRICS |
true |
Prometheus metrics |
Example:
export PIICLOAK_PORT=9000
export PIICLOAK_API_KEY=your-secret-key
python -m piicloakTo use the optional Privacy Filter backend on Python 3.10+, install OpenAI's official
openai/privacy-filter package source, not the unrelated privacy-filter package on PyPI. Then set an
explicit checkpoint path, or opt into the upstream default checkpoint download:
pip install "git+https://github.com/openai/privacy-filter.git@f7f00ca7fb869683eb732c010299d901457f19c3"
export PIICLOAK_DETECTOR_BACKEND=privacy-filter
export PIICLOAK_PRIVACY_FILTER_CHECKPOINT=/path/to/privacy_filter_checkpoint
python -m piicloakRequest:
{
"text": "Contact John at john@acme.com",
"entities": ["PERSON", "EMAIL_ADDRESS"], // optional
"mode": "replace", // optional
"language": "en", // optional
"score_threshold": 0.4 // optional
}Response:
{
"original": "Contact John at john@acme.com",
"anonymized": "Contact <PERSON> at <EMAIL_ADDRESS>",
"entities_found": [...]
}Set "safe_response": true to omit the raw input and raw matched entity text from the response.
curl -X POST http://localhost:8000/analyze \
-H "Content-Type: application/json" \
-d '{"text": "Contact john@example.com"}'curl http://localhost:8000/entitiescurl http://localhost:8000/metricscurl http://localhost:8000/health# Build
docker build -t piicloak .
# Run
docker run -p 8000:8000 piicloak
# With environment variables
docker run -p 8000:8000 \
-e PIICLOAK_API_KEY=your-key \
-e PIICLOAK_WORKERS=8 \
piicloakdocker-compose up -dpip install gunicorn
gunicorn -c gunicorn.conf.py "piicloak.app:create_application()"See docs/DEPLOYMENT.md for Kubernetes deployment guide.
# Clone repository
git clone https://github.com/dimanjet/piicloak.git
cd piicloak
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dev dependencies
pip install -e ".[dev]"
# Download spaCy model
python -m spacy download en_core_web_lg
# Run tests
pytest
# Run with coverage
pytest --cov=piicloak --cov-report=html
# Format code
black src/ tests/
# Lint
flake8 src/ tests/piicloak/
βββ src/piicloak/
β βββ __init__.py # PIICloak SDK class
β βββ __main__.py # CLI entry point
β βββ app.py # Application factory
β βββ api.py # REST API endpoints
β βββ config.py # Configuration
β βββ engine.py # Analyzer/Anonymizer setup
β βββ recognizers.py # Custom PII recognizers
β βββ middleware.py # Auth, CORS, logging
β βββ metrics.py # Prometheus metrics
βββ tests/ # Comprehensive test suite
βββ docs/ # Documentation
βββ Dockerfile # Production Docker image
βββ docker-compose.yml # Docker Compose config
βββ gunicorn.conf.py # Gunicorn configuration
βββ requirements.txt # Dependencies
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
To add a new PII recognizer:
- Add pattern(s) to
src/piicloak/recognizers.py - Create a factory function
- Add to
SUPPORTED_ENTITIES - Write tests in
tests/test_recognizers.py - Update README
Example:
def create_license_plate_recognizer() -> PatternRecognizer:
patterns = [
Pattern("US_PLATE", r"\b[A-Z]{2,3}[-\s]?\d{3,4}\b", 0.7),
]
return PatternRecognizer(
supported_entity="LICENSE_PLATE",
patterns=patterns
)- Throughput: ~100 requests/second (single worker)
- Latency: <100ms per request (average)
- Memory: ~500MB (with spaCy model loaded)
- Scalability: Stateless design, horizontally scalable
- Optional API key authentication
- CORS configuration
- Rate limiting support
- Security headers included
- No data retention
- Stateless operation
Report security vulnerabilities to: marinovdk@gmail.com
This project is licensed under the MIT License - see the LICENSE file for details.
PIICloak is built on top of these excellent open-source projects:
- Microsoft Presidio (MIT License)
- spaCy (MIT License)
- Flask (BSD-3-Clause License)
- python-docx (MIT License)
If you find PIICloak useful, please consider giving it a star β
- Author: Dmitry Marinov
- Email: marinovdk@gmail.com
- GitHub: @dimanjet
- Issues: GitHub Issues
Made with β€οΈ for the privacy-conscious developer community