Skip to content

Jermalk/docveil

Repository files navigation

docveil

Privacy-safe document intelligence pipeline. Analyses technical documents (BRD, RFC, proposal, specification) with local AI — sensitive names are masked before any model sees the text, and restored in the final output.

What it does

  1. You list sensitive terms (company names, system names, product names) in a rules file
  2. docveil replaces them with neutral placeholders — [SYSTEM_1], [COMPANY_1], etc.
  3. A local AI model extracts structured knowledge and runs your analysis task
  4. Real names are restored in the final output

No sensitive data ever leaves your machine. After the one-time model download, the pipeline runs entirely offline.

Quick start

See deploy/README.md for full setup and usage instructions, including Ollama installation, model download, and a ready-to-run example.

# after setup (see deploy/README.md)
python scripts/run_pipeline.py \
  deploy/examples/customer_portal_brd.md \
  deploy/examples/nebulize_rules.yaml \
  deploy/examples/task_prompt.txt

Requirements

  • Python 3.10+
  • Ollama with qwen3:14b and qwen3:8b (~25 GB disk)
  • Linux, macOS, or Windows

Pipeline

document  →  normalise  →  mask  →  extract  →  analyse  →  restore  →  output
              (step 01)   (step 02)  (step 05)  (step 06)   (step 08)

Eight steps, three optional. Single command via scripts/run_pipeline.py.

License

MIT

About

Privacy-safe document intelligence pipeline. Analyses technical documents (BRD, RFC, proposal, specification) with local AI — sensitive names are masked before any model sees the text, and restored in the final output.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages