File Integrity Verification Using SHA-256
This repository contains a lightweight Python script designed to calculate SHA-256 cryptographic hashes for all files within a given directory and export the results to a CSV file.
Hash-based integrity verification is a foundational practice in information security, digital forensics, and data governance, enabling organizations to detect unauthorized modifications, corruption, or accidental changes to digital assets.
Why Hashing Matters in Organizations
Many institutions underestimate the importance of cryptographic hashing until a data integrity incident occurs.
In practice, hashing plays a critical role across multiple domains:
Digital forensics and investigations
Legal and regulatory compliance
Information security and auditability
Data governance and long-term preservation
A single-bit change in a file results in a completely different hash, making hashes a reliable mechanism for integrity validation.
Common Institutional Use Cases 🔐 Data Integrity & Audit Trails
Organizations can generate hashes to:
Prove that a document has not been altered since a specific point in time
Maintain verifiable audit logs
Support legal or compliance requirements
📁 Secure File Sharing
Before and after sharing sensitive files, hashes can be compared to confirm:
The file was not modified in transit
The recipient received the exact original version
⚖️ Forensic Chain of Custody
In investigative environments, hashes are essential to:
Preserve evidentiary value
Demonstrate technical validation of digital evidence
Ensure reproducibility and trust
Hashing vs. Anonymization (Important Distinction)
While hashing is also widely used for data anonymization and pseudonymization, this script focuses exclusively on file-level integrity, not on transforming individual data fields.
In anonymization workflows, organizations may:
Replace sensitive values (e.g., names, identifiers) with hashes
Retain a protected hash-to-value mapping internally
Share datasets externally without exposing personal data
That approach operates at the data field level, whereas this script operates at the file level, ensuring integrity rather than confidentiality.
What This Script Does
✔ Recursively scans a directory ✔ Calculates a SHA-256 hash for each file ✔ Ignores hidden and system files ✔ Exports results to a structured CSV file ✔ Logs processing activity for traceability
This makes it suitable for:
Integrity validation
Audit documentation
Forensic preprocessing
Secure archival workflows
Disclaimer
This tool does not encrypt files or anonymize data content. It is intended strictly for integrity verification and audit purposes.
- SHA-256 hashing
- Recursive directory traversal
- Chunk-based file reading (memory efficient)
- Logging for traceability
- CSV export for audit and reporting purposes
- Command-line interface (CLI)
- Python 3.8+
- pandas
Install dependencies:
pip install pandas