Skip to content

talesperito/hash_folder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

hash_folder

File Integrity Verification Using SHA-256

This repository contains a lightweight Python script designed to calculate SHA-256 cryptographic hashes for all files within a given directory and export the results to a CSV file.

Hash-based integrity verification is a foundational practice in information security, digital forensics, and data governance, enabling organizations to detect unauthorized modifications, corruption, or accidental changes to digital assets.

Why Hashing Matters in Organizations

Many institutions underestimate the importance of cryptographic hashing until a data integrity incident occurs.

In practice, hashing plays a critical role across multiple domains:

Digital forensics and investigations

Legal and regulatory compliance

Information security and auditability

Data governance and long-term preservation

A single-bit change in a file results in a completely different hash, making hashes a reliable mechanism for integrity validation.

Common Institutional Use Cases 🔐 Data Integrity & Audit Trails

Organizations can generate hashes to:

Prove that a document has not been altered since a specific point in time

Maintain verifiable audit logs

Support legal or compliance requirements

📁 Secure File Sharing

Before and after sharing sensitive files, hashes can be compared to confirm:

The file was not modified in transit

The recipient received the exact original version

⚖️ Forensic Chain of Custody

In investigative environments, hashes are essential to:

Preserve evidentiary value

Demonstrate technical validation of digital evidence

Ensure reproducibility and trust

Hashing vs. Anonymization (Important Distinction)

While hashing is also widely used for data anonymization and pseudonymization, this script focuses exclusively on file-level integrity, not on transforming individual data fields.

In anonymization workflows, organizations may:

Replace sensitive values (e.g., names, identifiers) with hashes

Retain a protected hash-to-value mapping internally

Share datasets externally without exposing personal data

That approach operates at the data field level, whereas this script operates at the file level, ensuring integrity rather than confidentiality.

What This Script Does

✔ Recursively scans a directory ✔ Calculates a SHA-256 hash for each file ✔ Ignores hidden and system files ✔ Exports results to a structured CSV file ✔ Logs processing activity for traceability

This makes it suitable for:

Integrity validation

Audit documentation

Forensic preprocessing

Secure archival workflows

Disclaimer

This tool does not encrypt files or anonymize data content. It is intended strictly for integrity verification and audit purposes.

⚙️ Features

  • SHA-256 hashing
  • Recursive directory traversal
  • Chunk-based file reading (memory efficient)
  • Logging for traceability
  • CSV export for audit and reporting purposes
  • Command-line interface (CLI)

🛠️ Requirements

  • Python 3.8+
  • pandas

Install dependencies:

pip install pandas

About

Python CLI tool for SHA-256 file integrity verification — exports results to CSV for digital forensics

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages