Skip to content

chapman4444/folder-dedup-workbench

Repository files navigation

Folder Dedup Workbench

A Windows-focused Python toolkit for detecting and removing duplicate folders, batch renaming files, managing timestamps, building file archives, and creating backups. Ships as a single-file tkinter GUI with an integrated CLI.

Scan results

Features

Deduper (Main Tab)

  • Scans folders under selected root paths or entire drives
  • Detects duplicates by hashing top-level file contents (SHA-256)
  • Content-only matching by default (ignores filenames)
  • Fingerprint pre-pass skips folders that can't possibly match
  • Optional intra-folder dedup removes duplicate files within each folder first
  • Keeper selection: shallowest path wins, ties broken alphabetically (or keep newest)
  • Safe staging with manifests and undo scripts — originals are moved, never deleted
  • Subfolders and their contents are always preserved
  • Dry-run mode for previewing without changes
  • CLI mode for scripting: python duplicate_folder_finder.py <root_path> [options]

Batch Rename

  • Regex find/replace with capture group support
  • Prefix/suffix add/remove
  • Sequential numbering with configurable padding and templates
  • Case conversion (UPPER, lower, Title, snake_case)
  • Extension changing
  • Live preview with conflict detection
  • Full undo support

Date Mover

  • Filter & Move: select files by date range or "older than X days", move or copy to destination
  • Copy Timestamps: transfer mtime/ctime/atime between matched files (by name, path, or content hash)
  • Bulk Edit: set dates, shift by offset, round to nearest, or extract dates from filenames via regex
  • Windows ctime support via kernel32.SetFileTime ctypes

Archive DB

  • Build JSONL+ZSTD databases from folder trees (optionally with file content)
  • Browse and search databases with lazy-loading treeview
  • Extract individual files from content-enabled archives
  • Configurable compression level, file size limits, and extension filters

Backup Manager

  • Create compressed backups with dedup-aware incremental support
  • Browse and restore individual files or folders from backups
  • Conflict handling: skip, overwrite, or rename
  • Backup history with comparison between snapshots
  • Restores original timestamps

Requirements

  • Python 3.8+
  • tkinter (included with standard Python on Windows)
  • No third-party dependencies for core features
  • Optional: zstandard for Archive DB and Backup Manager compression

Usage

GUI

python duplicate_folder_finder.py

CLI

# Scan a folder (dry run)
python duplicate_folder_finder.py E:\MyFolder --dry-run

# Scan and deduplicate
python duplicate_folder_finder.py E:\MyFolder --staging E:\__DELETE_ME

# Scan with intra-folder dedup
python duplicate_folder_finder.py E:\MyFolder --dedupe-within

CLI Options

Flag Description
root Root directory or drive to scan
--staging Staging directory for moved files
--dry-run Preview only, no file operations
--dedupe-within Remove duplicate files within each folder first
--ignore-filenames Match by content only, ignore filenames (default)
--filename-aware Include filenames in folder signature
--cleanup-empty Remove empty directories after staging
--skip Comma-separated folder names to skip

Project Structure

duplicate_folder_finder.py   # Main application (GUI + CLI)
tools/
  base_tool.py               # BaseTool(ttk.Frame) base class
  batch_rename.py             # Batch Rename tool
  date_mover.py               # Date Mover tool
  archive_db.py               # Archive DB tool
  backup_manager.py           # Backup Manager tool
testing/test_fs/              # Synthetic test filesystem
Reference/                    # Prior iterations (reference only)
screenshots/                  # Application screenshots

How It Works

  1. Fingerprinting: each folder gets a fast fingerprint from its top-level file count and total size
  2. Hashing: only folders with matching fingerprints are fully hashed (SHA-256 of each file's content)
  3. Grouping: folders with identical content hashes are grouped as duplicates
  4. Keeper selection: the best folder in each group is kept; others are staged for review
  5. Staging: loser folders' top-level files are moved to a timestamped staging directory with a JSON manifest and an auto-generated undo script

Screenshots

Scan Results Settings
results settings

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages