A Windows-focused Python toolkit for detecting and removing duplicate folders, batch renaming files, managing timestamps, building file archives, and creating backups. Ships as a single-file tkinter GUI with an integrated CLI.
- Scans folders under selected root paths or entire drives
- Detects duplicates by hashing top-level file contents (SHA-256)
- Content-only matching by default (ignores filenames)
- Fingerprint pre-pass skips folders that can't possibly match
- Optional intra-folder dedup removes duplicate files within each folder first
- Keeper selection: shallowest path wins, ties broken alphabetically (or keep newest)
- Safe staging with manifests and undo scripts — originals are moved, never deleted
- Subfolders and their contents are always preserved
- Dry-run mode for previewing without changes
- CLI mode for scripting:
python duplicate_folder_finder.py <root_path> [options]
- Regex find/replace with capture group support
- Prefix/suffix add/remove
- Sequential numbering with configurable padding and templates
- Case conversion (UPPER, lower, Title, snake_case)
- Extension changing
- Live preview with conflict detection
- Full undo support
- Filter & Move: select files by date range or "older than X days", move or copy to destination
- Copy Timestamps: transfer mtime/ctime/atime between matched files (by name, path, or content hash)
- Bulk Edit: set dates, shift by offset, round to nearest, or extract dates from filenames via regex
- Windows
ctimesupport viakernel32.SetFileTimectypes
- Build JSONL+ZSTD databases from folder trees (optionally with file content)
- Browse and search databases with lazy-loading treeview
- Extract individual files from content-enabled archives
- Configurable compression level, file size limits, and extension filters
- Create compressed backups with dedup-aware incremental support
- Browse and restore individual files or folders from backups
- Conflict handling: skip, overwrite, or rename
- Backup history with comparison between snapshots
- Restores original timestamps
- Python 3.8+
- tkinter (included with standard Python on Windows)
- No third-party dependencies for core features
- Optional:
zstandardfor Archive DB and Backup Manager compression
python duplicate_folder_finder.py# Scan a folder (dry run)
python duplicate_folder_finder.py E:\MyFolder --dry-run
# Scan and deduplicate
python duplicate_folder_finder.py E:\MyFolder --staging E:\__DELETE_ME
# Scan with intra-folder dedup
python duplicate_folder_finder.py E:\MyFolder --dedupe-within| Flag | Description |
|---|---|
root |
Root directory or drive to scan |
--staging |
Staging directory for moved files |
--dry-run |
Preview only, no file operations |
--dedupe-within |
Remove duplicate files within each folder first |
--ignore-filenames |
Match by content only, ignore filenames (default) |
--filename-aware |
Include filenames in folder signature |
--cleanup-empty |
Remove empty directories after staging |
--skip |
Comma-separated folder names to skip |
duplicate_folder_finder.py # Main application (GUI + CLI)
tools/
base_tool.py # BaseTool(ttk.Frame) base class
batch_rename.py # Batch Rename tool
date_mover.py # Date Mover tool
archive_db.py # Archive DB tool
backup_manager.py # Backup Manager tool
testing/test_fs/ # Synthetic test filesystem
Reference/ # Prior iterations (reference only)
screenshots/ # Application screenshots
- Fingerprinting: each folder gets a fast fingerprint from its top-level file count and total size
- Hashing: only folders with matching fingerprints are fully hashed (SHA-256 of each file's content)
- Grouping: folders with identical content hashes are grouped as duplicates
- Keeper selection: the best folder in each group is kept; others are staged for review
- Staging: loser folders' top-level files are moved to a timestamped staging directory with a JSON manifest and an auto-generated undo script
| Scan Results | Settings |
|---|---|
![]() |
![]() |
MIT

