Fix hardlink support for archive creation in FilesFromDisk() by tracking inodes.#64
Fix hardlink support for archive creation in FilesFromDisk() by tracking inodes.#64solvingj wants to merge 1 commit intomholt:mainfrom
Conversation
Analysis: Hardlink Support in mholt/archiver vs mholt/archivesInvestigation SummaryWhat Happened to gnu-hardlinks.tar?The test file
The Original Hardlink Fix (archiver v3)Commit: Problem it solved: When extracting tar archives with hardlinks AND using path stripping (extracting a subdirectory), the The fix (in old // relativize any hardlink names
if th.Typeflag == tar.TypeLink {
th.Linkname = filepath.Join(filepath.Base(filepath.Dir(th.Linkname)), filepath.Base(th.Linkname))
}Test case ( The test extracted Why Was It Removed in v4?The v4 rewrite ( Old API (v3):
New API (v4):
Key difference: The v4 API is streaming and callback-based. It doesn't:
Does v4 Support Hardlinks?YES - But differently: Extraction (Current State):// From tar.go line 217
file := FileInfo{
FileInfo: info,
Header: hdr,
NameInArchive: hdr.Name,
LinkTarget: hdr.Linkname, // ✅ Preserved!
Open: func() (fs.File, error) {
return fileInArchive{io.NopCloser(tr), info}, nil
},
}The Creation (Current State):// From tar.go line 82
hdr, err := tar.FileInfoHeader(file, file.LinkTarget)The library DOES use THE PROBLEM: There's no code to detect hardlinks during file gathering! What's Missing in v4In
This is the same issue we fix in this fork. Why Was the Test Removed?Looking at the v4 rewrite commit, it was a massive refactoring (411 line changes to README alone). The test was removed because:
The test wasn't removed because hardlinks became unsupported - it was removed because the specific use case (extracting with path manipulation) no longer exists in the API. Comparison with Our FixOur Implementation (
|
mholt
left a comment
There was a problem hiding this comment.
Thank you, we'll give it a shot.
|
Will need to fix Windows compilation though. |
Hardlink Support Implementation for mholt/archives (archive creation)
Summary
This branch adds proper hardlink detection and preservation for tar archives in the
archiveslibrary during archive creation. Hardlinks are now correctly identified during file gathering and written to tar archives usingtar.TypeLinkentries, significantly reducing archive size for packages with many hardlinked files.Changes Made
1.
archives.goAdded hardlink detection to
FilesFromDisk():syscallfor accessing inode information{dev, ino}key to detect hardlinksNlink > 1is encountered:LinkTargetto the first occurrence's pathCode additions (~25 lines):
2.
tar.goModified
writeFileToArchive()to handle hardlinks:hdr.Typeflag = tar.TypeLinkhdr.Linkname = file.LinkTargethdr.Size = 0(no content stored)Code modifications (~20 lines):
3.
hardlink_test.go(new file)Added comprehensive test coverage:
TestHardlinkDetection: Verifies hardlink detection inFilesFromDisk()TestHardlinkInTarArchive: Verifies correct tar headers (TypeLink, zero size, Linkname)TestHardlinkExtraction: Verifies hardlink metadata is preserved during extractionResults
Tested with Git 2.50.1 for linux/amd64 (a package with 150 hardlinks):
Extraction Support
The extraction side already properly handles hardlinks:
Extract()setsFileInfo.LinkTargetfromtar.Header.Linknamefile.Header.(*tar.Header).Typeflag == tar.TypeLinkLinkTarget != ""for entriesPlatform Support
syscall.Stat_tsyscall.Stat_t)// +build !windowsBackward Compatibility
Testing
Run tests:
All tests pass, including:
Future Considerations
Related Issues
This addresses the same issue as mholt/archiver PR #171 from 6 years ago, but implemented for the newer
archiveslibrary.