[pkg] Resolve directory symlinks in fetched targets#13792
[pkg] Resolve directory symlinks in fetched targets#13792ElectreAAS wants to merge 2 commits intoocaml:mainfrom
Conversation
df327f2 to
91d83ae
Compare
There was a problem hiding this comment.
Here is an example that I think causes a loop in the current algorithm:
/source/
dir_a/
link_to_b -> ../dir_b
dir_b/
link_to_a -> ../dir_a
Following this along we get an infinitely descending directory.
I think you will need to keep a set of resolved targets and check against it if you want to remember where you have been as to not repeat the resolution.
Another issue that I see is the multiple passes that are currently done, but we can improve this later once we have something that works correctly.
We also might need to add some validation that we don't escape the source directory, I don't think its a good idea to allow symlinks to / for example.
|
I just pushed work in progress to adress the latest comment about cycles. |
fe80747 to
5134ba5
Compare
5134ba5 to
b869da0
Compare
| This fails correctly | ||
| $ build_pkg bar 2>&1 | sanitize_pkg_digest bar.0.0.1 | tail -3 | ||
| Error: Unable to resolve symlink | ||
| _build/_private/default/.pkg/bar.0.0.1-DIGEST_HASH/source/dir_b/link_to_a/link_to_b, |
There was a problem hiding this comment.
CI isn't happy because it finds dir_a/link_to_b/link_to_a before the one I wrote in the test
|
Aside from the CI failures due to non-deterministic errors, the main code is ready for review. |
|
I haven't looked closely, but I think your changes might have made that test non-deterministic. |
c43ca48 to
6a8ab11
Compare
Alizter
left a comment
There was a problem hiding this comment.
Here are some problems I've observed:
-
There is a difference between local and tarball sources when it comes to directory symlinks. The tarball sources correctly resolve the contents wheras local sources silently discard them. I think rather than silently discarding them we should either raise a user error if this is something we wish not to support or support it. I would probably consider not supporting it.
-
Broken symlinks appear to be silently ignored. For example something stupid like a symlink to itself. We should probably error to the user in this case saying that we don't accept such symlinks.
| explicitly deleted immediately after the archive is | ||
| extracted. This logic is implemented in the "source-fetch" | ||
| action spec in [Fetch_rules]. *) | ||
| (* CR-Ambre update above comment *) |
| (* [raw_resolved] is a relative build path but it might contain indirections, | ||
| something like _build/foo/../bar | ||
| or _build/../outside *) | ||
| let canon_resolved = Path.of_string raw_resolved in |
There was a problem hiding this comment.
This only canonicalises relative paths and not absolute ones. I think Path.External.canonicalize_abs was the other way you did it.
| | In_build_dir t, In_build_dir of_ -> | ||
| Option.map (Build.descendant t ~of_) ~f:(fun d -> | ||
| in_source_tree (Source0.of_local (Build.local d))) | ||
| | External t, External of_ -> Option.map (External.descendant t ~of_) ~f:external_ |
There was a problem hiding this comment.
I think one of the reasons this was avoided was because it causes complications with the other path types. For example shouldn't everything be considered of the root? I don't know if it makes sense to add such a function yet. It would probably be better to have a descendent function specifically for External instead rather than allowing them to mix.
| | None -> Memo.return Path.Local.Set.empty | ||
| | Some root -> loop root Path.Local.Set.empty Path.Local.root | ||
| | Some root -> | ||
| let rec collect_sources ~acc ~dir ~already_seen = |
There was a problem hiding this comment.
I think there are two separate concepts we are confusing here. We have the directory we would like to read from and the directory we would like to output to.
Normally these two are the same, but with directory symlinks the situation can differ.
In this case I would introduce a new parameter called ~read_dir and let full_path use that instead.
Then when you collect files, rather than appending relative you append a tuple relateive, read_relative where the latter is
let read_relative = Path.Local.relative read_dir name in
Then when you have a directory symlink, you can do:
- else files, local_resolved :: dirs, seen
+ else files, (relative, local_resolved) :: dirs, seeninstead.
In the end, your parallel map will look like:
Memo.parallel_map dirs ~f:(fun (dir, read_dir) ->
collect_sources
~acc:Path.Local.Set.empty
~dir
~read_dir
~already_seen:seen)
This should fix some issues we currently have.
|
Here are some of the issues we currently have: |
bc64415 to
71bc486
Compare
|
Update: we've looked further into this and it seems the desired behaviour in I've pushed a new version containing that deletion, along with a few WIP comments, I'm looking into them |
71bc486 to
7a3f5e6
Compare
Signed-off-by: Ambre Austen Suhamy <ambre@tarides.com>
… source) Sort entries in fetch to guarantee determinism Remove the symlink resolution happening in pkg_rules as it's too complicated. The main change happening in fetch is still there. Signed-off-by: Ambre Austen Suhamy <ambre@tarides.com>
|
About the |
7a3f5e6 to
8b44545
Compare
Housekeeping
This PR fixes the tests in #13393
#9873 will not be fixed, but still a significant step towards fixing #13678.
What this PR does
After fetching package sources, add a pass resolving directory symlinks. As they're not problematic at this stage, file symlinks are left as is. Broken symlinks are removed silently to preserve existing behaviour.
Note: both
portable_hardlinkandportable_symlinkwork backwards from what I initially understood, see #13791Done with the help of @Alizter, so thanks :)