Implemented DoubledEndedIterator for Ancestors<'_> + added tests and updated doc comments#153239
Implemented DoubledEndedIterator for Ancestors<'_> + added tests and updated doc comments#153239asder8215 wants to merge 3 commits intorust-lang:mainfrom
Conversation
|
rustbot has assigned @Mark-Simulacrum. Use Why was this reviewer chosen?The reviewer was selected based on:
|
This comment has been minimized.
This comment has been minimized.
79234be to
eb5b1fc
Compare
This comment has been minimized.
This comment has been minimized.
eb5b1fc to
26917f2
Compare
This comment has been minimized.
This comment has been minimized.
b436c61 to
094669b
Compare
…updated doc comments
094669b to
ef86174
Compare
| let os_str_path = self.as_os_str(); | ||
| let path_bytes = os_str_path.as_encoded_bytes(); | ||
| let path_len = path_bytes.len(); | ||
| let trailing_seps = if self.has_trailing_sep() { |
There was a problem hiding this comment.
This looks like a partial implementation of trim_trailing_sep, except without the handling for !self.has_root()...
I added an unresolved question to the tracking issue (#142503) on whether that's the right behavior for trim_trailing_sep/has_trailing_sep, but I think this is probably not what we want here...
|
Reminder, once the PR becomes ready for a review, use |
|
I ran the fuzz script you made after making some changes to the code, and it's been running without errors for ~20 minutes. I'm pretty confident that the code for all the |
|
Re-running the fuzz script for me with what I believe is latest on this branch fails with: i.e., forward returns Can you confirm whether that's the case and perhaps spend some more time fuzzing? The failure is near-instant for me, though I did modify the fuzzing target a bit: #![no_main]
use libfuzzer_sys::fuzz_target;
use std::ffi::OsStr;
use std::os::unix::ffi::OsStrExt;
use std::path::{Path, PathBuf};
fuzz_target!(|path: PathBuf| {
let path: &Path = &path;
let forward = path.ancestors().collect::<Vec<_>>();
let mut backward = path.ancestors().rev().collect::<Vec<_>>();
backward.reverse();
assert_eq!(forward, backward, "{:?}", path);
let mut by_parents = vec![];
let mut parent = path;
loop {
by_parents.push(parent);
parent = if let Some(p) = parent.parent() {
p
} else {
break;
};
}
assert_eq!(forward, by_parents, "{:?}", path);
}); |
Ah shoot. I'm noticing that I forgot to copy over updating the rebound index in the b'.' comparison conditional (!curr_dir_present branch) within I copied over the necessary |
…trailing) in Ancestors<'_>
ad2dc06 to
1a0f437
Compare
|
|
Okay, so I copied over the I did try to see if I could get rid of I'm okay with the existing code here for now, and I wouldn't mind coming back to it later to see if I could remove that function (if you are okay with that). On a separate note, there technically is duplicate code between Lastly, I still haven't set up my Windows 10 VM to run the rust test suite, so I may need some assistance testing for that. I tried installing |
|
Forgot to do this: |
|
This implementation looks pretty complicated. I poked around a bit and a non-Iterator (so needs moving state into a struct) implementation that seems to pass fuzzing at least on unix (I don't have Windows on hand to check that). Specifically note that this makes use of the existing I don't think we should land the PR's current form as-is because even if it's functionally the same (as fuzzing might suggest, though I didn't rerun it against your implementation) it's very hard for me to convince myself it's actually correct as a reviewer. In contrast leaning on Components should broadly work, though it does need verifying for the one assumption I make below ("Any fn fwd_ancestors(path: &Path) -> Vec<&Path> {
let mut ancestors = vec![];
let mut components = path.components();
loop {
ancestors.push(components.as_path());
match components.next_back() {
Some(Component::RootDir) => {
break;
}
None => {
break;
}
_ => {}
}
}
ancestors
}
fn rev_ancestors(path: &Path) -> Vec<&Path> {
let mut ancestors = vec![];
let mut components = path.components();
let trimmed_path = path.components().as_path();
loop {
let current_suffix = components.as_path();
match components.next() {
Some(Component::RootDir) => {
// do nothing
}
_ => {
let prefix = strip_delimited_suffix(trimmed_path, current_suffix);
ancestors.push(prefix);
}
}
if component.is_none() {
break;
}
}
ancestors
}
// Will panic if `suffix` is not actually a suffix of `path`.
//
// Will also panic if boundary in `path` between itself and `suffix` doesn't respect the conditions
// in OsStr::slice_encoded_bytes. Any `Component` boundary should be sufficient to satisfy that
// condition (TODO: verify that claim).
fn strip_delimited_suffix<'a>(path: &'a Path, suffix: &'a Path) -> &'a Path {
let path_bytes = path.as_os_str().as_encoded_bytes();
let suffix_bytes = suffix.as_os_str().as_encoded_bytes();
let stripped = path_bytes
.strip_suffix(suffix_bytes)
.expect("suffix is suffix of path");
Path::new(path.as_os_str().slice_encoded_bytes(..stripped.len()))
} |
|
Yea, the implementation is pretty complicated with keeping in mind about normalizing away current directory '.', and repeating '/' pieces, and then you have certain things to keep in mind of like how relative paths end with an empty path "". There's other behavior here that made this a bit convoluted, like when we use However, if we create an This behavior caused a little bit of a headache for me in terms of symmetry because it meant that the first path that I tried out the code you suggested on rust playground, and as expected it didn't cover the case of preserving the whole path untrimmed, so this isn't an exact equivalent to what The other difference I can point out here between this let path = Path::new("/foo/bar/baz///");
let mut ancestors = path.ancestors();
ancestors.next(); // This shows Some("/")
ancestors.next(); // This shows Some("/foo")
ancestors.next_back(); // This shows Some("/foo/bar/baz///")
ancestors.next_back(); // This shows Some("/foo/bar")
... // Everything else is Nonelet path = Path::new("/foo/bar/baz///");
let mut components = path.ancestors();
components.next(); // Some(RootDir), as_path() produces "foo/bar/baz"
components.next(); // Some(Normal("foo"), as_path() produces "bar/baz"
components.next_back(); // Some(Normal("baz"), as_path() produces "bar"
components.next_back(); // Some(Normal("bar"), as_path() produces ""
... // Everything else is None, producing ""I brought this distinction up because I think it shows a difference in focus in each iterator on how The intention behind this ACP was to avoid having to store the subpaths of a path (forward or backward direction) into a The non-iterator approach you suggested, while it's fine for the forward/ |
|
If it would make things easier for you to review and not feel like this PR will wildly cause something incorrect to occur with the stable The libs-api team also mentioned in the ACP that:
|
|
Yes, optimizing out the equality checks should be trivial - my code already panics if that logic is wrong, and indexing there is much easier to explain than the logic in this PR as-is. My feeling is that doing the trimming that components does shouldn't be a big deal, though it would be good to ask libs-api for an opinion on that, especially if it allows for a simplification for this PR. |
I think that would be best. @rustbot label +I-libs-nominated |
|
@Mark-Simulacrum I thought about your approach with It passes all the tests I created + fuzzing. However, I do notice that it's slower assuming from what I also added Let me know what you think about this implementation. If this is still not preferable, I'm down with the idea of creating a whole separate struct that does the reverse of what I have the old implementation in the previous commit to this current one. I can squash that out should you want that implementation off record. |
View all comments
This PR revolves around the tracking issue for implementing
DoubleEndedIteratorforAncestors<'_>.I tried to minimize the information needed in
Ancestors<'_>struct to effectively produce the correct backward and forward paths using a front and back index; I did opt to use a&'a [u8]over&'a Pathbecause it would reduce code in.next()/.next_back()where I wouldn't need to constantly convert the Path into a u8 slice. Due to how relative paths worked previously forAncestors<'_>, I had to introduce logic that allowed for returning an empty string at the start/end of a relative path for.next_back()/.next(); I'm open to a lot of feedback on how to make the code neater to address these cases.I tested this pretty extensively on my Linux machine (comparing it with the output of what
Ancestors<'_>previously returned); I do have to do more testing for Windows (been struggling a bit to get my Windows VM to run my shared folder containing the rust project through./xthough, so I may need some help on this), which there are some windows-specific test intests/path_ancestors.rs. As far as I'm aware as well, the create_dir_all tests, which does useAncestors<'_>underneath the hood, all passes.I am aware of one breaking change in this code though. I realized for a path like:
Its
Ancestor<'_>iterator contains:The previous implementation doesn't strip trailing slashes for the first path, but in subsequent paths it does eliminate trailing slashes. You can see this here.
With how I use ascii separators to determine that we've reached the next component, this does make it a bit difficult to find a neat way to handle this logic (and do this symmetrically as well for
.next_back()). For example, consider the two paths:With how I'm using a back index approach, I save what the current back idx is in a variable (curr_back), then iterate through the u8 slice until I hit an ascii separator, advance more through potential ascii separators, and then stop reading from there if there are no more adjacent ascii separators, returning what
path[..curr_back]is (trimming separators). Doing this on the first path would return "/foo/bar/baz" in the first call (since I trim separators, but otherwise this would be "/foo/bar/baz/") and then return "/foo/bar/baz" again in the next call (this is because curr_back is set to the idx of where the last "/" is at in "/foo/bar/baz/"). Doing this for the second path would return "/foo/bar/baz" in the first call and then "/foo/bar" in the next call. I currently just trim separators from the original path before giving that to theAncestors<'_>struct when.ancestors()is called.I'm hoping this change isn't room for big concerns because removing trailing separators for the original path does not break logic in accessing that path. However, if someone were to use
Ancestors<'_>for printing paths recursively up, this would cause changes to the first path, which I do not see it being a huge issue if they want to print the original path unchanged since they can opt to consume the first item in the iterator and print the original path. If I do, however, need to account for this logic, then I will try and see what I can do; I'm open to feedback in handling this case as well.UPDATE
I added a bit more information into
Ancestors<'_>(path_lenandtrailing_seps) that provides more information to the iterator so that we can print out the original path on the first call to.next()or when we reach the last component for.next_back(). Unfortunately, this does meanancestors()becomes aO(N)function if we just have ausize::MAXtrailing root path, but that's really irregular and rare to see. As a result of this, this should prevent any breaking changes.The way
Ancestors<'_>is implemented now operates the same as the previous version (at least on Linux/Unix, I still have to test on Windows to see if Prefix components are okay).