Skip to content

Allow shared Arc<Collector> usage#81

Open
yamabiiko wants to merge 1 commit intoibraheemdev:masterfrom
yamabiiko:master
Open

Allow shared Arc<Collector> usage#81
yamabiiko wants to merge 1 commit intoibraheemdev:masterfrom
yamabiiko:master

Conversation

@yamabiiko
Copy link
Copy Markdown

@yamabiiko yamabiiko commented Sep 13, 2025

As previously discussed in #66, this change is motivated by the memory usage of a freshly initialized HashMap or HashSet, which is ~1kB. This is can cause memory usage to grow quite fast when defining multiple HashMap within a single struct.

The change simply adds a C: Borrow<Collector> generic to the HashMap, so in order to make the API safe we constrain such setting to a shared_controller builder function with bounds K: Send + 'static, V: Send + 'static, which should be enough to enforce safety from my understanding.

The implemented Clone behavior for HashMap<K, V, S, Arc<Collector>> is to simply create a new Arc<Collector> rather than cloning it, as that seems the safer approach, even though it might be slightly confusing.

Comment thread src/map.rs Outdated
///
/// Note that all `Guard` references used to access the map must be produced by
/// the provided `collector`.
pub fn collector<C: Borrow<Collector>>(self, collector: C) -> HashMapBuilder<K, V, S, C> {
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is unsound without the extra bounds on K/V that were mentioned. It should also be distinct from the default collector builder, which takes an owned Collector without requiring extra bounds.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I mistakenly pushed an outdated version. As I mentioned in the PR, setting an Arc<Controller> should be done via a distinct shared_controller

Comment thread src/map.rs Outdated
// `HashMap` itself. If this was not true, we would require stricter bounds
// on `HashMap` operations themselves.
unsafe impl<K: Send + Sync, V: Send + Sync, S: Sync> Sync for HashMap<K, V, S> {}
unsafe impl<K, V, S: Sync, C: Sync + Borrow<Collector>> Sync for HashMap<K, V, S, C> {}
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why these changes?

Comment thread src/map.rs
raw: raw::HashMap<K, V, S>,
pub struct HashMap<K, V, S = RandomState, C = Collector>
where
C: Borrow<Collector>,
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would prefer to just unconditionally store an Arc<Collector>. It's useful to limit the size of a HashMap on the stack anyways, and I don't think the extra generics are worth it.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, so you think it would be better to switch every HashMap to be initialized with an Arc<Collector>? I agree that the extra generics are not the cleanest.

Copy link
Copy Markdown
Owner

@ibraheemdev ibraheemdev Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay, yes, I think that would be reasonable. seize::Collector is also quite a large type, and so allocating it would be useful either way.

@yamabiiko yamabiiko force-pushed the master branch 2 times, most recently from 219c4b2 to e7cdac0 Compare September 17, 2025 23:41
yamabiiko added a commit to Ebi-Corp/ebi that referenced this pull request Oct 3, 2025
- create new `FileSysService`, which handles retrieving or creating new nodes given a `ShelfDataRef` and a `PathBuf`
- remove metadata from `File`, now load metadata on `FileSummary`
- temporary use fork of `papapaya` for better memory usage ibraheemdev/papaya#81
- remove `dtag_files` from `Node` and `dtags` from file
- change reference type of owned `Tag` to `SharedRef` to allow information edit
- `attach`, `detach` for tags and dtags now take an input directory `FileId` instead of a path `PathBuf`
- updated the respective calls in `RpcService` to query the `FileSysService` when required
yamabiiko added a commit to Ebi-Corp/ebi that referenced this pull request Oct 13, 2025
- create new `FileSysService`, which handles retrieving or creating new nodes given a `ShelfDataRef` and a `PathBuf`
- remove metadata from `File`, now load metadata on `FileSummary`
- temporary use fork of `papapaya` for better memory usage ibraheemdev/papaya#81
- remove `dtag_files` from `Node` and `dtags` from file
- change reference type of owned `Tag` to `SharedRef` to allow information edit
- `attach`, `detach` for tags and dtags now take an input directory `FileId` instead of a path `PathBuf`
- updated the respective calls in `RpcService` to query the `FileSysService` when required
yamabiiko added a commit to Ebi-Corp/ebi that referenced this pull request Oct 13, 2025
- create new `FileSysService`, which handles retrieving or creating new nodes given a `ShelfDataRef` and a `PathBuf`
- remove metadata from `File`, now load metadata on `FileSummary`
- temporary use fork of `papapaya` for better memory usage ibraheemdev/papaya#81
- remove `dtag_files` from `Node` and `dtags` from file
- change reference type of owned `Tag` to `SharedRef` to allow information edit
- `attach`, `detach` for tags and dtags now take an input directory `FileId` instead of a path `PathBuf`
- updated the respective calls in `RpcService` to query the `FileSysService` when required
@ibraheemdev ibraheemdev added the feature New feature or request label Mar 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants