Improve concurrent sequential access speed by per-file file pointer.#182
Improve concurrent sequential access speed by per-file file pointer.#182tmhannes wants to merge 1 commit intorelan:masterfrom
Conversation
This commit extends the caching of cluster information introduced by commit 6c332e1 so that multiple open file handles accessing the same node each have their own cache. That prevents dramatic slowdowns when multiple processes are reading from a large file. * Replace the fptr_index and fptr_cluster members of struct exfat_node introduced by 6c332e1 with a list of exfat_fptr structs, each of which holds one index and cluster. * In fuse_exfat_open, add a new exfat_fptr to the exfat_node, and store a pointer to both in the fh member of the fuse_file_info. * In fuse_exfat_read and fuse_exfat_write, pass the exfat_fptr from the fh through to exfat_advance_cluster, so that multiple file handles open on the same file can independently track their most recently used cluster. For all other uses of exfat_advance_cluster, we continue to user the "shared" exfat_fptr stored in the node. * Adjust grow_file and shrink_file to detect when multiple exfat_fptr instances are stored in the node and update them all as needed.
|
I like the idea, multi-thread operations are indeed suboptimal. Thanks for your proposal! I have a few concerns about the implementation:
|
The My impression is that this is required when, for example, a process A has a file open and then a process B truncates it, so that process A is not left holding an invalid
It's not necessary to pass the I agree that the optional I didn't pursue that option because I wanted to keep the patch as small as possible. Requiring every caller of |
|
This looks useful, what's preventing getting this in? |
It is indeed useful, but IMO it would be better to choose the approach when all functions, working with file contents, require an position argument. This would be much more bug-proof. |
This PR addresses #181 by adding a separate "fptr" for each open file handle, so that sequential reads never have to restart from the beginning of the file.
In light testing on an i7 CPU with an external SSD, the PR allows concurrent readers at different positions in a large file to achieve the same total read throughput and CPU usage as a single reader sequentially processing the whole file.
Any feedback would be gratefully received.