Multithreaded support for apply_forest_proba (Issue #209)#210
Multithreaded support for apply_forest_proba (Issue #209)#210salbert83 wants to merge 2 commits intoJuliaAI:devfrom
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## dev #210 +/- ##
==========================================
+ Coverage 87.99% 88.02% +0.03%
==========================================
Files 10 10
Lines 1249 1253 +4
==========================================
+ Hits 1099 1103 +4
Misses 150 150 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
rikhuijzer
left a comment
There was a problem hiding this comment.
This PR seems like a step in the right direction to solve #209
| apply_tree_proba(tree::Root{S, T}, features::AbstractMatrix{S}, labels; use_multithreading = false) where {S, T} = | ||
| apply_tree_proba(tree.node, features, labels, use_multithreading = use_multithreading) | ||
| apply_tree_proba(tree::LeafOrNode{S, T}, features::AbstractMatrix{S}, labels; use_multithreading = false) where {S, T} = | ||
| stack_function_results(row->apply_tree_proba(tree, row, labels), features, use_multithreading = use_multithreading) |
There was a problem hiding this comment.
| stack_function_results(row->apply_tree_proba(tree, row, labels), features, use_multithreading = use_multithreading) | |
| stack_function_results(row->apply_tree_proba(tree, row, labels), features; use_multithreading) |
Lower bound is set to 1.6 so no need to repeat the keyword name
| for i in 1:N | ||
| out[i, :] = row_fun(X[i, :]) | ||
| end | ||
| else | ||
| for i in 1:N | ||
| out[i, :] = row_fun(X[i, :]) | ||
| end |
| @test depth(model) == 1 | ||
| probs = apply_tree_proba(model, features, classes) | ||
| @test reshape(sum(probs, dims=2), n) ≈ ones(n) | ||
| probs_m = apply_tree_proba(model, features, classes, use_multithreading=true) |
There was a problem hiding this comment.
Although there isn't a format style guide for this repository, consistent use of spaces around keyword argument equal signs seems like a good start. Here at line 19 are no spaces and at 33 and 59 there are. MLJ style is no spaces around keyword arguments equals I think.
Same holds for using the semicolon to separate the arguments from the keyword arguments. At some places in this PR it is done and and some not. Here, it's generally advised to use semicolons because they improve clarity.
ablaom
left a comment
There was a problem hiding this comment.
Thanks for reviewing this @rikhuijzer
@salbert83 Be good to add a docstring, as here: #208
And even better, also add an example in the README.md section on native interface. This should make the new feature more discoverable.
|
@salbert83 Would you have some time soon to respond to the review? |
|
@rikhuijzer I'm not getting a response here. Are you willing and able to fishish this? |
|
It looks like this would result in a nested So, let's leave this open until the lower bound is set to Julia 1.8 or until someone who really needs this implements and shows benchmarks? |
|
Thanks @rikhuijzer for looking into this.
So, does this also apply to the existing implementation added in https://github.com/JuliaAI/DecisionTree.jl/pull/188/files that therefore needs attention? I also notice that the existing implementation is buy-in ( |
Maybe that explains #188 (comment). I'm afraid, I don't know and also I never need multithreading so I'm not the right person to ask unfortunately. Maybe figuring out the right multithreading for this package is something you would like, @ExpandingMan? |
|
@OkonSamuel Do you see obvious issues with the way multithreading is currently implemented in prediction? It's here: DecisionTree.jl/src/classification/main.jl Line 468 in f57a156 |
|
@rikhuijzer I don't believe nested multithreading is an issue. This has been tested before in MLJTuning where optimization multithreading has within it resampling multithreading. My interpretation of the 1.9 changes cited is only that nested threading will typically be more efficient with new default settings for the scheduler. I don't see anything obvious wrong about the proposed implementation (or the existing one), provided user must buy-in, but will wait for the pinged experts to hopefully weigh in. |
I think regression uses the functions in ../classification/main.jl for applying forests to a set of features, so no new development required for this.
Fixes #209