Multithreaded support for apply_forest_proba (Issue #209) by salbert83 · Pull Request #210 · JuliaAI/DecisionTree.jl

salbert83 · 2023-01-15T14:12:25Z

I think regression uses the functions in ../classification/main.jl for applying forests to a set of features, so no new development required for this.

Fixes #209

codecov-commenter · 2023-01-15T14:16:29Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.02%. Comparing base (835f3cd) to head (686a44b).
Report is 33 commits behind head on dev.

Additional details and impacted files

@@            Coverage Diff             @@
##              dev     #210      +/-   ##
==========================================
+ Coverage   87.99%   88.02%   +0.03%     
==========================================
  Files          10       10              
  Lines        1249     1253       +4     
==========================================
+ Hits         1099     1103       +4     
  Misses        150      150

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

rikhuijzer

This PR seems like a step in the right direction to solve #209

rikhuijzer · 2023-01-15T16:49:28Z

src/classification/main.jl

+apply_tree_proba(tree::Root{S, T}, features::AbstractMatrix{S}, labels; use_multithreading = false) where {S, T} =
+    apply_tree_proba(tree.node, features, labels, use_multithreading = use_multithreading)
+apply_tree_proba(tree::LeafOrNode{S, T}, features::AbstractMatrix{S}, labels; use_multithreading = false) where {S, T} =
+    stack_function_results(row->apply_tree_proba(tree, row, labels), features, use_multithreading = use_multithreading)


Suggested change

stack_function_results(row->apply_tree_proba(tree, row, labels), features, use_multithreading = use_multithreading)

stack_function_results(row->apply_tree_proba(tree, row, labels), features; use_multithreading)

Lower bound is set to 1.6 so no need to repeat the keyword name

rikhuijzer · 2023-01-15T16:49:38Z

src/classification/main.jl

+        for i in 1:N
+            out[i, :] = row_fun(X[i, :])
+        end
+    else
+        for i in 1:N
+            out[i, :] = row_fun(X[i, :])
+        end


Cases are the same?

rikhuijzer · 2023-01-15T16:51:33Z

test/classification/iris.jl

 @test depth(model) == 1
 probs = apply_tree_proba(model, features, classes)
 @test reshape(sum(probs, dims=2), n) ≈ ones(n)
+probs_m = apply_tree_proba(model, features, classes, use_multithreading=true)


Although there isn't a format style guide for this repository, consistent use of spaces around keyword argument equal signs seems like a good start. Here at line 19 are no spaces and at 33 and 59 there are. MLJ style is no spaces around keyword arguments equals I think.

Same holds for using the semicolon to separate the arguments from the keyword arguments. At some places in this PR it is done and and some not. Here, it's generally advised to use semicolons because they improve clarity.

ablaom

Thanks for reviewing this @rikhuijzer

@salbert83 Be good to add a docstring, as here: #208

And even better, also add an example in the README.md section on native interface. This should make the new feature more discoverable.

ablaom · 2023-01-25T23:18:36Z

@salbert83 Would you have some time soon to respond to the review?

ablaom · 2023-02-06T21:45:33Z

@rikhuijzer I'm not getting a response here. Are you willing and able to fishish this?

rikhuijzer · 2023-02-07T07:41:38Z

It looks like this would result in a nested @threads call. One time in stack_function_results and one time inside the row_fun that is passed into stack_function_results. Nested @threads should be possible with the :dynamic scheduler, which appears to have only been added in Julia 1.8 (https://github.com/JuliaLang/julia/blob/master/HISTORY.md). Also, I don't know how to benchmark whether adding multithreading actually saves time or not.

So, let's leave this open until the lower bound is set to Julia 1.8 or until someone who really needs this implements and shows benchmarks?

ablaom · 2023-02-07T22:35:08Z

Thanks @rikhuijzer for looking into this.

It looks like this would result in a nested @threads call.

So, does this also apply to the existing implementation added in https://github.com/JuliaAI/DecisionTree.jl/pull/188/files that therefore needs attention?

I also notice that the existing implementation is buy-in (use_multithreading=false is the default) whereas the present addition is buy-out.

rikhuijzer · 2023-02-08T09:27:53Z

So, does this also apply to the existing implementation added in https://github.com/JuliaAI/DecisionTree.jl/pull/188/files that therefore needs attention?

Maybe that explains #188 (comment). I'm afraid, I don't know and also I never need multithreading so I'm not the right person to ask unfortunately.

Maybe figuring out the right multithreading for this package is something you would like, @ExpandingMan?

ablaom · 2023-02-08T20:32:57Z

@OkonSamuel Do you see obvious issues with the way multithreading is currently implemented in prediction? It's here:

DecisionTree.jl/src/classification/main.jl

Line 468 in f57a156

Threads.@threads for i in 1:N

ablaom · 2023-02-10T01:13:14Z

@rikhuijzer I don't believe nested multithreading is an issue. This has been tested before in MLJTuning where optimization multithreading has within it resampling multithreading. My interpretation of the 1.9 changes cited is only that nested threading will typically be more efficient with new default settings for the scheduler.

I don't see anything obvious wrong about the proposed implementation (or the existing one), provided user must buy-in, but will wait for the pinged experts to hopefully weigh in.

salbert83 added 2 commits January 15, 2023 09:07

Multithreaded support for apply_forest_proba

876dd49

Unit tests for multithreaded support for apply_forest_proba

686a44b

rikhuijzer reviewed Jan 15, 2023

View reviewed changes

ablaom reviewed Jan 16, 2023

View reviewed changes

	stack_function_results(row->apply_tree_proba(tree, row, labels), features, use_multithreading = use_multithreading)
	stack_function_results(row->apply_tree_proba(tree, row, labels), features; use_multithreading)

Conversation

salbert83 commented Jan 15, 2023 • edited by rikhuijzer Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Jan 15, 2023 • edited by codecov bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

rikhuijzer left a comment

Choose a reason for hiding this comment

Uh oh!

rikhuijzer Jan 15, 2023

Choose a reason for hiding this comment

Uh oh!

rikhuijzer Jan 15, 2023

Choose a reason for hiding this comment

Uh oh!

rikhuijzer Jan 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ablaom left a comment

Choose a reason for hiding this comment

Uh oh!

ablaom commented Jan 25, 2023

Uh oh!

ablaom commented Feb 6, 2023

Uh oh!

rikhuijzer commented Feb 7, 2023

Uh oh!

ablaom commented Feb 7, 2023

Uh oh!

rikhuijzer commented Feb 8, 2023

Uh oh!

ablaom commented Feb 8, 2023

Uh oh!

ablaom commented Feb 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

salbert83 commented Jan 15, 2023 •

edited by rikhuijzer

Loading

codecov-commenter commented Jan 15, 2023 •

edited by codecov bot

Loading

rikhuijzer Jan 15, 2023 •

edited

Loading

ablaom commented Feb 10, 2023 •

edited

Loading