RepresentationSeries: pca, nmf, tsne#140
Conversation
…ion" funcionts of representaion module - Implement full support for Representation Series in "Dimensionality Reduction" functions of representation module - add appropriate parameterized tests in `test_representation.py` Note: the dimensionality reduction functions *do support* both flat and represenational input; this is because as discussed in jbesomi#134 we will not implement a `to_repr` function, so users might have e.g. a flattened tfidf series they want to perform pca on -> they can use the function. Of course, we could remove the flat support (if we basically want to "forbid" users from using `VectorSeries` for tfidf/term_freq/count; we're not 100% sure. Co-authored-by: Maximilian Krahn <maximilian.krahn@icloud.com>
|
🎉 Opinions:
Review:
Hope you got some insights! 👍 |
|
I believe that it is a bad idea to restrict the So two functions that both do dimensionality reduction need a different pipeline, that's not nice for users! Yes, This allows the user to call all dimensionality reduction functions the same way. This is why I see two possibilities:
I think it should stay an implementation detail and not be visible to users that |
|
Hey Henri, I totally agree with your view and understand your points! I agree with having all representation functions (pca, nmf, tfidf) to receive as input a
I believe the users should be very aware that Regarding the |
I will add something about that to the docstring! I agree that when writing a full tutorial on representation, we have to describe all this in detail; otherwise users will just use
I agree it's good to stick to the scikit-learn defaults and mention this in the tutorial (the representation tutorial will need to be very good 🤓 !) |
- dim. red. functions only support Representation Series input - appropriately change tests
|
Just removed support for QUESTION: About the clustering functions: should they:
|
|
Good question! a) Only vector series: would very inefficient in some cases ... b) We would not respect the rule "one function type for So, to recap right now we have:
This might be a bit confusing/overcomplicated/limited, what's your advice on that? An alternative might be: keep only Conclusion: I would like to see/understand your opinion! 👍 |
|
I agree it's really annoying that we can't just only work with
I think even if possible, it's not really a nice UX if users have to use I think the possibilities are:
I believe that possibility two is the most intuitive for users, as the only time they'll ever notice it is when trying to put a Series with |
|
Closing this, see #156 |
test_representation.pyNote: the dimensionality reduction functions do support both flat and represenational input; this is because as discussed in #134 we will not implement a
to_reprfunction, so users might have e.g. a flattened tfidf series they want to perform pca on -> we allow them to use the dimensionality reduction functions. Of course, we could remove the flat support (if we basically want to "forbid" users from usingVectorSeriesfor tfidf/term_freq/count); we're not 100% sure.