You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Up until this point, pyrouge has applied Porter stemming (-m) silently, unchangeably, and by default. There is no way to turn this off, even if the user provides an explicit set of ROUGE parameters they want to use. It's also unclear given the existing code structure that the -m parameter is being added, and it seems most users don't realize they are using stemming.
Applying stemming like this is a deviation from the default ROUGE parameters, and tends to fairly significantly inflate ROUGE scores across the board. For this reason, using stemming really needs to be an explicit choice the user makes, rather than being done by default.
My only change here is only deleting the -m when it's added separately in __add_config_option. This makes the set of default parameters clearer, forcing the choice to use stemming to be explicit. This may not be the right solution if some people use and cite pyrouge knowing fully that it stems by default, but my impression is that this is not the case.
I agree with @grusky that it would be quite important to let the user remove the -m flag from the set of options, if needed. For example when evaluating languages other than English.
Given that pyrouge is used by quite a lot of projects (either directly or through wrappers like summ-eval), changing the default may not be the best option here.
We could remove the [-m] from this section of code:
That way, the existing behavior of pyrouge keeps the same, unless a user had previously provided arguments and relied on the fact that -m is added automatically.
@bheinzerling is there any chance to get this change into pyrouge?
That's of course horrible code and I don't know what 8-years-ago me was thinking. I also agree that moving the -m flag into the list of default options is the proper way to fix it. However, I'm weary of changing anything, since it's possible that there are users who rely on the current behaviour.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Up until this point, pyrouge has applied Porter stemming (
-m) silently, unchangeably, and by default. There is no way to turn this off, even if the user provides an explicit set of ROUGE parameters they want to use. It's also unclear given the existing code structure that the-mparameter is being added, and it seems most users don't realize they are using stemming.Applying stemming like this is a deviation from the default ROUGE parameters, and tends to fairly significantly inflate ROUGE scores across the board. For this reason, using stemming really needs to be an explicit choice the user makes, rather than being done by default.
My only change here is only deleting the
-mwhen it's added separately in__add_config_option. This makes the set of default parameters clearer, forcing the choice to use stemming to be explicit. This may not be the right solution if some people use and cite pyrouge knowing fully that it stems by default, but my impression is that this is not the case.