Refine string case conversion helpers#593
Conversation
|
I read a bit more on the differences between POSIX pre-built classes and unicode regex. The main difference is that POSIX classes are locale-dependent when unicode regex are not (see https://www.regular-expressions.info/posixbrackets.html). I wasn't able to test it but POSIX classes may be able to handle the ij digraph properly if the locale is set to Dutch. |
|
Also, how would you like to handle superscript and subscript letters and numbers? Should we drop them, or normalize them using Currently, digits are always separated from letters. Another option would be to keep digits attached when they directly follow a letter, but use the dash or underscore separator otherwise. The downside of this approach is that it makes the conversion from snake/kebab case to camel case non-reversible. |
|
Thanks! Lets keep this simple for now, and not worry about special characters (subscripts etc). I could also imagine options that ensure that the output is ASCII (since you might want that for programming variables) or valid R variable names, but I think extensions can wait until we see some use of this in the wild. |
Fixes #592
Let me know if you want me to add short comments explaining each regex pattern.
I'm not quite sure how you'd like to handle acronyms in
str_to_camel(). I'm more inclined to treat acronyms like any other words (e.g.userId; current behavior), but I know some people prefer preserving uppercase in two-letter acronyms (e.g.userID).Some tests might be redundant.