Releases: trinker/textshape
version 0.6.0
NEWS
Versioning
Releases will be numbered with the following semantic versioning format:
<major>.<minor>.<patch>
And constructed with the following guidelines:
- Breaking backward compatibility bumps the major (and resets the minor
and patch) - New additions without breaking backward compatibility bumps the minor
(and resets the patch) - Bug fixes and misc changes bumps the patch
textshape 1.5.1 - 1.6.0
BUG FIXES
split_match_regex_to_transcriptgave warnings aboutperl = TRUEthat were
due togsubwith afixed = TRUEclashing with theperl = TRUE.
NEW FEATURES
-
flattenadded for flattening nested, named lists into single tiered lists
using the concatenated list/atomic vector names as the names of the single
tiered list. -
unnest_textadded to located and un-nest nested text columns in a data.frame.
IMPROVEMENTS
tidy_dtm/tidy_tdmdid not order unnamed matrices as expected (e.g.,
{1, 2, ..., 1}was ordered as{1, 10, 2, ...}). This has been corrected.
textshape 1.4.0 - 1.5.0
BUG FIXES
- split_sentence
,split_word,split_token, &split_speaker` did not handle
single row data.frames properly resulting in loss of data. This has been fixed.
NEW FEATURES
-
split_sentence_tokenadded as a shortcut to split into sentences, add a
sentence index, and then split into tokens and add a token index. -
tidy_matrixandtidy_adjacency_matrixadded to provide easy tidy
representations of these data types. -
cluster_matrixadded for reordering the columns/rows of matrices via
hierarchical clustering.
IMPROVEMENTS
split_sentencenow handles digit(s) + inch (in.) abbreviation if not
followed by a capital letter. Previously, this was split on. Additionally,
post script (p.s.) is no longer split on.
textshape 1.1.0 - 1.3.0
BUG FIXES
tidy_listdid not addcontent.attribute.namefor lists of named vectors.
MINOR FEATURES
-
split_match_regexadded as a version ofsplit_matchwithregex = TRUEby
default. This makes it easier to reason about what the function call is doing. -
split_match_regex_to_transcriptadded to directly split a text by a person
regex and convert to a two column transcript of person and dialogue.
IMPROVEMENTS
-
tidy_listnow uses data.table'srbindfor lists ofdata.frames.
This means column ordering does not need to match and missing columns are
automatically filled withNAs. -
split_sentencehas better handling for the 'No.' abbreviation that
distinguishes between 'No.' followed by digits (assumed to be and abbreviation)
and when no digits follow (assumed to be a complete sentence). -
split_sentencehas better handling for quoted material (i.e., a punctuation
mark followed by single or double quotes that is not followed by a comma). -
split_sentencehas better handling for single and double middle names
presented as initials. -
split_sentencehas better handling for abbreviated English units of measure.
CHANGES
combine.defaultincluded element names by default. This has been removed to
include only the elements.
textshape 1.0.2
BUG FIXES
-
tidy_listwith a list of unnameddata.frames resulted in an error (see
issue #7). This issue has been fixed. -
split_word.data.frameandsplit_token.data.frameboth used an incorrect
column naming ofsentence_idfor word and token index respectively. These
columns are now renamed toword_idandtoken_idrespectively. -
split_tokengets a more robust splitting algorithm.
NEW FEATURES
-
column_to_rownamesadded to enable one to quickly add a column as rownames
easily within a pipeline. This is useful when turning adata.frameinto a
matrix. -
tidy_listpicks up the ability to tidy a list of named vectors into three
columns.
CHANGES
as.tibbleremoved from all function arguments. This was a nice interactive
feature that made programming very difficult to reason about. Having an
environment dependant output would result in no adoption of the textshape
package as a dependency. Additionally,set_outputandtibble_output,
two complementary function have been removed without being deprecated. The
problem was so egregious and the package infant enough, that removal without
deprecation was warranted.
textshape 1.0.1
NEW FEATURES
-
Users can now globally select a tibble output rather than a data.table
output for all functions that outputted a data.table. This can be set
globally viaset_output. If the user does not set the output type
textshape tries to infer based on whether or not the user has dplyr
loaded. If dplyr is loaded then tibble is the default output. -
set_outputandtibble_outputadded to globally set the output type
(tibble or data.table) and to check/infer the desired output type.
textshape 1.0.0
CHANGES
bind_list,bind_table, &bind_vectorhave been renamed to the more
meaningful forms oftidy_list,tidy_table, &tidy_vector. The former
version are now deprecated. This bumps the version to 1.0.0 as this is a
major change that breaks backward compatibility.
textshape 0.1.0 - 0.2.0
NEW FEATURES
-
bind_listadded torbindalistof nameddata.frames orvectors. -
split_transcriptadded to split a transcript style vector (e.g.,
c("greg: Who me", "sarah: yes you!")into a name and dialogue vector that is
coerced to adata.table. -
change_indexadded for extracting the indices of changes in runs within an
atomic vector. Pairs well withsplit_index. -
bind_vectoradded tocbinda named atomicvector's names and values. -
bind_tableadded tocbindatable's names and values. -
durationmethod for numeric vectors added as well as astartsandends
function for calculating start and end times from a numeric vector. -
from_toadded to prepare speaker data for a network lot given the flowing
nature of discourse. -
tidy_dtm&tidy_tdmadded to convert aDocumentTermMatrix
orTermDocumentMatrixinto a tidieddata.frame. -
tidy_colo_dtm&tidy_colo_tdmadded to convert aDocumentTermMatrix
orTermDocumentMatrixinto a collocation matrix and then a tidieddata.frame. -
unique_pairsadded to compliment the output oftidy_colo_dtm&
tidy_colo_tdm. Enables the removal of duplicated collocating pairs caused
by symmetrical mirroring of the upper and lower triangle of the collocation
matrix.
CHANGES
split_indexnow useschange_index(x)as the default whenxis an atomic
vector.
textshape 0.0.1
Tools that can be used to reshape text data.
textshape version 1.0.2
NEWS
Versioning
Releases will be numbered with the following semantic versioning format:
<major>.<minor>.<patch>
And constructed with the following guidelines:
- Breaking backward compatibility bumps the major (and resets the minor
and patch) - New additions without breaking backward compatibility bumps the minor
(and resets the patch) - Bug fixes and misc changes bumps the patch
textshape 1.0.2
BUG FIXES
tidy_listwith a list of unnameddata.frames resulted in an error (see
issue #7). This issue has been fixed.split_word.data.frameandsplit_token.data.frameboth used an incorrect
column naming ofsentence_idfor word and token index respectively. These
columns are now renamed toword_idandtoken_idrespectively.split_tokengets a more robust splitting algorithm.
NEW FEATURES
column_to_rownamesadded to enable one to quickly add a column as rownames
easily within a pipeline. This is useful when turning adata.frameinto a
matrix.tidy_listpicks up the ability to tidy a list of named vectors into three
columns.
CHANGES
as.tibbleremoved from all function arguments. This was a nice interactive
feature that made programming very difficult to reason about. Having an
environment dependant output would result in no adoption of the textshape
package as a dependency. Additionally,set_outputandtibble_output,
two complementary function have been removed without being deprecated. The
problem was so egregious and the package infant enough, that removal without
deprecation was warranted.
textshape 1.0.1
NEW FEATURES
- Users can now globally select a tibble output rather than a data.table
output for all functions that outputted a data.table. This can be set
globally viaset_output. If the user does not set the output type
textshape tries to infer based on whether or not the user has dplyr
loaded. If dplyr is loaded then tibble is the default output. set_outputandtibble_outputadded to globally set the output type
(tibble or data.table) and to check/infer the desired output type.
textshape 1.0.0
CHANGES
bind_list,bind_table, &bind_vectorhave been renamed to the more
meaningful forms oftidy_list,tidy_table, &tidy_vector. The former
version are now deprecated. This bumps the version to 1.0.0 as this is a
major change that breaks backward compatibility.
textshape 0.1.0 - 0.2.0
NEW FEATURES
bind_listadded torbindalistof nameddata.frames orvectors.split_transcriptadded to split a transcript style vector (e.g.,
c("greg: Who me", "sarah: yes you!")into a name and dialogue vector that is
coerced to adata.table.change_indexadded for extracting the indices of changes in runs within an
atomic vector. Pairs well withsplit_index.bind_vectoradded tocbinda named atomicvector's names and values.bind_tableadded tocbindatable's names and values.durationmethod for numeric vectors added as well as astartsandends
function for calculating start and end times from a numeric vector.from_toadded to prepare speaker data for a network lot given the flowing
nature of discourse.tidy_dtm&tidy_tdmadded to convert aDocumentTermMatrix
orTermDocumentMatrixinto a tidieddata.frame.tidy_colo_dtm&tidy_colo_tdmadded to convert aDocumentTermMatrix
orTermDocumentMatrixinto a collocation matrix and then a tidieddata.frame.unique_pairsadded to compliment the output oftidy_colo_dtm&
tidy_colo_tdm. Enables the removal of duplicated collocating pairs caused
by symmetrical mirroring of the upper and lower triangle of the collocation
matrix.
CHANGES
split_indexnow useschange_index(x)as the default whenxis an atomic
vector.
textshape 0.0.1
Tools that can be used to reshape text data.
version 0.2.0
NEWS
Versioning
Releases will be numbered with the following semantic versioning format:
<major>.<minor>.<patch>
And constructed with the following guidelines:
- Breaking backward compatibility bumps the major (and resets the minor
and patch) - New additions without breaking backward compatibility bumps the minor
(and resets the patch) - Bug fixes and misc changes bumps the patch
textshape 0.1.0 - 0.2.0
NEW FEATURES
bind_listadded torbindalistof nameddata.frames orvectors.split_transcriptadded to split a transcript style vector (e.g.,
c("greg: Who me", "sarah: yes you!")into a name and dialogue vector that is
coerced to adata.table.change_indexadded for extracting the indices of changes in runs within an
atomic vector. Pairs well withsplit_index.bind_vectoradded tocbinda named atomicvector's names and values.bind_tableadded tocbindatable's names and values.durationmethod for numeric vectors added as well as astartsandends
function for calculating start and end times from a numeric vector.from_toadded to prepare speaker data for a network lot given the flowing
nature of discourse.tidy_dtm&tidy_tdmadded to convert aDocumentTermMatrix
orTermDocumentMatrixinto a tidieddata.frame.tidy_colo_dtm&tidy_colo_tdmadded to convert aDocumentTermMatrix
orTermDocumentMatrixinto a collocation matrix and then a tidieddata.frame.unique_pairsadded to compliment the output oftidy_colo_dtm&
tidy_colo_tdm. Enables the removal of duplicated collocating pairs caused
by symmetrical mirroring of the upper and lower triangle of the collocation
matrix.
CHANGES
split_indexnow useschange_index(x)as the default whenxis an atomic
vector.
textshape 0.0.1
Tools that can be used to reshape text data.