An idea for enhancement: adding a new tab "Tokenization" or "Tokens". In this tab a user could upload a file in IOB2/BIO format and manually correct tokenization and tags. For example, a user could split or merge the tokens and modify the corresponding tags.
Motivation: the current tokenizer in MedTator for BIO export has relatively low accuracy for many special cases. In those cases the BIO files cannot be currently used for training. But even if the tokenizer will be replaced by another one (as mentioned in #7), it is unlikely to have good performance for all languages and use cases. Therefore having an opportunity to fix manually the tokens and tags in BIO format would be very helpful for building a gold standard corpus.
An idea for enhancement: adding a new tab "Tokenization" or "Tokens". In this tab a user could upload a file in IOB2/BIO format and manually correct tokenization and tags. For example, a user could split or merge the tokens and modify the corresponding tags.
Motivation: the current tokenizer in MedTator for BIO export has relatively low accuracy for many special cases. In those cases the BIO files cannot be currently used for training. But even if the tokenizer will be replaced by another one (as mentioned in #7), it is unlikely to have good performance for all languages and use cases. Therefore having an opportunity to fix manually the tokens and tags in BIO format would be very helpful for building a gold standard corpus.