Add manual correction of tokenization in BIO format

An idea for enhancement: adding a new tab "Tokenization" or "Tokens". In this tab a user could upload a file in IOB2/BIO format and manually correct tokenization and tags. For example, a user could split or merge the tokens and modify the corresponding tags.

Motivation: the current tokenizer in MedTator for BIO export has relatively low accuracy for many special cases. In those cases the BIO files cannot be currently used for training. But even if the tokenizer will be replaced by another one (as mentioned in https://github.com/OHNLP/MedTator/issues/7), it is unlikely to have good performance for all languages and use cases. Therefore having an opportunity to fix manually the tokens and tags in BIO format would be very helpful for building a gold standard corpus.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add manual correction of tokenization in BIO format #9

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add manual correction of tokenization in BIO format #9

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions