Custom model with pretokenized input including multiword by ziqianPeng · Pull Request #56 · nlp-uoregon/trankit

ziqianPeng · 2022-07-30T23:10:24Z

Hello!
I'm trying to train custom parser using trankit with pretokenized input extracted from conllu files.

Maybe I didn't get the right way but in my way some bug occurred for French (multiword token) and Chinese ("KeyError UD-Japanese-Like" if I parse my test file just after finish training), so I modified the source code to fix them. I also modified the path of xlm_roberta model in file_utils.py such that it will be downloaded only one time when training multiple models of the same type, such as 'customized'.
The file train_pred_trainkit.py is an example to apply these modification, especially the function pred_trankit.

I hope this would be helpful for you and thanks a lot for developing trankit!

ziqianPeng added 2 commits July 31, 2022 00:28

adapte trankit for custom model

a6b21d4

example

82b1135

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom model with pretokenized input including multiword#56

Custom model with pretokenized input including multiword#56
ziqianPeng wants to merge 2 commits intonlp-uoregon:masterfrom
ziqianPeng:master

ziqianPeng commented Jul 30, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ziqianPeng commented Jul 30, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant