Hello,
I wish to apply a custom syntax parser of mine to a set of conllu files that have already been lemmatized and pos-tagged (but nothing else). Here is a sample portion of one of those conllu files:
# P311545 = dubsar 3, 001
# translation = 1–5 To Šumāya, (thus says) Šulāya, your brother. May you be well. Thus (I say) to my brother: 6–19 Give two kurrus of barley, two sūtu of cress, five sūtu of sesame oil, and six minas of wool to Apil-abi, son of Šamaš-erība, my brother, who is staying with my brother. Then, come and I’ll personally give you here whatever you need. And, if you cannot give it to him, write me quickly so I’ll deliver it to him myself. 20–21 I am writing to my brother out of concern.
1 ana ana ADP PRP _ _ _ to a-na
2 Šumaya Šumaya PROPN PN _ _ _ Šumaya {m}šu-ma-a
3 ummā ummu NOUN N _ _ _ mother um-ma-a
4 ana ana ADP PRP _ _ _ to a-na
5 Šulaya Šulaya PROPN PN _ _ _ Šulaya {m}šu-la-a
6 ŠEŠ-kam-ma ŠEŠ-kam-ma X X _ _ _ _ ŠEŠ-kam-ma
7 ana ana ADP PRP _ _ _ to a-na
8 ka-a-ša₂ delay PROPN PN _ _ _ delay ka-a-ša₂
9 lū lū PART MOD _ _ _ may lu-u₂
10 šulmu šulmu NOUN N _ _ _ completeness šul-mu
11 ummā ummu NOUN N _ _ _ mother um-ma-a
12 ana ana ADP PRP _ _ _ to a-na
13 ahīyama ahu NOUN N _ _ _ brother ŠEŠ-ia₂-a-ma
14 {m}A-AD Apla-abi PROPN PN _ _ _ Apla-abi {m}A-AD
...
In my processing script, I follow one of the examples on this github page, using ConllParser. I feed the conllu data in as a text string using the parse_conll_text_as_space() method:
import akkModel
...
basic_nlp = akkModel.load()
basic_nlp.add_pipe("conll_formatter", last=True)
nlp = ConllParser(basic_nlp)
...
doc = nlp.parse_conll_text_as_spacy(text)
conllu_str = doc._.conll_str
However this generates an error
File "/path/to/script/annotate_block_conllu.py", line 62, in <module>
doc = nlp.parse_conll_text_as_spacy(text)
File "/path/to/miniforge3/lib/python3.10/site-packages/spacy_conll/parser.py", line 268, in parse_conll_text_as_spacy
raise ValueError(
ValueError: Your data is in an unexpected format. Make sure that it follows the CoNLL-U format requirements. See https://universaldependencies.org/format.html. Particularly make sure that the DEPREL field is filled in.
I don't see what the problem is. The DEPREL and HEAD fields are necessarily empty because that's what I want my custom model to generate. Why would someone want to feed in a completely filled in conllu file if they want to use a model to annotate certain fields (LEMMA, MORPH, etc.) automatically?
Spacy version 3.8.4
Hello,
I wish to apply a custom syntax parser of mine to a set of conllu files that have already been lemmatized and pos-tagged (but nothing else). Here is a sample portion of one of those conllu files:
In my processing script, I follow one of the examples on this github page, using
ConllParser. I feed the conllu data in as a text string using theparse_conll_text_as_space()method:However this generates an error
I don't see what the problem is. The DEPREL and HEAD fields are necessarily empty because that's what I want my custom model to generate. Why would someone want to feed in a completely filled in conllu file if they want to use a model to annotate certain fields (LEMMA, MORPH, etc.) automatically?
Spacy version 3.8.4