Skip to content

Problem feeding in conllu files with empty DEPREL fields(?) #34

@megamattc

Description

@megamattc

Hello,

I wish to apply a custom syntax parser of mine to a set of conllu files that have already been lemmatized and pos-tagged (but nothing else). Here is a sample portion of one of those conllu files:

# P311545 = dubsar 3, 001
# translation = 1–5 To Šumāya, (thus says) Šulāya, your brother. May you be well. Thus (I say) to my brother: 6–19 Give two kurrus of barley, two sūtu of cress, five sūtu of sesame oil, and six minas of wool to Apil-abi, son of Šamaš-erība, my brother, who is staying with my brother. Then, come and I’ll personally give you here whatever you need. And, if you cannot give it to him, write me quickly so I’ll deliver it to him myself. 20–21 I am writing to my brother out of concern.
1	ana	ana	ADP	PRP	_	_	_	to	a-na
2	Šumaya	Šumaya	PROPN	PN	_	_	_	Šumaya	{m}šu-ma-a
3	ummā	ummu	NOUN	N	_	_	_	mother	um-ma-a
4	ana	ana	ADP	PRP	_	_	_	to	a-na
5	Šulaya	Šulaya	PROPN	PN	_	_	_	Šulaya	{m}šu-la-a
6	ŠEŠ-kam-ma	ŠEŠ-kam-ma	X	X	_	_	_	_	ŠEŠ-kam-ma
7	ana	ana	ADP	PRP	_	_	_	to	a-na
8	ka-a-ša₂	delay	PROPN	PN	_	_	_	delay	ka-a-ša₂
9	lū	lū	PART	MOD	_	_	_	may	lu-u₂
10	šulmu	šulmu	NOUN	N	_	_	_	completeness	šul-mu
11	ummā	ummu	NOUN	N	_	_	_	mother	um-ma-a
12	ana	ana	ADP	PRP	_	_	_	to	a-na
13	ahīyama	ahu	NOUN	N	_	_	_	brother	ŠEŠ-ia₂-a-ma
14	{m}A-AD	Apla-abi	PROPN	PN	_	_	_	Apla-abi	{m}A-AD
...

In my processing script, I follow one of the examples on this github page, using ConllParser. I feed the conllu data in as a text string using the parse_conll_text_as_space() method:

import akkModel
...
basic_nlp = akkModel.load()
basic_nlp.add_pipe("conll_formatter", last=True)
nlp = ConllParser(basic_nlp)
...

doc = nlp.parse_conll_text_as_spacy(text)
conllu_str = doc._.conll_str

However this generates an error

File "/path/to/script/annotate_block_conllu.py", line 62, in <module>
    doc = nlp.parse_conll_text_as_spacy(text)
  File "/path/to/miniforge3/lib/python3.10/site-packages/spacy_conll/parser.py", line 268, in parse_conll_text_as_spacy
    raise ValueError(
ValueError: Your data is in an unexpected format. Make sure that it follows the CoNLL-U format requirements. See https://universaldependencies.org/format.html. Particularly make sure that the DEPREL field is filled in.

I don't see what the problem is. The DEPREL and HEAD fields are necessarily empty because that's what I want my custom model to generate. Why would someone want to feed in a completely filled in conllu file if they want to use a model to annotate certain fields (LEMMA, MORPH, etc.) automatically?

Spacy version 3.8.4

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions