Opening the file here is useless:
because lxml itself opens it there:
Cf.:
>>> help(lxml.etree.parse)
parse(source, parser=None, *, base_url=None)
Return an ElementTree object loaded with source elements. If no parser
is provided as second argument, the default parser is used.
The ``source`` can be any of the following:
- a file name/path
- a file object
- a file-like object
- a URL using the HTTP or FTP protocol
And you should also specify encoding explicitly, especially here:
|
with open(outfile,"w") as output: |
I'm quoting from the documentation:
In text mode, if encoding is not specified the encoding used is platform dependent: locale.getpreferredencoding(False) is called to get the current locale encoding.
AFAIK, Windows does not use UTF-8 here. This might lead to problems.
Thanks in any case for the TEI Xpath expressions 🙂
Opening the file here is useless:
toolbox/extract/read_tei.py
Line 30 in daeaa73
because lxml itself opens it there:
toolbox/extract/read_tei.py
Line 35 in daeaa73
Cf.:
And you should also specify encoding explicitly, especially here:
toolbox/extract/read_tei.py
Line 89 in daeaa73
I'm quoting from the documentation:
AFAIK, Windows does not use UTF-8 here. This might lead to problems.
Thanks in any case for the TEI Xpath expressions 🙂