Skip to content

doi-service does not process filenames having UTF-8 characters #418

@rsjoyner

Description

@rsjoyner

Checked for duplicates

No - I haven't checked

🐛 Describe the bug

When I did attempted to process a XML file that contained a single UTF-8 character in the filename, the s/w issues a very non-descript ERROR message.

The ERROR message led me to believe the UTF-8 character was in the body of the XML.
Only after hours of trying to "find" the UTF-8 characters, I finally renamed the file.
And the s/w processed the renamed file.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/data/home/pds4/pds-doi-service/bin/pds-doi-cmd", line 8, in
sys.exit(main())
File "/data/home/pds4/pds-doi-service/lib/python3.9/site-packages/pds_doi_service/core/cmd/pds_doi_cmd.py", line 42, in main
output = action.run(**kwargs)
File "/data/home/pds4/pds-doi-service/lib/python3.9/site-packages/pds_doi_service/core/actions/release.py", line 322, in run
raise CriticalDOIException(str(err))
pds_doi_service.core.entities.exceptions.CriticalDOIException: 'utf-8' codec can't decode byte 0xd7 in position 1455: invalid continuation byte

🕵️ Expected behavior

I expected the s/w to either process a filename having UTF-8 chars OR at least yield a more human readable Error message.
I had the rename both files to ".txt" as files with "xml" cannot be uploaded here.

bundle_test_20231005.txt
bundle_moon_lro_mini-rf_mosaics_apl_2022.txt

📜 To Reproduce

  1. scp bundle_test_20231005.xml rsjoyner@pdscloud-prod2:/home/pds4/input/bundle_test_20231005.xml
  2. pds-doi-cmd release -N img -s rsjoyner@jpl.nasa.gov -i /home/pds4/input/bundle_test_20231005.xml --no-review --force > /home/pds4/input/result_activate_no_review_IMG_MiniRF_Global_Mosaics_20231004.json

🖥 Environment Info

  • Version of this software [e.g. vX.Y.Z]
  • Operating System: [e.g. MacOSX with Docker Desktop vX.Y]
    ...

📚 Version of Software Used

No response

🩺 Test Data / Additional context

No response

🦄 Related requirements

🦄 #xyz

⚙️ Engineering Details

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    ToDo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions