Skip to content

Improve hfile_s3.c error handling#2036

Merged
whitwham merged 1 commit into
samtools:developfrom
jmarshall:s3-errors
Jun 17, 2026
Merged

Improve hfile_s3.c error handling#2036
whitwham merged 1 commit into
samtools:developfrom
jmarshall:s3-errors

Conversation

@jmarshall

Copy link
Copy Markdown
Member

The rewritten S3 backend does not translate libcurl error values into errno values; in some circumstances it returns an error indication but does not set errno at all, leaving it set to whatever it was set to previously. So for example a malformatted URL results in a spurious ENOENT error message:

$ htsfile s3:expect_invalid_argument
htsfile: can't open "s3:expect_invalid_argument": No such file or directory

This led to a bioconda package build test failure (bioconda/bioconda-recipes#64884):

running run_test.py
    pysam.AlignmentFile('s3:expect_invalid_argument')
  File "pysam/libcalignmentfile.pyx", line 752, in pysam.libcalignmentfile.AlignmentFile.__cinit__
  File "pysam/libcalignmentfile.pyx", line 952, in pysam.libcalignmentfile.AlignmentFile._open
FileNotFoundError: [Errno 2] Could not open alignment file: No such file or directory: 's3:expect_invalid_argument'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/conda-bld/pysam_1777293322095/test_tmp/run_test.py", line 15, in <module>
    assert e.errno == errno.EINVAL
AssertionError

because the test was written to validate the expected particular errno value that the libcurl backend produces. This particular test has since been rewritten in a more robust way, but nonetheless hfile_s3.c should enable meaningful error messages.

Also requests for buckets which you don't have permission to access or non-existent buckets produce unhelpful errors:

$ htsfile s3://bucket/foo.sam
htsfile: can't open "s3://bucket/foo.sam": Resource temporarily unavailable
$ htsfile s3://bork-asdfghjk/foo.sam  # Nonexistent at the time of writing
htsfile: can't open "s3://bork-asdfghjk/foo.sam": Resource temporarily unavailable

With -vvvv you can see that errno is being set appropriately from the 403 or 404 response, but it is overwritten with a spurious EAGAIN errno value before it is returned to the caller.

Hence this PR:

Libcurl API functions mostly do not set errno, so set it explicitly based on libcurl API error codes. Copy easy_errno() from hfile_libcurl.c, alongside the previously copied http_status_errno().

Also preserve errno in s3_read_open()/s3_write_open()'s error cleanup. Otherwise it will often be changed to EAGAIN by curl_easy_cleanup(), which is called during cleanup_local().

Libcurl API functions mostly do not set errno, so set it explicitly
based on libcurl API error codes. Copy easy_errno() from hfile_libcurl.c,
alongside the previously copied http_status_errno().

Also preserve errno in s3_read_open()/s3_write_open()'s error cleanup.
Otherwise it will often be changed to EAGAIN by curl_easy_cleanup(),
which is called during cleanup_local().

curl_global_init() always returns CURLE_FAILED_INIT on error, so simply
translate that never-happens into a suitably catastrophic errno value.

Signed-off-by: John Marshall <jmarshall@hey.com>
Comment thread hfile_s3.c
Comment on lines +2596 to +2597
// Set a suitably catastrophic error code
errno = ENETDOWN;

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curl_global_init() always returns CURLE_FAILED_INIT on error, so simply translate that never-happens into a suitably catastrophic errno value.

Comment thread hfile_s3.c
cret = curl_easy_getinfo(fp->curl, CURLINFO_RESPONSE_CODE, &response_code);

if (cret != CURLE_OK || response_code > 200) {
if (cret != CURLE_OK) {

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the unlikely event curl_easy_getinfo() fails, response_code will usually not have been written to. So separating these cases avoids reading an uninitialised variable in http_status_errno(response_code) below.

@whitwham whitwham merged commit 39552f3 into samtools:develop Jun 17, 2026
9 checks passed
@whitwham

Copy link
Copy Markdown
Member

Thank you.

@jmarshall jmarshall deleted the s3-errors branch June 17, 2026 19:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants