UFAL/Fix: add bitstream download-by-handle endpoint for curl instructions#1252
UFAL/Fix: add bitstream download-by-handle endpoint for curl instructions#1252milanmajchrak merged 15 commits intodtq-devfrom
Conversation
Adds GET /api/core/bitstreams/handle/{prefix}/{suffix}/{filename} endpoint
that directly serves bitstream content by item handle and filename.
This resolves the issue where curl download instructions generated by the
UI produced URLs pointing to non-existent backend endpoints, resulting in
404 errors when users attempted to download files via command line.
The new endpoint resolves the handle to an Item, finds the bitstream by
exact filename in ORIGINAL bundles, and streams the raw content with
correct Content-Type and Content-Disposition headers.
Refs: dataquest-dev/dspace-angular#1210
There was a problem hiding this comment.
Pull request overview
This PR adds a new REST API endpoint that allows downloading bitstreams by item handle and filename, addressing a UI issue where curl download instructions were pointing to non-existent endpoints. The implementation provides a direct download mechanism using the pattern /api/core/bitstreams/handle/{prefix}/{suffix}/{filename} to serve bitstream content.
Changes:
- Added BitstreamByHandleRestController with GET/HEAD support for downloading bitstreams by handle and filename
- Comprehensive integration test suite covering various scenarios including authorization, special characters, and error cases
- Minor fix in SubmissionCCLicenseUrlResourceHalLinkFactory to resolve method ambiguity with explicit Object cast
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 14 comments.
| File | Description |
|---|---|
| dspace-server-webapp/src/main/java/org/dspace/app/rest/BitstreamByHandleRestController.java | New REST controller implementing handle-based bitstream download endpoint with authorization and error handling |
| dspace-server-webapp/src/test/java/org/dspace/app/rest/BitstreamByHandleRestControllerIT.java | Comprehensive integration tests covering success cases, authorization, special characters, and error scenarios |
| dspace-server-webapp/src/main/java/org/dspace/app/rest/link/process/SubmissionCCLicenseUrlResourceHalLinkFactory.java | Added explicit cast to resolve queryParam method ambiguity |
dspace-server-webapp/src/main/java/org/dspace/app/rest/BitstreamByHandleRestController.java
Outdated
Show resolved
Hide resolved
dspace-server-webapp/src/test/java/org/dspace/app/rest/BitstreamByHandleRestControllerIT.java
Show resolved
Hide resolved
dspace-server-webapp/src/test/java/org/dspace/app/rest/BitstreamByHandleRestControllerIT.java
Show resolved
Hide resolved
dspace-server-webapp/src/test/java/org/dspace/app/rest/BitstreamByHandleRestControllerIT.java
Show resolved
Hide resolved
dspace-server-webapp/src/main/java/org/dspace/app/rest/BitstreamByHandleRestController.java
Outdated
Show resolved
Hide resolved
dspace-server-webapp/src/test/java/org/dspace/app/rest/BitstreamByHandleRestControllerIT.java
Outdated
Show resolved
Hide resolved
...main/java/org/dspace/app/rest/link/process/SubmissionCCLicenseUrlResourceHalLinkFactory.java
Show resolved
Hide resolved
dspace-server-webapp/src/main/java/org/dspace/app/rest/BitstreamByHandleRestController.java
Outdated
Show resolved
Hide resolved
dspace-server-webapp/src/main/java/org/dspace/app/rest/BitstreamByHandleRestController.java
Outdated
Show resolved
Hide resolved
dspace-server-webapp/src/main/java/org/dspace/app/rest/BitstreamByHandleRestController.java
Outdated
Show resolved
Hide resolved
|
@milanmajchrak, you have unresolved Copilot conversations. Have you already implemented its suggestions? |
curl -J on Windows cannot create files with non-ASCII characters (e.g. diacritics like e/a) from a raw UTF-8 Content-Disposition filename header. Uses filename*=UTF-8''percent-encoded-name (RFC 5987/6266) which curl properly decodes. Also includes an ASCII fallback in filename param.
dspace-server-webapp/src/main/java/org/dspace/app/rest/BitstreamByHandleRestController.java
Outdated
Show resolved
Hide resolved
dspace-server-webapp/src/main/java/org/dspace/app/rest/BitstreamByHandleRestController.java
Outdated
Show resolved
Hide resolved
dspace-server-webapp/src/test/java/org/dspace/app/rest/BitstreamByHandleRestControllerIT.java
Outdated
Show resolved
Hide resolved
dspace-server-webapp/src/test/java/org/dspace/app/rest/BitstreamByHandleRestControllerIT.java
Show resolved
Hide resolved
dspace-server-webapp/src/main/java/org/dspace/app/rest/BitstreamByHandleRestController.java
Outdated
Show resolved
Hide resolved
dspace-server-webapp/src/main/java/org/dspace/app/rest/BitstreamByHandleRestController.java
Show resolved
Hide resolved
...main/java/org/dspace/app/rest/link/process/SubmissionCCLicenseUrlResourceHalLinkFactory.java
Outdated
Show resolved
Hide resolved
...main/java/org/dspace/app/rest/link/process/SubmissionCCLicenseUrlResourceHalLinkFactory.java
Outdated
Show resolved
Hide resolved
dspace-server-webapp/src/main/java/org/dspace/app/rest/BitstreamByHandleRestController.java
Outdated
Show resolved
Hide resolved
…nloads context.complete() was called before bitstreamService.retrieve(), closing the DB connection and causing 'end of response with X bytes missing' errors. Now context.complete() is called only after the full content has been streamed. For S3 redirect and HEAD paths, context.complete() remains before return since no streaming is needed.
…fallback The filename parameter now contains the original name (with diacritics like e/a) instead of replacing non-ASCII chars with underscores. Characters in the ISO-8859-1 range are transmitted correctly by Tomcat and understood by curl on Western/Central-European systems. The filename* parameter still provides RFC 5987 percent-encoded UTF-8 for modern clients (curl 7.56+).
…ests Content-Disposition filename parameter now uses ASCII fallback (non-ASCII replaced with underscore) per RFC 6266. Modern clients use filename* (RFC 5987) which has the full UTF-8 name. The curl command no longer relies on Content-Disposition at all (uses -o instead of -OJ). New integration tests for edge cases: - Multiple dots in filename (archive.v2.1.tar.gz) - Double quotes in filename (escaped in Content-Disposition) - CJK characters (beyond ISO-8859-1) - Same filename in ORIGINAL and TEXT bundles (only ORIGINAL served)
- Remove duplicate HttpStatus import (apache vs spring) - Add missing MediaType import (spring) - Fix Content-Type assertion to include charset=UTF-8 - Use URI.create() for pre-encoded URLs in tests to prevent double-encoding (%25) rejection by StrictHttpFirewall All 15 integration tests pass.
…ren) New IT test for filename 'Media (+)#9) ano' verifying correct URL decoding, Content-Disposition encoding, and content delivery. 16/16 tests pass.
dspace-server-webapp/src/main/java/org/dspace/app/rest/BitstreamByHandleRestController.java
Outdated
Show resolved
Hide resolved
|
@milanmajchrak It is still not working for me> |
…in user The test downloadBitstreamByHandleUnauthorizedForNonAdmin uses getClient(token) which means the user IS authenticated. The controller correctly returns 403 (Forbidden) for authenticated users without access, not 401 (Unauthorized). 401 is only for anonymous/unauthenticated requests.
* UFAL/Fixed failing integration test (ufal#1332) (#1249) * Add debug messages to fauling test (cherry picked from commit 4cc3694) Co-authored-by: Milan Kuchtiak <kuchtiak@ufal.mff.cuni.cz> * [Port to dtq-dev] Fix OpenAIRE integration: null handling and HTTP client lifecycle (#1248) * Fix OpenAIRE integration: null handling and HTTP client lifecycle (ufal#1330) * Add test for OpenAIRE connector * Add null check for OpenAIRE response to prevent NullPointerException Co-authored-by: kosarko <1842385+kosarko@users.noreply.github.com> * Fix HTTP client lifecycle to prevent premature connection closure Co-authored-by: kosarko <1842385+kosarko@users.noreply.github.com> * Keep the try with resources but copy the response into an in memory stream and return that * license:check --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: kosarko <1842385+kosarko@users.noreply.github.com> (cherry picked from commit 02984db) * Handle NumberFormatException in OpenAIREFundingDataProvider.getNumberOfResults and use explicit UTF-8 charset in OpenAIRERestConnectorTest --------- Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: kosarko <1842385+kosarko@users.noreply.github.com> Co-authored-by: milanmajchrak <milan.majchrak@dataquest.sk> * UFAL/Added a comment to do not forget mounting the file which is changed via ocnfiguration feature (#1247) * UFAL/Issue 1315: Store file preview to database when file preview is created on Item Page load. (ufal#1316) (#1241) * Issue ufal/clarin-dspace1315: Store file preview to database when file preview is created on item page load * PR comments: commit context only when any of the file preview is successfully created (cherry picked from commit aab626b) Co-authored-by: Milan Kuchtiak <kuchtiak@ufal.mff.cuni.cz> * UFAL/Issue 1313: fixed error when file preview is not generated for bitstream with store_number = 77 (ufal#1318) (#1240) * Issue ufal#1313: fixed error when file preview is not generated for bitstream with store number = 77 (cherry picked from commit 04d64f7) Co-authored-by: Milan Kuchtiak <kuchtiak@ufal.mff.cuni.cz> * UFAL/Nw version metadata issues (#1236) * Issue ufal#1266: dc.date.available and dc.relation.replaces metadata not cleared properly (ufal#1307) * Issue ufal#1266: dc.date.available and dc.relation.replaces metadata not cleaned properly in new item version * resolve MR comments - update ignoredMetadataFields in versioning-service.xml * update ClarinVersionedHandleIdentifierProviderIT test to check dc.identifier.uri metadata for new version (cherry picked from commit 7ffaf9a) * Issue 1319: do not copy dc.identifier.doi metadata when new item version is created (cherry picked from commit 1b7ed17) --------- Co-authored-by: Milan Kuchtiak <kuchtiak@ufal.mff.cuni.cz> * UFAL/Fix: add bitstream download-by-handle endpoint for curl instructions (#1252) * fix: add bitstream download-by-handle endpoint for curl instructions Adds GET /api/core/bitstreams/handle/{prefix}/{suffix}/{filename} endpoint that directly serves bitstream content by item handle and filename. This resolves the issue where curl download instructions generated by the UI produced URLs pointing to non-existent backend endpoints, resulting in 404 errors when users attempted to download files via command line. The new endpoint resolves the handle to an Item, finds the bitstream by exact filename in ORIGINAL bundles, and streams the raw content with correct Content-Type and Content-Disposition headers. Refs: dataquest-dev/dspace-angular#1210 * Fixed compliing errors * Small refactoring - use constants and removed unnecessary changes * added comments, return 404 status instead of 402 * unauthorized instead of forbidden * fix: use RFC 5987 Content-Disposition for non-ASCII filenames curl -J on Windows cannot create files with non-ASCII characters (e.g. diacritics like e/a) from a raw UTF-8 Content-Disposition filename header. Uses filename*=UTF-8''percent-encoded-name (RFC 5987/6266) which curl properly decodes. Also includes an ASCII fallback in filename param. * fix: move context.complete() after streaming to prevent truncated downloads context.complete() was called before bitstreamService.retrieve(), closing the DB connection and causing 'end of response with X bytes missing' errors. Now context.complete() is called only after the full content has been streamed. For S3 redirect and HEAD paths, context.complete() remains before return since no streaming is needed. * fix: use real UTF-8 filename in Content-Disposition instead of ASCII fallback The filename parameter now contains the original name (with diacritics like e/a) instead of replacing non-ASCII chars with underscores. Characters in the ISO-8859-1 range are transmitted correctly by Tomcat and understood by curl on Western/Central-European systems. The filename* parameter still provides RFC 5987 percent-encoded UTF-8 for modern clients (curl 7.56+). * fix: revert to ASCII fallback in Content-Disposition, add edge-case tests Content-Disposition filename parameter now uses ASCII fallback (non-ASCII replaced with underscore) per RFC 6266. Modern clients use filename* (RFC 5987) which has the full UTF-8 name. The curl command no longer relies on Content-Disposition at all (uses -o instead of -OJ). New integration tests for edge cases: - Multiple dots in filename (archive.v2.1.tar.gz) - Double quotes in filename (escaped in Content-Disposition) - CJK characters (beyond ISO-8859-1) - Same filename in ORIGINAL and TEXT bundles (only ORIGINAL served) * fix: resolve compilation errors and fix IT test assertions - Remove duplicate HttpStatus import (apache vs spring) - Add missing MediaType import (spring) - Fix Content-Type assertion to include charset=UTF-8 - Use URI.create() for pre-encoded URLs in tests to prevent double-encoding (%25) rejection by StrictHttpFirewall All 15 integration tests pass. * test: add complex filename test (diacritics, plus, hash, unmatched paren) New IT test for filename 'Media (+)#9) ano' verifying correct URL decoding, Content-Disposition encoding, and content delivery. 16/16 tests pass. * fix authorization, comments, tests * fix: change expected status from 401 to 403 for authenticated non-admin user The test downloadBitstreamByHandleUnauthorizedForNonAdmin uses getClient(token) which means the user IS authenticated. The controller correctly returns 403 (Forbidden) for authenticated users without access, not 401 (Unauthorized). 401 is only for anonymous/unauthenticated requests. --------- Co-authored-by: Paurikova2 <michaela.paurikova@dataquest.sk> --------- Co-authored-by: Ondřej Košarko <ko_ok@centrum.cz> Co-authored-by: Milan Kuchtiak <kuchtiak@ufal.mff.cuni.cz> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: kosarko <1842385+kosarko@users.noreply.github.com> Co-authored-by: Paurikova2 <michaela.paurikova@dataquest.sk>
Problem description
Adds GET /api/core/bitstreams/handle/{prefix}/{suffix}/{filename} endpoint that directly serves bitstream content by item handle and filename.
This resolves the issue where curl download instructions generated by the UI produced URLs pointing to non-existent backend endpoints, resulting in 404 errors when users attempted to download files via command line.
The new endpoint resolves the handle to an Item, finds the bitstream by exact filename in ORIGINAL bundles, and streams the raw content with correct Content-Type and Content-Disposition headers.
Refs: dataquest-dev/dspace-angular#1210
Manual Testing (if applicable)
@Paurikova2 pls test it
Copilot review