Skip to content

UFAL/Release hotfix 2026 02 26 - Merge PR#1260

Merged
milanmajchrak merged 8 commits intocustomer/lindatfrom
ufal/release-hotfix-2026-02-26
Feb 26, 2026
Merged

UFAL/Release hotfix 2026 02 26 - Merge PR#1260
milanmajchrak merged 8 commits intocustomer/lindatfrom
ufal/release-hotfix-2026-02-26

Conversation

@milanmajchrak
Copy link
Collaborator

Problem description

Analysis

(Write here, if there is needed describe some specific problem. Erase it, when it is not needed.)

Problems

(Write here, if some unexpected problems occur during solving issues. Erase it, when it is not needed.)

Manual Testing (if applicable)

Copilot review

  • Requested review from Copilot

kosarko and others added 8 commits February 18, 2026 09:26
* Add debug messages to fauling test

(cherry picked from commit 4cc3694)

Co-authored-by: Milan Kuchtiak <kuchtiak@ufal.mff.cuni.cz>
…ient lifecycle (#1248)

* Fix OpenAIRE integration: null handling and HTTP client lifecycle (ufal#1330)

* Add test for OpenAIRE connector

* Initial plan

* Add null check for OpenAIRE response to prevent NullPointerException

Co-authored-by: kosarko <1842385+kosarko@users.noreply.github.com>

* Fix HTTP client lifecycle to prevent premature connection closure

Co-authored-by: kosarko <1842385+kosarko@users.noreply.github.com>

* Keep the try with resources but copy the response

into an in memory stream and return that

* license:check

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: kosarko <1842385+kosarko@users.noreply.github.com>
(cherry picked from commit 02984db)

* Handle NumberFormatException in OpenAIREFundingDataProvider.getNumberOfResults and use explicit UTF-8 charset in OpenAIRERestConnectorTest

---------

Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: kosarko <1842385+kosarko@users.noreply.github.com>
Co-authored-by: milanmajchrak <milan.majchrak@dataquest.sk>
…created on Item Page load. (ufal#1316) (#1241)

* Issue ufal/clarin-dspace1315: Store file preview to database when file preview is created on item page load

* assert text improvement

* PR comments: commit context only when any of the file preview is successfully created

* change variable name

(cherry picked from commit aab626b)

Co-authored-by: Milan Kuchtiak <kuchtiak@ufal.mff.cuni.cz>
…itstream with store_number = 77 (ufal#1318) (#1240)

* Issue ufal#1313: fixed error when file preview is not generated for bitstream with store number = 77

* resolve MR comments

(cherry picked from commit 04d64f7)

Co-authored-by: Milan Kuchtiak <kuchtiak@ufal.mff.cuni.cz>
* Issue ufal#1266: dc.date.available and dc.relation.replaces metadata not cleared properly (ufal#1307)

* Issue ufal#1266: dc.date.available and dc.relation.replaces metadata not cleaned properly in new item version

* resolve MR comments - update ignoredMetadataFields in versioning-service.xml

* update ClarinVersionedHandleIdentifierProviderIT test to check dc.identifier.uri metadata for new version

(cherry picked from commit 7ffaf9a)

* Issue 1319: do not copy dc.identifier.doi metadata when new item version is created

(cherry picked from commit 1b7ed17)

---------

Co-authored-by: Milan Kuchtiak <kuchtiak@ufal.mff.cuni.cz>
…ions (#1252)

* fix: add bitstream download-by-handle endpoint for curl instructions

Adds GET /api/core/bitstreams/handle/{prefix}/{suffix}/{filename} endpoint
that directly serves bitstream content by item handle and filename.

This resolves the issue where curl download instructions generated by the
UI produced URLs pointing to non-existent backend endpoints, resulting in
404 errors when users attempted to download files via command line.

The new endpoint resolves the handle to an Item, finds the bitstream by
exact filename in ORIGINAL bundles, and streams the raw content with
correct Content-Type and Content-Disposition headers.

Refs: dataquest-dev/dspace-angular#1210

* Fixed compliing errors

* Small refactoring - use constants and removed unnecessary changes

* added comments, return 404 status instead of 402

* unauthorized instead of forbidden

* fix: use RFC 5987 Content-Disposition for non-ASCII filenames

curl -J on Windows cannot create files with non-ASCII characters (e.g.
diacritics like e/a) from a raw UTF-8 Content-Disposition filename header.

Uses filename*=UTF-8''percent-encoded-name (RFC 5987/6266) which curl
properly decodes. Also includes an ASCII fallback in filename param.

* fix: move context.complete() after streaming to prevent truncated downloads

context.complete() was called before bitstreamService.retrieve(), closing
the DB connection and causing 'end of response with X bytes missing' errors.
Now context.complete() is called only after the full content has been streamed.
For S3 redirect and HEAD paths, context.complete() remains before return
since no streaming is needed.

* fix: use real UTF-8 filename in Content-Disposition instead of ASCII fallback

The filename parameter now contains the original name (with diacritics like
e/a) instead of replacing non-ASCII chars with underscores. Characters in
the ISO-8859-1 range are transmitted correctly by Tomcat and understood by
curl on Western/Central-European systems. The filename* parameter still
provides RFC 5987 percent-encoded UTF-8 for modern clients (curl 7.56+).

* fix: revert to ASCII fallback in Content-Disposition, add edge-case tests

Content-Disposition filename parameter now uses ASCII fallback (non-ASCII
replaced with underscore) per RFC 6266. Modern clients use filename* (RFC
5987) which has the full UTF-8 name. The curl command no longer relies on
Content-Disposition at all (uses -o instead of -OJ).

New integration tests for edge cases:
- Multiple dots in filename (archive.v2.1.tar.gz)
- Double quotes in filename (escaped in Content-Disposition)
- CJK characters (beyond ISO-8859-1)
- Same filename in ORIGINAL and TEXT bundles (only ORIGINAL served)

* fix: resolve compilation errors and fix IT test assertions

- Remove duplicate HttpStatus import (apache vs spring)
- Add missing MediaType import (spring)
- Fix Content-Type assertion to include charset=UTF-8
- Use URI.create() for pre-encoded URLs in tests to prevent
  double-encoding (%25) rejection by StrictHttpFirewall

All 15 integration tests pass.

* test: add complex filename test (diacritics, plus, hash, unmatched paren)

New IT test for filename 'Media (+)#9) ano' verifying correct URL decoding,
Content-Disposition encoding, and content delivery. 16/16 tests pass.

* fix authorization, comments, tests

* fix: change expected status from 401 to 403 for authenticated non-admin user

The test downloadBitstreamByHandleUnauthorizedForNonAdmin uses getClient(token)
which means the user IS authenticated. The controller correctly returns 403
(Forbidden) for authenticated users without access, not 401 (Unauthorized).
401 is only for anonymous/unauthenticated requests.

---------

Co-authored-by: Paurikova2 <michaela.paurikova@dataquest.sk>
Copilot AI review requested due to automatic review settings February 26, 2026 09:09
@milanmajchrak milanmajchrak merged commit ee9cb65 into customer/lindat Feb 26, 2026
12 checks passed
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Hotfix-focused PR for CLARIN-DSpace that adjusts item versioning metadata behavior, improves/extends bitstream preview & download functionality, and hardens OpenAIRE integration (with added/updated integration tests).

Changes:

  • Update versioning to ignore additional metadata fields during version creation and fix dc.relation.replaces clearing.
  • Add a new direct-download REST endpoint to fetch ORIGINAL bitstreams by item handle + filename, with extensive integration tests.
  • Improve preview content generation persistence (commit when previews are generated) and harden OpenAIRE connector/provider behavior with additional tests.

Reviewed changes

Copilot reviewed 18 out of 19 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
dspace/config/spring/api/versioning-service.xml Adds more metadata fields to ignore when creating new versions.
dspace/config/dspace.cfg Adds an operational note near updateable-config settings.
dspace-server-webapp/src/test/java/org/dspace/app/rest/MetadataBitstreamRestRepositoryIT.java Updates preview-related test setup and assertions for stored preview creation.
dspace-server-webapp/src/test/java/org/dspace/app/rest/BitstreamByHandleRestControllerIT.java New integration tests for downloading bitstreams via handle+filename endpoint.
dspace-server-webapp/src/main/java/org/dspace/app/rest/repository/MetadataBitstreamRestRepository.java Commits Context when preview content is generated during byHandle search.
dspace-server-webapp/src/main/java/org/dspace/app/rest/link/process/SubmissionCCLicenseUrlResourceHalLinkFactory.java Minor inline documentation tweak while building URIs.
dspace-server-webapp/src/main/java/org/dspace/app/rest/BitstreamByHandleRestController.java New controller implementing GET/HEAD direct download by handle+filename, incl. S3 redirect support.
dspace-api/src/test/resources/org/dspace/scripts/filepreview/logos.tgz Adds a tgz fixture for preview/sync-storage testing.
dspace-api/src/test/java/org/dspace/workflow/WorkflowCurationIT.java Clears LegacyPlugin cache to stabilize workflow curation test behavior.
dspace-api/src/test/java/org/dspace/scripts/filepreview/FilePreviewIT.java Adds sync-storage preview test and refactors script-running helper.
dspace-api/src/test/java/org/dspace/identifier/ClarinVersionedHandleIdentifierProviderIT.java New integration test validating versioned handle metadata expectations.
dspace-api/src/test/java/org/dspace/external/provider/impl/OpenAIREFundingDataProviderTest.java Adds regression test ensuring null OpenAIRE responses yield 0 results (no NPE).
dspace-api/src/test/java/org/dspace/external/OpenAIRERestConnectorTest.java New test for OpenAIRE connector GET/unmarshal behavior (currently relies on checkstyle suppression).
dspace-api/src/test/java/org/dspace/builder/WorkspaceItemBuilder.java Adds builder helper for creating a bitstream with a specific store number.
dspace-api/src/main/java/org/dspace/versioning/DefaultItemVersionProvider.java Fixes clearMetadata to use Item.ANY for dc.relation.replaces.
dspace-api/src/main/java/org/dspace/storage/bitstore/SyncBitstreamStorageServiceImpl.java Adds retrieveFile implementation for sync bitstream storage.
dspace-api/src/main/java/org/dspace/external/provider/impl/OpenAIREFundingDataProvider.java Hardens parsing/null handling for OpenAIRE total-results count.
dspace-api/src/main/java/org/dspace/external/OpenAIRERestConnector.java Fixes returned InputStream lifetime by copying response content before closing HTTP client.
checkstyle-suppressions.xml Adds suppression entry for the new OpenAIRERestConnectorTest.

@@ -84,16 +85,16 @@ public void setup() throws Exception {
.build();

// create empty THUMBNAIL bundle
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this test setup, the comment says an empty THUMBNAIL bundle is created, but the code actually creates an ORIGINAL bundle. Please update the comment (or bundle name) so it matches the behavior being tested.

Suggested change
// create empty THUMBNAIL bundle
// create empty ORIGINAL bundle

Copilot uses AI. Check for mistakes.
# Configuration files that can be updated via the admin API
# Comma-separated list of file names relative to ${dspace.dir}/config directory
# Only these files will be allowed for reading and updating through the REST API
# NOTE! This file should be mounted because after restarting the backend those changes will be lost
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This note is ambiguous: “This file should be mounted” appears in dspace.cfg but is adjacent to the config.admin.updateable.files setting. Consider rewording to explicitly mention which updateable config file(s) (e.g., item-submission.xml) must be mounted/persisted to avoid losing admin-API changes after a restart.

Suggested change
# NOTE! This file should be mounted because after restarting the backend those changes will be lost
# NOTE! Each file listed in config.admin.updateable.files (e.g. item-submission.xml) must be mounted/persisted,
# otherwise any changes made via the admin API will be lost after restarting the backend.

Copilot uses AI. Check for mistakes.
Comment on lines +47 to +52
@Autowired
AuthorizeService authorizeService;

@Autowired
BitstreamService bitstreamService;

Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test class declares an injected BitstreamService field but never uses it, which is likely to fail UnusedImports/unused-field checks and adds noise. Please remove the unused field (and any corresponding import) or use it in assertions.

Copilot uses AI. Check for mistakes.
Comment on lines +210 to +214
// the client will be closed, we need to copy the response stream to a new one that we can return
try (InputStream is = getResponse.getEntity().getContent()) {
byte[] bytes = is.readAllBytes();
result = new java.io.ByteArrayInputStream(bytes);
}
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get() now reads the entire HTTP response into memory via readAllBytes() before returning a ByteArrayInputStream. For large OpenAIRE responses this can cause high memory usage. Consider refactoring so unmarshalling happens inside the HTTP-client try-with-resources (or return an InputStream that closes the underlying response/client on close) to keep streaming behavior without leaking connections.

Copilot uses AI. Check for mistakes.
Comment on lines +41 to +46
// setup mocks so we don't have to set whole DSpace kernel etc.
// still, the idea is to test how the get method behaves
CloseableHttpClient httpClient = spy(HttpClientBuilder.create().build());
doReturn(httpClient.execute(new HttpGet(mockServer.url("").toString())))
.when(httpClient).execute(Mockito.any());

Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test uses HttpClientBuilder.create() (disallowed by the repository’s checkstyle Regexp rule) and relies on a checkstyle suppression. Prefer mocking CloseableHttpClient directly (or using the existing DSpaceHttpClientFactory abstraction) so the suppression in checkstyle-suppressions.xml isn’t needed.

Copilot uses AI. Check for mistakes.
Comment on lines +59 to +61
} catch (IOException e) {
e.printStackTrace();
}
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The IOException is caught and only printStackTrace() is called, which can let the test pass silently on failures and is generally discouraged. Please fail the test (e.g., rethrow as AssertionError / use fail(...)) so an unexpected IOException causes a test failure.

Copilot uses AI. Check for mistakes.
on JMockIt Expectations blocks and similar. See https://github.com/checkstyle/checkstyle/issues/3739 -->
<suppress checks="Indentation" files="src[/\\]test[/\\]java"/>
<suppress checks="Regexp" files="DSpaceHttpClientFactory\.java"/>
<suppress checks="Regexp" files="OpenAIRERestConnectorTest\.java"/>
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a checkstyle suppression for OpenAIRERestConnectorTest.java weakens the repo’s enforced convention (avoid direct HttpClientBuilder.create() usage). It would be better to adjust the test to comply (e.g., mock CloseableHttpClient) and remove this suppression.

Suggested change
<suppress checks="Regexp" files="OpenAIRERestConnectorTest\.java"/>

Copilot uses AI. Check for mistakes.
/**
* Unit Tests for ClarinVersionedHandleIdentifierProvider
*
* @authorMilan Kuchtiak
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Javadoc typo: @authorMilan Kuchtiak is missing a space after @author, so it won’t render as intended. Please change it to @author Milan Kuchtiak.

Suggested change
* @authorMilan Kuchtiak
* @author Milan Kuchtiak

Copilot uses AI. Check for mistakes.
Comment on lines +134 to +167
configurationService.setProperty("sync.storage.service.enabled", true);

context.turnOffAuthorisationSystem();

WorkspaceItem wItem2;
try (InputStream tgzFile = getClass().getResourceAsStream("logos.tgz")) {
wItem2 = WorkspaceItemBuilder.createWorkspaceItem(context, collection)
.withBitstream("logos.tgz", "/local/path/logos.tgz", tgzFile, SYNC_STORE_NUMBER)
.build();
}

context.restoreAuthSystemState();

// Get the item and its bitstream
Item item2 = wItem2.getItem();
List<Bundle> bundles = item2.getBundles();
Bitstream bitstream2 = bundles.get(0).getBitstreams().get(0);

// Set the bitstream format to application/zip
BitstreamFormat bitstreamFormat = bitstreamFormatService.findByMIMEType(context, "application/x-gtar");
bitstream2.setFormat(context, bitstreamFormat);
bitstreamService.update(context, bitstream2);
context.commit();
context.reloadEntity(bitstream2);
context.reloadEntity(item2);

runScriptForItemWithBitstreams(item2);

Bitstream b2 = bitstreamService.findAll(context).stream()
.filter(b -> b.getStoreNumber() == SYNC_STORE_NUMBER)
.findFirst().orElse(null);

assertNotNull(b2);
assertTrue("Expects preview content created and stored.", previewContentService.hasPreview(context, b2));
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

configurationService.setProperty("sync.storage.service.enabled", true) is not reset after this test, which can leak state into subsequent tests (ConfigurationService is shared). Please restore the previous value in a finally block or reset it in an @After method.

Suggested change
configurationService.setProperty("sync.storage.service.enabled", true);
context.turnOffAuthorisationSystem();
WorkspaceItem wItem2;
try (InputStream tgzFile = getClass().getResourceAsStream("logos.tgz")) {
wItem2 = WorkspaceItemBuilder.createWorkspaceItem(context, collection)
.withBitstream("logos.tgz", "/local/path/logos.tgz", tgzFile, SYNC_STORE_NUMBER)
.build();
}
context.restoreAuthSystemState();
// Get the item and its bitstream
Item item2 = wItem2.getItem();
List<Bundle> bundles = item2.getBundles();
Bitstream bitstream2 = bundles.get(0).getBitstreams().get(0);
// Set the bitstream format to application/zip
BitstreamFormat bitstreamFormat = bitstreamFormatService.findByMIMEType(context, "application/x-gtar");
bitstream2.setFormat(context, bitstreamFormat);
bitstreamService.update(context, bitstream2);
context.commit();
context.reloadEntity(bitstream2);
context.reloadEntity(item2);
runScriptForItemWithBitstreams(item2);
Bitstream b2 = bitstreamService.findAll(context).stream()
.filter(b -> b.getStoreNumber() == SYNC_STORE_NUMBER)
.findFirst().orElse(null);
assertNotNull(b2);
assertTrue("Expects preview content created and stored.", previewContentService.hasPreview(context, b2));
final String syncStorageEnabledProperty = "sync.storage.service.enabled";
boolean previousSyncStorageEnabled = configurationService.getBooleanProperty(syncStorageEnabledProperty);
try {
configurationService.setProperty(syncStorageEnabledProperty, true);
context.turnOffAuthorisationSystem();
WorkspaceItem wItem2;
try (InputStream tgzFile = getClass().getResourceAsStream("logos.tgz")) {
wItem2 = WorkspaceItemBuilder.createWorkspaceItem(context, collection)
.withBitstream("logos.tgz", "/local/path/logos.tgz", tgzFile, SYNC_STORE_NUMBER)
.build();
}
context.restoreAuthSystemState();
// Get the item and its bitstream
Item item2 = wItem2.getItem();
List<Bundle> bundles = item2.getBundles();
Bitstream bitstream2 = bundles.get(0).getBitstreams().get(0);
// Set the bitstream format to application/zip
BitstreamFormat bitstreamFormat = bitstreamFormatService.findByMIMEType(context, "application/x-gtar");
bitstream2.setFormat(context, bitstreamFormat);
bitstreamService.update(context, bitstream2);
context.commit();
context.reloadEntity(bitstream2);
context.reloadEntity(item2);
runScriptForItemWithBitstreams(item2);
Bitstream b2 = bitstreamService.findAll(context).stream()
.filter(b -> b.getStoreNumber() == SYNC_STORE_NUMBER)
.findFirst().orElse(null);
assertNotNull(b2);
assertTrue("Expects preview content created and stored.",
previewContentService.hasPreview(context, b2));
} finally {
configurationService.setProperty(syncStorageEnabledProperty, previousSyncStorageEnabled);
}

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants