Skip to content

refactor: remove RERODOC harvesting pipeline and legacy URL support#1094

Merged
PascalRepond merged 1 commit intorero:stagingfrom
PascalRepond:rep-rerodoc
Mar 24, 2026
Merged

refactor: remove RERODOC harvesting pipeline and legacy URL support#1094
PascalRepond merged 1 commit intorero:stagingfrom
PascalRepond:rep-rerodoc

Conversation

@PascalRepond
Copy link
Copy Markdown
Contributor

RERO DOC repository is now closed. Remove the MARC21 transformation pipeline, CLI commands, loader schema, rerod PID minting, /rerodoc/ redirection routes, OAI source config, and all associated tests.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 17, 2026

Walkthrough

This pull request removes RERO DOC integration from the SONAR codebase, deleting its OAI source, CLI commands, DoJSON models/overdo logic, schema registration, PID minting and redirection logic, and multiple related tests and fixtures.

Changes

Cohort / File(s) Summary
Data and Configuration
data/oai_sources.json
Removed the rerodoc OAI source entry and its metadata configuration.
CLI Commands
sonar/modules/documents/cli/documents.py, sonar/modules/documents/cli/rerodoc.py
Removed the rerodoc CLI subcommand registration and deleted its update-file-permissions implementation.
DoJSON Models
sonar/modules/documents/dojson/rerodoc/__init__.py, sonar/modules/documents/dojson/rerodoc/model.py, sonar/modules/documents/dojson/rerodoc/model.py_hesso
Deleted the RERODOC MARC21 → DoJSON transformation modules and their mapping/transform logic.
Overdo Transformation
sonar/modules/documents/dojson/rerodoc/overdo.py
Removed the Overdo subclass and its organisation provisioning, contributor-role mapping, verification, and default-license injection behavior.
Schema Definitions
sonar/modules/documents/loaders/schemas/rerodoc.py, sonar/modules/documents/loaders/schemas/factory.py
Deleted RerodocSchema and removed "rerodoc" from LoaderSchemaFactory.schemas.
PID Minting & Redirects
sonar/modules/documents/minters.py, sonar/theme/views.py
Removed RERO DOC PID minting path and the /rerodoc/... redirection route and related imports.
Test Fixtures & Data
tests/conftest.py, tests/ui/conftest.py, tests/ui/data/harvested_record.xml
Removed a RERO DOC identifiedBy fixture entry, adjusted fixture docstring, and simplified the harvested MARCXML test record to minimal fields.
Tests — CLI / Receivers / Views
tests/ui/documents/cli/test_documents_cli_ark.py, tests/ui/documents/cli/test_documents_cli_rerodoc.py, tests/ui/documents/test_documents_receivers.py, tests/ui/test_views.py
Deleted the RERODOC CLI test module, updated tests to use archive_ouverte_unige/ARK where applicable, and removed rerodoc redirection tests.
Tests — DoJSON & Loader
tests/ui/documents/dojson/rerodoc/test_rerodoc_model.py, tests/ui/documents/dojson/rerodoc/test_rerodoc_overdo.py, tests/unit/documents/loaders/test_loader_schema_factory.py, tests/unit/documents/loaders/test_rerodoc_loader.py
Removed comprehensive RERODOC transformation and Overdo test suites; adjusted loader factory test to expect ArchiveOuverteUnigeSchema.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and accurately describes the main change: removal of the RERODOC harvesting pipeline and legacy URL support.
Description check ✅ Passed The description is directly related to the changeset, explaining that RERO DOC is closed and detailing what components are being removed.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/unit/documents/loaders/test_loader_schema_factory.py (1)

29-31: ⚠️ Potential issue | 🟡 Minor

Assertion is unreachable and uses incorrect accessor.

The assertion on line 31 is inside the with block, so it never executes—control exits the block as soon as the exception is raised on line 30. Additionally, str(exception) should be str(exception.value) to get the exception message.

🐛 Proposed fix
     with pytest.raises(Exception) as exception:
         schema = LoaderSchemaFactory.create("not-existing")
-        assert str(exception) == 'No schema defined for key "not-existing"'
+    assert str(exception.value) == 'No schema defined for key "not-existing"'
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/documents/loaders/test_loader_schema_factory.py` around lines 29 -
31, The test's assertion is inside the pytest.raises context and uses the wrong
accessor; update the test that calls LoaderSchemaFactory.create("not-existing")
so the assert moves outside the with pytest.raises(Exception) as exception:
block and compare the raised message via str(exception.value) (e.g. assert
str(exception.value) == 'No schema defined for key "not-existing"') to correctly
verify the exception message from LoaderSchemaFactory.create.
🧹 Nitpick comments (1)
sonar/modules/documents/minters.py (1)

47-56: Docstring return contract is inaccurate.

On Line 55, the docstring says this function returns a PID object, but external_minters(...) does not return anything. Please update the docstring to avoid misleading API expectations.

✏️ Suggested docstring fix
 def external_minters(record_uuid, data, pid_key="pid"):
     """External minters.

-    ARK.
+    ARK.

     :param record_uuid: Record UUID.
     :param data: Record data.
     :param pid_key: PID key.
-    :returns: Created PID object.
+    :returns: None.
     """
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sonar/modules/documents/minters.py` around lines 47 - 56, The docstring for
external_minters incorrectly states it returns a PID object; update the
docstring for the function external_minters to reflect that it does not return a
value (e.g., remove or change the ":returns:" section to indicate it returns
None or has no return), and ensure the summary and parameter descriptions remain
accurate so callers aren't misled about a return value.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/ui/data/harvested_record.xml`:
- Around line 1-6: The provided test XML lacks the MARC21 wrapper/namespace and
leader expected by Marc21Schema.parse_xml/create_record, so update the test
fixture used in harvested_record.xml to a valid MARC21 Slim document: wrap
records in a marc:collection and each record in marc:record, add the
xmlns:marc="http://www.loc.gov/MARC21/slim" declaration, and include a
marc:leader element alongside controlfield and datafield tags.

---

Outside diff comments:
In `@tests/unit/documents/loaders/test_loader_schema_factory.py`:
- Around line 29-31: The test's assertion is inside the pytest.raises context
and uses the wrong accessor; update the test that calls
LoaderSchemaFactory.create("not-existing") so the assert moves outside the with
pytest.raises(Exception) as exception: block and compare the raised message via
str(exception.value) (e.g. assert str(exception.value) == 'No schema defined for
key "not-existing"') to correctly verify the exception message from
LoaderSchemaFactory.create.

---

Nitpick comments:
In `@sonar/modules/documents/minters.py`:
- Around line 47-56: The docstring for external_minters incorrectly states it
returns a PID object; update the docstring for the function external_minters to
reflect that it does not return a value (e.g., remove or change the ":returns:"
section to indicate it returns None or has no return), and ensure the summary
and parameter descriptions remain accurate so callers aren't misled about a
return value.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0615d2b1-ade1-4a0f-ba74-5a33baddc6d7

📥 Commits

Reviewing files that changed from the base of the PR and between c7d96da and 7af7392.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (23)
  • data/oai_sources.json
  • sonar/modules/documents/cli/documents.py
  • sonar/modules/documents/cli/rerodoc.py
  • sonar/modules/documents/dojson/rerodoc/__init__.py
  • sonar/modules/documents/dojson/rerodoc/model.py
  • sonar/modules/documents/dojson/rerodoc/model.py_hesso
  • sonar/modules/documents/dojson/rerodoc/overdo.py
  • sonar/modules/documents/loaders/schemas/factory.py
  • sonar/modules/documents/loaders/schemas/rerodoc.py
  • sonar/modules/documents/minters.py
  • sonar/theme/views.py
  • tests/conftest.py
  • tests/ui/conftest.py
  • tests/ui/data/harvested_record.xml
  • tests/ui/documents/cli/test_documents_cli_ark.py
  • tests/ui/documents/cli/test_documents_cli_rerodoc.py
  • tests/ui/documents/dojson/rerodoc/test_rerodoc_model.py
  • tests/ui/documents/dojson/rerodoc/test_rerodoc_overdo.py
  • tests/ui/documents/test_documents_api.py
  • tests/ui/documents/test_documents_receivers.py
  • tests/ui/test_views.py
  • tests/unit/documents/loaders/test_loader_schema_factory.py
  • tests/unit/documents/loaders/test_rerodoc_loader.py
💤 Files with no reviewable changes (11)
  • tests/unit/documents/loaders/test_rerodoc_loader.py
  • data/oai_sources.json
  • sonar/modules/documents/dojson/rerodoc/init.py
  • tests/ui/documents/dojson/rerodoc/test_rerodoc_overdo.py
  • sonar/modules/documents/loaders/schemas/rerodoc.py
  • sonar/modules/documents/dojson/rerodoc/model.py_hesso
  • sonar/modules/documents/dojson/rerodoc/model.py
  • sonar/modules/documents/dojson/rerodoc/overdo.py
  • sonar/modules/documents/cli/rerodoc.py
  • tests/ui/documents/dojson/rerodoc/test_rerodoc_model.py
  • tests/ui/documents/cli/test_documents_cli_rerodoc.py

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
sonar/modules/documents/loaders/schemas/factory.py (1)

37-40: Use KeyError instead of generic Exception for missing schema keys.

Raising a broad Exception makes caller-side error handling less precise. KeyError is more semantically appropriate for dictionary lookups. Additionally, the logic can be simplified by using in instead of .get():

Suggested change
     `@staticmethod`
     def create(schema_key):
@@
-        if LoaderSchemaFactory.schemas.get(schema_key):
-            return LoaderSchemaFactory.schemas[schema_key]()
-
-        raise Exception(f'No schema defined for key "{schema_key}"')
+        if schema_key in LoaderSchemaFactory.schemas:
+            return LoaderSchemaFactory.schemas[schema_key]()
+
+        raise KeyError(f'No schema defined for key "{schema_key}"')

The existing test (which catches Exception) will still pass since KeyError is a subclass of Exception. Consider also updating the test assertion to pytest.raises(KeyError) for consistency.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sonar/modules/documents/loaders/schemas/factory.py` around lines 37 - 40, The
factory currently uses LoaderSchemaFactory.schemas.get(schema_key) and raises a
generic Exception on miss; change this to check membership with "if schema_key
in LoaderSchemaFactory.schemas" and return
LoaderSchemaFactory.schemas[schema_key]() when present, and raise
KeyError(schema_key) when missing so callers can catch a KeyError specifically;
update any tests to expect KeyError (pytest.raises(KeyError)) if desired.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@sonar/modules/documents/loaders/schemas/factory.py`:
- Around line 37-40: The factory currently uses
LoaderSchemaFactory.schemas.get(schema_key) and raises a generic Exception on
miss; change this to check membership with "if schema_key in
LoaderSchemaFactory.schemas" and return
LoaderSchemaFactory.schemas[schema_key]() when present, and raise
KeyError(schema_key) when missing so callers can catch a KeyError specifically;
update any tests to expect KeyError (pytest.raises(KeyError)) if desired.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a40c1bef-3f99-48e8-9ab5-7c25f281f2f2

📥 Commits

Reviewing files that changed from the base of the PR and between 7af7392 and 7f5f285.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (23)
  • data/oai_sources.json
  • sonar/modules/documents/cli/documents.py
  • sonar/modules/documents/cli/rerodoc.py
  • sonar/modules/documents/dojson/rerodoc/__init__.py
  • sonar/modules/documents/dojson/rerodoc/model.py
  • sonar/modules/documents/dojson/rerodoc/model.py_hesso
  • sonar/modules/documents/dojson/rerodoc/overdo.py
  • sonar/modules/documents/loaders/schemas/factory.py
  • sonar/modules/documents/loaders/schemas/rerodoc.py
  • sonar/modules/documents/minters.py
  • sonar/theme/views.py
  • tests/conftest.py
  • tests/ui/conftest.py
  • tests/ui/data/harvested_record.xml
  • tests/ui/documents/cli/test_documents_cli_ark.py
  • tests/ui/documents/cli/test_documents_cli_rerodoc.py
  • tests/ui/documents/dojson/rerodoc/test_rerodoc_model.py
  • tests/ui/documents/dojson/rerodoc/test_rerodoc_overdo.py
  • tests/ui/documents/test_documents_api.py
  • tests/ui/documents/test_documents_receivers.py
  • tests/ui/test_views.py
  • tests/unit/documents/loaders/test_loader_schema_factory.py
  • tests/unit/documents/loaders/test_rerodoc_loader.py
💤 Files with no reviewable changes (11)
  • sonar/modules/documents/loaders/schemas/rerodoc.py
  • data/oai_sources.json
  • tests/unit/documents/loaders/test_rerodoc_loader.py
  • sonar/modules/documents/dojson/rerodoc/init.py
  • sonar/modules/documents/cli/rerodoc.py
  • tests/ui/documents/cli/test_documents_cli_rerodoc.py
  • tests/ui/documents/dojson/rerodoc/test_rerodoc_overdo.py
  • sonar/modules/documents/dojson/rerodoc/model.py_hesso
  • sonar/modules/documents/dojson/rerodoc/model.py
  • sonar/modules/documents/dojson/rerodoc/overdo.py
  • tests/ui/documents/dojson/rerodoc/test_rerodoc_model.py
🚧 Files skipped from review as they are similar to previous changes (5)
  • sonar/modules/documents/cli/documents.py
  • tests/ui/conftest.py
  • tests/ui/documents/cli/test_documents_cli_ark.py
  • tests/unit/documents/loaders/test_loader_schema_factory.py
  • tests/conftest.py

@PascalRepond PascalRepond requested review from jma and rerowep March 17, 2026 15:16
RERO DOC repository is now closed. Remove the MARC21 transformation
pipeline, CLI commands, loader schema, rerod PID minting, /rerodoc/
redirection routes, OAI source config, and all associated tests.

Co-Authored-by: Pascal Repond <pascal.repond@rero.ch>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/unit/documents/loaders/test_loader_schema_factory.py (1)

24-31: Consider adding test coverage for the boris schema.

The factory now contains two schemas (archive_ouverte_unige and boris), but the test only validates one. Adding a test for boris would ensure both remaining schemas are covered.

🧪 Suggested test expansion
+from sonar.modules.documents.loaders.schemas.boris import BorisSchema
 from sonar.modules.documents.loaders.schemas.archive_ouverte_unige import ArchiveOuverteUnigeSchema
 from sonar.modules.documents.loaders.schemas.factory import LoaderSchemaFactory


 def test_loader_schema_factory():
     """Test loader schema factory."""
     schema = LoaderSchemaFactory.create("archive_ouverte_unige")
     assert isinstance(schema, ArchiveOuverteUnigeSchema)

+    schema = LoaderSchemaFactory.create("boris")
+    assert isinstance(schema, BorisSchema)
+
     with pytest.raises(Exception) as exception:
         LoaderSchemaFactory.create("not-existing")
     assert str(exception.value) == 'No schema defined for key "not-existing"'
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/documents/loaders/test_loader_schema_factory.py` around lines 24 -
31, Add a test that also asserts the factory returns the boris schema: call
LoaderSchemaFactory.create("boris") and assert the returned object is an
instance of the BorIS schema class (e.g., BorisSchema or the exact class name
used for the "boris" entry); keep the existing negative test for "not-existing".
Reference LoaderSchemaFactory.create and the boris schema class when adding the
new assertion.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/unit/documents/loaders/test_loader_schema_factory.py`:
- Around line 24-31: Add a test that also asserts the factory returns the boris
schema: call LoaderSchemaFactory.create("boris") and assert the returned object
is an instance of the BorIS schema class (e.g., BorisSchema or the exact class
name used for the "boris" entry); keep the existing negative test for
"not-existing". Reference LoaderSchemaFactory.create and the boris schema class
when adding the new assertion.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3217e8b4-6ca0-489f-b5dd-60ac2d9bdd2f

📥 Commits

Reviewing files that changed from the base of the PR and between 7f5f285 and 3e28af2.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (23)
  • data/oai_sources.json
  • sonar/modules/documents/cli/documents.py
  • sonar/modules/documents/cli/rerodoc.py
  • sonar/modules/documents/dojson/rerodoc/__init__.py
  • sonar/modules/documents/dojson/rerodoc/model.py
  • sonar/modules/documents/dojson/rerodoc/model.py_hesso
  • sonar/modules/documents/dojson/rerodoc/overdo.py
  • sonar/modules/documents/loaders/schemas/factory.py
  • sonar/modules/documents/loaders/schemas/rerodoc.py
  • sonar/modules/documents/minters.py
  • sonar/theme/views.py
  • tests/conftest.py
  • tests/ui/conftest.py
  • tests/ui/data/harvested_record.xml
  • tests/ui/documents/cli/test_documents_cli_ark.py
  • tests/ui/documents/cli/test_documents_cli_rerodoc.py
  • tests/ui/documents/dojson/rerodoc/test_rerodoc_model.py
  • tests/ui/documents/dojson/rerodoc/test_rerodoc_overdo.py
  • tests/ui/documents/test_documents_api.py
  • tests/ui/documents/test_documents_receivers.py
  • tests/ui/test_views.py
  • tests/unit/documents/loaders/test_loader_schema_factory.py
  • tests/unit/documents/loaders/test_rerodoc_loader.py
💤 Files with no reviewable changes (11)
  • sonar/modules/documents/dojson/rerodoc/init.py
  • data/oai_sources.json
  • tests/ui/documents/cli/test_documents_cli_rerodoc.py
  • tests/ui/documents/dojson/rerodoc/test_rerodoc_overdo.py
  • sonar/modules/documents/loaders/schemas/rerodoc.py
  • sonar/modules/documents/dojson/rerodoc/model.py
  • sonar/modules/documents/dojson/rerodoc/overdo.py
  • sonar/modules/documents/cli/rerodoc.py
  • tests/unit/documents/loaders/test_rerodoc_loader.py
  • sonar/modules/documents/dojson/rerodoc/model.py_hesso
  • tests/ui/documents/dojson/rerodoc/test_rerodoc_model.py
✅ Files skipped from review due to trivial changes (6)
  • tests/ui/documents/cli/test_documents_cli_ark.py
  • tests/ui/conftest.py
  • sonar/modules/documents/cli/documents.py
  • tests/ui/documents/test_documents_receivers.py
  • tests/ui/data/harvested_record.xml
  • tests/ui/documents/test_documents_api.py
🚧 Files skipped from review as they are similar to previous changes (4)
  • sonar/modules/documents/minters.py
  • tests/ui/test_views.py
  • tests/conftest.py
  • sonar/theme/views.py

@PascalRepond PascalRepond merged commit fa3bacf into rero:staging Mar 24, 2026
5 checks passed
@PascalRepond PascalRepond deleted the rep-rerodoc branch March 24, 2026 08:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants