refactor: remove RERODOC harvesting pipeline and legacy URL support#1094
refactor: remove RERODOC harvesting pipeline and legacy URL support#1094PascalRepond merged 1 commit intorero:stagingfrom
Conversation
WalkthroughThis pull request removes RERO DOC integration from the SONAR codebase, deleting its OAI source, CLI commands, DoJSON models/overdo logic, schema registration, PID minting and redirection logic, and multiple related tests and fixtures. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/unit/documents/loaders/test_loader_schema_factory.py (1)
29-31:⚠️ Potential issue | 🟡 MinorAssertion is unreachable and uses incorrect accessor.
The assertion on line 31 is inside the
withblock, so it never executes—control exits the block as soon as the exception is raised on line 30. Additionally,str(exception)should bestr(exception.value)to get the exception message.🐛 Proposed fix
with pytest.raises(Exception) as exception: schema = LoaderSchemaFactory.create("not-existing") - assert str(exception) == 'No schema defined for key "not-existing"' + assert str(exception.value) == 'No schema defined for key "not-existing"'🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/documents/loaders/test_loader_schema_factory.py` around lines 29 - 31, The test's assertion is inside the pytest.raises context and uses the wrong accessor; update the test that calls LoaderSchemaFactory.create("not-existing") so the assert moves outside the with pytest.raises(Exception) as exception: block and compare the raised message via str(exception.value) (e.g. assert str(exception.value) == 'No schema defined for key "not-existing"') to correctly verify the exception message from LoaderSchemaFactory.create.
🧹 Nitpick comments (1)
sonar/modules/documents/minters.py (1)
47-56: Docstring return contract is inaccurate.On Line 55, the docstring says this function returns a PID object, but
external_minters(...)does not return anything. Please update the docstring to avoid misleading API expectations.✏️ Suggested docstring fix
def external_minters(record_uuid, data, pid_key="pid"): """External minters. - ARK. + ARK. :param record_uuid: Record UUID. :param data: Record data. :param pid_key: PID key. - :returns: Created PID object. + :returns: None. """🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sonar/modules/documents/minters.py` around lines 47 - 56, The docstring for external_minters incorrectly states it returns a PID object; update the docstring for the function external_minters to reflect that it does not return a value (e.g., remove or change the ":returns:" section to indicate it returns None or has no return), and ensure the summary and parameter descriptions remain accurate so callers aren't misled about a return value.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/ui/data/harvested_record.xml`:
- Around line 1-6: The provided test XML lacks the MARC21 wrapper/namespace and
leader expected by Marc21Schema.parse_xml/create_record, so update the test
fixture used in harvested_record.xml to a valid MARC21 Slim document: wrap
records in a marc:collection and each record in marc:record, add the
xmlns:marc="http://www.loc.gov/MARC21/slim" declaration, and include a
marc:leader element alongside controlfield and datafield tags.
---
Outside diff comments:
In `@tests/unit/documents/loaders/test_loader_schema_factory.py`:
- Around line 29-31: The test's assertion is inside the pytest.raises context
and uses the wrong accessor; update the test that calls
LoaderSchemaFactory.create("not-existing") so the assert moves outside the with
pytest.raises(Exception) as exception: block and compare the raised message via
str(exception.value) (e.g. assert str(exception.value) == 'No schema defined for
key "not-existing"') to correctly verify the exception message from
LoaderSchemaFactory.create.
---
Nitpick comments:
In `@sonar/modules/documents/minters.py`:
- Around line 47-56: The docstring for external_minters incorrectly states it
returns a PID object; update the docstring for the function external_minters to
reflect that it does not return a value (e.g., remove or change the ":returns:"
section to indicate it returns None or has no return), and ensure the summary
and parameter descriptions remain accurate so callers aren't misled about a
return value.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 0615d2b1-ade1-4a0f-ba74-5a33baddc6d7
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (23)
data/oai_sources.jsonsonar/modules/documents/cli/documents.pysonar/modules/documents/cli/rerodoc.pysonar/modules/documents/dojson/rerodoc/__init__.pysonar/modules/documents/dojson/rerodoc/model.pysonar/modules/documents/dojson/rerodoc/model.py_hessosonar/modules/documents/dojson/rerodoc/overdo.pysonar/modules/documents/loaders/schemas/factory.pysonar/modules/documents/loaders/schemas/rerodoc.pysonar/modules/documents/minters.pysonar/theme/views.pytests/conftest.pytests/ui/conftest.pytests/ui/data/harvested_record.xmltests/ui/documents/cli/test_documents_cli_ark.pytests/ui/documents/cli/test_documents_cli_rerodoc.pytests/ui/documents/dojson/rerodoc/test_rerodoc_model.pytests/ui/documents/dojson/rerodoc/test_rerodoc_overdo.pytests/ui/documents/test_documents_api.pytests/ui/documents/test_documents_receivers.pytests/ui/test_views.pytests/unit/documents/loaders/test_loader_schema_factory.pytests/unit/documents/loaders/test_rerodoc_loader.py
💤 Files with no reviewable changes (11)
- tests/unit/documents/loaders/test_rerodoc_loader.py
- data/oai_sources.json
- sonar/modules/documents/dojson/rerodoc/init.py
- tests/ui/documents/dojson/rerodoc/test_rerodoc_overdo.py
- sonar/modules/documents/loaders/schemas/rerodoc.py
- sonar/modules/documents/dojson/rerodoc/model.py_hesso
- sonar/modules/documents/dojson/rerodoc/model.py
- sonar/modules/documents/dojson/rerodoc/overdo.py
- sonar/modules/documents/cli/rerodoc.py
- tests/ui/documents/dojson/rerodoc/test_rerodoc_model.py
- tests/ui/documents/cli/test_documents_cli_rerodoc.py
7af7392 to
7f5f285
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
sonar/modules/documents/loaders/schemas/factory.py (1)
37-40: UseKeyErrorinstead of genericExceptionfor missing schema keys.Raising a broad
Exceptionmakes caller-side error handling less precise.KeyErroris more semantically appropriate for dictionary lookups. Additionally, the logic can be simplified by usingininstead of.get():Suggested change
`@staticmethod` def create(schema_key): @@ - if LoaderSchemaFactory.schemas.get(schema_key): - return LoaderSchemaFactory.schemas[schema_key]() - - raise Exception(f'No schema defined for key "{schema_key}"') + if schema_key in LoaderSchemaFactory.schemas: + return LoaderSchemaFactory.schemas[schema_key]() + + raise KeyError(f'No schema defined for key "{schema_key}"')The existing test (which catches
Exception) will still pass sinceKeyErroris a subclass ofException. Consider also updating the test assertion topytest.raises(KeyError)for consistency.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sonar/modules/documents/loaders/schemas/factory.py` around lines 37 - 40, The factory currently uses LoaderSchemaFactory.schemas.get(schema_key) and raises a generic Exception on miss; change this to check membership with "if schema_key in LoaderSchemaFactory.schemas" and return LoaderSchemaFactory.schemas[schema_key]() when present, and raise KeyError(schema_key) when missing so callers can catch a KeyError specifically; update any tests to expect KeyError (pytest.raises(KeyError)) if desired.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@sonar/modules/documents/loaders/schemas/factory.py`:
- Around line 37-40: The factory currently uses
LoaderSchemaFactory.schemas.get(schema_key) and raises a generic Exception on
miss; change this to check membership with "if schema_key in
LoaderSchemaFactory.schemas" and return
LoaderSchemaFactory.schemas[schema_key]() when present, and raise
KeyError(schema_key) when missing so callers can catch a KeyError specifically;
update any tests to expect KeyError (pytest.raises(KeyError)) if desired.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: a40c1bef-3f99-48e8-9ab5-7c25f281f2f2
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (23)
data/oai_sources.jsonsonar/modules/documents/cli/documents.pysonar/modules/documents/cli/rerodoc.pysonar/modules/documents/dojson/rerodoc/__init__.pysonar/modules/documents/dojson/rerodoc/model.pysonar/modules/documents/dojson/rerodoc/model.py_hessosonar/modules/documents/dojson/rerodoc/overdo.pysonar/modules/documents/loaders/schemas/factory.pysonar/modules/documents/loaders/schemas/rerodoc.pysonar/modules/documents/minters.pysonar/theme/views.pytests/conftest.pytests/ui/conftest.pytests/ui/data/harvested_record.xmltests/ui/documents/cli/test_documents_cli_ark.pytests/ui/documents/cli/test_documents_cli_rerodoc.pytests/ui/documents/dojson/rerodoc/test_rerodoc_model.pytests/ui/documents/dojson/rerodoc/test_rerodoc_overdo.pytests/ui/documents/test_documents_api.pytests/ui/documents/test_documents_receivers.pytests/ui/test_views.pytests/unit/documents/loaders/test_loader_schema_factory.pytests/unit/documents/loaders/test_rerodoc_loader.py
💤 Files with no reviewable changes (11)
- sonar/modules/documents/loaders/schemas/rerodoc.py
- data/oai_sources.json
- tests/unit/documents/loaders/test_rerodoc_loader.py
- sonar/modules/documents/dojson/rerodoc/init.py
- sonar/modules/documents/cli/rerodoc.py
- tests/ui/documents/cli/test_documents_cli_rerodoc.py
- tests/ui/documents/dojson/rerodoc/test_rerodoc_overdo.py
- sonar/modules/documents/dojson/rerodoc/model.py_hesso
- sonar/modules/documents/dojson/rerodoc/model.py
- sonar/modules/documents/dojson/rerodoc/overdo.py
- tests/ui/documents/dojson/rerodoc/test_rerodoc_model.py
🚧 Files skipped from review as they are similar to previous changes (5)
- sonar/modules/documents/cli/documents.py
- tests/ui/conftest.py
- tests/ui/documents/cli/test_documents_cli_ark.py
- tests/unit/documents/loaders/test_loader_schema_factory.py
- tests/conftest.py
RERO DOC repository is now closed. Remove the MARC21 transformation pipeline, CLI commands, loader schema, rerod PID minting, /rerodoc/ redirection routes, OAI source config, and all associated tests. Co-Authored-by: Pascal Repond <pascal.repond@rero.ch>
7f5f285 to
3e28af2
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/unit/documents/loaders/test_loader_schema_factory.py (1)
24-31: Consider adding test coverage for theborisschema.The factory now contains two schemas (
archive_ouverte_unigeandboris), but the test only validates one. Adding a test forboriswould ensure both remaining schemas are covered.🧪 Suggested test expansion
+from sonar.modules.documents.loaders.schemas.boris import BorisSchema from sonar.modules.documents.loaders.schemas.archive_ouverte_unige import ArchiveOuverteUnigeSchema from sonar.modules.documents.loaders.schemas.factory import LoaderSchemaFactory def test_loader_schema_factory(): """Test loader schema factory.""" schema = LoaderSchemaFactory.create("archive_ouverte_unige") assert isinstance(schema, ArchiveOuverteUnigeSchema) + schema = LoaderSchemaFactory.create("boris") + assert isinstance(schema, BorisSchema) + with pytest.raises(Exception) as exception: LoaderSchemaFactory.create("not-existing") assert str(exception.value) == 'No schema defined for key "not-existing"'🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/documents/loaders/test_loader_schema_factory.py` around lines 24 - 31, Add a test that also asserts the factory returns the boris schema: call LoaderSchemaFactory.create("boris") and assert the returned object is an instance of the BorIS schema class (e.g., BorisSchema or the exact class name used for the "boris" entry); keep the existing negative test for "not-existing". Reference LoaderSchemaFactory.create and the boris schema class when adding the new assertion.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/unit/documents/loaders/test_loader_schema_factory.py`:
- Around line 24-31: Add a test that also asserts the factory returns the boris
schema: call LoaderSchemaFactory.create("boris") and assert the returned object
is an instance of the BorIS schema class (e.g., BorisSchema or the exact class
name used for the "boris" entry); keep the existing negative test for
"not-existing". Reference LoaderSchemaFactory.create and the boris schema class
when adding the new assertion.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 3217e8b4-6ca0-489f-b5dd-60ac2d9bdd2f
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (23)
data/oai_sources.jsonsonar/modules/documents/cli/documents.pysonar/modules/documents/cli/rerodoc.pysonar/modules/documents/dojson/rerodoc/__init__.pysonar/modules/documents/dojson/rerodoc/model.pysonar/modules/documents/dojson/rerodoc/model.py_hessosonar/modules/documents/dojson/rerodoc/overdo.pysonar/modules/documents/loaders/schemas/factory.pysonar/modules/documents/loaders/schemas/rerodoc.pysonar/modules/documents/minters.pysonar/theme/views.pytests/conftest.pytests/ui/conftest.pytests/ui/data/harvested_record.xmltests/ui/documents/cli/test_documents_cli_ark.pytests/ui/documents/cli/test_documents_cli_rerodoc.pytests/ui/documents/dojson/rerodoc/test_rerodoc_model.pytests/ui/documents/dojson/rerodoc/test_rerodoc_overdo.pytests/ui/documents/test_documents_api.pytests/ui/documents/test_documents_receivers.pytests/ui/test_views.pytests/unit/documents/loaders/test_loader_schema_factory.pytests/unit/documents/loaders/test_rerodoc_loader.py
💤 Files with no reviewable changes (11)
- sonar/modules/documents/dojson/rerodoc/init.py
- data/oai_sources.json
- tests/ui/documents/cli/test_documents_cli_rerodoc.py
- tests/ui/documents/dojson/rerodoc/test_rerodoc_overdo.py
- sonar/modules/documents/loaders/schemas/rerodoc.py
- sonar/modules/documents/dojson/rerodoc/model.py
- sonar/modules/documents/dojson/rerodoc/overdo.py
- sonar/modules/documents/cli/rerodoc.py
- tests/unit/documents/loaders/test_rerodoc_loader.py
- sonar/modules/documents/dojson/rerodoc/model.py_hesso
- tests/ui/documents/dojson/rerodoc/test_rerodoc_model.py
✅ Files skipped from review due to trivial changes (6)
- tests/ui/documents/cli/test_documents_cli_ark.py
- tests/ui/conftest.py
- sonar/modules/documents/cli/documents.py
- tests/ui/documents/test_documents_receivers.py
- tests/ui/data/harvested_record.xml
- tests/ui/documents/test_documents_api.py
🚧 Files skipped from review as they are similar to previous changes (4)
- sonar/modules/documents/minters.py
- tests/ui/test_views.py
- tests/conftest.py
- sonar/theme/views.py
RERO DOC repository is now closed. Remove the MARC21 transformation pipeline, CLI commands, loader schema, rerod PID minting, /rerodoc/ redirection routes, OAI source config, and all associated tests.