Skip to content

Migration lep#520

Open
kpsherva wants to merge 6 commits into
CERNDocumentServer:masterfrom
kpsherva:migration-lep
Open

Migration lep#520
kpsherva wants to merge 6 commits into
CERNDocumentServer:masterfrom
kpsherva:migration-lep

Conversation

@kpsherva
Copy link
Copy Markdown
Contributor

@kpsherva kpsherva commented May 4, 2026

it is best to review commit by commit
PR introduces some slight performance improvements
needs: CERNDocumentServer/cds-rdm#795

@kpsherva kpsherva requested a review from zubeydecivelek May 4, 2026 08:13
@kpsherva kpsherva moved this to In review 🔍 in Sprint Q2 2026 ☀️ May 4, 2026
except (CDSMigrationException, ValidationError, InvalidRelationValue) as e:

exc = ManualImportRequired(
message=str(e),
Copy link
Copy Markdown
Contributor

@zubeydecivelek zubeydecivelek May 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we catch any kind of migration errors I get "" for str(e), so we lose the error message.

For example if we get report number already exists error it's ManualImportRequired and when we catch we're losing it's message
Screenshot 2026-05-06 at 14 16 58
with the message:
Screenshot 2026-05-06 at 14 17 12

maybe we can do

message=getattr(e, "message", None) or str(e)

or maybe we can update the CDSMigrationException class so str(e) can keep the error message?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've fixed this elsewhere in the logging, it shouldn't happen anymore

Comment thread cds_migrator_kit/rdm/records/load/load.py
)
for experiment in experiments:
if experiment.lower().strip() == "not applicable":
if experiment.lower().strip() in ["not applicable", "xx"]:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your PR adds it already so someone will need to rebase

Comment on lines +258 to +269
def _match_affiliation(self, affiliation_name, json_entry):
"""Match an affiliation against `CDSMigrationAffiliationMapping` db table."""
if is_ror(affiliation_name):
ror = normalize_ror(affiliation_name)
name = AffiliationsMetadata.query.filter_by(pid=ror).one_or_none()
if name is None:
raise ManualImportRequired(
message="Affiliation {ror} does not exist in the AffiliationMetadata table".format(ror=ror),
field="validation",
stage="transform",
description="Add this affiliation",
recid=json_entry["recid"],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since only recid is used in json_entry wouldnt it be better if we just pass the recid?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usually yes but I am not sure if it will make a big difference here

Comment on lines +175 to +176
if subject_value.lower() == "xx":
raise IgnoreKey("subjects")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your PR will add this so someone will need to rebase. In general, I wouldn't change a PR because another feature needs something, because it is hard to draw a line where to stop :)

def date(self, key, value):
"""Translates dates."""
dates = self.get("dates", [])
valid = value.get("a")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a date validation?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have the validation on the load step



class ResearchModel(CdsOverdo):
"""Translation model for MoUs."""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the docstring? No :)



class ResearchCommitteeModel(CdsOverdo):
"""Translation model for MoUs."""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this correct?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

"8564_8", # file id
"8564_s", # bibdoc id
"8564_x", # icon thumbnails sizes
# "8564_y", # file description - done by files dump
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is ignored for almost all models, why not here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems my answer for this was not posted somehow...
because for the research collection we use this field to determine open access fields, we can't ignore it

Comment on lines +466 to +472
if subject.get("subject", "") in ["xx"]:
del subject

subjects(json_entry)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we already check in the rule, you think it's needed? Or if we need should we also check select: other subjects to delete?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes we can move the whole clean up here. Let's move it after merging your PR, when we have all the values to drop?

* chore(tests): fix testing cases, formatting
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants