-
Notifications
You must be signed in to change notification settings - Fork 10
Remodela institutições (versão 3) - wip #1263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Remodela institutições (versão 3) - wip #1263
Conversation
… metadados de fonte
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR refactors the organization/institution models from version 2 to version 3, introducing new models and restructuring the existing ones. The title indicates this is work in progress ("wip").
Changes:
- Introduces new models:
OrganizationalLevel,RawOrganization, andBaseOrganizationalLevel - Refactors
Organizationmodel to removeinstitution_type_mec,institution_type_scielo, andis_officialfields - Adds new fields:
source,external_idto Organization and OrganizationInstitutionType - Adds new utility function
clean_xml_tag_contentto standardize text input - Updates institution models with documentation about their legacy status (version 1)
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 21 comments.
Show a summary per file
| File | Description |
|---|---|
| organization/models.py | Major refactoring of Organization models, adding RawOrganization and OrganizationalLevel, removing old fields and methods |
| organization/migrations/0011_organizationallevel_raworganization_and_more.py | Django migration creating new models and altering existing ones |
| organization/choices.py | Replaces inst_type tuple with SOURCE_CHOICES list |
| institution/models.py | Adds documentation marking these models as version 1 (legacy) |
| core/utils/standardizer.py | Adds clean_xml_tag_content and remove_html_tags functions |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| field=models.ManyToManyField( | ||
| blank=True, | ||
| help_text="Standardized levels (requires organization)", | ||
| related_name="raw_organization_leves", |
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in related_name: "raw_organization_leves" should be "raw_organization_levels" (missing 'l'). This should match the corrected name in the model.
| related_name="raw_organization_leves", | |
| related_name="raw_organization_levels", |
| class Meta: | ||
| verbose_name = _("Organization") | ||
| verbose_name_plural = _("Organizations") | ||
| unique_together = [("name", "location", "external_id", "source")] # NOVO |
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unique_together constraint includes 'external_id' and 'source' which are nullable fields. This means multiple records with NULL values for these fields can exist, which may not provide the intended uniqueness. If the intent is to allow multiple organizations with the same name in the same location but different sources/external_ids, consider adding a database constraint that handles NULLs appropriately, or ensure that source always has a value (remove null=True).
organization/models.py
Outdated
| def get(cls, name, original=None, normalized_name=None, acronym=None, country=None, state=None, city=None, level_1=None, level_2=None, level_3=None): | ||
| """ | ||
| Busca registro existente. | ||
| """ | ||
| if not name: | ||
| raise ValueError("RawOrganization.get requires name") | ||
|
|
||
| original = clean_xml_tag_content(original) | ||
| normalized_name = clean_xml_tag_content(normalized_name) | ||
| name = clean_xml_tag_content(name) | ||
| country = clean_xml_tag_content(country) | ||
| state = clean_xml_tag_content(state) | ||
| city = clean_xml_tag_content(city) | ||
| level_1 = clean_xml_tag_content(level_1) | ||
| level_2 = clean_xml_tag_content(level_2) | ||
| level_3 = clean_xml_tag_content(level_3) | ||
|
|
||
| params = {"name__iexact": name} | ||
|
|
||
| if original: | ||
| params["original__iexact"] = original | ||
| else: | ||
| params["original__isnull"] = True | ||
|
|
||
| if normalized_name: | ||
| params["normalized_name__iexact"] = normalized_name | ||
| else: | ||
| params["normalized_name__isnull"] = True | ||
|
|
||
| if country: | ||
| params["country__iexact"] = country | ||
| else: | ||
| params["country__isnull"] = True | ||
|
|
||
| if state: | ||
| params["state__iexact"] = state | ||
| else: | ||
| params["state__isnull"] = True | ||
|
|
||
| if city: | ||
| params["city__iexact"] = city | ||
| else: | ||
| params["city__isnull"] = True | ||
|
|
||
| if level_1: | ||
| params["level_1__iexact"] = level_1 | ||
| else: | ||
| params["level_1__isnull"] = True | ||
|
|
||
| if level_2: | ||
| params["level_2__iexact"] = level_2 | ||
| else: | ||
| params["level_2__isnull"] = True | ||
|
|
||
| if level_3: | ||
| params["level_3__iexact"] = level_3 | ||
| else: | ||
| params["level_3__isnull"] = True | ||
|
|
||
| try: | ||
| return cls.objects.get(**params) | ||
| except cls.MultipleObjectsReturned: | ||
| return cls.objects.filter(**params).first() |
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The get method builds query parameters checking for None values and setting __isnull conditions. However, the method doesn't handle the case where 'acronym' is provided but is None in the parameters - line 844 shows acronym is accepted but never checked in the params dictionary construction. This inconsistency could lead to unexpected query behavior. Consider adding similar handling for acronym as done for other optional fields.
organization/models.py
Outdated
| if organization is not None: | ||
| obj.organization = organization | ||
| if organizational_level is not None: | ||
| obj.organizational_level = organizational_level |
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cannot assign ManyToMany fields directly. Lines 1027, 1029 attempt to assign values directly to ManyToMany fields 'organization' and 'organizational_level'. This will raise a TypeError. Use .set() or .add() methods instead, and these operations can only be performed after the object is saved.
core/utils/standardizer.py
Outdated
| if text_.isalpha(): | ||
| return text_ | ||
| else: | ||
| return None |
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic in clean_xml_tag_content is problematic. When assert_string=True (which is the default), it uses text_.isalpha() to validate the string. This method returns True only if ALL characters are alphabetic (no spaces, numbers, or special characters). This means that most organization names like "Universidade de São Paulo", "USP-2024", or any name with spaces will be rejected and return None. This will cause data loss when cleaning organization names that contain spaces, numbers, or special characters.
| if text_.isalpha(): | |
| return text_ | |
| else: | |
| return None | |
| # Accept strings that contain at least one alphabetic character. | |
| # This allows spaces, digits, and punctuation in names (e.g., "USP-2024"), | |
| # while still rejecting strings with no letters at all. | |
| if any(ch.isalpha() for ch in text_): | |
| return text_ | |
| return None |
organization/models.py
Outdated
| organizational_level = models.ManyToManyField( | ||
| "OrganizationalLevel", | ||
| blank=True, | ||
| related_name="raw_organization_leves", |
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in related_name: "raw_organization_leves" should be "raw_organization_levels" (missing 'l').
| related_name="raw_organization_leves", | |
| related_name="raw_organization_levels", |
| @property | ||
| def is_matched(self): | ||
| """Indica se está vinculado a Organization.""" | ||
| return self.organization_id is not None |
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The is_matched property checks 'self.organization_id' but 'organization' is defined as a ManyToManyField, not a ForeignKey. ManyToMany fields don't have an '_id' attribute. This will raise an AttributeError. To check if a ManyToMany relationship exists, use 'self.organization.exists()' instead (though this requires the object to be saved first).
| return self.organization_id is not None | |
| return self.organization.exists() |
organization/models.py
Outdated
| organization=organization, | ||
| organizational_level=organizational_level, | ||
| match_status=match_status or "unmatched", | ||
| extra_data=extra_data, |
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The create method references 'extra_data' field (line 949) which is not defined in the RawOrganization model. This will raise a TypeError when trying to create an instance with this parameter.
| extra_data=extra_data, |
organization/models.py
Outdated
| self.organization = organization | ||
| self.organizational_level = organizational_level |
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cannot assign ManyToMany fields directly. Lines 1072-1073 attempt to assign values directly to ManyToMany fields 'organization' and 'organizational_level'. This will raise a TypeError. These should use .set() or .add() methods: e.g., 'self.organization.set([organization])' if organization is a single instance, or 'self.organization.add(organization)'.
organization/models.py
Outdated
| from core.forms import CoreAdminModelForm | ||
| from core.models import BaseHistory, CommonControlField | ||
| from core.utils.standardizer import remove_extra_spaces | ||
| from core.utils.standardizer import clean_xml_tag_content, remove_extra_spaces |
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'remove_extra_spaces' is not used.
| from core.utils.standardizer import clean_xml_tag_content, remove_extra_spaces | |
| from core.utils.standardizer import clean_xml_tag_content |
…GANIZATION_ROLES)
…tituições a papéis
… em modelos de instituição
…ctionOrganization
… e marca campos antigos como deprecated
…ds extensos em CharFields
… para dados legados
…elo JournalOrganization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 15 out of 15 changed files in this pull request and generated 28 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
organization/models.py
Outdated
| def clean(self): | ||
| """Validações de integridade.""" | ||
| super().clean() | ||
|
|
||
| # organizational_level só pode existir se organization existir | ||
| if self.organizational_level and not self.organization: | ||
| raise ValidationError({ | ||
| 'organizational_level': _( | ||
| "Cannot set organizational_level without organization" | ||
| ) | ||
| }) | ||
|
|
||
| # organizational_level deve pertencer à organization | ||
| if (self.organizational_level and self.organization and | ||
| self.organizational_level.organization_id != self.organization_id): | ||
| raise ValidationError({ | ||
| 'organizational_level': _( | ||
| "Organizational level must belong to the selected organization" | ||
| ) | ||
| }) |
Copilot
AI
Jan 31, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The clean method checks ManyToMany fields self.organizational_level and self.organization before the instance is saved. ManyToMany relationships cannot be accessed before the instance has a primary key. This validation will fail or produce incorrect results. Consider moving this validation to a custom save method or using Django signals.
organization/models.py
Outdated
| self.organization = organization | ||
| self.organizational_level = organizational_level | ||
| self.match_status = match_status | ||
| self.updated_by = user | ||
| self.full_clean() | ||
| self.save() |
Copilot
AI
Jan 31, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The link_to_organization method assigns ManyToMany fields directly at lines 1072-1073. ManyToMany fields must be set using .add(), .set(), or .remove() methods, not direct assignment. This will raise a TypeError.
| def __str__(self): | ||
| return f"{self.name}" | ||
| if self.range: | ||
| return f"{self.organization} - {self.role} ({self.range})" | ||
| return f"{self.organization} - {self.role}" |
Copilot
AI
Jan 31, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The BaseOrganizationRole.__str__ method references self.range which comes from BaseDateRange, but BaseOrganizationRole inherits from BaseDateRange that uses CharField for dates (initial_date/final_date), not the property names used in the BaseHistory model. However, looking at line 659, BaseDateRange does define a range property. But the field names don't match - BaseOrganizationRole panels reference fields from BaseDateRange which uses initial_date/final_date (CharField), creating inconsistency with the data model.
journal/api/v1/serializers.py
Outdated
| "start_date": item.start_date, | ||
| "end_date": item.end_date, |
Copilot
AI
Jan 31, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Field name mismatch: The get_organization_by_role method accesses item.start_date and item.end_date at lines 223-224, but JournalOrganization inherits from BaseDateRange which defines fields as initial_date and final_date. This will raise AttributeErrors. Change to use item.initial_date and item.final_date.
| "start_date": item.start_date, | |
| "end_date": item.end_date, | |
| "start_date": item.initial_date, | |
| "end_date": item.final_date, |
| if self.organization: | ||
| return f"{self.organization}" |
Copilot
AI
Jan 31, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The __str__ method checks self.organization but it's a ManyToManyField, not a ForeignKey. This will always evaluate to a Manager object (truthy), not the intended check for whether organizations exist. Consider using self.organization.exists() or self.organization.first() instead.
| if self.organization: | |
| return f"{self.organization}" | |
| organization = self.organization.first() | |
| if organization: | |
| return str(organization) |
| # class Meta: | ||
| # model = models.PublisherHistory | ||
| # fields = [ | ||
| # "name", |
Copilot
AI
Jan 31, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment appears to contain commented-out code.
| # class OwnerSerializer(serializers.ModelSerializer): | ||
| # """ | ||
| # DEPRECATED: This serializer is deprecated and will be removed in a future version. | ||
| # Use JournalOrganizationSerializer with role="owner" instead. | ||
| # """ | ||
| # name = serializers.CharField( | ||
| # source="institution.institution.institution_identification.name" |
Copilot
AI
Jan 31, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment appears to contain commented-out code.
| # class Meta: | ||
| # model = models.OwnerHistory | ||
| # fields = [ | ||
| # "name", |
Copilot
AI
Jan 31, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment appears to contain commented-out code.
| OrgLevelSponsor, | ||
| ) | ||
| from organization.models import HELP_TEXT_ORGANIZATION, Organization | ||
| from organization.models import HELP_TEXT_ORGANIZATION, Organization, BaseOrganizationRole, RawOrganization |
Copilot
AI
Jan 31, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'RawOrganization' is not used.
| from organization.models import HELP_TEXT_ORGANIZATION, Organization, BaseOrganizationRole, RawOrganization | |
| from organization.models import HELP_TEXT_ORGANIZATION, Organization, BaseOrganizationRole |
| ) | ||
| from organization.models import HELP_TEXT_ORGANIZATION, Organization | ||
| from organization.models import HELP_TEXT_ORGANIZATION, Organization, BaseOrganizationRole, RawOrganization | ||
| from organization.choices import ORGANIZATION_ROLES |
Copilot
AI
Jan 31, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'ORGANIZATION_ROLES' is not used.
| from organization.choices import ORGANIZATION_ROLES |
Introduces: - task_migrate_organization_history_for_journal: migrates data for a specific journal. - task_migrate_organization_history_for_collection: dispatches migration tasks for all journals in a collection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 17 out of 17 changed files in this pull request and generated 23 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Cria ou atualiza registro. | ||
| """ | ||
| if not original: | ||
| raise ValueError("RawOrganization.create_or_update requires name") |
Copilot
AI
Jan 31, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the RawOrganization.create_or_update() method, line 935 raises a ValueError with the message "RawOrganization.create_or_update requires name" but the actual parameter being checked is "original", not "name". The error message should say "requires original" for accuracy.
| raise ValueError("RawOrganization.create_or_update requires name") | |
| raise ValueError("RawOrganization.create_or_update requires original") |
| if has_only_alpha_or_space(text_): | ||
| return text_ | ||
| else: | ||
| return None |
Copilot
AI
Jan 31, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function clean_xml_tag_content() uses assert_string parameter with a default of True, which causes it to return None when the text contains numbers (line 52). This behavior could lead to silent data loss when processing organization names that legitimately contain numbers (e.g., "Lab 3", "Building 42"). Consider whether this validation is appropriate for all use cases, or if it should be optional/configurable.
| if has_only_alpha_or_space(text_): | |
| return text_ | |
| else: | |
| return None | |
| if not has_only_alpha_or_space(text_): | |
| logging.warning( | |
| "clean_xml_tag_content: text failed alpha/space assertion; preserving value: %r", | |
| text_, | |
| ) |
| "start_date", | ||
| "end_date", | ||
| "is_current", |
Copilot
AI
Jan 31, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The JournalOrganizationSerializer fields list (lines 82-84) references start_date and end_date, but the actual field names in the model (from BaseDateRange) are initial_date and final_date. The serializer fields should be initial_date and final_date to match the model.
| verbose_name=_("New Title"), | ||
| on_delete=models.SET_NULL, | ||
| null=True, | ||
| null=True, |
Copilot
AI
Jan 31, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The field name in JournalOrganization model is "initial_date" and "final_date" (inherited from BaseDateRange), but the model definition on line 93 shows the old_title relationship which has a trailing space after "null=True," on line 93. While this is just a formatting issue, it's inconsistent with the rest of the codebase.
| null=True, | |
| null=True, |
| dict: Estatísticas da migração por journal | ||
| """ | ||
| try: | ||
| user = _get_user(user_id, username) |
Copilot
AI
Jan 31, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The task_migrate_organization_history_for_journal() and task_migrate_organization_history_for_collection() functions call _get_user() incorrectly. On line 470 it's called as _get_user(user_id, username), but the correct signature from core.utils.utils is _get_user(request, username=None, user_id=None). It should be _get_user(self.request, username=username, user_id=user_id) to match the pattern used in other tasks like load_journal_from_article_meta_for_one_collection (line 77).
| user = _get_user(user_id, username) | |
| user = _get_user(self.request, username=username, user_id=user_id) |
| for history_item in history_queryset: | ||
|
|
||
| # Cria o registro no novo modelo consolidado | ||
| JournalOrganization.objects.create( | ||
| journal=self, | ||
| role=role, | ||
| organization=history_item.organization, | ||
| original_data=history_item.institution_name, | ||
| start_date=history_item.initial_date_isoformat, | ||
| end_date=history_item.final_date_isoformat, | ||
| creator=user | ||
| ) |
Copilot
AI
Jan 31, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The migrate_all_organization_history() method accesses history_item.organization (line 1202) and history_item.institution_name (line 1203), but the actual field name in the legacy models is "institution", not "organization". This will cause an AttributeError. The correct field should be history_item.institution for the ForeignKey, and history_item.institution_name property already handles the name extraction.
| return RawOrganization.objects.get(**params) | ||
| except RawOrganization.MultipleObjectsReturned: | ||
| return RawOrganization.objects.filter(**params).first() |
Copilot
AI
Jan 31, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the OrganizationalLevel.get() method, the function returns RawOrganization objects (lines 545 and 547) instead of OrganizationalLevel objects. This is a copy-paste error that will cause incorrect behavior when trying to retrieve OrganizationalLevel instances.
| return RawOrganization.objects.get(**params) | |
| except RawOrganization.MultipleObjectsReturned: | |
| return RawOrganization.objects.filter(**params).first() | |
| return cls.objects.get(**params) | |
| except cls.MultipleObjectsReturned: | |
| return cls.objects.filter(**params).first() |
| def get(cls, original, country=None, state=None, city=None, level_1=None, level_2=None, level_3=None): | ||
| """ | ||
| Busca registro existente. | ||
| """ | ||
| if not original: | ||
| raise ValueError("RawOrganization.get requires original") | ||
|
|
||
| original = clean_xml_tag_content(original) | ||
| params = {"original": original} | ||
| if country is not None: | ||
| params["country"] = country | ||
| if state is not None: | ||
| params["state"] = state | ||
| if city is not None: | ||
| params["city"] = city | ||
| if level_1 is not None: | ||
| params["level_1"] = level_1 | ||
| if level_2 is not None: | ||
| params["level_2"] = level_2 | ||
| if level_3 is not None: | ||
| params["level_3"] = level_3 | ||
| try: | ||
| return cls.objects.get(**params) | ||
| except cls.MultipleObjectsReturned: | ||
| return cls.objects.filter(**params).first() |
Copilot
AI
Jan 31, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The migration's unique_together on line 516 uses ("name", "acronym", ...) but in RawOrganization.get() on line 844, the lookup uses only "original" without including "name" or "acronym" in the params. This mismatch between the unique constraint and the get() method logic could lead to MultipleObjectsReturned exceptions or incorrect record retrieval.
| field=models.ManyToManyField( | ||
| blank=True, | ||
| help_text="Standardized levels (requires organization)", | ||
| related_name="raw_organization_leves", |
Copilot
AI
Jan 31, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The related_name in the migration file for RawOrganization.organizational_level is "raw_organization_leves" (line 438), which appears to be a typo for "raw_organization_levels" (missing an 'l'). This should be corrected to "raw_organization_levels" for consistency and clarity.
| related_name="raw_organization_leves", | |
| related_name="raw_organization_levels", |
| def has_only_alpha_or_space(text): | ||
| """ Verifica se o conteúdo do texto é válido como string, ou seja, | ||
| não é vazio e não contém números. """ | ||
| if not text: | ||
| return False | ||
| parts = text.split() | ||
| for part in parts: | ||
| if not part.isalpha(): | ||
| return False | ||
| return True |
Copilot
AI
Jan 31, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The has_only_alpha_or_space() function on line 31 returns False for text containing numbers, but the function name suggests it should return True if the text has "only alpha or space". The name is misleading - it should be named something like is_alphabetic_text() or has_no_numbers() to better reflect its behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 18 out of 18 changed files in this pull request and generated 24 comments.
Comments suppressed due to low confidence (1)
journal/tasks.py:38
- The removed function load_journal_from_classic_website is referenced in a deleted script file but may still be referenced elsewhere in the codebase. Before removing this task, you should verify that no other code attempts to call this function. Use tools to search for any remaining references to ensure this is safe to remove.
@celery_app.task(bind=True)
def load_journal_from_article_meta(
self, username=None, user_id=None, limit=None, collection_acron=None, load_data=None
):
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "initial_date", | ||
| models.CharField( | ||
| blank=True, | ||
| max_length=10, | ||
| null=True, | ||
| verbose_name="Initial Date", | ||
| ), | ||
| ), | ||
| ( | ||
| "final_date", | ||
| models.CharField( | ||
| blank=True, max_length=10, null=True, verbose_name="Final Date" | ||
| ), |
Copilot
AI
Feb 1, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The migration creates JournalOrganization with initial_date and final_date fields (lines 164-176), but the unique_together constraint at lines 1328-1331 doesn't include these date fields. However, BaseDateRange uses initial_date and final_date as field names. There's an inconsistency in the field naming - the migration uses start_date/end_date in the model definition but initial_date/final_date in the actual field creation. Verify which naming convention should be used.
| panels = [ | ||
| AutocompletePanel("organization"), | ||
| FieldPanel("role"), | ||
| ] + BaseDateRange.panels |
Copilot
AI
Feb 1, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BaseOrganizationRole.panels references start_date and end_date fields but BaseDateRange (which BaseOrganizationRole inherits from) defines initial_date and final_date fields. This mismatch will cause errors in the Wagtail admin interface. The panel field names must match the actual model field names.
| ] + BaseDateRange.panels | |
| FieldPanel("initial_date"), | |
| FieldPanel("final_date"), | |
| ] |
| name="source", | ||
| field=models.CharField( | ||
| blank=True, | ||
| choices=[ | ||
| ("user", "user"), | ||
| ("legacy", "legacy"), | ||
| ("MEC", "Ministério da Educação e Cultura"), | ||
| ("ror", "Research Organization Registry"), | ||
| ], | ||
| default="user", | ||
| help_text="Data source (ror, isni, manual, etc.)", | ||
| max_length=50, | ||
| null=True, | ||
| verbose_name="Source", | ||
| ), |
Copilot
AI
Feb 1, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Migration 0012 changes the default value from "user_input" to "user" for the source field. This could cause issues with existing data that has "user_input" as the source value. Consider adding a data migration to update existing "user_input" values to "user" before changing the default, or keep "user_input" as a valid choice in SOURCE_CHOICES.
| logger.info(f"Organizations update completed for collections {collection_acron_list}. " | ||
| f"Total: {total_journals}, Success: {successful_updates}, Failed: {failed_updates}") | ||
|
|
||
| return None |
Copilot
AI
Feb 1, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The task returns None instead of returning the update_results dictionary. This makes it impossible to see what was processed. Change return None to return update_results or create a summary result object.
| def clean_xml_tag_content(text, assert_string=True): | ||
| if not text: | ||
| return text | ||
| text = "".join(remove_html_tags(text)) | ||
| text_ = remove_extra_spaces(text) | ||
| if assert_string: | ||
| if has_only_alpha_or_space(text_): | ||
| return text_ | ||
| else: | ||
| return None | ||
| return text_ |
Copilot
AI
Feb 1, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The clean_xml_tag_content function has an assert_string parameter that when True will return None if the text contains numbers. This behavior could silently discard valid organization names that contain numbers (e.g., "Lab 21", "Unit 3", "Building 5A"). Consider if this validation is too strict for organization names, or document why numeric content should be rejected.
| def add_organization(self, user, role, organization=None, original_data=None, | ||
| start_date=None, end_date=None): | ||
| """ | ||
| Adiciona uma organização ao periódico. | ||
|
|
||
| Args: | ||
| user: User instance | ||
| role: str - 'owner', 'publisher', 'sponsor', 'copyright_holder' | ||
| organization: Organization instance (opcional) | ||
| original_data: str - dados originais quando organization é None | ||
| start_date: date - data de início | ||
| end_date: date - data de fim | ||
|
|
||
| Returns: | ||
| JournalOrganization instance | ||
| """ | ||
| if not role: | ||
| raise ValueError("add_organization requires role parameter") | ||
| if not organization and not original_data: | ||
| raise ValueError("add_organization requires organization or original_data parameter") | ||
|
|
||
| journal_org = JournalOrganization.objects.create( | ||
| journal=self, | ||
| role=role, | ||
| organization=organization, | ||
| original_data=original_data, | ||
| start_date=start_date, | ||
| end_date=end_date, | ||
| creator=user | ||
| ) | ||
| return journal_org |
Copilot
AI
Feb 1, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method parameters use start_date and end_date but BaseDateRange fields are named initial_date and final_date. The assignment at lines 1053-1054 will try to set non-existent attributes. Either change the parameter names to initial_date/final_date to match the model fields, or update BaseDateRange to use start_date/end_date field names.
| if not obj.end_date: | ||
| return True | ||
| from datetime import date | ||
| return obj.end_date > date.today() |
Copilot
AI
Feb 1, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The is_current method accesses obj.end_date but BaseDateRange fields are named final_date. This will cause an AttributeError. Should be obj.final_date instead.
| if not obj.end_date: | |
| return True | |
| from datetime import date | |
| return obj.end_date > date.today() | |
| if not obj.final_date: | |
| return True | |
| from datetime import date | |
| return obj.final_date > date.today() |
| if self.organizations.filter(role=role).count() == history_queryset.count(): | ||
| history_queryset.delete() |
Copilot
AI
Feb 1, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function deletes the legacy history records if the count matches, but this is risky. The migration should be a separate step from deletion, and deletion should only happen after manual verification that the migration was successful. Consider removing this auto-deletion or at least adding a force_delete parameter that defaults to False. The cleanup_legacy_organization_history function exists for this purpose and should be used separately.
|
|
||
| try: | ||
| # Obter usuário | ||
| user = _get_user(self.request, username=username, user_id=user_id) |
Copilot
AI
Feb 1, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function _get_user is called with inconsistent parameters. On line 470, it's called with _get_user(user_id, username) but on line 655, it's called with _get_user(self.request, username=username, user_id=user_id). You need to verify which signature is correct and use it consistently throughout the file.
| user = _get_user(self.request, username=username, user_id=user_id) | |
| user = _get_user(user_id, username) |
| force_update=force_update, | ||
| ) | ||
|
|
||
| result = { |
Copilot
AI
Feb 1, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable result is not used.
O que esse PR faz?
Fale sobre o propósito do pull request como por exemplo: quais problemas ele soluciona ou quais features ele adiciona.
Onde a revisão poderia começar?
Indique o caminho do arquivo e o arquivo onde o revisor deve iniciar a leitura do código.
Como este poderia ser testado manualmente?
Estabeleça os passos necessários para que a funcionalidade seja testada manualmente pelo revisor.
Algum cenário de contexto que queira dar?
Indique um contexto onde as modificações se fazem necessárias ou passe informações que contextualizam
o revisor a fim de facilitar o entendimento da funcionalidade.
Screenshots
Quando aplicável e se fizer possível adicione screenshots que remetem a situação gráfica do problema que o pull request resolve.
Quais são tickets relevantes?
Indique uma issue ao qual o pull request faz relacionamento.
Referências
Indique as referências utilizadas para a elaboração do pull request.