Conversation
…-list Co-authored-by: sal-uva <10960315+sal-uva@users.noreply.github.com>
Co-authored-by: sal-uva <10960315+sal-uva@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR implements the "Upload Annotations" feature requested in issue #223, which allows users to upload annotation data for existing datasets via a CSV file or pasted text. Additionally, it adds hot-updating of the annotation fields section in the UI when a processor completes, without requiring a page reload.
Changes:
- New
upload-annotationsprocessor that parses and saves annotation data (including frontend validation viavalidate_query, file handling viaafter_create, and async batch processing inprocess()) - Hot-update feature for the annotation fields list in the UI: the
check-processorsAPI response now includesannotation_fieldsfrom the top-parent dataset, and new JS code dynamically updates the displayed annotation badges on processor completion - Category reclassification of
annotation_metadata.pyfrom"Metrics"to"Conversion"
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
processors/conversion/upload_annotations.py |
New processor class for uploading annotations via CSV file or pasted text |
webtool/views/api_tool.py |
Adds annotation_fields to the check-processors API response |
webtool/static/js/fourcat.js |
Adds update_annotation_fields_list() function and calls it when a processor finishes |
processors/metrics/annotation_metadata.py |
Changes category from "Metrics" to "Conversion" |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Write uploaded annotations as output file | ||
| warning = None | ||
| if skipped: | ||
| warning = "Uploaded %i annotation(s) for %i item(s). Skipped %i row(s) (item ID not found in dataset or empty)." % (saved, total_rows, skipped) |
There was a problem hiding this comment.
The warning message at line 325 uses total_rows for the second %i placeholder (number of items), but total_rows is the total number of input data rows (including skipped rows), not the count of successfully processed items. The second placeholder represents "how many items were uploaded", which should be len(results) (the number of unique items that had at least one annotation saved).
For example, if the file has 100 rows and 10 were skipped, total_rows is 100, but the message says "Uploaded X annotation(s) for 100 item(s). Skipped 10 row(s)..." — which is misleading, as 90 items (not 100) were actually processed.
The correct variable to use here is len(results) (or a dedicated counter tracking the number of successfully processed items).
| # Parse data | ||
| try: | ||
| reader = csv.reader(f_handle, delimiter=separator) | ||
| header = next(reader, None) | ||
| except csv.Error: | ||
| self.dataset.finish_with_error("Could not parse the uploaded data. Make sure the correct separator is used between columns.") | ||
| return | ||
|
|
||
| if not header or len(header) < 2: | ||
| self.dataset.finish_with_error( | ||
| "Data must have at least two columns: the first for item IDs, and at least one for annotations." | ||
| ) | ||
| return | ||
|
|
||
| annotation_labels = [h.strip() for h in header[1:]] | ||
|
|
||
| # Check max 20 annotations | ||
| if len(annotation_labels) > 20: | ||
| self.dataset.finish_with_error("A maximum of 20 annotation fields per upload is allowed.") | ||
| return | ||
|
|
||
| # Check for duplicate annotation field names in existing annotations | ||
| source_dataset = self.source_dataset | ||
| existing_labels = source_dataset.get_annotation_field_labels() | ||
| conflicts = [label for label in annotation_labels if label in existing_labels] | ||
| if conflicts: | ||
| self.dataset.finish_with_error( | ||
| "Annotation field(s) already exist: %s. Please rename them in your input data." % ", ".join(conflicts) | ||
| ) | ||
| return |
There was a problem hiding this comment.
The f_handle file handle is opened at line 206 (or assigned at line 215), but when the csv.Error exception is caught at line 221-223 and the function returns early, f_handle is not closed before the return. The same applies to the validation failure at lines 225-229 and 234-246. This leaks file handles (particularly for upload_method == "file", where a real file is opened). The code should use a try/finally block or a context manager (with statement) around the entire processing section to ensure f_handle is always closed, even on early exit.
| let html = '<div class="fullwidth annotation-fields-list"><dt>Annotations</dt><dd><ul>'; | ||
| for (let field_id in annotation_fields) { | ||
| let label = annotation_fields[field_id].label; | ||
| html += '<li><span class="property-badge" data-annotation-id="' + field_id + '"><i class="fa-solid fa-tag"></i> ' + label + '</span></li>'; | ||
| } | ||
| html += '</ul></dd></div>'; | ||
| card_dl.find('#dataset-result').before(html); | ||
| } | ||
| } else { | ||
| // Update existing section: add any new fields | ||
| let ul = annotation_list.find('ul'); | ||
| for (let field_id in annotation_fields) { | ||
| if (ul.find('[data-annotation-id="' + field_id + '"]').length === 0) { | ||
| let label = annotation_fields[field_id].label; | ||
| ul.append('<li><span class="property-badge" data-annotation-id="' + field_id + '"><i class="fa-solid fa-tag"></i> ' + label + '</span></li>'); |
There was a problem hiding this comment.
In update_annotation_fields_list, both field_id and label (user-supplied annotation field IDs and labels from uploaded CSV headers) are concatenated directly into HTML strings at lines 689 and 700 without any HTML escaping. A maliciously crafted annotation label like "><script>alert(1)</script> would result in XSS when the hot-update function renders it. The label and field_id values should be HTML-escaped (e.g., using a utility function) before being inserted into the DOM. Alternatively, DOM APIs (like document.createElement, textContent, and setAttribute) should be used instead of string concatenation to build HTML.
| if (annotation_list.length === 0) { | ||
| // Section does not exist yet; find the dataset details card and insert after | ||
| // the last existing fullwidth div before API Credentials or Processors sections | ||
| let card_dl = $('.result-page .card dl'); | ||
| if (card_dl.length > 0) { | ||
| let html = '<div class="fullwidth annotation-fields-list"><dt>Annotations</dt><dd><ul>'; | ||
| for (let field_id in annotation_fields) { | ||
| let label = annotation_fields[field_id].label; | ||
| html += '<li><span class="property-badge" data-annotation-id="' + field_id + '"><i class="fa-solid fa-tag"></i> ' + label + '</span></li>'; | ||
| } | ||
| html += '</ul></dd></div>'; | ||
| card_dl.find('#dataset-result').before(html); | ||
| } |
There was a problem hiding this comment.
When update_annotation_fields_list creates a new .annotation-fields-list section (the annotation_list.length === 0 branch), the generated HTML div is missing the data-label-edit-href attribute that the existing server-rendered section in result-details.html (line 134) always includes. The annotation_label.handle click handler at line 971 reads this attribute via find_parent(target, 'div').getAttribute('data-label-edit-href'). Without it, callback_url will be null, and the subsequent fetch(null, ...) call will fail with an invalid URL error — meaning annotation label editing will be broken for any fields added via hot-update before the page is reloaded. The dataset key needs to be made available to the JS function so it can build the correct URL for the data-label-edit-href attribute.
| "annotation_fields": { | ||
| field_id: {"label": field_data.get("label", "")} | ||
| for field_id, field_data in (top_parent.annotation_fields or {}).items() | ||
| } |
There was a problem hiding this comment.
The OpenAPI schema docstring for the check_processor endpoint does not include the newly added annotation_fields field in the response schema. This leaves the API documentation inaccurate with respect to the actual response structure returned by this endpoint.
|
@sal-uva looks like copilot did a review after you reviewed and merged this. The hanging open file and ability to inject code that would be executed by our frontend seem the most important to address. |
That's stupid of me. Still learning the Copilot PR flow. I'll check the issues (and definitely fix the JS injection one). |
New processor allowing users to upload annotations for a dataset via CSV file or pasted text, plus hot-updating of the annotation-fields-list in the UI.
Processor (
processors/conversion/upload_annotations.py)requiresconditional display)validate_query): checks data format, max 20 annotation fields, no duplicate labels, and conflicts with existing annotation fields on the datasetafter_createpersists the uploaded file to.importingfor async processingfinish_with_warning()for partial successHot-update (
fourcat.js+api_tool.py)check-processorsAPI response now includesannotation_fieldsfrom the top parent datasetupdate_annotation_fields_list()in JS dynamically inserts or appends annotation field badges when a processor completes, without page reloadOriginal prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.