Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
f8e3815
Add OC_URL annotations: clickable hyperlinks anchored to highlighted …
claude May 13, 2026
61c4070
Merge remote-tracking branch 'origin/main' into claude/add-url-annota…
JSv4 May 14, 2026
198830e
Fix mypy: broaden requests headers types to dict[str, str | bytes]
JSv4 May 14, 2026
8371960
Add backend tests for OC_URL annotation mutations and validation
JSv4 May 14, 2026
82163b3
Add frontend tests for OC_URL annotation utilities and modal
JSv4 May 14, 2026
eb2521f
Exclude test_url_annotation from mypy baseline (matches sibling test …
JSv4 May 14, 2026
53d90c7
Address review: extract OC_URL label magic values, dedupe link_url va…
JSv4 May 14, 2026
1379ac8
Replace 'as any' with 'as unknown as <T>' in urlAnnotation tests to k…
JSv4 May 14, 2026
1624a26
Cover useCreateUrlAnnotation throw path (network error → toast, no st…
JSv4 May 14, 2026
869a7df
Address review: fix URLField validator conflict, narrow exception catch
JSv4 May 14, 2026
5f5c12a
Merge duplicate lucide-react imports in TxtAnnotator
JSv4 May 14, 2026
ef30fe9
Address round 3 review: DRY URL allow-list, nullable linkUrl type, na…
JSv4 May 14, 2026
0967149
Fix TxtAnnotator URL editor for OC_URL annotations without a set link…
JSv4 May 14, 2026
4e59bb8
Persist link_url through V2 export/import + legacy ETL round-trip
JSv4 May 14, 2026
633d6b5
Address round 4 review: feed filter, SPA navigation, constants drift,…
JSv4 May 14, 2026
182f558
Address round 5 review: tighten ok=true assertion, dedupe trim in ope…
JSv4 May 14, 2026
6a3ca07
Address round 6 review: protocol-relative open-redirect + whitespace-…
JSv4 May 14, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Added

- **OC_URL link annotations — clickable hyperlinks anchored to highlighted text** (`opencontractserver/annotations/models.py`, `opencontractserver/annotations/migrations/0072_annotation_link_url.py`, `opencontractserver/constants/annotations.py`, `config/graphql/{annotation_mutations,serializers,mutations}.py`, `frontend/src/assets/configurations/constants.ts`, `frontend/src/types/graphql-api.ts`, `frontend/src/components/annotator/types/annotations.ts`, `frontend/src/components/annotator/utils/urlAnnotation.ts` (new), `frontend/src/graphql/{queries,mutations}.ts`, `frontend/src/components/annotator/hooks/AnnotationHooks.tsx`, `frontend/src/components/annotator/display/components/{Selection,SelectionBoundary}.tsx`, `frontend/src/components/annotator/renderers/pdf/{PDF,PDFPage,SelectionLayer}.tsx`, `frontend/src/components/annotator/renderers/txt/TxtAnnotator.tsx`, `frontend/src/components/annotator/components/wrappers/TxtAnnotatorWrapper.tsx`, `frontend/src/components/annotator/components/modals/CreateUrlAnnotationModal.tsx` (new), `frontend/src/components/knowledge_base/document/DocumentKnowledgeBase.tsx`, `frontend/src/components/knowledge_base/document/document_kb/DocumentViewer.tsx`, `frontend/src/utils/transform.tsx`). Annotations carrying the new `OC_URL` label render as hyperlinks (underline + external-link icon, pointer cursor) and open their `link_url` on click in both the PDF viewer and the text/markdown viewer. The `Annotation.link_url` URL field (max 2048, nullable) carries the target; an "Add link…" item in the selection action menu prompts for a URL and calls a new `add_url_annotation` GraphQL mutation that auto-creates the `OC_URL` label per-corpus via `Corpus.ensure_label_and_labelset` (mirroring `OC_SECTION`). The pencil icon on an existing `OC_URL` annotation opens a URL-edit modal instead of the label modal. Held `Shift`/`Ctrl`/`Cmd` while clicking a link annotation falls back to the normal "toggle selection" behaviour so authors can still pick a link to delete or re-edit it. **Security**: `link_url` is validated both in `Annotation.save()` (always runs, regardless of `VALIDATE_ANNOTATION_JSON`) and in the GraphQL mutation/serializer layer via a shared `validate_link_url(...)` helper — only `http://`, `https://`, and site-relative `/...` paths are accepted, so `javascript:`, `data:`, and other dangerous schemes are rejected before persistence and never reach `window.open`. External targets open in a new tab with `noopener,noreferrer`; site-relative paths navigate in the current tab so the SPA router can resolve them.

### Changed

- **Fork ≡ export+import: V2 parity refactor + roundtrip-loss fixes** (`opencontractserver/tasks/fork_tasks.py`, `opencontractserver/tasks/export_tasks_v2.py`, `opencontractserver/utils/{export_v2,import_v2,etl}.py`, `opencontractserver/tests/{test_corpus_export_import_v2.py,test_corpus_forking.py,test_ingestion_source.py}`, and shared fixture helpers `opencontractserver/tests/{_corpus_fixture,_corpus_snapshot}.py`). Replaces the legacy bespoke `fork_corpus` machinery with a thin shell that drives the V2 export → V2 import pipeline so the two code paths can no longer drift. `build_corpus_v2_zip` is the pure builder that `package_corpus_export_v2` (Celery task) now wraps; `import_corpus_v2_from_bytes` is the in-process entry point that `fork_corpus` invokes after `build_corpus_v2_zip`. Restores fields that previously dropped on round-trip (manual-metadata `Fieldset` / `Column` / `Datacell` rows, `IngestionSource` rows, `DocumentPath` and `CorpusFolder` snapshots, structural-set membership) and adds a three-roundtrip invariant test (`TestV2ThreeRoundTripDataIntegrity`) plus error-handling coverage (`CorpusForkErrorHandlingTest`).
Expand Down
159 changes: 158 additions & 1 deletion config/graphql/annotation_mutations.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
import logging

import graphene
from django.core.exceptions import ObjectDoesNotExist
from django.core.exceptions import ObjectDoesNotExist, ValidationError
from django.db import transaction
from graphene.types.generic import GenericScalar
from graphql_jwt.decorators import login_required
Expand All @@ -25,6 +25,13 @@
Annotation,
Note,
Relationship,
validate_link_url,
)
from opencontractserver.constants.annotations import (
OC_URL_LABEL,
OC_URL_LABEL_COLOR,
OC_URL_LABEL_DESCRIPTION,
OC_URL_LABEL_ICON,
)
from opencontractserver.corpuses.models import Corpus
from opencontractserver.documents.models import Document, DocumentPath
Expand Down Expand Up @@ -215,6 +222,21 @@ def mutate(root, info, annotation_id, comment=None) -> "ApproveAnnotation":
)


def _format_link_url_error(exc: ValidationError) -> str:
"""Surface a stable, human-readable link_url validation error.

``str(ValidationError({"link_url": "..."}))`` returns a Python
``[" {'link_url': ['...']} "]`` string that leaks internal structure.
Pull the first message off the dict so the user sees a clean sentence.
"""
detail = getattr(exc, "message_dict", None)
if detail:
messages = detail.get("link_url", []) or []
if messages:
return str(messages[0])
return "link_url failed validation."


def _resolve_annotation_parents(
user, corpus_pk: int | str, document_pk: int | str
) -> tuple["Document", "Corpus"] | None:
Expand Down Expand Up @@ -278,6 +300,13 @@ class Arguments:
required=False,
description="Optional markdown description for this annotation.",
)
link_url = graphene.String(
required=False,
description=(
"Optional URL opened on click. Restricted to http(s):// or "
"site-relative paths; intended for OC_URL annotations."
),
)

ok = graphene.Boolean()
message = graphene.String()
Expand All @@ -296,13 +325,22 @@ def mutate(
annotation_label_id,
annotation_type,
long_description=None,
link_url=None,
) -> "AddAnnotation":
corpus_pk = from_global_id(corpus_id)[1]
document_pk = from_global_id(document_id)[1]
label_pk = from_global_id(annotation_label_id)[1]

user = info.context.user

if link_url:
try:
validate_link_url(link_url)
except ValidationError as exc:
return AddAnnotation(
ok=False, annotation=None, message=_format_link_url_error(exc)
)

parents = _resolve_annotation_parents(user, corpus_pk, document_pk)
if parents is None:
return AddAnnotation(
Expand All @@ -322,6 +360,10 @@ def mutate(
creator=user,
json=json,
annotation_type=annotation_type.value,
# Normalise empty string to None so the column ends up NULL
# (the ``if link_url:`` guard above only protects the validator
# call, not the persisted value).
link_url=link_url or None,
)
annotation.save()
set_permissions_for_obj_to_user(user, annotation, [PermissionTypes.CRUD])
Expand All @@ -331,6 +373,113 @@ def mutate(
)


class AddUrlAnnotation(graphene.Mutation):
"""Create an annotation labelled ``OC_URL`` with a click-through URL.

Convenience wrapper over ``AddAnnotation``: ensures the corpus has an
``OC_URL`` label (creating it if absent) and stamps ``link_url`` on the
resulting annotation so the frontend renders the highlighted text as a
clickable hyperlink.
"""

class Arguments:
json = GenericScalar(
required=True, description="New-style JSON for multipage annotations."
)
page = graphene.Int(
required=True, description="What page is this annotation on (0-indexed)."
)
raw_text = graphene.String(
required=True, description="The raw text being linked."
)
corpus_id = graphene.String(
required=True, description="ID of the corpus this annotation is for."
)
document_id = graphene.String(
required=True, description="ID of the document this annotation is on."
)
annotation_type = graphene.Argument(
graphene.Enum.from_enum(LabelType),
required=True,
description="Annotation type: TOKEN_LABEL for PDFs, SPAN_LABEL for text.",
)
link_url = graphene.String(
required=True,
description="The target URL to open on click.",
)

ok = graphene.Boolean()
message = graphene.String()
annotation = graphene.Field(AnnotationType)

@login_required
@graphql_ratelimit_dynamic(get_rate=get_user_tier_rate("WRITE_LIGHT"))
def mutate(
root,
info,
json,
page,
raw_text,
corpus_id,
document_id,
annotation_type,
link_url,
) -> "AddUrlAnnotation":
Comment thread
github-code-quality[bot] marked this conversation as resolved.
Fixed
Comment thread
github-code-quality[bot] marked this conversation as resolved.
Fixed
Comment on lines +417 to +427
corpus_pk = from_global_id(corpus_id)[1]
document_pk = from_global_id(document_id)[1]

user = info.context.user

try:
validate_link_url(link_url)
except ValidationError as exc:
return AddUrlAnnotation(
ok=False, annotation=None, message=_format_link_url_error(exc)
)

parents = _resolve_annotation_parents(user, corpus_pk, document_pk)
if parents is None:
return AddUrlAnnotation(
ok=False,
annotation=None,
message=_ANNOTATION_PARENT_NOT_FOUND_MSG,
)
document, corpus = parents

with transaction.atomic():
# ``ensure_label_and_labelset`` is idempotent per (text, label_type).
# PDF (TOKEN_LABEL) and text (SPAN_LABEL) documents each get their
# own OC_URL row — the lookup filters on both fields, so flipping
# types between calls cannot return a label of the wrong shape to
# the renderer.
label = corpus.ensure_label_and_labelset(
label_text=OC_URL_LABEL,
creator_id=user.pk,
label_type=annotation_type.value,
color=OC_URL_LABEL_COLOR,
icon=OC_URL_LABEL_ICON,
description=OC_URL_LABEL_DESCRIPTION,
)

annotation = Annotation(
page=page,
raw_text=raw_text,
corpus_id=corpus.pk,
document_id=document.pk,
annotation_label_id=label.pk,
creator=user,
json=json,
annotation_type=annotation_type.value,
link_url=link_url,
)
annotation.save()
set_permissions_for_obj_to_user(user, annotation, [PermissionTypes.CRUD])

return AddUrlAnnotation(
ok=True, message="URL annotation created", annotation=annotation
)


class AddDocTypeAnnotation(graphene.Mutation):
class Arguments:
corpus_id = graphene.String(
Expand Down Expand Up @@ -727,6 +876,14 @@ class Arguments:
long_description = graphene.String()
json = GenericScalar()
annotation_label = graphene.String()
link_url = graphene.String(
required=False,
description=(
"Optional click-through URL for OC_URL annotations. Pass an "
"empty string to clear an existing URL. Restricted to "
"http(s):// or site-relative paths."
),
)


class UpdateRelations(graphene.Mutation):
Expand Down
2 changes: 2 additions & 0 deletions config/graphql/mutations.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@
AddAnnotation,
AddDocTypeAnnotation,
AddRelationship,
AddUrlAnnotation,
ApproveAnnotation,
CreateNote,
DeleteNote,
Expand Down Expand Up @@ -236,6 +237,7 @@ class Mutation(graphene.ObjectType):

# ANNOTATION MUTATIONS ######################################################
add_annotation = AddAnnotation.Field()
add_url_annotation = AddUrlAnnotation.Field()
remove_annotation = RemoveAnnotation.Field()
update_annotation = UpdateAnnotation.Field()
add_doc_type_annotation = AddDocTypeAnnotation.Field()
Expand Down
16 changes: 16 additions & 0 deletions config/graphql/serializers.py
Original file line number Diff line number Diff line change
Expand Up @@ -210,9 +210,25 @@ class Meta:
"creator_id",
"parent",
"parent_id",
"link_url",
]
read_only_fields = ["id", "creator", "parent"]

def validate_link_url(self, value: str | None) -> str | None:
"""Normalise empty strings to None and reject unsafe schemes.

The frontend sends `link_url=""` to clear an existing URL; convert
that to None so the column ends up NULL. All non-empty values flow
through ``Annotation.validate_link_url`` to block ``javascript:``
and other dangerous schemes before reaching persistence.
"""
from opencontractserver.annotations.models import validate_link_url as _validate

if not value:
return None
_validate(value)
return value

def create(self, validated_data: dict) -> Annotation:
"""
Create a new `Annotation` instance, mapping `creator_id` and `parent_id` to their respective
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
15 changes: 15 additions & 0 deletions frontend/src/assets/configurations/constants.ts
Original file line number Diff line number Diff line change
Expand Up @@ -262,6 +262,21 @@ export const DOCUMENT_ANNOTATION_INDEX_MAX_DEPTH = 6;
// Keep in sync with opencontractserver/constants/annotations.py
export const STRUCTURAL_LABEL_PREFIX = "OC_";
export const OC_SECTION_LABEL = "OC_SECTION";
// Annotations carrying the OC_URL label render as clickable hyperlinks; their
// ``linkUrl`` field is opened on click. Keep in sync with
// opencontractserver/constants/annotations.py.
export const OC_URL_LABEL = "OC_URL";
// Sentinel id used when constructing a placeholder OC_URL ``AnnotationLabel``
// before the server has assigned a real id (e.g. when the user is creating
// a new link annotation from a selection). Surfaced as a named constant so
// downstream code that needs to recognise "this label is still pending"
// has a single source of truth instead of comparing against a raw string.
export const PENDING_OC_URL_LABEL_ID = "__pending_oc_url__";
// Default presentation for the OC_URL label. Mirrors the backend constants
// (``opencontractserver/constants/annotations.py``) so the placeholder used
// before the server has assigned a real label, the renderer's hyperlink
// styling, and the auto-created server-side label all agree.
export const OC_URL_LABEL_COLOR = "#2563EB";

// Document search/picker limits
export const DOCUMENT_PICKER_SEARCH_LIMIT = 20;
Expand Down
Loading