Add OC_URL annotations: clickable hyperlinks anchored to highlighted text#1648
Add OC_URL annotations: clickable hyperlinks anchored to highlighted text#1648JSv4 wants to merge 17 commits into
Conversation
…text Annotations carrying the new OC_URL label render as hyperlinks (underline + external-link icon, pointer cursor) and open their link_url on click in both the PDF viewer and the text/markdown viewer. Backend: new Annotation.link_url URLField (max 2048, nullable) with migration 0072. New add_url_annotation GraphQL mutation auto-ensures the OC_URL label per-corpus via Corpus.ensure_label_and_labelset (mirrors the OC_SECTION pattern). Both add_annotation and update_annotation now accept link_url. Validation lives in a shared validate_link_url() helper invoked from Annotation.save() unconditionally and from the GraphQL mutation / DRF serializer layers — only http(s) and site-relative paths are allowed, so javascript: / data: schemes are rejected before persistence. Frontend: linkUrl propagates through RawServerAnnotationType, the ServerSpanAnnotation / ServerTokenAnnotation classes, the primary annotation fetch queries (GET_DOCUMENT_KNOWLEDGE_AND_ANNOTATIONS, GET_ANNOTATIONS, GET_ANNOTATIONS_FOR_ANALYSIS, GET_ANNOTATIONS_FOR_CARDS, and similar), and the create/update mutations. A new useCreateUrlAnnotation hook fronts the addUrlAnnotation mutation; create/url-create handlers are threaded as props (DocumentKnowledgeBase → DocumentViewer → PDF → PDFPage → SelectionLayer; TxtAnnotatorWrapper → TxtAnnotator) so the renderers stay decoupled from Apollo and remain unit-testable. The selection action menu gains an "Add link…" entry in both PDF and text viewers, which opens a shared CreateUrlAnnotationModal that prompts for a URL. The pencil icon on an existing OC_URL annotation opens the URL-edit modal instead of the label-edit modal. Holding Shift / Ctrl / Cmd while clicking a link annotation falls back to the normal toggle-selection behaviour. External URLs open in a new tab with noopener,noreferrer; site-relative paths navigate in the current tab so the SPA router can resolve them. A defensive isSafeUrl mirror of the backend allow-list lives in the renderer so a stale cached unsafe URL still won't reach window.open.
Code Review — PR #1648: OC_URL Link AnnotationsOverviewThis is a well-designed, security-conscious feature that wires clickable hyperlinks through the full stack. The layered URL validation approach (model Bug —
|
types-requests upgrade made requests.post/get expect MutableMapping[str, str | bytes], which is invariant in values, so dict[str, str] no longer matches. Broaden header annotations and maybe_add_cloud_run_auth() signature accordingly.
Covers validate_link_url (model + GraphQL), addUrlAnnotation (happy path, label idempotency, IDOR, unsafe schemes), addAnnotation linkUrl argument, and updateAnnotation linkUrl clearing/validation.
Unit tests:
- urlAnnotation.ts: isUrlAnnotation matrix + openAnnotationUrl
allow-list (http(s)/relative), refusal of javascript: and data: URLs
- useCreateUrlAnnotation hook: corpus-missing short-circuit, blank URL
short-circuit, success path persistence with server linkUrl, server
ok=false path
Component tests (Playwright):
- CreateUrlAnnotationModal: visibility, create/edit headers, empty +
unsafe scheme validation, onConfirm/onCancel callbacks, Enter to
submit, trimmed URL, site-relative paths.
…lidation in clean/save
Constants:
- Add OC_URL_LABEL_COLOR / OC_URL_LABEL_ICON / OC_URL_LABEL_DESCRIPTION
in opencontractserver/constants/annotations.py and use them in
AddUrlAnnotation.mutate (no more magic values).
Model:
- Annotation.clean() now validates link_url BEFORE the JSON-validation
early-return so the check is always run regardless of the
VALIDATE_ANNOTATION_JSON flag.
- Annotation.save() only re-validates link_url when clean() was
skipped (VALIDATE_ANNOTATION_JSON=False), removing the redundant
double-call.
…eep any-baseline gate
Code Review: OC_URL Link Annotations (#1648)OverviewThis is a well-structured feature that adds clickable hyperlink annotations anchored to highlighted text in both the PDF and text viewers. The security considerations are clearly thought through (defense-in-depth URL validation on both backend and frontend, IssuesBug:
The custom Fix: use
In both except Exception as exc:
return AddAnnotation(ok=False, annotation=None, message=str(exc))
Redundant link_url=link_url or None,The Minor IssuesTwo separate lucide-react imports in
Asymmetric Positives
|
|
placeholder test comment - will be replaced |
CLAUDE.mdThis file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. Project OverviewOpenContracts is an MIT-licensed enterprise document analytics platform for PDFs and text-based formats. It features a Django/GraphQL backend with PostgreSQL + pgvector, a React/TypeScript frontend with Jotai state management, and pluggable document processing pipelines powered by machine learning models. Baseline Commit Rules
Essential CommandsBackend (Django)# Run backend tests (sequential, use --keepdb to speed up subsequent runs)
docker compose -f test.yml run django python manage.py test --keepdb
# Run backend tests in PARALLEL (recommended - ~4x faster)
# Uses pytest-xdist with 4 workers, --dist loadscope keeps class tests together
docker compose -f test.yml run django pytest -n 4 --dist loadscope
# Run parallel tests with auto-detected worker count (uses all CPU cores)
docker compose -f test.yml run django pytest -n auto --dist loadscope
# Run parallel tests with fresh databases (first run or after schema changes)
docker compose -f test.yml run django pytest -n 4 --dist loadscope --create-db
# Run specific test file
docker compose -f test.yml run django python manage.py test opencontractserver.tests.test_notifications --keepdb
# Run specific test file in parallel
docker compose -f test.yml run django pytest opencontractserver/tests/test_notifications.py -n 4 --dist loadscope
# Run specific test class/method
docker compose -f test.yml run django python manage.py test opencontractserver.tests.test_notifications.TestNotificationModel.test_create_notification --keepdb
# Apply database migrations
docker compose -f local.yml run django python manage.py migrate
# Create new migration
docker compose -f local.yml run django python manage.py makemigrations
# Django shell
docker compose -f local.yml run django python manage.py shell
# Code quality (runs automatically via pre-commit hooks)
pre-commit run --all-filesFrontend (React/TypeScript)cd frontend
# Start development server (proxies to Django on :8000)
yarn start
# Run unit tests (Vitest) - watches by default
yarn test:unit
# Run component tests (Playwright) - CRITICAL: Use --reporter=list to prevent hanging
yarn test:ct --reporter=list
# Run component tests with grep filter
yarn test:ct --reporter=list -g "test name pattern"
# Run E2E tests
yarn test:e2e
# Coverage reports (unit tests via Vitest, component tests via Playwright + Istanbul)
yarn test:coverage:unit
yarn test:coverage:ct
# Linting and formatting
yarn lint
yarn fix-styles
# Build for production
yarn build
# Preview production build locally
yarn serveProduction Deployment# CRITICAL: Always run migrations FIRST in production
docker compose -f production.yml --profile migrate up migrate
# Then start main services
docker compose -f production.yml upHigh-Level ArchitectureBackend ArchitectureStack: Django 4.x + GraphQL (Graphene) + PostgreSQL + pgvector + Celery Key Patterns:
Frontend ArchitectureStack: React 18 + TypeScript + Apollo Client + Jotai (atoms) + PDF.js + Vite Key Patterns:
Data Flow ArchitectureDocument Processing:
GraphQL Permission Flow:
Critical Security Patterns
Critical Concepts
Testing PatternsManual Test ScriptsLocation: When performing manual testing (e.g., testing migrations, verifying database state, testing API endpoints interactively), always document the test steps in a markdown file under Format: # Test: [Brief description]
## Purpose
What this test verifies.
## Prerequisites
- Required state (e.g., "migration at 0058")
- Required data (e.g., "at least one document exists")
## Steps
1. Step one with exact command
```bash
docker compose -f local.yml run --rm django python manage.py shell -c "..."
Expected Results
CleanupCommands to restore original state if needed. Automated Documentation ScreenshotsLocation: Screenshots for documentation are captured during Playwright component tests and pushed back to a PR branch by the How it works:
Naming convention (
At least 2 segments required, 3 recommended. All lowercase alphanumeric with single hyphens. Example: import { docScreenshot } from "./utils/docScreenshot";
// After the component renders and assertions pass:
await docScreenshot(page, "badges--celebration-modal--auto-award");Rules:
Release Screenshots (Point-in-Time)For release notes, use Location: import { releaseScreenshot } from "./utils/docScreenshot";
await releaseScreenshot(page, "v3.0.0.b3", "landing-page", { fullPage: true });Key differences from
When to use which:
Authenticated Playwright Testing (Live Frontend Debugging)When you need to interact with the running frontend as an authenticated user (e.g., debugging why a query returns empty results), use Django admin session cookies to authenticate GraphQL requests. Architecture context: The frontend uses Auth0 for authentication, but the Django backend also accepts session cookie auth. Apollo Client sends GraphQL requests directly to Step 1: Set a password for the superuser (one-time setup): docker compose -f local.yml exec django python manage.py shell -c "
from django.contrib.auth import get_user_model
User = get_user_model()
u = User.objects.filter(is_superuser=True).first()
u.set_password('testpass123')
u.save()
print(f'Password set for {u.username}')
"Step 2: Playwright script pattern: const { chromium } = require('playwright');
(async () => {
const browser = await chromium.launch({ headless: true });
const context = await browser.newContext();
const page = await context.newPage();
// Collect console messages for debugging
const consoleMsgs = [];
page.on('console', msg => consoleMsgs.push('[' + msg.type() + '] ' + msg.text()));
// 1. Login to Django admin to get session cookie
await page.goto('http://localhost:8000/admin/login/');
await page.fill('#id_username', '<superuser-username>');
await page.fill('#id_password', 'testpass123');
await page.click('input[type=submit]');
await page.waitForTimeout(2000);
// 2. Extract the session cookie
const cookies = await context.cookies();
const sessionCookie = cookies.find(c => c.name === 'sessionid');
// 3. Intercept GraphQL requests to inject the session cookie
// (needed because Apollo sends cross-origin requests to :8000)
await page.route('**/graphql/**', async (route) => {
const headers = {
...route.request().headers(),
'Cookie': 'sessionid=' + sessionCookie.value,
};
await route.continue({ headers });
});
// 4. Navigate to the frontend page under test
await page.goto('http://localhost:5173/extracts');
await page.waitForTimeout(5000);
// 5. Inspect results
const bodyText = await page.textContent('body');
console.log(bodyText);
await browser.close();
})();Run from the frontend directory (where cd frontend && node /path/to/script.jsKey details:
Alternative — create a session programmatically (no admin login needed): docker compose -f local.yml exec django python manage.py shell -c "
from django.contrib.sessions.backends.db import SessionStore
from django.contrib.auth import get_user_model
User = get_user_model()
user = User.objects.filter(is_superuser=True).first()
session = SessionStore()
session['_auth_user_id'] = str(user.pk)
session['_auth_user_backend'] = 'django.contrib.auth.backends.ModelBackend'
session['_auth_user_hash'] = user.get_session_auth_hash()
session.save()
print(f'Session key: {session.session_key}')
"Then use the printed session key directly in curl or Playwright route interception. Documentation Locations
Branch StrategyThis project follows trunk-based development:
Changelog MaintenanceIMPORTANT: Always update The changelog follows Keep a Changelog format: ## [Unreleased] - YYYY-MM-DD
### Added
- New features
### Fixed
- Bug fixes with file locations and line numbers
### Changed
- Changes to existing functionality
### Technical Details
- Implementation specifics, architectural notesWhen to update:
What to include:
Pre-commit HooksAutomatically run on commit:
Run manually: Common Pitfalls
|
|
test2 |
|
test |
1 similar comment
|
test |
- Annotation.link_url is now CharField (was URLField). URLField
attaches Django's URLValidator which runs at clean_fields() time
and rejects site-relative paths like /corpus/foo before our own
validate_link_url (in clean()) gets a chance to allow them.
Migration 0072 updated to the same shape.
- AddAnnotation / AddUrlAnnotation catch ValidationError instead
of bare Exception, and surface a clean human-readable message via
_format_link_url_error (extracts the dict-detail's first message
instead of the noisy ["{'link_url': ['...']}"] repr).
- Empty-string normalisation comment clarified on AddAnnotation so
the link_url or None is no longer dead code on re-read.
|
test |
| def mutate( | ||
| root, | ||
| info, | ||
| json, | ||
| page, | ||
| raw_text, | ||
| corpus_id, | ||
| document_id, | ||
| annotation_type, | ||
| link_url, | ||
| ) -> "AddUrlAnnotation": |
|
hello |
Code Review: OC_URL Clickable Hyperlink AnnotationsThis is a well-designed feature with particularly strong security coverage for a user-controlled URL reflection path. The three-layer validation approach (model save → mutation layer → frontend renderer) is exactly right for this attack surface. A few items below worth addressing before merge. OverviewAdds What Works Well
Issues1. TypeScript type mismatch —
|
CLAUDE.mdThis file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. Project OverviewOpenContracts is an MIT-licensed enterprise document analytics platform for PDFs and text-based formats. It features a Django/GraphQL backend with PostgreSQL + pgvector, a React/TypeScript frontend with Jotai state management, and pluggable document processing pipelines powered by machine learning models. Baseline Commit Rules
Essential CommandsBackend (Django)# Run backend tests (sequential, use --keepdb to speed up subsequent runs)
docker compose -f test.yml run django python manage.py test --keepdb
# Run backend tests in PARALLEL (recommended - ~4x faster)
# Uses pytest-xdist with 4 workers, --dist loadscope keeps class tests together
docker compose -f test.yml run django pytest -n 4 --dist loadscope
# Run parallel tests with auto-detected worker count (uses all CPU cores)
docker compose -f test.yml run django pytest -n auto --dist loadscope
# Run parallel tests with fresh databases (first run or after schema changes)
docker compose -f test.yml run django pytest -n 4 --dist loadscope --create-db
# Run specific test file
docker compose -f test.yml run django python manage.py test opencontractserver.tests.test_notifications --keepdb
# Run specific test file in parallel
docker compose -f test.yml run django pytest opencontractserver/tests/test_notifications.py -n 4 --dist loadscope
# Run specific test class/method
docker compose -f test.yml run django python manage.py test opencontractserver.tests.test_notifications.TestNotificationModel.test_create_notification --keepdb
# Apply database migrations
docker compose -f local.yml run django python manage.py migrate
# Create new migration
docker compose -f local.yml run django python manage.py makemigrations
# Django shell
docker compose -f local.yml run django python manage.py shell
# Code quality (runs automatically via pre-commit hooks)
pre-commit run --all-filesFrontend (React/TypeScript)cd frontend
# Start development server (proxies to Django on :8000)
yarn start
# Run unit tests (Vitest) - watches by default
yarn test:unit
# Run component tests (Playwright) - CRITICAL: Use --reporter=list to prevent hanging
yarn test:ct --reporter=list
# Run component tests with grep filter
yarn test:ct --reporter=list -g "test name pattern"
# Run E2E tests
yarn test:e2e
# Coverage reports (unit tests via Vitest, component tests via Playwright + Istanbul)
yarn test:coverage:unit
yarn test:coverage:ct
# Linting and formatting
yarn lint
yarn fix-styles
# Build for production
yarn build
# Preview production build locally
yarn serveProduction Deployment# CRITICAL: Always run migrations FIRST in production
docker compose -f production.yml --profile migrate up migrate
# Then start main services
docker compose -f production.yml upHigh-Level ArchitectureBackend ArchitectureStack: Django 4.x + GraphQL (Graphene) + PostgreSQL + pgvector + Celery Key Patterns:
Frontend ArchitectureStack: React 18 + TypeScript + Apollo Client + Jotai (atoms) + PDF.js + Vite Key Patterns:
Data Flow ArchitectureDocument Processing:
GraphQL Permission Flow:
Critical Security Patterns
Critical Concepts
Testing PatternsManual Test ScriptsLocation: When performing manual testing (e.g., testing migrations, verifying database state, testing API endpoints interactively), always document the test steps in a markdown file under Format: # Test: [Brief description]
## Purpose
What this test verifies.
## Prerequisites
- Required state (e.g., "migration at 0058")
- Required data (e.g., "at least one document exists")
## Steps
1. Step one with exact command
```bash
docker compose -f local.yml run --rm django python manage.py shell -c "..."
Expected Results
CleanupCommands to restore original state if needed. Automated Documentation ScreenshotsLocation: Screenshots for documentation are captured during Playwright component tests and pushed back to a PR branch by the How it works:
Naming convention (
At least 2 segments required, 3 recommended. All lowercase alphanumeric with single hyphens. Example: import { docScreenshot } from "./utils/docScreenshot";
// After the component renders and assertions pass:
await docScreenshot(page, "badges--celebration-modal--auto-award");Rules:
Release Screenshots (Point-in-Time)For release notes, use Location: import { releaseScreenshot } from "./utils/docScreenshot";
await releaseScreenshot(page, "v3.0.0.b3", "landing-page", { fullPage: true });Key differences from
When to use which:
Authenticated Playwright Testing (Live Frontend Debugging)When you need to interact with the running frontend as an authenticated user (e.g., debugging why a query returns empty results), use Django admin session cookies to authenticate GraphQL requests. Architecture context: The frontend uses Auth0 for authentication, but the Django backend also accepts session cookie auth. Apollo Client sends GraphQL requests directly to Step 1: Set a password for the superuser (one-time setup): docker compose -f local.yml exec django python manage.py shell -c "
from django.contrib.auth import get_user_model
User = get_user_model()
u = User.objects.filter(is_superuser=True).first()
u.set_password('testpass123')
u.save()
print(f'Password set for {u.username}')
"Step 2: Playwright script pattern: const { chromium } = require('playwright');
(async () => {
const browser = await chromium.launch({ headless: true });
const context = await browser.newContext();
const page = await context.newPage();
// Collect console messages for debugging
const consoleMsgs = [];
page.on('console', msg => consoleMsgs.push('[' + msg.type() + '] ' + msg.text()));
// 1. Login to Django admin to get session cookie
await page.goto('http://localhost:8000/admin/login/');
await page.fill('#id_username', '<superuser-username>');
await page.fill('#id_password', 'testpass123');
await page.click('input[type=submit]');
await page.waitForTimeout(2000);
// 2. Extract the session cookie
const cookies = await context.cookies();
const sessionCookie = cookies.find(c => c.name === 'sessionid');
// 3. Intercept GraphQL requests to inject the session cookie
// (needed because Apollo sends cross-origin requests to :8000)
await page.route('**/graphql/**', async (route) => {
const headers = {
...route.request().headers(),
'Cookie': 'sessionid=' + sessionCookie.value,
};
await route.continue({ headers });
});
// 4. Navigate to the frontend page under test
await page.goto('http://localhost:5173/extracts');
await page.waitForTimeout(5000);
// 5. Inspect results
const bodyText = await page.textContent('body');
console.log(bodyText);
await browser.close();
})();Run from the frontend directory (where cd frontend && node /path/to/script.jsKey details:
Alternative — create a session programmatically (no admin login needed): docker compose -f local.yml exec django python manage.py shell -c "
from django.contrib.sessions.backends.db import SessionStore
from django.contrib.auth import get_user_model
User = get_user_model()
user = User.objects.filter(is_superuser=True).first()
session = SessionStore()
session['_auth_user_id'] = str(user.pk)
session['_auth_user_backend'] = 'django.contrib.auth.backends.ModelBackend'
session['_auth_user_hash'] = user.get_session_auth_hash()
session.save()
print(f'Session key: {session.session_key}')
"Then use the printed session key directly in curl or Playwright route interception. Documentation Locations
Branch StrategyThis project follows trunk-based development:
Changelog MaintenanceIMPORTANT: Always update The changelog follows Keep a Changelog format: ## [Unreleased] - YYYY-MM-DD
### Added
- New features
### Fixed
- Bug fixes with file locations and line numbers
### Changed
- Changes to existing functionality
### Technical Details
- Implementation specifics, architectural notesWhen to update:
What to include:
Pre-commit HooksAutomatically run on commit:
Run manually: Common Pitfalls
|
1 similar comment
CLAUDE.mdThis file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. Project OverviewOpenContracts is an MIT-licensed enterprise document analytics platform for PDFs and text-based formats. It features a Django/GraphQL backend with PostgreSQL + pgvector, a React/TypeScript frontend with Jotai state management, and pluggable document processing pipelines powered by machine learning models. Baseline Commit Rules
Essential CommandsBackend (Django)# Run backend tests (sequential, use --keepdb to speed up subsequent runs)
docker compose -f test.yml run django python manage.py test --keepdb
# Run backend tests in PARALLEL (recommended - ~4x faster)
# Uses pytest-xdist with 4 workers, --dist loadscope keeps class tests together
docker compose -f test.yml run django pytest -n 4 --dist loadscope
# Run parallel tests with auto-detected worker count (uses all CPU cores)
docker compose -f test.yml run django pytest -n auto --dist loadscope
# Run parallel tests with fresh databases (first run or after schema changes)
docker compose -f test.yml run django pytest -n 4 --dist loadscope --create-db
# Run specific test file
docker compose -f test.yml run django python manage.py test opencontractserver.tests.test_notifications --keepdb
# Run specific test file in parallel
docker compose -f test.yml run django pytest opencontractserver/tests/test_notifications.py -n 4 --dist loadscope
# Run specific test class/method
docker compose -f test.yml run django python manage.py test opencontractserver.tests.test_notifications.TestNotificationModel.test_create_notification --keepdb
# Apply database migrations
docker compose -f local.yml run django python manage.py migrate
# Create new migration
docker compose -f local.yml run django python manage.py makemigrations
# Django shell
docker compose -f local.yml run django python manage.py shell
# Code quality (runs automatically via pre-commit hooks)
pre-commit run --all-filesFrontend (React/TypeScript)cd frontend
# Start development server (proxies to Django on :8000)
yarn start
# Run unit tests (Vitest) - watches by default
yarn test:unit
# Run component tests (Playwright) - CRITICAL: Use --reporter=list to prevent hanging
yarn test:ct --reporter=list
# Run component tests with grep filter
yarn test:ct --reporter=list -g "test name pattern"
# Run E2E tests
yarn test:e2e
# Coverage reports (unit tests via Vitest, component tests via Playwright + Istanbul)
yarn test:coverage:unit
yarn test:coverage:ct
# Linting and formatting
yarn lint
yarn fix-styles
# Build for production
yarn build
# Preview production build locally
yarn serveProduction Deployment# CRITICAL: Always run migrations FIRST in production
docker compose -f production.yml --profile migrate up migrate
# Then start main services
docker compose -f production.yml upHigh-Level ArchitectureBackend ArchitectureStack: Django 4.x + GraphQL (Graphene) + PostgreSQL + pgvector + Celery Key Patterns:
Frontend ArchitectureStack: React 18 + TypeScript + Apollo Client + Jotai (atoms) + PDF.js + Vite Key Patterns:
Data Flow ArchitectureDocument Processing:
GraphQL Permission Flow:
Critical Security Patterns
Critical Concepts
Testing PatternsManual Test ScriptsLocation: When performing manual testing (e.g., testing migrations, verifying database state, testing API endpoints interactively), always document the test steps in a markdown file under Format: # Test: [Brief description]
## Purpose
What this test verifies.
## Prerequisites
- Required state (e.g., "migration at 0058")
- Required data (e.g., "at least one document exists")
## Steps
1. Step one with exact command
```bash
docker compose -f local.yml run --rm django python manage.py shell -c "..."
Expected Results
CleanupCommands to restore original state if needed. Automated Documentation ScreenshotsLocation: Screenshots for documentation are captured during Playwright component tests and pushed back to a PR branch by the How it works:
Naming convention (
At least 2 segments required, 3 recommended. All lowercase alphanumeric with single hyphens. Example: import { docScreenshot } from "./utils/docScreenshot";
// After the component renders and assertions pass:
await docScreenshot(page, "badges--celebration-modal--auto-award");Rules:
Release Screenshots (Point-in-Time)For release notes, use Location: import { releaseScreenshot } from "./utils/docScreenshot";
await releaseScreenshot(page, "v3.0.0.b3", "landing-page", { fullPage: true });Key differences from
When to use which:
Authenticated Playwright Testing (Live Frontend Debugging)When you need to interact with the running frontend as an authenticated user (e.g., debugging why a query returns empty results), use Django admin session cookies to authenticate GraphQL requests. Architecture context: The frontend uses Auth0 for authentication, but the Django backend also accepts session cookie auth. Apollo Client sends GraphQL requests directly to Step 1: Set a password for the superuser (one-time setup): docker compose -f local.yml exec django python manage.py shell -c "
from django.contrib.auth import get_user_model
User = get_user_model()
u = User.objects.filter(is_superuser=True).first()
u.set_password('testpass123')
u.save()
print(f'Password set for {u.username}')
"Step 2: Playwright script pattern: const { chromium } = require('playwright');
(async () => {
const browser = await chromium.launch({ headless: true });
const context = await browser.newContext();
const page = await context.newPage();
// Collect console messages for debugging
const consoleMsgs = [];
page.on('console', msg => consoleMsgs.push('[' + msg.type() + '] ' + msg.text()));
// 1. Login to Django admin to get session cookie
await page.goto('http://localhost:8000/admin/login/');
await page.fill('#id_username', '<superuser-username>');
await page.fill('#id_password', 'testpass123');
await page.click('input[type=submit]');
await page.waitForTimeout(2000);
// 2. Extract the session cookie
const cookies = await context.cookies();
const sessionCookie = cookies.find(c => c.name === 'sessionid');
// 3. Intercept GraphQL requests to inject the session cookie
// (needed because Apollo sends cross-origin requests to :8000)
await page.route('**/graphql/**', async (route) => {
const headers = {
...route.request().headers(),
'Cookie': 'sessionid=' + sessionCookie.value,
};
await route.continue({ headers });
});
// 4. Navigate to the frontend page under test
await page.goto('http://localhost:5173/extracts');
await page.waitForTimeout(5000);
// 5. Inspect results
const bodyText = await page.textContent('body');
console.log(bodyText);
await browser.close();
})();Run from the frontend directory (where cd frontend && node /path/to/script.jsKey details:
Alternative — create a session programmatically (no admin login needed): docker compose -f local.yml exec django python manage.py shell -c "
from django.contrib.sessions.backends.db import SessionStore
from django.contrib.auth import get_user_model
User = get_user_model()
user = User.objects.filter(is_superuser=True).first()
session = SessionStore()
session['_auth_user_id'] = str(user.pk)
session['_auth_user_backend'] = 'django.contrib.auth.backends.ModelBackend'
session['_auth_user_hash'] = user.get_session_auth_hash()
session.save()
print(f'Session key: {session.session_key}')
"Then use the printed session key directly in curl or Playwright route interception. Documentation Locations
Branch StrategyThis project follows trunk-based development:
Changelog MaintenanceIMPORTANT: Always update The changelog follows Keep a Changelog format: ## [Unreleased] - YYYY-MM-DD
### Added
- New features
### Fixed
- Bug fixes with file locations and line numbers
### Changed
- Changes to existing functionality
### Technical Details
- Implementation specifics, architectural notesWhen to update:
What to include:
Pre-commit HooksAutomatically run on commit:
Run manually: Common Pitfalls
|
Code Review: OC_URL Link AnnotationsOverall this is a well-designed feature with solid security foundations. The defence-in-depth approach is exactly right for a click-to-open flow. A few issues need addressing before merge. Bugs1. OC_URL annotations are silently hidden in the UnifiedContentFeed (critical) File: frontend/src/components/knowledge_base/document/unified_feed/UnifiedContentFeed.tsx:260 STRUCTURAL_LABEL_PREFIX = "OC_" and OC_URL starts with OC_, so every URL annotation is filtered out of the feed entirely. OC_URL is a user-authored annotation that should appear in the feed. This needs either a specific exclusion list (OC_SECTION, OC_EXTRACT_SOURCE) instead of a prefix check, or an explicit carve-out for OC_URL. 2. window.location.assign() for site-relative paths causes a full page reload (medium) File: frontend/src/components/annotator/utils/urlAnnotation.ts:64 window.location.assign() is a hard navigation that discards the Apollo cache and React Router history state. Every other navigation in the annotator uses useNavigate from react-router-dom. The cleanest fix is to accept an optional navigate callback from call sites; Selection.tsx and TxtAnnotator.tsx both already have access to a useNavigate reference they could pass through. Code Quality3. Duplicate lucide-react imports in TxtAnnotator.tsx Lines 37 and 40 both import from lucide-react. Link2 should be merged into the first import statement. 4. Hardcoded values in the SelectionLayer placeholder label Inside handleConfirmCreateLink in SelectionLayer.tsx the text "OC_URL" and color "#2563EB" are hardcoded as literals. Both values are already defined as constants in frontend/src/assets/configurations/constants.ts and opencontractserver/constants/annotations.py. Using literals creates drift risk. 5. Inconsistent label check in Selection.tsx The pencil icon click handler checks annotation.annotationLabel?.text === OC_URL_LABEL directly rather than using isUrlAnnotation(). Unlike the isLinkAnnotation variable (which correctly requires both the label AND a non-empty linkUrl), this branch opens the URL edit modal even when linkUrl is absent. That is arguably correct for the edit flow, but the inconsistency is confusing — a comment explaining the intentional difference would help future readers. 6. Inline styles in CreateUrlAnnotationModal.tsx The rest of the codebase uses styled-components; this modal uses raw inline style object props throughout. Minor inconsistency worth noting. Potential Issues7. URL normalization gap in validate_link_url opencontractserver/annotations/models.py:110: the function validates the stripped form but the caller stores the original self.link_url (possibly with leading/trailing whitespace). The frontend modal pre-trims before sending, so this is safe in practice, but a direct API call or serializer path could persist a URL with leading whitespace that renders as a broken link. Consider stripping in save() before calling validate_link_url. 8. mypy.ini blanket suppression for the new test file Several existing test files use this pattern. Worth a quick pass to see if the suppressed errors are real type gaps that should be fixed rather than ignored. What's Done Well
|
const x = {
id: foo,
}; |
…med sentinel id - Export isSafeUrl from urlAnnotation.ts and use it in CreateUrlAnnotationModal so the http(s)/relative allow-list lives in a single place on the frontend (renderer click handler + author modal share the same predicate). - Tighten NewUrlAnnotationOutputType.annotation.linkUrl to string | null to match the GraphQL schema (server field is nullable). Aligns with RawServerAnnotationType.linkUrl. - Extract the placeholder OC_URL label sentinel id into PENDING_OC_URL_LABEL_ID in constants.ts (was hard-coded as "__pending_oc_url__" in SelectionLayer.tsx). Also use OC_URL_LABEL for the placeholder's text instead of the raw string.
Code Review: OC_URL Link Annotations (#1648)Overall this is a well-designed feature with solid security foundations. The defence-in-depth approach (backend Bugs1. OC_URL annotations are silently hidden in the UnifiedContentFeed (critical)
The existing filter reads: // Always hide OC_* prefixed annotations (platform-generated structural labels)
if (ann.annotationLabel.text?.startsWith(STRUCTURAL_LABEL_PREFIX)) {
return;
}
2.
The PR description says site-relative paths "navigate in the current tab so the SPA router can resolve them", but export function openAnnotationUrl(
annotation: ServerTokenAnnotation | ServerSpanAnnotation,
navigate?: NavigateFn
): boolean {
// ...
if (isRelativePath) {
if (navigate) navigate(normalized);
else window.location.assign(normalized);
}
// ...
}
Code Quality3. Duplicate Lines 37 and 40 both import from 4. Hardcoded values in the SelectionLayer placeholder label Inside 5. Inconsistent label check in The pencil icon click handler uses: if (annotation.annotationLabel?.text === OC_URL_LABEL) {
setIsEditUrlModalVisible(true);
}This is a raw label text comparison rather than 6. Inline styles in The rest of the codebase uses styled-components; this modal uses raw Potential Issues7. URL normalization gap in
if self.link_url:
self.link_url = self.link_url.strip()
validate_link_url(self.link_url)8. Several existing test files use this pattern. Worth a quick pass to see if the suppressed errors are real type gaps that should be fixed rather than ignored. What's Done Well
|
Code Review — PR #1648: OC_URL Link AnnotationsOverviewThis PR adds clickable hyperlink annotations to both the PDF and text viewers. The implementation is well-structured: a IssuesBug — TxtAnnotator edit path cannot reach the URL editor for OC_URL annotations without a set URLIn In Suggestion: Align the TxtAnnotator check with the PDF one — check the label text rather than Gap —
|
…_url The pencil-icon handler in TxtAnnotator routed by isUrlAnnotation, which requires both the OC_URL label AND a non-empty linkUrl. That left OC_URL annotations with an unset linkUrl (e.g. created via the generic addAnnotation path) trapped in the label editor with no way to attach a URL. Aligns the TxtAnnotator check with the PDF Selection check — route on the label text directly so the URL editor opens whenever the user is editing an OC_URL annotation, regardless of current linkUrl value. Also drops a redundant .toLowerCase() call in isSafeUrl by caching the lowered string once.
OC_URL annotations carry their target URL in Annotation.link_url. Without round-trip support, forking a corpus or exporting + re-importing silently drops every link target. This commit adds: - OpenContractsAnnotationPythonType.link_url (NotRequired Optional) to the export type contract. - Export side: structural V2 (export_v2.build_corpus_v2_zip) and legacy / non-structural (etl.build_document_export) both emit link_url when it is non-NULL. - Import side: structural V2 (import_v2.import_corpus_v2_from_bytes) and the bulk import path (importing.import_annotations) both restore link_url on the freshly-created Annotation rows. Falsy / missing values fall through as NULL, so existing corpora without link annotations are unaffected.
Code Review — PR #1648: OC_URL Link AnnotationsOverall this is a well-scoped, well-described feature with good layering across backend and frontend. The security-first URL validation design is commendable. A few issues need attention before merge — one critical, a few majors, and several smaller items. 🔴 Security[CRITICAL] S1 — Protocol-relative URL Both the backend and frontend allow-lists check
Fix: tighten the path check in both places: # backend
if normalized.startswith("/") and not normalized.startswith("//"):
return # site-relative, OK// frontend
if (url.startsWith("/") && !url.startsWith("//")) return true;Also add [MAJOR] S2 — DRF serializer raises
Fix: from rest_framework import serializers as drf_serializers
from django.core.exceptions import ValidationError as DjangoValidationError
try:
_validate(value)
except DjangoValidationError as exc:
msgs = list(exc.messages)
raise drf_serializers.ValidationError(msgs[0] if msgs else "Invalid link URL.")[MINOR] S3 —
Fix: validate for ann in instances:
if ann.link_url:
validate_link_url(ann.link_url) # raises ValidationError early
Annotation.objects.bulk_create(instances)🟠 Bugs[MAJOR] B1 — OC_URL annotations are invisibly filtered out of
Fix: key the rule on the if (ann.annotationLabel.text?.startsWith(STRUCTURAL_LABEL_PREFIX)
&& ann.annotationLabel.text !== OC_URL_LABEL) {
return;
}Or, more robustly, rely solely on [MINOR] B2 — If every if (!Object.keys(json).length) {
toast.error("Could not determine annotation bounds — please try again.");
return;
}[MINOR] B3 — Misleading name:
🟡 Code Quality[MODERATE] Q1 — Deferred local import in # config/graphql/serializers.py
def validate_link_url(self, value):
from opencontractserver.annotations.models import validate_link_url as _validate
...There is no circular-import reason for this to be a deferred local import. Move it to the module-level imports block per standard Django convention. [MINOR] Q2 — Magic color The placeholder label color for OC_URL annotations is hardcoded as [MINOR] Q3 — Per CLAUDE.md: "Utility functions belong in utility files." This Django 🟡 Tests[CRITICAL given S1] T1 — No test for Neither [MAJOR given B1] T2 — No test verifying OC_URL annotations surface in Bug B1 exists entirely undetected because there is no test asserting that an OC_URL-labelled annotation appears in the feed. A simple unit test with a mocked annotation list would catch the regression. [MODERATE given S3] T3 — No test for A direct test of [MINOR] T4 — Weak conditional assertion in if "data" in result:
...This silently passes when the server returns a GraphQL-level error object. The database assertion below is the real safety net, but the intent should be made unconditional so a future refactor can't silently drop the guard. ✅ What's Good
Priority order for fixes before merge:
|
… whitespace trim Bugs: - UnifiedContentFeed previously filtered out *every* annotation whose label started with the OC_ prefix to hide structural / system annotations. OC_URL is user-authored and SHOULD appear, so this silently removed every link annotation from the feed. Add an explicit carve-out for OC_URL_LABEL. - openAnnotationUrl now accepts an optional navigate callback so call sites can route site-relative paths through useNavigate() instead of window.location.assign (a hard reload that discards Apollo cache + router state). PDF (Selection.tsx) and text (TxtAnnotator.tsx) call sites threaded through. The no-arg overload still falls back to window.location.assign for non-router contexts. Hygiene: - Extract OC_URL_LABEL_COLOR constant and use it in SelectionLayer placeholder. Was a hard-coded "#2563EB" literal that could drift from the backend OC_URL_LABEL_COLOR Python constant. - Annotation.clean() and Annotation.save() now .strip() link_url before validating + persisting, so a direct API call that submits leading/trailing whitespace doesn't store a broken URL.
Code Review — PR #1648: OC_URL Link AnnotationsOverviewThis PR adds clickable hyperlink annotations anchored to highlighted text in both PDF and text/markdown viewers. The feature is well-architected with solid defense-in-depth for security (scheme allow-listing at model, serializer, mutation, and frontend renderer layers). The test coverage is comprehensive. Overall this is a high-quality PR. Security — StrongThe multi-layer URL validation is a good pattern:
The Potential Bug — Export / Import Loses
|
…nAnnotationUrl
Test:
- test_update_rejects_unsafe_link_url default-True on .get("ok") so a
missing/absent updateAnnotation payload still fails the assertion.
Silent absence is no longer treated as success. Drop the outer
data-presence guard.
Hygiene:
- openAnnotationUrl trims the URL once and reuses the normalised form
for both the safety check and navigation, instead of letting isSafeUrl
trim again internally.
Code Review — PR #1648: OC_URL Link AnnotationsOverall this is a well-scoped, carefully layered feature. The defence-in-depth approach to URL validation (model validator → GraphQL layer → DRF serializer → frontend renderer) is exactly the right pattern for a security-sensitive field. Test coverage for the new code is high. The issues below are ordered by severity. High — Potential Bugs1. Protocol-relative URL bypass in
|
| Gap | Files |
|---|---|
//evil.com bypass (see #1 above) |
test_url_annotation.py, urlAnnotation.test.ts |
Whitespace-only link_url normalisation (see #2) |
test_url_annotation.py |
Export/import round-trip with link_url |
Currently no test covers etl.py, export_v2.py, import_v2.py changes |
Concurrent ensure_label_and_labelset (see #3) |
Would require a TransactionTestCase with threads |
Minor Nits
isPublicis hardcoded tofalseinuseCreateUrlAnnotation's local-state construction (pre-existing pattern fromuseCreateAnnotation). The server returns the real value; the client should readcreated.isPublicinstead.OC_URL_LABEL_ICONis exported from the backend constants but not fromfrontend/src/assets/configurations/constants.ts, breaking the "mirrors the backend constants" comment in that file. Since the icon isn't used in the frontend, just remove the comment or add the constant.
Summary
The security model is solid and the core implementation is correct. The highest-priority fix is the //evil.com protocol-relative bypass (#1), which is a real open-redirect / safe-URL bypass that affects both the backend validator and the frontend isSafeUrl check. The whitespace normalisation gap (#2) is also worth closing before merge. Everything else is code-quality cleanup that could land in a follow-up.
…only normalisation Critical security fix: - //evil.com is a protocol-relative URL — browsers resolve it as https://evil.com. It starts with / so the previous site-relative branch in validate_link_url (backend) and isSafeUrl (frontend) would accept it as safe, then navigate/window.location.assign would happily redirect to evil.com. Both now explicitly reject any string starting with // before falling through to the allow-list. Test gap: - Whitespace-only link_url (e.g. " ") was truthy as a Python string. Annotation.clean()/save() would strip it to "" and then validate_link_url("") returns clean — so the column ended up with an empty string instead of NULL. Both now collapse whitespace-only to None via self.link_url.strip() or None. Tests: - test_url_annotation.py: new tests for protocol-relative rejection and whitespace-only normalisation. - urlAnnotation.test.ts: new test for protocol-relative rejection.
Code Review: OC_URL Link Annotations (#1648)OverviewThis PR adds clickable hyperlink annotations ( Bugs1. In // ServerSpanAnnotation.update()
delta.linkUrl ?? this.linkUrl
// in update():
linkUrl: 'linkUrl' in delta ? delta.linkUrl : this.linkUrl2. In production ( Code Quality3. Inconsistent
if (annotation.annotationLabel?.text === OC_URL_LABEL) {
setIsEditUrlModalVisible(true);
}
4.
5. const handleSpanClick = linkAnnotation
? (event: React.MouseEvent) => { ... }
: undefined;
return (
<AnnotatedSpan ... onClick={handleSpanClick} />
);This creates a new function reference for every annotated span on every render. Wrap it with Minor Notes6. In combinedRawText += " " + pageAnnotation.rawText;The leading space is trimmed at the end ( 7. Unrelated type-annotation fixes in pipeline files Several pipeline files ( 8. [mypy-opencontractserver.tests.test_url_annotation]
ignore_errors = TrueAll other new test files in this repo are also exempted, so this is consistent. Worth fixing at some point but not a blocker for this PR. SecurityThe multi-layer validation approach is solid:
One suggestion: the Test Coverage
SummaryThe feature is well-designed and production-ready with the two bug fixes above (the |
Annotations carrying the new OC_URL label render as hyperlinks (underline +
external-link icon, pointer cursor) and open their link_url on click in both
the PDF viewer and the text/markdown viewer.
Backend: new Annotation.link_url URLField (max 2048, nullable) with
migration 0072. New add_url_annotation GraphQL mutation auto-ensures the
OC_URL label per-corpus via Corpus.ensure_label_and_labelset (mirrors the
OC_SECTION pattern). Both add_annotation and update_annotation now accept
link_url. Validation lives in a shared validate_link_url() helper invoked
from Annotation.save() unconditionally and from the GraphQL mutation /
DRF serializer layers — only http(s) and site-relative paths are allowed,
so javascript: / data: schemes are rejected before persistence.
Frontend: linkUrl propagates through RawServerAnnotationType, the
ServerSpanAnnotation / ServerTokenAnnotation classes, the primary
annotation fetch queries (GET_DOCUMENT_KNOWLEDGE_AND_ANNOTATIONS,
GET_ANNOTATIONS, GET_ANNOTATIONS_FOR_ANALYSIS, GET_ANNOTATIONS_FOR_CARDS,
and similar), and the create/update mutations. A new useCreateUrlAnnotation
hook fronts the addUrlAnnotation mutation; create/url-create handlers are
threaded as props (DocumentKnowledgeBase → DocumentViewer → PDF → PDFPage →
SelectionLayer; TxtAnnotatorWrapper → TxtAnnotator) so the renderers stay
decoupled from Apollo and remain unit-testable. The selection action menu
gains an "Add link…" entry in both PDF and text viewers, which opens a
shared CreateUrlAnnotationModal that prompts for a URL. The pencil icon on
an existing OC_URL annotation opens the URL-edit modal instead of the
label-edit modal. Holding Shift / Ctrl / Cmd while clicking a link
annotation falls back to the normal toggle-selection behaviour. External
URLs open in a new tab with noopener,noreferrer; site-relative paths
navigate in the current tab so the SPA router can resolve them. A defensive
isSafeUrl mirror of the backend allow-list lives in the renderer so a
stale cached unsafe URL still won't reach window.open.