Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Changed

- **Corpus Home document-tree loads ~order-of-magnitude faster** (`config/graphql/filters.py`, `frontend/src/graphql/queries.ts`, `frontend/src/components/corpuses/DocumentTableOfContents.tsx`). The "Loading document structure…" wait on Corpus Home was driven by two over-fetched GraphQL queries. (1) `GET_DOCUMENT_RELATIONSHIPS` (capped at `first: 500`) selected the full `sourceDocument`/`targetDocument` objects (title, description, fileType, **icon**, slug, creator), plus `corpus { …creator }`, `creator { …username }`, `annotationLabel { id, text, color, icon }`, `data`, `created`, `modified`, and `myPermissions`. `DocumentTableOfContents.tsx:535-611` then consumed only four of those fields (`relationshipType`, `annotationLabel.text`, `sourceDocument.id`, `targetDocument.id`) — every `icon` selection still triggered `build_absolute_uri()` via `create_file_resolver` and the per-row document payloads were thrown away because `GET_CORPUS_DOCUMENTS_FOR_TOC` already provided the same fields. Worse, all relationships came back regardless of type/label and were filtered client-side to `relationshipType === "RELATIONSHIP"` AND `annotationLabel.text.toLowerCase() === "parent"`. (2) `GET_CORPUS_DOCUMENTS_FOR_TOC` selected `icon` and `creator { id, slug }`, neither of which the TOC renders (`renderNode` derives the on-screen icon from `fileType` via `getFileIcon`, and creator metadata is never displayed). Fix: a new `annotation_label_text` `iexact` filter on `DocumentRelationshipFilter` (paired with the existing `relationship_type` filter) so the parent-only restriction now runs on the server; a new ultra-lean `GET_CORPUS_DOCUMENT_TOC_EDGES` query that returns just `{ id, sourceDocument { id }, targetDocument { id } }` and lives alongside (not in place of) the original `GET_DOCUMENT_RELATIONSHIPS` because `CorpusDocumentRelationships.tsx:314` and `DocumentRelationshipModal.tsx:611` still need the rich payload; `GET_CORPUS_DOCUMENTS_FOR_TOC` slimmed to `{ id, title, description, slug, fileType }`. `RunCorpusActionModal.tsx` (the only other consumer of the doc query) reads only `node.id` / `node.title`, so the field removal is safe. End result: for a 76-document corpus, the relationships payload drops from "76+ source-and-target document blobs" to "76 ID-only edge rows," up to 152 `build_absolute_uri()` icon-resolver calls and 76 description blobs are eliminated, and any non-parent relationships (notes, future link types) never leave the server.

- **Fork ≡ export+import: V2 parity refactor + roundtrip-loss fixes** (`opencontractserver/tasks/fork_tasks.py`, `opencontractserver/tasks/export_tasks_v2.py`, `opencontractserver/utils/{export_v2,import_v2,etl}.py`, `opencontractserver/tests/{test_corpus_export_import_v2.py,test_corpus_forking.py,test_ingestion_source.py}`, and shared fixture helpers `opencontractserver/tests/{_corpus_fixture,_corpus_snapshot}.py`). Replaces the legacy bespoke `fork_corpus` machinery with a thin shell that drives the V2 export → V2 import pipeline so the two code paths can no longer drift. `build_corpus_v2_zip` is the pure builder that `package_corpus_export_v2` (Celery task) now wraps; `import_corpus_v2_from_bytes` is the in-process entry point that `fork_corpus` invokes after `build_corpus_v2_zip`. Restores fields that previously dropped on round-trip (manual-metadata `Fieldset` / `Column` / `Datacell` rows, `IngestionSource` rows, `DocumentPath` and `CorpusFolder` snapshots, structural-set membership) and adds a three-roundtrip invariant test (`TestV2ThreeRoundTripDataIntegrity`) plus error-handling coverage (`CorpusForkErrorHandlingTest`).
- **Behavioural change:** `fork_corpus` no longer respects selective `doc_ids` / `annotation_ids` arguments. Any caller passing those now gets a *full* fork (with a `logger.warning`). No live caller in `opencontractserver/`, `config/`, or tests still passes selective args; legacy queued Celery tasks would still run safely (full fork instead of partial). A short note here flags the contract change for downstream forks of this repo.
- **Import-side correctness:** `import_metadata_schema` now clears its `column_map` on rollback so callers can't accidentally re-link freshly imported rows to pks that no longer exist.
Expand Down
4 changes: 4 additions & 0 deletions config/graphql/filters.py
Original file line number Diff line number Diff line change
Expand Up @@ -605,6 +605,10 @@ class Meta:
class DocumentRelationshipFilter(django_filters.FilterSet):
"""Filter set for DocumentRelationship model."""

annotation_label_text = filters.CharFilter(
field_name="annotation_label__text", lookup_expr="iexact"
)

class Meta:
model = DocumentRelationship
fields = [
Expand Down
8 changes: 8 additions & 0 deletions frontend/src/assets/configurations/constants.ts
Original file line number Diff line number Diff line change
Expand Up @@ -254,6 +254,14 @@ export const DOCUMENT_RELATIONSHIP_TOC_LIMIT = 500;
// Backend enforces max 100 records per page on documents connection
export const CORPUS_DOCUMENTS_TOC_LIMIT = 100;

// Document relationship type / label filters used by the corpus TOC tree.
// Mirrors the backend's `RELATIONSHIP_TYPE_CHOICES` for the "RELATIONSHIP"
// member and the conventional "parent" annotation label text. Used as
// GraphQL variables so the server-side filter restricts the edges to the
// hierarchy-defining rows only.
export const DOCUMENT_RELATIONSHIP_TYPE_RELATIONSHIP = "RELATIONSHIP";
export const DOCUMENT_RELATIONSHIP_LABEL_PARENT = "parent";

// Document annotation index (within-document TOC)
// Keep in sync with opencontractserver/constants/annotations.py
export const DOCUMENT_ANNOTATION_INDEX_LIMIT = 500;
Expand Down
74 changes: 26 additions & 48 deletions frontend/src/components/corpuses/DocumentTableOfContents.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,9 @@ import {
} from "lucide-react";

import {
GET_DOCUMENT_RELATIONSHIPS,
GetDocumentRelationshipsOutput,
GetDocumentRelationshipsInput,
DocumentRelationshipNode,
GET_CORPUS_DOCUMENT_TOC_EDGES,
GetCorpusDocumentTocEdgesInput,
GetCorpusDocumentTocEdgesOutput,
GET_CORPUS_DOCUMENTS_FOR_TOC,
GetCorpusDocumentsForTocInput,
GetCorpusDocumentsForTocOutput,
Expand All @@ -38,6 +37,8 @@ import { mediaQuery } from "./styles/corpusDesignTokens";
import {
DOCUMENT_RELATIONSHIP_TOC_LIMIT,
CORPUS_DOCUMENTS_TOC_LIMIT,
DOCUMENT_RELATIONSHIP_TYPE_RELATIONSHIP,
DOCUMENT_RELATIONSHIP_LABEL_PARENT,
} from "../../assets/configurations/constants";
import { DocumentAnnotationIndex } from "./DocumentAnnotationIndex";

Expand All @@ -60,7 +61,6 @@ interface DocumentNode {
description?: string;
fileType?: string;
slug?: string;
icon?: string;
children: DocumentNode[];
}

Expand Down Expand Up @@ -469,17 +469,22 @@ export const DocumentTableOfContents: React.FC<
// URL-driven expand all state
const expandAllFromUrl = useReactiveVar(tocExpandAll);

// Query for document relationships in this corpus
// Query for "parent"-labeled document relationship edges in this corpus.
// Uses the lean TOC-specific query that returns only source/target IDs and
// pushes the relationship_type/label filters to the server, so we don't
// fetch hundreds of unrelated rows or duplicate document metadata.
const {
data: relationshipsData,
loading: relationshipsLoading,
error: relationshipsError,
} = useQuery<GetDocumentRelationshipsOutput, GetDocumentRelationshipsInput>(
GET_DOCUMENT_RELATIONSHIPS,
} = useQuery<GetCorpusDocumentTocEdgesOutput, GetCorpusDocumentTocEdgesInput>(
GET_CORPUS_DOCUMENT_TOC_EDGES,
{
variables: {
corpusId,
first: DOCUMENT_RELATIONSHIP_TOC_LIMIT,
relationshipType: DOCUMENT_RELATIONSHIP_TYPE_RELATIONSHIP,
annotationLabelText: DOCUMENT_RELATIONSHIP_LABEL_PARENT,
},
skip: !corpusId,
fetchPolicy: "cache-and-network",
Expand Down Expand Up @@ -507,7 +512,13 @@ export const DocumentTableOfContents: React.FC<
const loading = relationshipsLoading || documentsLoading;
const error = relationshipsError || documentsError;

// Check if we've hit the limits (potential truncation)
// Check if we've hit the limits (potential truncation).
// NOTE: `relationshipTotalCount` here is the count of *parent-labeled
// RELATIONSHIP* rows only (the server-side filter narrows the queryset
// before it is counted). It is intentionally narrower than the legacy
// `GET_DOCUMENT_RELATIONSHIPS` total, which counted every relationship
// type — a corpus with 600 total relationships but only 50 parent ones
// would have triggered the old warning and won't trigger this one.
const relationshipTotalCount =
relationshipsData?.documentRelationships?.totalCount ?? 0;
const documentsTotalCount = documentsData?.documents?.totalCount ?? 0;
Expand All @@ -532,15 +543,8 @@ export const DocumentTableOfContents: React.FC<
};
}

// Filter to only "parent" labeled relationships
const parentRelationships = relationships
.map((e) => e.node)
.filter(
(rel): rel is DocumentRelationshipNode =>
rel != null &&
rel.relationshipType === "RELATIONSHIP" &&
rel.annotationLabel?.text?.toLowerCase() === "parent"
);
// Server-side filters already restrict edges to "parent"-labeled
// RELATIONSHIP rows, so no client-side filter is needed here.

// Build a map of document info from ALL corpus documents
const documentMap = new Map<
Expand All @@ -551,7 +555,6 @@ export const DocumentTableOfContents: React.FC<
description?: string;
fileType?: string;
slug?: string;
icon?: string;
}
>();

Expand All @@ -564,7 +567,6 @@ export const DocumentTableOfContents: React.FC<
description: doc.description || undefined,
fileType: doc.fileType || undefined,
slug: doc.slug,
icon: doc.icon || undefined,
});
});

Expand All @@ -574,33 +576,10 @@ export const DocumentTableOfContents: React.FC<
const parentMap = new Map<string, string>(); // child -> parent
const childrenMap = new Map<string, string[]>(); // parent -> children

parentRelationships.forEach((rel) => {
const sourceId = rel.sourceDocument.id;
const targetId = rel.targetDocument.id;

// Update document info with richer data from relationships if available
if (rel.sourceDocument.title) {
documentMap.set(sourceId, {
...documentMap.get(sourceId),
id: sourceId,
title: rel.sourceDocument.title || "Untitled",
description: rel.sourceDocument.description || undefined,
fileType: rel.sourceDocument.fileType || undefined,
slug: rel.sourceDocument.slug,
icon: rel.sourceDocument.icon,
});
}
if (rel.targetDocument.title) {
documentMap.set(targetId, {
...documentMap.get(targetId),
id: targetId,
title: rel.targetDocument.title || "Untitled",
description: rel.targetDocument.description || undefined,
fileType: rel.targetDocument.fileType || undefined,
slug: rel.targetDocument.slug,
icon: rel.targetDocument.icon,
});
}
relationships.forEach((edge) => {
const sourceId = edge.node?.sourceDocument?.id;
const targetId = edge.node?.targetDocument?.id;
if (!sourceId || !targetId) return;

// Source's parent is target (source "has parent" target)
parentMap.set(sourceId, targetId);
Expand Down Expand Up @@ -658,7 +637,6 @@ export const DocumentTableOfContents: React.FC<
description: docInfo.description,
fileType: docInfo.fileType,
slug: docInfo.slug,
icon: docInfo.icon,
children,
};
};
Expand Down
82 changes: 72 additions & 10 deletions frontend/src/graphql/queries.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5469,7 +5469,10 @@ export const GET_DOCUMENT_RELATIONSHIPS = gql`
}
`;

// Lightweight query for TOC - gets all documents in a corpus with minimal fields
// Lightweight query for TOC - gets all documents in a corpus with minimal fields.
// Intentionally omits `icon` (the TOC renders an icon derived from `fileType`
// on the frontend) and `creator` (unused). Dropping these avoids one file-URL
// resolver call per document plus an extra join on every page load.
export interface GetCorpusDocumentsForTocInput {
corpusId: string;
first?: number;
Expand All @@ -5480,11 +5483,7 @@ export interface CorpusDocumentForToc {
title: string;
description: string | null;
slug: string;
icon: string | null;
fileType: string | null;
creator: {
slug: string;
};
}

export interface GetCorpusDocumentsForTocOutput {
Expand All @@ -5511,12 +5510,7 @@ export const GET_CORPUS_DOCUMENTS_FOR_TOC = gql`
title
description
slug
icon
fileType
creator {
id
slug
}
}
}
totalCount
Expand All @@ -5530,6 +5524,74 @@ export const GET_CORPUS_DOCUMENTS_FOR_TOC = gql`
}
`;

// Ultra-lean relationships query for the corpus TOC tree.
// Only source/target IDs and the relationship identity are needed to compute
// parent/child edges; document metadata is supplied by GET_CORPUS_DOCUMENTS_FOR_TOC.
// Server-side filtering on `relationshipType` and `annotationLabelText` keeps
// the result set restricted to "parent"-labeled RELATIONSHIP rows.
export interface GetCorpusDocumentTocEdgesInput {
corpusId: string;
first?: number;
relationshipType?: string;
annotationLabelText?: string;
}

export interface CorpusDocumentTocEdge {
id: string;
// `sourceDocument` and `targetDocument` are typed nullable because the
// GraphQL schema marks every relation field as nullable by default. At the
// database level the underlying FKs on `DocumentRelationship` are non-null,
// so in practice these are always present — but consumers must still null-
// guard on the unwrapped value to keep TypeScript happy (and to remain
// safe against any future permission-scoped scrubs of the related rows).
sourceDocument: { id: string } | null;
targetDocument: { id: string } | null;
}

export interface GetCorpusDocumentTocEdgesOutput {
documentRelationships: {
edges: Array<{
node: CorpusDocumentTocEdge;
}>;
totalCount: number;
pageInfo: {
hasNextPage: boolean;
};
};
}

export const GET_CORPUS_DOCUMENT_TOC_EDGES = gql`
query GetCorpusDocumentTocEdges(
$corpusId: ID
$first: Int
$relationshipType: String
$annotationLabelText: String
) {
documentRelationships(
corpusId: $corpusId
first: $first
relationshipType: $relationshipType
annotationLabelText: $annotationLabelText
) {
edges {
node {
id
sourceDocument {
id
}
targetDocument {
id
}
}
}
totalCount
pageInfo {
hasNextPage
}
}
}
`;

// ============================================================================
// CAML ARTICLE (Readme.CAML document)
// ============================================================================
Expand Down
18 changes: 14 additions & 4 deletions frontend/tests/DocumentTableOfContents.ct.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,12 @@ test.describe("DocumentTableOfContents", () => {
timeout: 10000,
});

await expect(page.getByText("Parent Document")).toBeVisible();
// Use exact match — the mocked description "A parent document for testing
// hierarchy" also contains the substring "Parent Document" and would
// trigger Playwright's strict-mode violation otherwise.
await expect(
page.getByText("Parent Document", { exact: true })
).toBeVisible();
});

test("displays child documents", async ({ mount, page }) => {
Expand Down Expand Up @@ -229,8 +234,9 @@ test.describe("DocumentTableOfContents", () => {
timeout: 10000,
});

// Click on a document title
await page.getByText("Parent Document").click();
// Click on a document title (exact match — the description also contains
// the substring "Parent Document").
await page.getByText("Parent Document", { exact: true }).click();

// Navigation would happen via React Router - we can't easily test the actual navigation
// but we can verify the click handler is called (no errors thrown)
Expand Down Expand Up @@ -331,7 +337,11 @@ test.describe("DocumentTableOfContents", () => {
await expect(page.getByText("Table of Contents")).toBeVisible({
timeout: 10000,
});
await expect(page.getByText("Parent Document")).toBeVisible();
// Exact match — the mocked description "A parent document for testing
// hierarchy" also contains the substring "Parent Document".
await expect(
page.getByText("Parent Document", { exact: true })
).toBeVisible();

// Expand parent document — use its treeitem's chevron directly
const parentItem = page.getByRole("treeitem", {
Expand Down
Loading
Loading