Skip to content

moldabayevd/dissertation-assistant

Repository files navigation

Dissertation Assistant

An LLM agent system for evidence-grounded academic drafting with automated review and revision loops, built on FastAPI, Next.js, PostgreSQL and Ollama.

Drafts are generated strictly from user-approved evidence chunks, validated against strict JSON schemas, and passed through a reviewer agent that produces typed, priority-tagged comments — which a revision agent then addresses, producing a new versioned draft with tracked deltas and full provenance.


Why this exists

Off-the-shelf LLMs write dissertation drafts happily, but they:

  • fabricate citations, experiments, and numeric claims;
  • give no traceable link between a generated sentence and its source;
  • overwrite earlier versions, erasing history;
  • have no notion of "supervisor feedback" or structured revision;
  • return arbitrary unstructured text that cannot be safely persisted.

Dissertation Assistant treats the draft as a first-class, versioned, evidence-bound artifact. Every generated sentence is linked to approved evidence. Every revision is a new immutable version with tracked changes. Nothing reaches storage unless it passes strict schema and business validation.

This is not an anti-plagiarism evasion tool and not a system for hiding AI authorship. The product is designed around provenance, user confirmation, evidence traceability, explicit approvals and auditability.


Agent workflow

flowchart LR
    U[User-approved<br/>evidence chunks] --> W[student_writer_agent]
    O[Approved outline<br/>section] --> W
    W -->|Draft v1<br/>strict JSON| V1[DraftVersion v1]
    V1 --> R[supervisor_reviewer_agent]
    R -->|Typed comments<br/>priority-tagged| RB[ReviewBatch]
    RB --> X[revision_agent]
    V1 --> X
    X -->|Applied changes<br/>strict JSON| V2[DraftVersion v2]
    V2 --> R
    V2 --> A{User<br/>approve?}
    A -->|yes| DONE[Approved version]
    A -->|no| R
Loading

Three cooperating agents drive the loop:

  • student_writer_agent — generates a single section draft strictly from approved evidence. Produces typed JSON; invalid output is rejected and automatically retried with a correction prompt.
  • supervisor_reviewer_agent — reviews a specific DraftVersion and emits structured ReviewComment items with exact quotes, type, and priority.
  • revision_agent — addresses open review comments on a DraftVersion and produces a new version with applied_changes and tracked deltas. Cosmetic no-op revisions are rejected.

Every step persists provenance: which evidence was used, which prompts ran, which comments were addressed, and which sentences changed between versions.


Key properties

  • Evidence-first generation. Drafts can only use outline sections and evidence the user has explicitly approved.
  • No fabricated citations or numeric claims. Every reference is tied to a SourceDocument. Every numeric claim is tied to evidence metadata.
  • Immutable versioning. Revision creates a new DraftVersion instead of overwriting. Full history with side-by-side comparison.
  • Typed review workflow. Review comments move through open → addressed → closed → unresolved states. Revisions must include meaningful applied_changes.
  • Schema-hard LLM contracts. Writer, reviewer, and reviser agents use strict JSON-only contracts with retry-on-invalid-output.
  • Full audit trail. Execution metadata, timestamps, and failure reasons are persisted for every generation, review, and revision run.
  • Multi-language. UI language (ru / en / kk) is separate from dissertation language; section-level language_override supported.

Architecture

┌─────────────────┐    ┌──────────────────┐    ┌──────────────────┐
│  Next.js UI     │───▶│  FastAPI API     │───▶│  PostgreSQL      │
│  (TypeScript)   │    │  (Python 3.12)   │    │  (SQLAlchemy +   │
│                 │◀───│                  │◀───│   Alembic)       │
└─────────────────┘    └────────┬─────────┘    └──────────────────┘
                                │
                                │ LLM calls
                                ▼
                       ┌──────────────────┐
                       │  Ollama runtime  │
                       │  (qwen3:14b for  │
                       │   writer/review/ │
                       │   revise agents) │
                       └──────────────────┘
  • /frontend — Next.js App Router, TypeScript, Tailwind. Draft workspace, review workspace, side-by-side version comparison, provenance inspector.
  • /backend — FastAPI, SQLAlchemy, Alembic migrations, Ollama integration through a shared LLM abstraction, job system for background generation, review, and revision runs.
  • /worker — Celery integration entry point for long-running jobs.
  • /infra — Dockerfiles and infrastructure assets.
  • /docs — phase-by-phase architecture and implementation notes.
  • /templates — document template placeholders.

Tech stack

Backend Python 3.12 · FastAPI · SQLAlchemy · Alembic · Celery · pytest

Frontend Next.js 15 (App Router) · TypeScript · Tailwind · Vitest

Data & LLM PostgreSQL · Ollama (qwen3:14b text, gemma3:4b vision, qwen3-embedding:4b)

Infra Docker · docker-compose


Quick Start

1. Configure environment

Copy .env.example to .env and adjust values if needed.

Important for draft generation, review, and revision:

  • OLLAMA_BASE_URL
  • OLLAMA_TEXT_MODEL
  • OLLAMA_TIMEOUT_SECONDS / OLLAMA_CONNECT_TIMEOUT_SECONDS / OLLAMA_READ_TIMEOUT_SECONDS
  • DRAFT_GENERATION_MAX_RETRIES
  • DRAFT_GENERATION_MAX_EVIDENCE_ITEMS
  • DRAFT_GENERATION_MIN_WORDS

The expected text model for student_writer_agent, supervisor_reviewer_agent, and revision_agent is qwen3:14b.

2. Ensure Ollama is available

ollama list
ollama pull qwen3:14b

3. Start with Docker Compose

docker compose up --build

Services:

4. Run backend locally without Docker

cd backend
python -m venv .venv
. .venv/Scripts/activate
pip install -r requirements.txt
python -m alembic upgrade head
python -m app.seeds.seed
uvicorn app.main:app --reload

5. Run frontend locally

cd frontend
npm install
npm run dev

Tests

Backend

cd backend
python -m pytest

Frontend

cd frontend
npm install
npm test
npm run build

Seed Data

The backend seed creates a demo project named Demo Dissertation Project with:

  • demo user
  • uploaded text and docx materials
  • parsed source documents
  • extracted evidence items
  • initialized wizard steps
  • generated outline

Implementation phases

The project was built incrementally across four phases. Each phase is fully functional and gated behind validation.

Phase 1 — Foundations

  • monorepo scaffold
  • Docker Compose infrastructure
  • FastAPI backend skeleton
  • Next.js + TypeScript + Tailwind frontend skeleton
  • PostgreSQL-ready SQLAlchemy schema for the dissertation workflow domain
  • project CRUD, uploads, wizard steps, seed data, and baseline tests

Phase 2A — Evidence layer

  • TXT / PDF / DOCX parsing
  • SourceDocument and ParsingJob entities
  • chunk-based EvidenceItem
  • evidence moderation with pending / approved / rejected
  • evidence APIs and moderation UI

Phase 2B — Evidence-grounded draft generation

  • provenance-aware draft generation for one outline section at a time
  • Draft, DraftSection, DraftVersion persistence with:
    • prompt context snapshots
    • used evidence snapshots and relations
    • missing data, warnings, execution metadata, validation reports
  • DraftGenerationJob background jobs with status, logs, retry info, failure reason
  • DraftVersionEvidenceLink for used EvidenceItem tracking
  • Ollama text-provider integration through the shared LLM abstraction
  • strict schema validation and business validation before save
  • automatic retry with correction prompt on invalid model output
  • REST API for drafts, versions, provenance, approve/reject
  • frontend draft workspace (outline sections, generate, versions, evidence, warnings, provenance, approve/reject, background job status)

Phase 2C — Review / revision loop

  • supervisor review batches linked to a concrete DraftVersion
  • structured ReviewComment items with exact quote, type, priority
  • revision workflow that creates a new DraftVersion instead of overwriting
  • RevisionAction persistence with review-comment linkage and tracked deltas
  • execution metadata, timestamps, audit trail for reviewer and revision runs
  • strict JSON-only reviewer / reviser contracts
  • schema and business validation with correction-prompt retry
  • meaningful revision validation — cosmetic no-op revisions are rejected
  • applied_changes required for successful revisions
  • review comment lifecycle: open → addressed → closed → unresolved
  • side-by-side version comparison with persisted deltas per section
  • ui_language vs dissertation_language separation in project settings
  • OutlineSection.language_override for multilingual special sections
  • review / revision REST API (generate review, list batches, comments, trigger revision, list revision actions, compare versions, section review timeline)
  • frontend review workspace (batches, priority badges, revision trigger, applied changes, side-by-side comparison, section timeline, lifecycle visibility, next-step guidance through generate → review → revise → compare → approve)

Core guardrails

  • No fabricated experiments, datasets, citations, publications, novelty claims, or references.
  • Every reference must be tied to a source.
  • Every numeric claim must be tied to evidence metadata.
  • Important milestones require explicit user confirmation.
  • Provenance, audit logs, and artifact lineage must be persisted.
  • Draft generation may use only approved outline sections and approved evidence.
  • Invalid model output must not be stored.
  • Successful revision must include meaningful tracked changes.
  • User-facing generated comments must stay in the configured dissertation language.

API highlights

Phase 2B — Drafts

  • POST /api/projects/{project_id}/drafts/generate
  • GET /api/projects/{project_id}/drafts
  • GET /api/projects/{project_id}/draft-sections
  • GET /api/projects/{project_id}/draft-sections/{id}/versions
  • GET /api/projects/{project_id}/draft-versions/{id}
  • GET /api/projects/{project_id}/draft-versions/{id}/provenance
  • POST /api/projects/{project_id}/draft-versions/{id}/approve
  • POST /api/projects/{project_id}/draft-versions/{id}/reject
  • GET /api/projects/{project_id}/draft-jobs

Phase 2C — Review / revision

  • POST /api/projects/{project_id}/draft-versions/{id}/reviews/generate
  • GET /api/projects/{project_id}/review-batches
  • GET /api/projects/{project_id}/review-batches/{id}
  • GET /api/projects/{project_id}/review-batches/{id}/comments
  • POST /api/projects/{project_id}/review-batches/{id}/revision
  • GET /api/projects/{project_id}/revision-actions
  • GET /api/projects/{project_id}/draft-version-comparisons
  • GET /api/projects/{project_id}/draft-sections/{id}/review-timeline

Evidence / outline

  • POST /api/projects · GET /api/projects · PATCH /api/projects/{id}
  • POST /api/projects/{id}/files/upload
  • POST /api/projects/{id}/sources/parse
  • GET /api/projects/{id}/parse-jobs · GET /api/projects/{id}/sources
  • GET /api/projects/{id}/evidence · PATCH /api/projects/{id}/evidence/{id}
  • POST /api/projects/{id}/outline/generate · PUT /api/projects/{id}/outline
  • PATCH /api/projects/{id}/steps/{step_key}
  • GET /api/projects/{id}/timeline

Current scope limits (Phase 2C)

  • Only text generation is active at runtime (no vision, OCR, or chart generation).
  • No automatic multi-section dissertation generation — user drives the loop section by section.
  • No production embedding retrieval flow yet; evidence selection is heuristic and intentionally conservative.
  • No final DOCX assembly in this phase.
  • Draft generation requires an approved outline section and approved evidence items in the same project.
  • Reviewer works only on an existing DraftVersion; reviser works only from an existing ReviewBatch.
  • Invalid reviewer / reviser output is not persisted.

Next phase (2D)

  1. Section-level iterative review loops over revised versions.
  2. Stronger section-to-evidence selection heuristics without violating the evidence-first rule.
  3. Tighter approval aggregation across multi-section draft progress.
  4. Embedding-based retrieval and optional vision/OCR pipelines.
  5. Final DOCX assembly with provenance footnotes.

Repository layout

.
├── frontend/            Next.js app — dashboards, draft/review workspaces
├── backend/             FastAPI app — domain, services, tests, seeds
├── worker/              Celery entry point for background jobs
├── infra/               Dockerfiles and infrastructure assets
├── docs/                Phase-by-phase architecture notes
├── templates/           Document template placeholders
├── docker-compose.yml
├── README.md
├── AGENTS.md            Agent contracts and behavior specs
└── PLAN.md              Multi-phase implementation plan

License

Personal project. All rights reserved by the author unless stated otherwise.

Built by Dauren Moldabayev.

About

LLM agent system for evidence-grounded dissertation drafting with automated review and revision loops. FastAPI · Next.js · PostgreSQL · Ollama.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors