Create a llm wiki during ingestion

Build a wiki-generation layer on top of the current ChatDKU ingestion pipeline so that data is not only chunked for retrieval, but also compiled into a persistent, human-readable knowledge layer.

The goal is not to replace vector search. The goal is to add a second layer that:

- organizes source content into stable wiki pages,
- accumulates cross-source knowledge over time,
- reduces repeated synthesis at query time,
- makes important entities, concepts, and policies easier to inspect and maintain.

## Why Borrow from `llm-wiki-agent`
Karpathy propose this idea several weeks ago in https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f

Available repo: https://github.com/SamurAIGPT/llm-wiki-agent

The key idea worth borrowing is not the exact implementation, but the **compiled knowledge layer**:

- source files remain immutable,
- ingestion produces structured wiki pages,
- pages are linked and indexed ahead of time,
- contradictions and coverage gaps are surfaced during ingestion instead of only during retrieval.

This fits ChatDKU well because current ingestion already has:

- stable file-level metadata such as `file_path`, `file_name`, timestamps, and access fields,
- a node-generation stage in `update_data.py`,
- downstream vector stores in Chroma and Postgres.

So the missing piece is a **wiki layer** between raw documents and retrieval.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a llm wiki during ingestion #263

Why Borrow from `llm-wiki-agent`

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Create a llm wiki during ingestion #263

Description

Why Borrow from llm-wiki-agent

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Why Borrow from `llm-wiki-agent`