GitHub - limloop/narrative-analyzer: Universal content mining tool with hierarchical context processing

🇬🇧 English...

Narrative Analyzer is not just a script, but a comprehensive methodology for extracting meaning from sequential textual data. Here are the key principles:

📊 Hierarchical Information Processing

The system operates on the principle of an information pyramid:

Raw data
→ Deep analysis
→ Contextual compression
→ Season summary

This enables the analysis of endless volumes of content while preserving relevant context.

🔄 Self-Sustaining Ecosystem

Each stage of analysis fuels the next:

Analysis of episode N relies on the findings from episodes 1...(N-1)
The more data analyzed — the higher the quality of the analysis becomes
The system "learns" to understand the universe as it processes new data

🌐 Universal Application

The tool works with any sequential textual data:

TV series/anime/movie subtitles
Podcast and lecture transcripts
Book chapters and literary works
Let's plays and gaming streams
Article series or blog posts
Historical chronicles and documents

🎯 Deep Semantic Analysis

Unlike simple summarization, the system identifies:

Plot arcs and narrative structures
Character and relationship evolution
Emotional patterns and tone
Unresolved mysteries and foreshadowing
Connections between disparate events

💡 Conceptual Novelty

The project solves the fundamental problem of limited LLM context through:

Dynamic history compression (working with the model's memory)
Semantic ranking (determining what is truly important to keep)
Insight reuse (analysis produces analysis for analysis)

⚙️ Installation and Setup

1. Installing Dependencies

# Cloning the repository
git clone https://github.com/limloop/narrative-analyzer.git
cd narrative-analyzer

# Installing dependencies
pip install -r requirements.txt

2. Setting Environment Variables

Create a .env file in the root directory:

# Required variables
OPENROUTER_API_KEY=your_openrouter_api_key_here

How to get an API key:

Register on OpenRouter
Go to API settings
Create a new key
Copy it into the .env file

3. Project Configuration

Generate a base config and customize it for your project:

# Generating a configuration template
python analyzer.py --generate-config

# Editing the config for your project
nano config.json  # or use any editor

Key settings in config.json:

{
  "content_series_name": "Your Project Name",
  "input_dir": "clean",
  "output_dir": "data",
  
  "model_settings": {
    "name": "z-ai/glm-4.5-air:free",
    "temperature": 0.5,
    "max_tokens": 4096
  },
  
  "api_settings": {
    "base_url": "https://openrouter.ai/api/v1",
    "max_retries": 5
  },
  
  "content_settings": {
    "content_type": "TV series",
    "content_unit": "episode", 
    "main_character": "protagonist"
  }
}

4. Data Preparation

Place VTT subtitle files in the root directory and run the conversion:

# Converting VTT -> clean text
python converter.py

# Files will be saved to the clean/ folder

You can also place raw txt files directly into the clean folder.

5. Running the Analysis

Without a configuration file, the system will run on a standard template, which is useless for real work.

# Running analysis on all files
python analyzer.py --config config.json

6. Generating Summaries (Optional)

After analysis, you can create general season summaries:

python compact.py

This script also transforms the data into a human-readable format.

🔧 Advanced Configuration

Using Different Models

In config.json, you can specify any model from the OpenRouter catalog or any other API compatible with the openai library:

{
  "model_settings": {
    "name": "google/gemini-pro-1.5",
    "temperature": 0.7
  }
}

Customizing Prompts

For fine-tuning the analysis, modify the prompt templates in config.json:

{
  "analysis_prompt": {
    "system_message": "Your custom system prompt...",
    "user_prompt_template": "Your custom user prompt with {variables}..."
  }
}

Processing Large Volumes

For long series, configure the context parameters:

{
  "context_settings": {
    "full_context_episodes": 5,
    "max_context_length": 40000,
    "enable_context_summary": true
  }
}

🎯 Work Examples

The examples/ folder contains a complete working example of real data processing:

📁 Example Structure:

examples/
├── *.vtt                          # Raw VTT subtitles from YouTube
├── clean/                         # Cleaned text files
│   └── *.txt                      # Text after converter processing
├── data/                          # Analysis results
│   └── *.json                     # Structured analysis in JSON
└── combined_results/              # Final summaries
    ├── all_episodes.jsonl         # All episodes in single file
    ├── season_summary.json        # Season summary report
    ├── full_prompt.txt            # Full analysis prompt
    └── compact_season_prompt.txt  # Compact season prompt

🔄 Complete Processing Pipeline:

Source data: VTT subtitles of "Game of God" series
Conversion: python src/converter.py → clean/*.txt
Analysis: python src/analyzer.py → data/*.json
Summarization: python src/compact.py → combined_results/

🎯 What You Can Explore:

Raw data - original VTT files with timestamps
Cleaned texts - converter output results
Deep analysis - AI-structured insights per episode
Season summaries - unified narrative overview

🚀 How to Reproduce:

# Copy config from example
cp examples/config.json config.json

# Run full pipeline
python src/converter.py
python src/analyzer.py  
python src/compact.py

🚨 Troubleshooting

API Errors

Check the correctness of OPENROUTER_API_KEY in .env
Ensure you have available requests on your account

Encoding Issues

If encoding errors occur, try converting files to UTF-8

Long Episodes

For very long episodes, increasing max_context_length may be required

🇷🇺 Русский...

Narrative Analyzer — это не просто скрипт, а целая методология извлечения смысла из последовательных текстовых данных. Вот ключевые принципы:

📊 Иерархическая обработка информации

Система работает по принципу информационной пирамиды:

Сырые данные
→ Глубокий анализ
→ Контекстное сжатие
→ Сезонная сводка

Это позволяет анализировать бесконечные объемы контента, сохраняя релевантный контекст.

🔄 Самоподдерживающаяся экосистема

Каждый этап анализа становится топливом для следующего:

Анализ эпизода N опирается на выводы эпизодов 1...(N-1)
Чем больше данных проанализировано — тем качественнее становится анализ
Система "учится" понимать вселенную по мере обработки новых данных

🌐 Универсальность применения

Инструмент работает с любыми последовательными текстовыми данными:

Субтитры сериалов/аниме/фильмов
Транскрипты подкастов и лекций
Главы книг и литературные произведения
Летсплеи и игровые стримы
Циклы статей или блог-постов
Исторические хроники и документы

🎯 Глубокий смысловой анализ

В отличие от простой суммаризации, система выявляет:

Сюжетные арки и нарративные структуры
Эволюцию персонажей и отношений
Эмоциональные паттерны и тональность
Неразрешенные загадки и форшедоуинг
Связи между разрозненными событиями

💡 Концептуальная новизна

Проект решает фундаментальную проблему ограниченного контекста LLM через:

Динамическое сжатие истории (работа с памятью модели)
Семантическое ранжирование (что действительно важно сохранить)
Переиспользование insights (анализ производит анализ для анализа)

⚙️ Установка и настройка

1. Установка зависимостей

# Клонирование репозитория
git clone https://github.com/limloop/narrative-analyzer.git
cd narrative-analyzer

# Установка зависимостей
pip install -r requirements.txt

2. Настройка переменных окружения

Создайте файл .env в корневой директории:

# Обязательные переменные
OPENROUTER_API_KEY=your_openrouter_api_key_here

Как получить API ключ:

Зарегистрируйтесь на OpenRouter
Перейдите в настройки API
Создайте новый ключ
Скопируйте его в файл .env

3. Конфигурация проекта

Сгенерируйте базовый конфиг и настройте под ваш проект:

# Генерация шаблона конфигурации
python analyzer.py --generate-config

# Редактирование конфига под ваш проект
nano config.json  # или используйте любой редактор

Ключевые настройки config.json:

{
  "content_series_name": "Название вашего проекта",
  "input_dir": "clean",
  "output_dir": "data",
  
  "model_settings": {
    "name": "z-ai/glm-4.5-air:free",
    "temperature": 0.5,
    "max_tokens": 4096
  },
  
  "api_settings": {
    "base_url": "https://openrouter.ai/api/v1",
    "max_retries": 5
  },
  
  "content_settings": {
    "content_type": "сериале",
    "content_unit": "эпизод", 
    "main_character": "герой"
  }
}

4. Подготовка данных

Поместите VTT файлы субтитров в корневую директорию и запустите конвертацию:

# Конвертация VTT -> чистый текст
python converter.py

# Файлы будут сохранены в папку clean/

Вы так же можете поместить сырые txt файлы сразу в папку clean.

5. Запуск анализа

Без конфигурационного файла система будет работать на стандартном шаблоне, который в реальной работе бесполезен.

# Запуск анализа всех файлов
python analyzer.py --config config.json

6. Генерация сводок (опционально)

После анализа можно создать общие сводки по сезону:

python compact.py

Так же этот скрипт преобразовывает данные в читаемый для человека вид.

🔧 Расширенная настройка

Использование разных моделей

В config.json можно указать любую модель из каталога OpenRouter или любого другого совместимого с библиотекой openai api:

{
  "model_settings": {
    "name": "google/gemini-pro-1.5",
    "temperature": 0.7
  }
}

Кастомизация промптов

Для тонкой настройки анализа измените шаблоны промптов в config.json:

{
  "analysis_prompt": {
    "system_message": "Твой кастомный system prompt...",
    "user_prompt_template": "Твой кастомный user prompt с {переменными}..."
  }
}

Обработка больших объемов

Для длинных сериалов настройте параметры контекста:

{
  "context_settings": {
    "full_context_episodes": 5,
    "max_context_length": 40000,
    "enable_context_summary": true
  }
}

🎯 Примеры работы

В папке examples/ вы найдете полный рабочий пример обработки реальных данных:

📁 Структура примера:

examples/
├── *.vtt                          # Сырые VTT-субтитры из YouTube
├── clean/                         # Очищенные текстовые файлы
│   └── *.txt                      # Текст после обработки конвертером
├── data/                          # Результаты анализа
│   └── *.json                     # Структурированный анализ в JSON
└── combined_results/              # Финальные сводки
    ├── all_episodes.jsonl         # Все эпизоды в одном файле
    ├── season_summary.json        # Итоговая сводка сезона
    ├── full_prompt.txt            # Полный промпт для анализа
    └── compact_season_prompt.txt  # Сжатый промпт сезона

🔄 Полный цикл обработки:

Исходные данные: VTT-субтитры серий "Игра Бога"
Конвертация: python src/converter.py → clean/*.txt
Анализ: python src/analyzer.py → data/*.json
Сводка: python src/compact.py → combined_results/

🎯 Что можно изучить:

Сырые данные - оригинальные VTT файлы с временными метками
Очищенные тексты - результат работы конвертера
Глубокий анализ - структурированные выводы ИИ по каждой серии
Сезонные сводки - объединенная картина всего narrative

🚀 Как повторить обработку:

# Копируем конфиг из примера
cp examples/config.json config.json

# Запускаем полный пайплайн
python src/converter.py
python src/analyzer.py  
python src/compact.py

🚨 Решение проблем

Ошибки API

Проверьте правильность OPENROUTER_API_KEY в .env
Убедитесь, что на счету есть доступные запросы

Проблемы с кодировкой

Если возникают ошибки кодировки, попробуйте конвертировать файлы в UTF-8

Длинные серии

Для очень длинных эпизодов может потребоваться увеличение max_context_length

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
examples		examples
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

📊 Hierarchical Information Processing

🔄 Self-Sustaining Ecosystem

🌐 Universal Application

🎯 Deep Semantic Analysis

💡 Conceptual Novelty

⚙️ Installation and Setup

1. Installing Dependencies

2. Setting Environment Variables

3. Project Configuration

Key settings in config.json:

4. Data Preparation

5. Running the Analysis

6. Generating Summaries (Optional)

🔧 Advanced Configuration

Using Different Models

Customizing Prompts

Processing Large Volumes

🎯 Work Examples

📁 Example Structure:

🔄 Complete Processing Pipeline:

🎯 What You Can Explore:

🚀 How to Reproduce:

🚨 Troubleshooting

API Errors

Encoding Issues

Long Episodes

📊 Иерархическая обработка информации

🔄 Самоподдерживающаяся экосистема

🌐 Универсальность применения

🎯 Глубокий смысловой анализ

💡 Концептуальная новизна

⚙️ Установка и настройка

1. Установка зависимостей

2. Настройка переменных окружения

3. Конфигурация проекта

Ключевые настройки config.json:

4. Подготовка данных

5. Запуск анализа

6. Генерация сводок (опционально)

🔧 Расширенная настройка

Использование разных моделей

Кастомизация промптов

Обработка больших объемов

🎯 Примеры работы

📁 Структура примера:

🔄 Полный цикл обработки:

🎯 Что можно изучить:

🚀 Как повторить обработку:

🚨 Решение проблем

Ошибки API

Проблемы с кодировкой

Длинные серии

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages