Skip to content

samhains/eagle-image-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Image Metadata Exploration

This repository contains tooling, documentation, and experiments for clustering and interpreting a personal collection of ~7,000 images enriched with captions, metadata, and precomputed embeddings. The goal is to surface recurring visual motifs and themes that reflect personal visual preferences.

Objectives

  • Organize the dataset into meaningful clusters using both existing embeddings and any additional representations that improve structure.
  • Leverage LLM-assisted sensemaking to label clusters and summarize prevailing concepts.
  • Iterate on quantitative evaluations that validate cluster quality and capture trends over time.

Repository Structure

  • docs/ – project documentation, changelog, research notes, and experiment journal.
  • src/analysis/ – Python modules for preprocessing, clustering, and visualization helpers.
  • requirements.txt – curated dependency list for the experimentation environment.
  • data/ – generated artifacts (ignored by git). Created by preprocessing/clustering scripts.

Note: eagle_images_rows.csv is ignored by git due to size/sensitivity. Place it at the repo root before running scripts.

Environment Setup

  1. Create and activate a virtual environment (e.g., python -m venv .venv && source .venv/bin/activate).
  2. Install dependencies: pip install -r requirements.txt.
  3. Ensure the CSV export from Eagle (eagle_images_rows.csv) lives at the project root.

Script Usage

All modules live under src/analysis. Add the directory to PYTHONPATH when executing modules:

PYTHONPATH=src python -m analysis.preprocess --help

The planned workflow is:

  1. Run analysis.preprocess to parse JSON columns, materialize embeddings as NumPy arrays, and serialize a cleaned metadata table.
  2. Experiment with analysis.cluster to perform UMAP dimensionality reduction followed by HDBSCAN (or other algorithms) and persist experiment outputs.
  3. Summarize results, plots, and LLM interpretations in docs/cluster-journal.md.

License

Personal research project – no license specified yet.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors