A Survey and Computational Atlas of Personality Models

, complete walkthrough from history to validation results

44 personality models: standardized datasets, dual-resolution embeddings (1536 + 3072-dim), trained classifiers (RF, SVC, LR, kNN), and verified model cards, ready to use in five minutes.

Raetano, J., Gregor, J., & Tamang, S. (2026). A Survey and Computational Atlas of Personality Models. ACM Transactions on Intelligent Systems and Technology (TIST). Under review.

Abstract

Personality psychology has spent a century characterizing individual differences, yet computational approaches to personality are dominated by a single model: the Big Five (OCEAN). Numerous other validated frameworks exist, spanning clinical diagnosis, narcissism, motivation, cognition, conflict resolution, and applied assessment, but without a shared computational format they remain fragmented across disciplinary silos. This paper surveys 44 personality models from seven research traditions and encodes every construct as a factor chain, a structured descriptor set that disambiguates each construct through its defining adjectives, synonyms, and behavioral terms. The result is a unified computational atlas of 358 factors, each paired with a neural embedding and a trained classifier, searchable across all surveyed models simultaneously. Three layers of evaluation confirm the encoding: classifiers tested against human-authored items from 22 published psychometric instruments, an independent LLM judge panel, and replication across multiple generators, achieving 86.8% accuracy on human-authored items. Embedding-space analysis reveals that the seven disciplinary categories do not form coherent clusters in semantic space; data-driven clustering instead uncovers 15 natural groupings that cut across the predefined categories, approximate established diagnostic boundaries, and provide large-scale evidence that personality science has been studying overlapping constructs under different names for decades, though shared vocabulary alone is not sufficient evidence for construct equivalence. Human personality is too vast for any one framework to capture; each tradition maps territory the others cannot see. The full atlas, trained classifiers, and reproducibility notebooks are openly released.

What the Abstract Is Saying

The problem. Computational approaches to personality are dominated by a single model: the Big Five (OCEAN). Numerous other validated frameworks exist across psychology, but without a shared format they stay siloed.

What we did. We encoded all 44 models into a single machine-readable format called a factor chain, a structured descriptor set (adjective, synonym, verb, noun) that disambiguates every trait. The result is 6,694 factor chains across 358 factors. Each chain gets a neural embedding and a trained classifier that can identify which personality construct a new sentence describes.

Does it work? Three experiments say yes. Classifiers tested against human-authored items from 22 published psychometric instruments, cross-checked by an independent LLM judge panel, and validated across multiple generators. Four classifier types (RF, SVC, LR, kNN) are compared: kNN achieves 83.5% mean accuracy on human items, with a best-per-model oracle ceiling of 86.8%. Every model scores above chance.

What we discovered. The seven disciplinary categories that organize the survey (Clinical, Motivational, etc.) do not form coherent clusters in the embedding space; they are organizational labels, not semantic boundaries. Data-driven clustering instead uncovers 15 natural groupings that cut across traditions, approximate established diagnostic boundaries, and provide large-scale evidence that personality science has been studying overlapping constructs under different names for decades, though shared vocabulary alone is not sufficient evidence for construct equivalence. OCEAN, despite its dominance in AI, ranks only third in per-model information contribution.

Everything is open. All data, classifiers, and code are in this repository.

For definitions of every technical term, see the Glossary of Terms.

The Semantic Personality Index (SPI)

Data-driven clustering of all 6,694 trait embeddings reveals that the seven disciplinary categories do not reflect how personality constructs actually organize in semantic space. The Semantic Personality Index (SPI) identifies 15 natural clusters with silhouette 0.098, modest but significantly above the random-partition null (p < 0.001) and far above the disciplinary categories (silhouette 0.0002). This remains below the 0.25 threshold for substantial cluster structure (Kaufman & Rousseeuw, 1990), so the SPI should be interpreted as a soft semantic map rather than a definitive taxonomy. Clinical psychology fragments into four semantic regions. Narcissism fragments into three. Meanwhile, four clusters (Dominance, Activation, Withdrawal, Self-Direction) span all seven categories, while Warmth spans six of seven. A permutation null model confirms this is not a vocabulary artifact: under random category assignment, 14.4 of 15 clusters span all seven categories by chance; the actual count of 4 is far below the null (p < 0.001), indicating genuine category specificity. The atlas is not merely a catalog of 44 models, it is a map that reveals personality science has been studying overlapping constructs under different names for decades.

Pipeline

flowchart LR
    A["<b>44 Datasets</b><br/>6,694 traits<br/>5-column schema"] --> B["<b>Embeddings</b><br/>text-embedding-3-small<br/>1,536 dimensions"]
    B --> C["<b>Classifiers</b><br/>RF · SVC · LR · kNN<br/>per model"]
    C --> D["<b>FAISS Index</b><br/>6,694 vectors<br/>cross-model search"]

    style A fill:#4C72B0,stroke:#fff,color:#fff
    style B fill:#55A868,stroke:#fff,color:#fff
    style C fill:#C44E52,stroke:#fff,color:#fff
    style D fill:#8172B2,stroke:#fff,color:#fff

Psychometric literature → standardized lexical datasets → semantic embeddings → trained classifiers → searchable cross-model index. Every stage ships in this repository.

Quick Start

Each of the 44 models has a Jupyter notebook you can open and run immediately.

atlas/01_trait_based/ocean/          ← open OCEAN-Updated.ipynb
atlas/02_narcissism_based/dtm/       ← open DTM.ipynb
atlas/05_clinical_health/mmpi/       ← open MMPI.ipynb
...any of 44 folders

In 10 lines:

import pandas as pd
import joblib

# 1. Load any model's dataset and trained classifier
df = pd.read_csv("datasets/ocean.csv")
model = joblib.load("models/ocean_knn_model.pkl")   # or _svc_, _lr_, _rf_
encoder = joblib.load("models/ocean_label_encoder.pkl")

# 2. Load pre-computed embeddings (1536-dim, text-embedding-3-small)
embeddings = pd.read_csv("Embeddings/ocean_embeddings.csv")
X = embeddings.iloc[:, 1:].values  # drop index column

# 3. Predict personality factors
predictions = model.predict(X)
labels = encoder.inverse_transform(predictions)
print(f"Predicted factors: {set(labels)}")

Or run the standalone demo, three pre-baked examples work without any API key:

git clone https://github.com/Wildertrek/survey.git
cd survey
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

python demo.py                                          # list examples
python demo.py "tends to worry about the future"        # -> GAD-7, OCEAN Neuroticism
python demo.py "manipulates others for personal gain"   # -> Narcissism across 5 models
python demo.py "enjoys leading group discussions"        # -> Extraversion across 4 models

The demo searches all 6,694 traits across 44 models and shows which categories, models, and factors match. Add --model ocean to also classify through a specific model's trained classifiers (RF, SVC, LR, kNN).

To search your own queries, set an OpenAI API key (costs < $0.001 per query):

export OPENAI_API_KEY=sk-...
python demo.py "avoids conflict at all costs" --model tki --top 10

Get an API key at: platform.openai.com/api-keys

Dependencies: pandas, scikit-learn, joblib, numpy. For custom queries and notebooks: add openai, python-dotenv.

What's in the Atlas

Asset	Count	Format	Size
Lexical datasets	44	CSV (Factor, Adjective, Synonym, Verb, Noun)	532 KB
Embeddings (1536-dim)	44	CSV (OpenAI text-embedding-3-small)	220 MB
Embeddings (3072-dim)	44	CSV (OpenAI text-embedding-3-large)	440 MB*
Classifiers (1536-dim)	176	scikit-learn pickle (RF, SVC, LR, kNN x 44)	59 MB
RF classifiers (3072-dim)	44	scikit-learn pickle	107 MB*
Label encoders	44	scikit-learn pickle	29 KB
Model graphs	44	PNG + JSON (factor diagrams + machine-readable graph data)	164 MB
Psychometric test items	22	JSON (418 items from published psychometric instruments)	92 KB
Opus test items	45	JSON (5,369 items generated by Claude Opus)	1.9 MB
Model cards	44	Markdown (verified from peer-reviewed appendix)	—
Starter notebooks	44	Jupyter (.ipynb)	—
Total trait rows	6,694	across all 44 models

*3072-dim embeddings and models are hosted separately due to size. See Experiment 2 assets below.

Repository Structure

survey/
├── README.md
├── HISTORY.md                      (how the atlas was built, origin story, timeline)
├── LICENSE                         (AGPL-3.0)
├── CITATION.cff                    (CFF citation metadata)
├── requirements.txt                (pip dependencies)
├── demo.py                         (cross-model trait search demo)
├── examples.json                   (pre-baked demo embeddings, no API key needed)
│
├── atlas/                          ★ Start here: 44 model cards + notebooks
│   ├── README.md                   (master index, all 44 models)
│   ├── references.bib              (328 citations across all cards)
│   │
│   ├── 01_trait_based/             (6 models)
│   │   ├── ocean/                  OCEAN (Big Five)
│   │   ├── mbti/                   Myers-Briggs Type Indicator
│   │   ├── hexaco/                 HEXACO Personality Model
│   │   ├── epm/                    Eysenck Personality Model
│   │   ├── 16pf/                   Sixteen Personality Factors
│   │   └── ftm/                    Four Temperaments
│   │
│   ├── 02_narcissism_based/        (10 models)
│   │   ├── npi/                    Narcissistic Personality Inventory
│   │   ├── pni/                    Pathological Narcissism Inventory
│   │   ├── ffni/                   Five-Factor Narcissism Inventory
│   │   ├── ffni_sf/                FFNI Short Form
│   │   ├── narq/                   Narcissistic Admiration & Rivalry
│   │   ├── hsns/                   Hypersensitive Narcissism Scale
│   │   ├── dtm/                    Dark Triad
│   │   ├── dt4/                    Dark Tetrad
│   │   ├── mcmin/                  MCMI-IV Narcissistic Scales
│   │   └── ipn/                    Inventory of Pathological Narcissism
│   │
│   ├── 03_motivational_value/      (6 models)
│   │   ├── stbv/                   Schwartz Theory of Basic Values
│   │   ├── mst/                    Motivational Systems Theory
│   │   ├── rft/                    Regulatory Focus Theory
│   │   ├── sdt/                    Self-Determination Theory
│   │   ├── aam/                    Approach/Avoidance Motivation
│   │   └── clifton/                Clifton Strengths
│   │
│   ├── 04_cognitive_learning/      (4 models)
│   │   ├── pct/                    Personal Construct Theory
│   │   ├── scm/                    Social-Cognitive Model
│   │   ├── cest/                   Cognitive-Experiential Self-Theory
│   │   └── fsls/                   Felder-Silverman Learning Styles
│   │
│   ├── 05_clinical_health/         (10 models)
│   │   ├── mmpi/                   Minnesota Multiphasic Personality Inventory
│   │   ├── tci/                    Temperament and Character Inventory
│   │   ├── tmp/                    Triarchic Model of Psychopathy
│   │   ├── bdi/                    Beck Depression Inventory
│   │   ├── gad7/                   Generalized Anxiety Disorder 7
│   │   ├── scid/                   Structured Clinical Interview for DSM
│   │   ├── mcmi/                   Millon Clinical Multiaxial Inventory
│   │   ├── rit/                    Rorschach Inkblot Test
│   │   ├── tat/                    Thematic Apperception Test
│   │   └── wais/                   Wechsler Adult Intelligence Scale
│   │
│   ├── 06_interpersonal_conflict/  (2 models)
│   │   ├── tki/                    Thomas-Kilmann Conflict Mode Instrument
│   │   └── disc/                   DiSC Workplace Profile
│   │
│   └── 07_application_holistic/    (6 models)
│       ├── riasec/                 Holland's Theory of Career Choice
│       ├── bt/                     Bartle Types
│       ├── tei/                    Theories of Emotional Intelligence
│       ├── em/                     Enneagram Model
│       ├── papc/                   Parametric Analysis of Person Characteristics
│       └── cmoa/                   Circumplex Model of Affect
│
│   Each model folder contains:
│   ├── MODEL_CARD.md               (verified model card)
│   ├── <MODEL>.ipynb               (starter notebook)
│   └── <model>_small.png           (factor diagram thumbnail)
│
├── datasets/                       44 CSV files (lexical schemas)
├── Embeddings/                     44 embedding CSVs (1536-dim, text-embedding-3-small)
├── models/                         176 classifiers (44 x RF/SVC/LR/kNN) + 44 label encoders (1536-dim)
├── graphs/                         44 high-res model diagrams + 44 JSON graph files
│
├── validate.py                     Standalone validation script
├── test_items/                     44 held-out test item files (5,052 GPT-4o items)
├── test_items_opus/                45 test item files (5,369 Claude Opus items)
├── human_items/                    22 published psychometric instrument items (418 items, "psychometric items")
│
├── notebooks/                      Cross-model analysis scripts + Colab notebooks
│   ├── atlas_quick_start.ipynb     Quick Start: load, predict, PCA, FAISS, 3 experiments
│   ├── atlas_classifier_comparison.ipynb  NEW: 4-way classifier comparison (RF/LR/SVC/kNN)
│   ├── atlas_deep_dive.ipynb       Deep Dive: per-model analysis
│   ├── atlas_embedding_projector.ipynb   PCA/t-SNE/UMAP + SPI clustering
│   └── 01_cross_model_pca_analysis.py  (supports --embedding-dim 1536|3072)
│
├── results/                        PCA + validation results
│   ├── README.md                   (methodology + full results)
│   ├── pca_summary_report.json
│   ├── pca_scree_plot.png
│   ├── pca_2d_all_models.png
│   ├── pca_model_centroids_2d.png
│   ├── pca_factor_loadings_heatmap.png
│   ├── pca_variance_explained.csv
│   ├── pca_top_factors_by_variance.csv
│   ├── pca_model_overlap_matrix.csv
│   ├── pca_3072/                   3072-dim PCA results (same structure)
│   └── validation/                 Per-model validation results
│       ├── individual_model_results.csv
│       ├── experiment2_comparison.csv
│       ├── category_results.csv
│       ├── baselines.json
│       ├── factor_complexity.csv
│       ├── atlas_summary.json
│       └── latency.json

The 44 Models

I. Trait-Based Models (6)

#	Model	Full Name	Traits	Rows
1	OCEAN	Big Five Factor Model	O, C, E, A, N	120
2	MBTI	Myers-Briggs Type Indicator	EI, SN, TF, JP	144
3	HEXACO	HEXACO Personality Model	H, E, X, A, C, O	66
4	EPM	Eysenck Personality Model	P, E, N	90
5	16PF	Sixteen Personality Factors	16 primary factors	192
6	FT	Four Temperaments	Sanguine, Choleric, Melancholic, Phlegmatic	80

II. Narcissism-Based Models (10)

#	Model	Full Name	Rows
7	NPI	Narcissistic Personality Inventory	168
8	PNI	Pathological Narcissism Inventory	168
9	FFNI	Five-Factor Narcissism Inventory	360
10	FFNI-SF	FFNI Short Form	60
11	NARQ	Narcissistic Admiration & Rivalry	48
12	HSNS	Hypersensitive Narcissism Scale	48
13	DT3	Dark Triad	54
14	DT4	Dark Tetrad	96
15	MCMI-Narc	MCMI-IV Narcissistic Scales	72
16	IPN	Inventory of Pathological Narcissism	80

III. Motivational and Value Models (6)

#	Model	Full Name	Rows
17	STBV	Schwartz Theory of Basic Values	128
18	MST	Motivational Systems Theory	64
19	RFT	Regulatory Focus Theory	56
20	SDT	Self-Determination Theory	84
21	AAM	Approach/Avoidance Motivation	40
22	CS	Clifton Strengths	136

IV. Cognitive and Learning Models (4)

#	Model	Full Name	Rows
23	PCT	Personal Construct Theory	899
24	SCM	Social-Cognitive Model	180
25	CEST	Cognitive-Experiential Self-Theory	72
26	FSLS	Felder-Silverman Learning Styles	360

V. Clinical and Psychological Health Models (10)

#	Model	Full Name	Rows
27	MMPI	Minnesota Multiphasic Personality Inventory	360
28	TCI	Temperament and Character Inventory	252
29	TMP	Triarchic Model of Psychopathy	108
30	BDI	Beck Depression Inventory	756
31	GAD-7	Generalized Anxiety Disorder 7	252
32	SCID	Structured Clinical Interview for DSM	401
33	MCMI	Millon Clinical Multiaxial Inventory	84
34	RIT	Rorschach Inkblot Test	45
35	TAT	Thematic Apperception Test	39
36	WAIS	Wechsler Adult Intelligence Scale	60

VI. Interpersonal and Conflict Resolution Models (2)

#	Model	Full Name	Rows
37	TKI	Thomas-Kilmann Conflict Mode Instrument	20
38	DiSC	DiSC Workplace Profile	31

VII. Application-Specific and Holistic Models (6)

#	Model	Full Name	Rows
39	RIASEC	Holland's Theory of Career Choice	120
40	BT	Bartle Types	64
41	TEI	Theories of Emotional Intelligence	48
42	EM	Enneagram Model	81
43	PAPC	Parametric Analysis of Person Characteristics	72
44	CMOA	Circumplex Model of Affect	36

Cross-Model PCA Analysis

Principal Component Analysis across all 44 models (6,694 trait rows, 1536-dim embeddings):

Metric	Value
PC1 variance explained	4.5%
Top-5 components	15.5% cumulative
Top-10 components	25.2% cumulative
Most similar pair	IPN / PNI (cosine = 0.959)*
Least similar pair	TKI / WAIS (cosine = 0.268)
Highest-variance category	Narcissism (350.2)

*IPN (Inventory of Pathological Narcissism) and PNI (Pathological Narcissism Inventory) are the same instrument (Pincus et al., 2009) with its name abbreviated in two different orderings. Both are deliberately included in the atlas as a built-in jangle detection validation: the embedding pipeline correctly identifies them as near-identical (cosine 0.959) without any manual annotation, demonstrating that the atlas can surface cases where different labels mask the same construct. See Jangle Fallacy in the glossary.

See results/README.md for full PCA outputs: factor loadings heatmap, model overlap matrix, similarity rankings, and summary report.

Reproduce:

python notebooks/01_cross_model_pca_analysis.py

Empirical Validation

Three experiments and 14 research questions, including the Semantic Personality Index (SPI) discovery, plus independent PCA and knowledge graph validations, all across 44 models.

Research questions at a glance

RQ	Question	Answer
	Experiment 1: Baseline
1	Can the classifiers distinguish factors above chance?	58.6% mean accuracy on 5,038 novel items. All 44 models beat random (+35.7% lift); 41/44 beat frequency baseline.
2	How does factor count affect accuracy?	r = -0.67 (p < .001). Models with 2-5 factors average 67.6%; 20+ factors average 30.2%.
3	Do independent LLM judges agree on item validity?	Kappa = 0.99 across a triple-judge panel (GPT-5.2, Gemini 3 Pro, Claude Opus 4.6). 95.7% agreement with expected factors.
4	Where do RF classifiers fail relative to judges?	66.7% RF-judge agreement. Judges correct 95.7% of the time. The gap is a classifier problem, not construct ambiguity.
5	Do related constructs from different models converge?	Yes. 1,536-dim index retrieves hits from 7.0 independent models per query (16% of the atlas). Convergence improves further in Experiment 2.
6	Are there category-level performance differences?	Motivational 74.5% > Narcissism 68.3% > Trait-Based 64.0% > Cognitive 51.8% > App/Holistic 50.9% > Clinical 50.6% > Interpersonal 23.7%.
	Experiment 2: Improvement
7	Does upgrading to 3,072-dim embeddings help?	+5.1pp mean across all 44 models. 28 improved, 13 decreased slightly.
8	Does data augmentation help sparse models?	+25.9pp on 14 targeted models (all below 50%). All 14 improved.
9	Does hierarchical classification help high-factor models?	+4.8pp on 8 targeted models (15+ factors). Modest gain; 1/8 won in ablation.
	Experiment 3: External validation
10	Do results hold across different LLM generators?	GPT-4o 58.7% vs Claude Opus 55.5% (delta 3.3pp, p = .041, d = 0.17). Consistent.
11	Can classifiers handle human-authored psychometric items?	83.5% kNN on 418 items from 20 models (oracle ceiling 86.8%). RF baseline 69.8%.
12	Do human items retrieve related content across models?	100% model and category hit rate in top-20 retrieval. Mean 8.2 models per query.
	Semantic Personality Index (SPI)
13	Do the 7 theoretical categories correspond to semantic boundaries?	No. Silhouette 0.0002. Data-driven k=15: silhouette 0.098, significantly above null (p < 0.001). ARI = 0.17.
14	What structure does the embedding space reveal?	15 SPI clusters. Clinical fragments into 4, Narcissism into 3. Interpersonal Circumplex confirmed (Warmth + Dominance span all 7 categories).
	Supplementary
PCA	Is the embedding space redundant?	No. 50 components capture only 56.9% variance. No scree elbow.
KG	Does graph structure agree with embedding similarity?	Yes. Mantel r = 0.66 (p < .001). Graph density predicts classification difficulty.
DSM-5	Does the embedding space preserve clinical construct structure?	98.2% of 222 DSM-5 disorders route to the Clinical atlas category. SCID is the top retrieval for 20/21 DSM-5 categories.

Experiment 1: Baseline (RQ1-6)

5,052 novel test items, generated by GPT-4o, classified through each model's four classifiers (RF, SVC, LR, kNN). None of these items were seen during training. RF is reported as the primary baseline; SVC and kNN consistently outperform it on novel items.

Metric	Value
Test items	5,052 generated (5,038 valid), 358 factors (316 unique)
Mean RF accuracy	58.6% (median 61.1%)
Mean lift over random	+35.7% (all 44 models above chance)
Models beating frequency baseline	41/44 (93%)
Inter-judge agreement (kappa)	0.99
Factor-count vs. accuracy	r = -0.67, p < 0.001
Hardware	Commodity CPU, no GPU required

By category: Motivational (74.5%) > Narcissism (68.3%) > Trait-Based (64.0%) > Cognitive (51.8%) > App/Holistic (50.9%) > Clinical (50.6%) > Interpersonal (23.7%)

The 37-point gap between RF accuracy (58.6%) and judge accuracy (95.7%) is the improvement target for Experiment 2.

Experiment 2: Improvement cycle (RQ7-9)

One intervention per bottleneck, all evaluated on the same 5,038 test items:

Intervention	Scope	Mean Acc	Delta
Exp1 baseline (1536-dim)	44 models	58.7%	—
RQ7: 3072-dim embeddings	44 models	63.8%	+5.1pp
RQ8: Data augmentation	14 targeted	61.1%	+25.9pp*
RQ9: Hierarchical classifiers	8 targeted	39.9%	+4.8pp*
Combined best-per-model	44 models	71.5%	+12.9pp

*Delta computed against baseline for the same targeted subset.

Top 5 improvers: WAIS +42.1pp, EM +40.6pp, DISC +37.4pp, TKI +36.5pp, TEI +32.7pp

After improvement (RF): 22/44 models above 70% (was 13), 0 models below 30% (was 3). With multi-classifier selection: 33/44 above 70%.

The bottlenecks were data sparsity and embedding capacity, not ambiguity in the personality constructs. No specialized hardware or new architectures were needed.

3072-dim assets: The upgraded embeddings (440 MB) and retrained classifiers (107 MB) are on Hugging Face Hub. The 1536-dim assets in this repository remain the default.

Per-model results are in each model's MODEL_CARD.md and in results/validation/. See results/README.md for the full breakdown.

Experiment 3: External validation (RQ10-12)

The first two experiments used LLM-generated items. Experiment 3 tests whether the classifiers generalize to items written by human psychometricians.

Metric	Value
Published instruments	22 (BFI-44, HEXACO-60, GAD-7, Short Dark Triad, NARQ, etc.)
Human-authored items	418 across 20 models
RF accuracy (baseline)	69.8%
kNN accuracy	83.5%
SVC accuracy	81.9%
LR accuracy	82.9%
Best-per-model oracle	86.8% (best classifier chosen per model)
Reliability tiers (best clf)	33 reliable (>70%) / 9 usable (50-70%) / 2 research-only (<50%)
Reverse-scored error (RF vs kNN)	RF 43.0% error, kNN 11.1% error
Multi-generator consistency (RQ10)	GPT-4o 58.7% vs Opus 55.5%, d = 0.17
Convergent validity (RQ12)	100% model/category hit rate, 8.2 models per query
Classifier comparison (Friedman)	chi2 = 22.4, p < 0.001 (significant across 4 classifiers)

Human items score higher because published psychometric items are designed to load cleanly on single factors, so less ambiguity means better classification. The multi-classifier comparison reveals that RF is the weakest of the four classifiers, particularly on reverse-scored items (43.0% error vs 11.1% for kNN). Linear and instance-based methods generalize better to unseen human-authored content.

Human-authored items are in human_items/ (22 JSON files). Second-generator items are in test_items_opus/ (5,369 Claude Opus items).

Cross-model convergent validity (FAISS)

FAISS (Facebook AI Similarity Search; Douze et al., 2024; Johnson et al., 2021) is used for fast nearest-neighbor search over dense trait embeddings. Three FAISS index variants queried with 10 standardized constructs (anxiety, extraversion, narcissism, depression, etc.):

Index	Dim	Vectors	Models/query	Categories/query
A: Original	1,536	6,694	7.0	3.4
B: 3072-dim	3,072	6,694	8.6	3.9
C: 3072 + augmented	3,072	8,419	9.4	4.5

The 3,072-dim space retrieves 23% more independent models per query. Adding augmented data pushes diversity to 9.4 models across 4.5 categories, because previously underrepresented models (DISC, TKI, EM) now have enough vectors to appear in results.

Reproduce the Validation

All test items and classifiers are included in this repository. To reproduce the reported accuracy figures:

# Install dependencies
pip install -r requirements.txt

# Set your OpenAI API key (needed to embed test items; ~$0.05 for all 44 models)
export OPENAI_API_KEY=sk-...

# Validate a single model
python validate.py --model ocean

# Validate all 44 models (1536-dim, ~5 min)
python validate.py --all

# Validate with 3072-dim embeddings (requires models_3072/ directory)
python validate.py --all --dim 3072

# Re-run without re-embedding (uses cached embeddings)
python validate.py --all --no-embed

Results are saved to validation_results_1536.csv (or _3072.csv). The test_items/ directory contains 44 JSON files with 5,052 held-out items generated by GPT-4o. The test_items_opus/ directory contains 45 JSON files with 5,369 items generated by Claude Opus (used in Experiment 3, RQ10). The human_items/ directory contains 418 items from 22 published psychometric instruments (used in Experiment 3, RQ11–12).

Reproduce the PCA

# 1536-dim PCA (default, uses Embeddings/ directory)
python notebooks/01_cross_model_pca_analysis.py

# 3072-dim PCA (uses Embeddings_3072/ directory)
python notebooks/01_cross_model_pca_analysis.py --embedding-dim 3072 --output-dir results/pca_3072

Train All Classifiers

The repository ships 176 pre-trained classifiers (44 models x 4 classifier types). To retrain from scratch or reproduce:

# Train RF, SVC, LR, kNN for all 44 models and evaluate on test + human items
python scripts/train_and_evaluate_all_classifiers.py

This script:

Loads training embeddings from Embeddings/{model}_embeddings.csv
Trains four classifiers per model with the following hyperparameters:
- RF: RandomForestClassifier(n_estimators=100, random_state=42, n_jobs=-1)
- LR: LogisticRegression(max_iter=1000, random_state=42)
- SVC: LinearSVC(max_iter=2000, random_state=42, dual="auto")
- kNN: KNeighborsClassifier(n_neighbors=5)
Saves classifiers to models/{model}_{clf}_model.pkl (e.g., ocean_svc_model.pkl)
Evaluates all classifiers on cached test items (.validation_cache/) and human items (human_items/)
Computes reliability tiers, reverse-scored error rates, and Friedman test statistics
Writes results to results/reviewer_experiments/multi_classifier_evaluation.json

File naming convention:

models/
├── ocean_rf_model.pkl          # Random Forest
├── ocean_svc_model.pkl         # Support Vector (LinearSVC)
├── ocean_lr_model.pkl          # Logistic Regression
├── ocean_knn_model.pkl         # k-Nearest Neighbors (k=5)
├── ocean_label_encoder.pkl     # Shared label encoder (used by all 4)
└── ...                         # Same pattern for all 44 models

All classifiers share the same label encoder per model. The label encoder maps factor names to integer indices and is created during training from the training data labels.

Dependencies: scikit-learn, numpy, joblib. No GPU or API key required. Training all 176 classifiers takes approximately 2 minutes on commodity hardware.

Computational Benchmarks

Four classifier types (RF, SVC, LR, kNN) are provided for 1536-dim embeddings (176 classifiers total). RF classifiers are also available at 3072-dim.

Measurement	1536-dim (4 classifiers)	3072-dim (RF only)
Embedding model	text-embedding-3-small	text-embedding-3-large
Total classifiers	176 (44 x RF/SVC/LR/kNN)	44 RF
Total model size	59 MB	107 MB
Best mean accuracy (test items)	74.7% (SVC)	63.8% (RF)
Best mean accuracy (human items)	83.5% (kNN, 20 models)	N/A
Best-per-model oracle (human)	86.8%	N/A
Models above 70% (best clf)	33	19
Total label encoders	29 KB	29 KB
Batch inference (50 chars x 1 model)	5.3 ms	~8 ms
Embedding cost (all 44 models)	$0.27	$0.54
ONNX export	Supported	Supported
Platform	Commodity CPU (no GPU required)	Same

SVC and kNN are recommended over RF for deployment. RF remains included for backward compatibility and because it wins on 4 models. The entire atlas runs on commodity hardware. Individual models are small enough for browser deployment via ONNX.js or mobile via Core ML / TFLite.

Downstream Validation

Preliminary downstream testing applies the atlas to personality inference across 28 novels and 535 literary characters (181 matched against scholarly ground truth), demonstrating the atlas as working infrastructure:

Result	Value	Significance
Personality inference accuracy	MAE = 0.298 [CI: 0.274, 0.324]	181 characters vs. scholarly GT, 28 books, r = 0.635
Author fingerprinting	85.7% (12/14 correct)	Cohen's d = 1.96. Personality vectors only, no text, no metadata.
Knowledge distillation	$0.001 per character	94% accuracy preserved vs. $1.53 full pipeline. 75K Gutenberg titles feasible.

These results use the atlas classifiers and embeddings from this repository applied to literary character profiling, a task the atlas was not specifically designed for.

Applied Use: AGI Cognitive Benchmarking

The 15-cluster Semantic Personality Atlas (SPA) can be instantiated as an LLM reasoning prior — a compact personality state that primes a language model toward a specific cognitive stance. The APERTURE submission to Kaggle's Measuring Progress Toward AGI competition (2026) demonstrates this transfer, instantiating one SPA cluster (the "Systematic Explorer") as a stance prepended to underspecified cognitive tasks across Executive Function, Metacognition, and Social Cognition tracks. Results announced June 1, 2026.

Published Kaggle Benchmark Task	Track
APERTURE-Clarification-Seeking	Metacognition
APERTURE-Executive-Inhibition	Executive Function
APERTURE-Risk-Authority	Social Cognition

Expert Evaluation: Open Invitation

We invite clinical psychologists, personality researchers, and psychometricians to independently evaluate the atlas using our complete evaluation toolkit. No API keys, no coding, no cost. A domain expert can complete all three tasks in 45-60 minutes.

Task	What you do	Items	Time
Item Classification	Assign 50 personality items to their correct factor	50 items across 20 models, all 7 categories	~25 min
Construct Review	Rate whether factor chain entries capture the intended construct	67 factor entries	~15 min
Taxonomy Review	Assess whether the 7 category groupings are appropriate	7 categories	~10 min

To participate:

git clone https://github.com/Wildertrek/survey.git
cd survey/expert_evaluation
pip install streamlit && streamlit run evaluator.py

A browser window opens automatically. When finished, click Submit to download your results and send them via a pre-addressed email. Results will be incorporated with full acknowledgment in published revisions.

See expert_evaluation/README.md for full instructions and system requirements.

Lexical Schema

Every dataset follows a standardized five-column lexical schema called a factor chain, a five-field descriptor that connects a theoretical construct to the words that measure it. Every model, regardless of its theoretical origins, ultimately describes factors, and every factor is defined by trait vocabulary. The factor-chain schema normalizes that relationship so that any model can be embedded, classified, and compared alongside any other.

Column	Description	Example (OCEAN)
`Factor`	Personality dimension	Extraversion
`Adjective`	Trait descriptor	Active
`Synonym`	Near-equivalent	Energetic
`Verb`	Behavioral form	Activate
`Noun`	Nominal quality	Activeness

This schema enables consistent embedding generation, cross-model comparison, and integration into LLM-based personality inference pipelines.

Schema Variations

38 of 44 models use the standard five-column schema above. Six models extend the schema with additional hierarchy or domain columns:

Model	Columns	Notes
WAIS	`Factor, Adjective, Description, Synonym, Verb, Noun, Embedding`	Extra `Description` column
DISC	`Domain, Subcategory, Factor, Adjective, Synonym, Verb, Noun`	Two hierarchy columns prepended
EM	`Type, Name, Factor, Adjective, Synonym, Verb, Noun, Adjacencies`	Enneagram type/name + adjacency list
TAT	`Category, Factor, Adjective, Synonym, Verb, Noun`	One hierarchy column prepended
TEI	`Domain, Factor, Adjective, Synonym, Verb, Noun`	One hierarchy column prepended
TKI	`Category, Factor, Adjective, Synonym, Verb, Noun`	One hierarchy column prepended

All models share the core Factor, Adjective, Synonym, Verb, Noun columns used for embedding generation. The five-column schema is the minimum; extensions provide additional categorical or relational structure.

Model Cards

Each model has a verified MODEL_CARD.md containing:

Description: theoretical basis and origin
Dimensions: factors, facets, and brain-function mappings
Applications: use cases in psychology, AI, and human-AI interaction
Timeline: historical milestones
Psychometrics: reliability, validity, instruments
Data Structure: schema fields and example entries
Resources: links to dataset, embeddings, classifier, and graph in this repo
Validation Results: classifier accuracy (RF, SVC, LR, kNN), baselines, LLM judge evaluation, category context

All cards were verified by the authors and updated with empirical validation results from the companion experiment (5,052 test items, triple-judge LLM panel).

Starter Notebooks

Each model folder in atlas/ includes a Jupyter notebook that demonstrates:

Loading the model's lexical dataset
Generating or loading embeddings
Training a classifier (RF shown in notebooks; SVC, LR, kNN also available as pre-trained .pkl files)
Evaluating classification accuracy
Visualizing embedding clusters (PCA)

These notebooks were developed in the Personality-Trait-Models research repository and represent the exact workflow used to produce the atlas artifacts. For multi-classifier training, see Train All Classifiers.

Colab Notebooks

Cross-model analysis notebooks in notebooks/, runnable directly in Google Colab (no API keys needed):

Notebook	Description
Quick Start	Load any model, PCA, FAISS search, 3 validation experiments, DSM-5 alignment
Classifier Comparison	Four-way comparison (RF/LR/SVC/kNN), Friedman test, reliability tiers, reverse-scored analysis
Deep Dive	Per-model deep analysis
Embedding Projector	PCA/t-SNE/UMAP projections, SPI clustering (RQ13-RQ14)

License

Code is licensed under the AGPL-3.0 License, see LICENSE. Datasets, embeddings, model cards, and non-code assets are licensed under CC BY-NC-SA 4.0, see DATA_LICENSE.

Citation

@article{Raetano2026Atlas,
  title   = {A Survey and Computational Atlas of Personality Models},
  author  = {Raetano, Joseph and Gregor, Jens and Tamang, Suzanne},
  journal = {ACM Transactions on Intelligent Systems and Technology},
  year    = {2026},
  note    = {Under review (TIST-2025-12-1243)}
}

Name		Name	Last commit message	Last commit date
Latest commit History 215 Commits
.github		.github
Embeddings		Embeddings
atlas		atlas
data		data
datasets		datasets
docs		docs
expert_evaluation		expert_evaluation
figures		figures
graphs		graphs
human_items		human_items
models		models
notebooks		notebooks
results		results
scripts		scripts
test_items		test_items
test_items_opus		test_items_opus
.gitignore		.gitignore
ACKNOWLEDGMENTS.md		ACKNOWLEDGMENTS.md
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
DATA_LICENSE		DATA_LICENSE
HISTORY.md		HISTORY.md
LICENSE		LICENSE
README.md		README.md
REVIEWER_ARTIFACTS.md		REVIEWER_ARTIFACTS.md
SECURITY.md		SECURITY.md
demo.py		demo.py
examples.json		examples.json
requirements.txt		requirements.txt
validate.py		validate.py
validation_results_1536.csv		validation_results_1536.csv

Folders and files

Latest commit

History

Repository files navigation

A Survey and Computational Atlas of Personality Models

Abstract

What the Abstract Is Saying

The Semantic Personality Index (SPI)

Pipeline

Quick Start

What's in the Atlas

Repository Structure

The 44 Models

I. Trait-Based Models (6)

II. Narcissism-Based Models (10)

III. Motivational and Value Models (6)

IV. Cognitive and Learning Models (4)

V. Clinical and Psychological Health Models (10)

VI. Interpersonal and Conflict Resolution Models (2)

VII. Application-Specific and Holistic Models (6)

Cross-Model PCA Analysis

Empirical Validation

Research questions at a glance

Experiment 1: Baseline (RQ1-6)

Experiment 2: Improvement cycle (RQ7-9)

Experiment 3: External validation (RQ10-12)

Cross-model convergent validity (FAISS)

Reproduce the Validation

Reproduce the PCA

Train All Classifiers

Computational Benchmarks

Downstream Validation

Applied Use: AGI Cognitive Benchmarking

Expert Evaluation: Open Invitation

Lexical Schema

Schema Variations

Model Cards

Starter Notebooks

Colab Notebooks

License

Citation

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages