SUN is a small proof-of-concept that explores building a minimal, interpretable language model by composing a network of information sources and influence factors. The aim is educational: to experiment with how small, inspectable components can be combined to produce language-like behavior, not to compete with large production models. This is build on the CAPA_v1 architecture (more on that in future readme, this one is ai i was very lazy i know)
The project models a language system as a directed, weighted graph of:
- information nodes: corpora, token statistics, lexical resources, heuristics;
- influence edges: weights and transformation rules that modulate signals between nodes;
- aggregation and decoding components that combine signals and produce tokens.
By keeping components small and explicit, SUN makes it easier to reason about which inputs contribute to a decision, and to test simple alternatives to black-box models.
- Lightweight Python implementation for constructing information-influence networks.
- Configurable node and edge types (statistical, rule-based, heuristic).
- Simple token-level scoring and decoding (greedy / gated aggregation).
- Example datasets and notebooks for exploration.
- src/ — core implementation (network, node classes, aggregation, decoders)
- data/ — tiny example datasets and token-frequency files
- examples/ — runnable scripts demonstrating basic usage
- notebooks/ — interactive experiments and visualizations
- tests/ — small unit tests (if present)
- README.md — this file
Adjust the structure to match the repository if files are organized differently.
A small Python environment is sufficient.
-
Create and activate a virtual environment:
python -m venv .venv source .venv/bin/activate # macOS / Linux .venv\Scripts\activate # Windows
-
Install dependencies (this may not be up to date):
pip install -r requirements.txt
If the requirements.txt is not up to date, install the basics for experimenting:
pip install numpy networkx jupyter
Run an example script (adjust path to match repository):
python examples/run_simple_network.py
Or start a notebook to explore interactively:
jupyter notebook notebooks/demo.ipynb
Minimal conceptual steps:
- Define nodes and edges (via config or code).
- Load small token statistics or datasets into nodes.
- Propagate and aggregate signals across the network to score tokens.
- Decode tokens into short sequences using greedy or simple beam search.
Nodes accept small, human-readable inputs such as JSON, CSV, or plain text token-frequency maps. Keep datasets tiny for fast iteration.
Example token-frequency JSON:
{
"the": 5000,
"cat": 200,
"sat": 150
}
- Educational toy project — not production-ready.
- Simplified components yield poor performance compared to modern LLMs.
- Limited evaluation tooling: add tests and metrics for experiments.
Contributions welcome. Good ways to help:
- Add examples and small datasets.
- Improve documentation and notebooks.
- Implement new node/edge types and aggregation strategies.
- Add tests and CI for reproducible experiments.
When contributing, please open a small, focused pull request describing changes.
There is non... just use it responsivley. I trust yall!
Owner: @gurkebaui