Skip to content

Latest commit

 

History

History
163 lines (102 loc) · 6.41 KB

File metadata and controls

163 lines (102 loc) · 6.41 KB

Mathematics Underlying the LLM‑Ω Code

The code implements a mathematics‑only language model. Its core operations can be described mathematically. Below are the key equations and structures.


1. Mathematical Objects as Trees

Any mathematical expression is represented as a rooted, ordered tree ( \mathcal{T} ).
Nodes are operations ( ( +, \times, \sin, \int, \dots ) ) or symbols (variables, constants).
The set of all trees is ( \mathbf{Tree} ).


2. Pattern Recognition as Functor

The PRE maps raw inputs to trees. For text ( T ), image ( I ), diagram ( D ):

[ \text{PRE}: (T, I, D) \mapsto \tau \in \mathbf{Tree} ]

This is a learned functor from the category of sensory data to the category of trees.
If we denote the data category as ( \mathbf{Data} ), PRE is a functor:

[ \mathcal{F}: \mathbf{Data} \to \mathbf{Tree} ]

It is trained to minimise the reconstruction error:

[ \mathcal{L}{\text{PRE}} = \mathbb{E} \left[ | \tau - \tau{\text{true}} |_{\text{tree}} \right] ]

where ( | \cdot |_{\text{tree}} ) is a tree edit distance.


3. Core Mathematics Engine as a Tree Transformer

The CME is a neural network that maps a tree ( \tau ) to another tree ( \tau' ):

[ \text{CME}(\tau) = \tau' ]

Internally, it uses tree‑structured attention. Let ( \tau ) have nodes ( {v_i} ) with parent–child relations. The attention weight between nodes ( i ) and ( j ) is:

[ \alpha_{ij} = \frac{\exp( \langle q_i, k_j \rangle )}{\sum_{k} \exp( \langle q_i, k_k \rangle )} \cdot \mathbf{1}_{[j \text{ is ancestor/descendant of } i]} ]

where the mask restricts attention to the tree’s hierarchical structure. This is equivalent to a graph attention network on the tree’s adjacency.

The output tree is generated autoregressively: at each step, the model predicts the next symbol (node label) conditioned on previously generated nodes, following a depth‑first traversal order.


4. Interactive Theorem Proving as a Game

Let ( \Gamma ) be a set of hypotheses and ( G ) a goal (both trees). A proof is a sequence of transformations:

[ (\Gamma, G) \xrightarrow{\text{tactic}_1} (\Gamma_1, G_1) \xrightarrow{\text{tactic}_2} \dots \xrightarrow{\text{tactic}_n} (\Gamma_n, \top) ]

where ( \top ) is the empty goal. The model outputs a tactic sequence. The reward is 1 if the proof checker accepts, else 0. This is a Markov decision process with state ( s = (\Gamma, G) ) and action ( a = \text{tactic} ). The optimal policy is learned via reinforcement learning (e.g., PPO).


5. Discovery as Sampling from a Generative Model

The model’s output distribution over trees is ( p_\theta(\tau) ). Novelty is measured by embedding distance to known theorems:

[ \text{novelty}(\tau) = \min_{t \in \text{KB}} | \text{enc}(\tau) - \text{enc}(t) |_2 ]

The discovery process samples ( \tau \sim p_\theta ), filters by novelty, then attempts proof. If proved, ( \tau ) is added to knowledge base (KB). This is a form of active learning with intrinsic motivation (novelty reward).


6. Multi‑Modal Diagram Encoding

A commutative diagram is a directed graph ( \mathcal{G} = (V, E) ) with objects ( v \in V ) and morphisms ( e \in E ). The diagram encoder learns a function:

[ \Phi: \text{Image of diagram} \mapsto \mathcal{G} ]

It is trained on synthetic diagrams paired with their formal specification (e.g., as a category). The loss is a graph matching loss:

[ \mathcal{L}{\text{diag}} = \sum{v \in V} | \mathbf{h}v - \mathbf{h}v^{\text{true}} |^2 + \lambda \sum{(u,v) \in E} | \mathbf{m}{uv} - \mathbf{m}_{uv}^{\text{true}} |^2 ]

where ( \mathbf{h}v ) are node embeddings and ( \mathbf{m}{uv} ) are edge embeddings.


7. Reinforcement Learning from Mathematical Feedback

Let ( \mathcal{E} ) be an environment (CAS or proof assistant). At step ( t ), the model produces an expression ( e_t ). The environment returns a reward ( r_t = f(e_t, e_{\text{target}}) ), e.g.:

[ r_t = -\frac{|e_t - e_{\text{target}}|}{|e_{\text{target}}| + \epsilon} ]

for numerical tasks, or binary for correctness. The model is trained with policy gradient:

[ \nabla_\theta J = \mathbb{E}{\tau \sim p\theta} \left[ \sum_{t} \nabla_\theta \log p_\theta(a_t | s_t) , R_t \right] ]

where ( R_t ) is the discounted cumulative reward.


8. Self‑Play and Curriculum Learning

Define two copies of the model: ( G ) (generator) and ( S ) (solver). They play a game:

  1. ( G ) samples a problem ( p \sim p_\theta ).
  2. ( S ) attempts to solve ( p ) by generating a solution ( s ).
  3. A verifier (CAS) checks if ( s ) solves ( p ).

The reward for ( G ) is the verifier’s score, and for ( S ) it is the same. Both are updated. This is equivalent to a two‑player zero‑sum game whose Nash equilibrium corresponds to a distribution of problems that are solvable by the current solver.

The curriculum emerges because as ( S ) improves, ( G ) must generate harder problems to maintain a positive reward. This is a form of adversarial training.


9. Unified Objective

The full training objective combines:

  • Supervised next‑step prediction (on math trees): ( \mathcal{L}_{\text{sup}} )
  • RL proof game: ( J_{\text{RL}} )
  • Novelty‑driven discovery: intrinsic reward ( R_{\text{novelty}} )
  • Self‑play adversarial reward: ( R_{\text{sp}} )

[ \mathcal{L}{\text{total}} = \mathcal{L}{\text{sup}} + \lambda_1 J_{\text{RL}} + \lambda_2 \mathbb{E}[R_{\text{novelty}}] + \lambda_3 \mathbb{E}[R_{\text{sp}}] ]

All are differentiable with respect to the model parameters through appropriate approximations (REINFORCE, Gumbel‑softmax, etc.).


10. Summary Table of Key Math Objects

Concept Mathematical Form
Expression tree ( \tau \in \mathbf{Tree} )
PRE functor ( \mathcal{F}: \mathbf{Data} \to \mathbf{Tree} )
Tree attention ( \alpha_{ij} \propto \exp(\langle q_i, k_j \rangle) \cdot \text{mask}(i,j) )
Proof state ( (\Gamma, G) )
Policy gradient ( \nabla_\theta J = \mathbb{E}[\sum_t \nabla_\theta \log \pi_\theta(a_t
Novelty ( \min_{t \in \text{KB}} | \text{enc}(\tau) - \text{enc}(t) |_2 )
Self‑play payoff ( R_{\text{sp}} = \mathbf{1}_{\text{solution correct}} )

Thus, the code’s behavior is governed by these mathematical structures, making LLM‑Ω a principled mathematical agent.