Skip to content

akumoli-debug/pokervision

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Core idea: A stateful agent that builds persistent internal models of other agents from interaction, and conditions its decisions on those learned beliefs. Poker is used as a controlled testbed for repeated, adversarial interaction. This project is an exploration of agent-centric world modeling in a minimal multi-agent environment. Poker serves as a controlled setting for studying persistent entity representations, belief updates from interaction, and decision-making conditioned on learned internal models rather than equilibrium assumptions.

PokerVision: Exploitative Poker AI

PokerVision is a stateful agent operating in a multi-agent environment that learns persistent, opponent-specific behavioral models from interaction logs. Rather than optimizing for equilibrium play, it infers how other agents deviate from idealized assumptions and conditions its decisions on those learned internal models. Poker is used as a controlled testbed for studying behavioral world modeling under uncertainty.

Rather than relying on larger models or more data, the project focuses on explicit internal state, online belief updates, and interaction-driven learning.

A stateful agent that builds internal models of other agents and conditions decisions on them. Built as a lightweight research project to demonstrate how behavioural modelling can outperform static “play-perfect” solvers against real, non‑optimal opponents.

Demo

The core project lives in the pokervision_github/ folder and includes a simple web UI:

  • Run the live assistant (see Quickstart below), then open http://localhost:8000.
  • The UI returns Action, Bet size, Reasoning, and Advice for each analysis.
  • A demo GIF (pokervision_github/assets/demo.gif) in the project shows the assistant analysing hands and suggesting actions.

Architecture

PokerVision architecture

High-level flow:

  • Environment state (hand, pot, position, cards)
  • Opponent belief state (persistent memory over tendencies)
  • Policy network (conditions on state + beliefs)
  • Action recommendation (bet / call / fold, with explanation)
  • Observed opponent actionbelief update loop back into the opponent state

Advice + Belief (Live)

Advice in the live UI is generated from the current hand context (pot odds, hand strength, pressure) and a persistent per-opponent belief updated online after each interaction. That internal state—how often this opponent applies pressure vs checks—drives both the recommendation and the advice text (e.g. “Opponent shows frequent pressure in similar spots”). So decisions and explanations are conditioned on the same belief state.

Why Poker?

Poker provides a minimal, well-defined environment for studying multi-agent interaction under uncertainty. Agents have hidden state, partial observability, repeated interaction, and incentives to exploit systematic deviations—making it a useful testbed for behavioural world modeling without complex physics or perception.

Quickstart

# 1. Clone and enter the repo
git clone https://github.com/akumoli-debug/pokervision-General-Intuition.git
cd pokervision-General-Intuition/pokervision_github

# 2. Set up the environment
./setup.sh           # installs Python dependencies via pip

# 3. (Optional) Prepare your own data
#    Put PokerNow CSV logs into data/ and run:
python scripts/enhanced_complete_parser.py

# 4. Train the main model (card-aware, ~74% accuracy in our runs)
python scripts/train_with_cards.py

# 5. Launch the live assistant UI
python scripts/live_ui_fixed.py
# Then open http://localhost:8000 — the UI returns Action, Bet size, Reasoning, and Advice per hand.

Key Entry Points

Inside pokervision_github/ the most useful scripts and docs are:

  • Training

    • scripts/train_pytorch.py – basic model training.
    • scripts/train_enhanced_model.py – adds richer strategic features.
    • scripts/train_with_cards.py – full card-aware model used for the main results.
    • scripts/augment_data.py – data augmentation via suit/position symmetry.
    • scripts/finetune_opponent.py – opponent-specific fine-tuning.
  • Evaluation & analysis

    • scripts/compare_models_fixed.py – compare different checkpoints.
    • docs/TRAINING.md – full training instructions and tips.
  • Live play / demo

    • scripts/live_ui_fixed.py – launches the browser-based assistant.
    • docs/API.md – programmatic API usage examples.
  • Context for General Intuition

    • docs/GI_APPLICATION.md – one-pager framing this project for General Intuition.

How it Works

World model: opponent beliefs and conditioning

PokerVision maintains an internal belief state over other agents and updates it through interaction.

  1. Belief state (what is represented)
    For each opponent, the agent stores a persistent summary of latent behavioural tendencies (propensity to fold under pressure, call-down frequency, aggression by position and stack depth). This is encoded as a feature vector / embedding that serves as an approximation of the opponent’s policy.

  2. Online belief update (how it learns)
    After each observed action, the belief state is updated online using new evidence from the hand (action taken, context, outcome). Updates are incremental and confidence-weighted, so representations refine over repeated interaction while remaining robust to short-term variance.

  3. Policy conditioning (how it decides)
    At decision time, the policy conditions jointly on the current environment state and the opponent belief state. The belief modulates action preferences—for example, applying more pressure to inferred over-folders or favouring thin value bets versus calling stations—so behaviour adapts to specific opponents instead of playing a static equilibrium strategy.

In short (explicit abstraction):

belief ← prior
for each interaction:
    observe action
    belief ← update(belief, action)
    act ← policy(state, belief)

Data flow:

Hand state + opponent stats
          └──> Neural model (features + cards)
                      └──> Action logits + value
                                └──> Recommended action + updated opponent memory

Results (Snapshot)

Offline accuracy on held-out data from PokerNow logs:

Results are included to validate the learning loop; the primary contribution is the agent architecture and belief‑update mechanism rather than absolute performance.

Model Accuracy Notes
GTO-style baseline ~55% Unexploitable, not personalised
Basic neural net ~63% Limited strategic features
Enhanced + cards ~74% Uses SPR, position, cards
Opponent fine-tuning 80–99% Per-opponent models on enough data

These numbers are approximate and depend on the exact dataset split and hyperparameters; see the scripts in scripts/ for the full training pipeline.

Reproducibility

  • Data: Results were obtained on ~4K PokerNow hands, augmented via suit symmetry.
    To reproduce, you will need similar hand histories with comparable stakes and formats.
  • Scripts: All training scripts live in scripts/ and can be run end‑to‑end with the commands in docs/TRAINING.md.
  • Randomness: Training is stochastic (PyTorch initialisation, shuffling, etc.), so expect small variations around the reported accuracies.
    For more deterministic runs, fix seeds in Python / NumPy / PyTorch and keep hardware and library versions constant.
  • “Beats GTO” definition: Here, “beats GTO” means the exploitative model achieves higher offline action‑matching accuracy and higher expected value against recorded opponents than a static GTO‑style policy, when evaluated on the same held‑out hands.

Limitations

  • Not a solver: This is not a full-game equilibrium solver; it focuses on pattern recognition from historical data.
  • Data requirements: Performance depends heavily on having enough clean hand histories per opponent; sparse data can lead to noisy estimates.
  • Domain assumptions: The current feature set and parsing logic target PokerNow-style No-Limit Hold’em logs; other formats may require custom parsing.
  • Evaluation gap: Offline accuracy and EV on historical data are only proxies for live win‑rate; real‑world performance will depend on table dynamics and opponent adaptation.
  • Ethical use: This code is for research and educational purposes; many poker sites restrict or forbid real‑time assistance tools—check and follow the rules of any platform you use.
  • Behavioural stationarity: Opponent models assume behavioural stationarity over short horizons; long‑term adaptation and strategic deception are not yet modelled.

Failure Modes & Open Questions

  • Belief updates assume short-term behavioral stationarity; adversarial deception is not modeled.
  • Sparse interaction leads to overconfident beliefs without strong priors.
  • Offline evaluation may overestimate gains due to non-stationary opponent adaptation.

Design Choices

  • Frozen policy backbone with online belief updates: keeps the main decision network stable while beliefs adapt over time.
  • Confidence-weighted opponent updates: down-weights noisy, low-signal hands to reduce overfitting to variance.
  • Explicit separation between environment state and agent beliefs: makes it clear which information comes from the shared world vs. inferred opponent models, and how each influences decisions.

About

An adaptive agent in a multi-agent environment that learns persistent, opponent-specific behavioral patterns from interaction logs. Rather than optimizing for equilibrium, it builds internal models of other agents and conditions decisions on their deviations, illustrating behavioral world modeling under uncertainty.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors