Pickomino-Env

Animated demo of the Pickomino game played manually.

Description

An environment conforming to the Gymnasium API for the dice game Pickomino (Heckmeck am Bratwurmeck) Goal: train a Reinforcement Learning agent for optimal play. Meaning, decide which face of the dice to collect, when to roll and when to stop.

Features


Gymnasium API	Standard `reset` / `step` / `render` / `close` interface
Push-your-luck mechanics	Lock in die faces one at a time, decide when to stop before busting
Non-trivial decisions	Optimal play requires probability reasoning, not just greedy face selection
Multi-player bots	Play against 1–6 heuristic bot opponents
Reproducible episodes	Full seed support via `env.reset(seed=42)`
Three render modes	`None` (headless), `"human"` (pygame window), `"rgb_array"` (recording)
SB3 compatible	Dict observation space works with Stable-Baselines3 and other RL libraries

Differences from the Physical Game

If you know the physical game, note the following simplifications:

Failed Attempt: the highest tile on the table is removed, not turned face-down.
Tile selection: the best reachable tile is always taken automatically, you cannot choose a lower-valued tile like in the physical game.
Stealing: always performed when possible, you cannot choose.
Win condition: determined correctly when playing manually with GUI (most worms win, ties broken by the highest tile). When training without a renderer, no winner is declared. Use total reward as your metric. But take care, stolen tiles do not reduce your reward, total reward can exceed your final score.
Stack height: not included in the observation (visible in the physical game).

Action Space

The action space is MultiDiscrete([6, 2]). The step() method accepts both the ndarray returned by action_space.sample() and a plain Python tuple.

action = (die_face (0–5), action_type (0=roll, 1=stop))

Index	die_face	action_type
0–5	Die face to collect: 0→1 eye, 1→2 eyes, 2→3 eyes, 3→4 eyes, 4→5 eyes, 5→worm	—
0–1	—	0 = roll again, 1 = stop and take a tile

Observation Space

The observation is a dict with four keys:

Key	Max	Shape
dice_collected	8	(6,)
dice_rolled	8	(6,)
tiles_table	1	(16,)
tile_players	36	(number_of_players,)

There are eight dice, each with faces 1–5 plus a worm. The worm is a sixth distinct die face, but it scores 5 points. The same as the 5-eye face — so it is not a sixth distinct point value.

Note: There are eight dice to roll and collect. A die has six sides with the number of eyes one through five, but a worm instead of a six. The values correspond to the number of eyes, with the worm also having the value five (and not six!). The 16 tiles are numbered 21 to 36 and have worm values from one to four spread in four groups. The game is for two to seven players. Here your Reinforcement Learning Agent is the first player. The other players are computer bots. The bots play, according to a heuristic. When you create the environment, you have to define the number of bots.

For a more detailed description of the rules, see the file pickomino-rulebook.pdf. You can play the game online here: https://www.maartenpoirot.com/pickomino/. The heuristic used by the bots is described here: https://frozenfractal.com/blog/2015/5/3/how-to-win-at-pickomino/.

Rewards

The goal is to collect tiles in a stack. The winner is the player, which at the end of the game has the most worms on her tiles. For the Reinforcement Learning Agent a reward equal to the value (worms) of a tile is given when the tile is picked. For a failed attempt (see rulebook), a corresponding negative reward is given. When a bot steals your tile, no negative reward is given. Hence, the total reward at the end of the game can be greater than the score.

For the full rules see the Pickomino rulebook or play online. To try the environment manually, see Play manually. The bot heuristic is described here.

Info Dictionary

The info dictionary is returned at every step. It is intended for debugging and logging, not for learning.

Key	Type	Description
`dice_collected`	`list[int]`	Counts of each die face collected this turn
`dice_rolled`	`list[int]`	Counts of each die face in the current roll
`terminated`	`bool`	Whether the episode has terminated
`truncated`	`bool`	Whether the game was truncated due to the last action
`tiles_table_vec`	`numpy.ndarray[int8]`, shape `(16,)`	Binary vector of tiles currently available on the table
`smallest_tile`	`int`	Lowest-numbered tile still on the table
`explanation`	`str`	Reason for the last termination, truncation, or failed attempt
`player_stack`	`list[int]`	All tiles currently held by the agent
`player_score`	`int`	Agent's current score (sum of worm values)
`current_player_index`	`int`	Index of the player whose turn it is
`bot_scores`	`list[int]`	Scores of all bots, in order

Starting State

dice_collected = [0, 0, 0, 0, 0, 0].
dice_rolled = [3, 0, 1, 2, 0, 2] Random dice, sum = 8.
tiles_table = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1].
tile_players = [0, 0, 0] (with number_of_bots = 2).

Episode End

Termination occurs when there are no more tiles to take on the table — Game Over.

Truncation

Truncation occurs when the agent attempts an illegal action during dice selection or rolling (for example, selecting a face that was not rolled, selecting a face already collected this turn, or choosing to roll when no dice remain). The game continues, and a new valid action is required.

Invalid Actions

Out-of-range actions (outside [0–5] or [0–1]) raise a ValueError and do not affect the episode state.

Failed Attempt

A Failed Attempt occurs when the agent fails to secure a tile. If the agent has a stack of already picked tiles, then the top tile is returned to the table, and a negative reward is applied. If the stack is empty, nothing happens, and the reward is zero. The game continues — the episode does not end.

Arguments

These must be specified.

Parameter	Type	Default	Description
`number_of_bots`	int	1	Number of bot opponents (1-6) you want to play against
`render_mode`	str or None	None	Visualization mode: None (training), "human" (display), or "rgb_array" (recording)

Bot Heuristic

The bots use the following heuristic, inspired by Frozen Fractal's strategy:

Take the highest-contributing face. Select the die face where count × face value is greatest. Worms count as 5.
Tie-breaking. When two faces contribute equally: prefer worms over 5s. If still tied, prefer the face with the fewest dice keeping more dice available for future rolls. Hence, for example, three 4s are preferred over four 3s.
Worm priority on early rolls. If no dice have been collected yet and this is the third roll or later, take worms if available, regardless of contribution.
Stop as soon as a tile is reachable. Once the running total meets or exceeds the lowest available tile value, and a worm has been collected, the bot stops.

Setup

Python 3.10–3.14

Installation

We recommend installing in a virtual environment:

python -m venv .venv

# macOS / Linux
source .venv/bin/activate

# Windows PowerShell
.venv\Scripts\Activate.ps1

# Windows cmd.exe
.venv\Scripts\activate.bat

# Windows Git Bash
source .venv/Scripts/activate

pip install pickomino-env

Verify the installation:

pickomino-play

Quick Start

import gymnasium as gym

# render_mode options:
#   None         — no rendering, fastest (default, recommended for training)
#   "human"      — pygame window, requires a display
#   "rgb_array"  — returns RGB array, useful for recording
env = gym.make("Pickomino-v0", render_mode="human", number_of_bots=2)

# Reset and get initial observation
obs, info = env.reset(seed=42)

# Run one episode
terminated = False
truncated = False
total_reward = 0

while not terminated and not truncated:
    # Agent selects action: (die_face, roll_choice)
    action = env.action_space.sample()  # Random action for demo
    # Step environment
    obs, reward, terminated, truncated, info = env.step(action)
    total_reward += reward
    if truncated:
        print(f"Invalid action: {info['explanation']}")

print(f"Episode finished. Total reward: {total_reward}")
env.close()

Play Manually

Playing a few games by hand is the fastest way to understand the rules and the strategic depth before training an agent. Launch the pygame GUI:

# One bot (default)
pickomino-play

# Up to six bots
pickomino-play --number-of-bots=3

To adjust bot play speed, change RENDER_DELAY in constants.py. A higher value slows bots down. A lower value speeds them up.

RENDER_DELAY: Final[float] = 2

Security & Bug Bounty

Found a bug? Valid reports are rewarded with a physical copy of the Pickomino board game. See SECURITY.md for scope, timelines, and reporting instructions.

Contributing

Contributions are welcome. The project runs sprints with issues assigned to contributors.

Browse or open an issue on GitHub Issues
Create a branch using the format <issue-number>-<brief-description>
Run pre-commit run --all-files before pushing
Open a Pull Request from your branch to the main branch.

See CONTRIBUTING.md for the full workflow, code style requirements, and definition of done.

Resources

Game Rules — Pickomino Rulebook
Play Online — Maarteen Poirot's Pickomino
Play on Board Game Arena — Pickomino with Elo
**Strategy Discussion ** — Playing the Odds — One Worm at a Time
Bot Strategy — How to Win at Pickomino
Gymnasium Docs — gymnasium.farama.org

Contact

Maintained by smallgig. For questions or ideas, open an issue with the label question.

License

MIT License. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 756 Commits
.github		.github
assets		assets
examples		examples
pickomino_env		pickomino_env
tests @ 58df526		tests @ 58df526
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
TESTING.md		TESTING.md
TROUBLESHOOTING.md		TROUBLESHOOTING.md
pickomino-rulebook.pdf		pickomino-rulebook.pdf
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pickomino-Env

Description

Features

Differences from the Physical Game

Action Space

Observation Space

Rewards

Info Dictionary

Starting State

Episode End

Truncation

Invalid Actions

Failed Attempt

Arguments

Bot Heuristic

Setup

Installation

Quick Start

Play Manually

Security & Bug Bounty

Contributing

Resources

Contact

License

About

Uh oh!

Releases 14

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Pickomino-Env

Description

Features

Differences from the Physical Game

Action Space

Observation Space

Rewards

Info Dictionary

Starting State

Episode End

Truncation

Invalid Actions

Failed Attempt

Arguments

Bot Heuristic

Setup

Installation

Quick Start

Play Manually

Security & Bug Bounty

Contributing

Resources

Contact

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 14

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages