An environment conforming to the Gymnasium API for the dice game Pickomino (Heckmeck am Bratwurmeck) Goal: train a Reinforcement Learning agent for optimal play. Meaning, decide which face of the dice to collect, when to roll and when to stop.
| Gymnasium API | Standard reset / step / render / close interface |
| Push-your-luck mechanics | Lock in die faces one at a time, decide when to stop before busting |
| Non-trivial decisions | Optimal play requires probability reasoning, not just greedy face selection |
| Multi-player bots | Play against 1–6 heuristic bot opponents |
| Reproducible episodes | Full seed support via env.reset(seed=42) |
| Three render modes | None (headless), "human" (pygame window), "rgb_array" (recording) |
| SB3 compatible | Dict observation space works with Stable-Baselines3 and other RL libraries |
If you know the physical game, note the following simplifications:
- Failed Attempt: the highest tile on the table is removed, not turned face-down.
- Tile selection: the best reachable tile is always taken automatically, you cannot choose a lower-valued tile like in the physical game.
- Stealing: always performed when possible, you cannot choose.
- Win condition: determined correctly when playing manually with GUI (most worms win, ties broken by the highest tile). When training without a renderer, no winner is declared. Use total reward as your metric. But take care, stolen tiles do not reduce your reward, total reward can exceed your final score.
- Stack height: not included in the observation (visible in the physical game).
The action space is MultiDiscrete([6, 2]). The step() method accepts both
the ndarray returned by action_space.sample() and a plain Python tuple.
action = (die_face (0–5), action_type (0=roll, 1=stop))
| Index | die_face | action_type |
|---|---|---|
| 0–5 | Die face to collect: 0→1 eye, 1→2 eyes, 2→3 eyes, 3→4 eyes, 4→5 eyes, 5→worm | — |
| 0–1 | — | 0 = roll again, 1 = stop and take a tile |
The observation is a dict with four keys:
| Key | Min | Max | Shape |
|---|---|---|---|
| dice_collected | 0 | 8 | (6,) |
| dice_rolled | 0 | 8 | (6,) |
| tiles_table | 0 | 1 | (16,) |
| tile_players | 0 | 36 | (number_of_players,) |
There are eight dice, each with faces 1–5 plus a worm. The worm is a sixth distinct die face, but it scores 5 points. The same as the 5-eye face — so it is not a sixth distinct point value.
Note: There are eight dice to roll and collect. A die has six sides with the number of eyes one through five, but a worm instead of a six. The values correspond to the number of eyes, with the worm also having the value five (and not six!). The 16 tiles are numbered 21 to 36 and have worm values from one to four spread in four groups. The game is for two to seven players. Here your Reinforcement Learning Agent is the first player. The other players are computer bots. The bots play, according to a heuristic. When you create the environment, you have to define the number of bots.
For a more detailed description of the rules, see the file pickomino-rulebook.pdf. You can play the game online here: https://www.maartenpoirot.com/pickomino/. The heuristic used by the bots is described here: https://frozenfractal.com/blog/2015/5/3/how-to-win-at-pickomino/.
The goal is to collect tiles in a stack. The winner is the player, which at the end of the game has the most worms on her tiles. For the Reinforcement Learning Agent a reward equal to the value (worms) of a tile is given when the tile is picked. For a failed attempt (see rulebook), a corresponding negative reward is given. When a bot steals your tile, no negative reward is given. Hence, the total reward at the end of the game can be greater than the score.
For the full rules see the Pickomino rulebook or play online. To try the environment manually, see Play manually. The bot heuristic is described here.
The info dictionary is returned at every step. It is intended for debugging and
logging, not for learning.
| Key | Type | Description |
|---|---|---|
dice_collected |
list[int] |
Counts of each die face collected this turn |
dice_rolled |
list[int] |
Counts of each die face in the current roll |
terminated |
bool |
Whether the episode has terminated |
truncated |
bool |
Whether the game was truncated due to the last action |
tiles_table_vec |
numpy.ndarray[int8], shape (16,) |
Binary vector of tiles currently available on the table |
smallest_tile |
int |
Lowest-numbered tile still on the table |
explanation |
str |
Reason for the last termination, truncation, or failed attempt |
player_stack |
list[int] |
All tiles currently held by the agent |
player_score |
int |
Agent's current score (sum of worm values) |
current_player_index |
int |
Index of the player whose turn it is |
bot_scores |
list[int] |
Scores of all bots, in order |
dice_collected= [0, 0, 0, 0, 0, 0].dice_rolled= [3, 0, 1, 2, 0, 2] Random dice, sum = 8.tiles_table= [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1].tile_players= [0, 0, 0] (with number_of_bots = 2).
Termination occurs when there are no more tiles to take on the table — Game Over.
Truncation occurs when the agent attempts an illegal action during dice selection or rolling (for example, selecting a face that was not rolled, selecting a face already collected this turn, or choosing to roll when no dice remain). The game continues, and a new valid action is required.
Out-of-range actions (outside [0–5] or [0–1]) raise a ValueError and do not
affect the episode state.
A Failed Attempt occurs when the agent fails to secure a tile. If the agent has a stack of already picked tiles, then the top tile is returned to the table, and a negative reward is applied. If the stack is empty, nothing happens, and the reward is zero. The game continues — the episode does not end.
These must be specified.
| Parameter | Type | Default | Description |
|---|---|---|---|
number_of_bots |
int | 1 | Number of bot opponents (1-6) you want to play against |
render_mode |
str or None | None | Visualization mode: None (training), "human" (display), or "rgb_array" (recording) |
The bots use the following heuristic, inspired by Frozen Fractal's strategy:
- Take the highest-contributing face. Select the die face where
count × face valueis greatest. Worms count as 5. - Tie-breaking. When two faces contribute equally: prefer worms over 5s. If still tied, prefer the face with the fewest dice keeping more dice available for future rolls. Hence, for example, three 4s are preferred over four 3s.
- Worm priority on early rolls. If no dice have been collected yet and this is the third roll or later, take worms if available, regardless of contribution.
- Stop as soon as a tile is reachable. Once the running total meets or exceeds the lowest available tile value, and a worm has been collected, the bot stops.
- Python 3.10–3.14
We recommend installing in a virtual environment:
python -m venv .venv
# macOS / Linux
source .venv/bin/activate
# Windows PowerShell
.venv\Scripts\Activate.ps1
# Windows cmd.exe
.venv\Scripts\activate.bat
# Windows Git Bash
source .venv/Scripts/activate
pip install pickomino-envVerify the installation:
pickomino-playimport gymnasium as gym
# render_mode options:
# None — no rendering, fastest (default, recommended for training)
# "human" — pygame window, requires a display
# "rgb_array" — returns RGB array, useful for recording
env = gym.make("Pickomino-v0", render_mode="human", number_of_bots=2)
# Reset and get initial observation
obs, info = env.reset(seed=42)
# Run one episode
terminated = False
truncated = False
total_reward = 0
while not terminated and not truncated:
# Agent selects action: (die_face, roll_choice)
action = env.action_space.sample() # Random action for demo
# Step environment
obs, reward, terminated, truncated, info = env.step(action)
total_reward += reward
if truncated:
print(f"Invalid action: {info['explanation']}")
print(f"Episode finished. Total reward: {total_reward}")
env.close()Playing a few games by hand is the fastest way to understand the rules and the strategic depth before training an agent. Launch the pygame GUI:
# One bot (default)
pickomino-play
# Up to six bots
pickomino-play --number-of-bots=3To adjust bot play speed, change RENDER_DELAY in constants.py. A higher value slows bots down. A lower
value speeds them up.
RENDER_DELAY: Final[float] = 2Found a bug? Valid reports are rewarded with a physical copy of the Pickomino board game. See SECURITY.md for scope, timelines, and reporting instructions.
Contributions are welcome. The project runs sprints with issues assigned to contributors.
- Browse or open an issue on GitHub Issues
- Create a branch using the format
<issue-number>-<brief-description> - Run
pre-commit run --all-filesbefore pushing - Open a Pull Request from your branch to the main branch.
See CONTRIBUTING.md for the full workflow, code style requirements, and definition of done.
- Game Rules — Pickomino Rulebook
- Play Online — Maarteen Poirot's Pickomino
- Play on Board Game Arena — Pickomino with Elo
- **Strategy Discussion ** — Playing the Odds — One Worm at a Time
- Bot Strategy — How to Win at Pickomino
- Gymnasium Docs — gymnasium.farama.org
Maintained by smallgig.
For questions or ideas, open an issue with the label question.
MIT License. See LICENSE for details.
