Deep Polygon

A multi-agent deep reinforcement learning experiment using PyTorch with a modern real-time monitoring and control dashboard using Next.js - designing and visualizing: collaborative and competitive agents for grid based arena environments like splix.io, paper.io and tileman.io.

Showcase

Aggressive	Balanced	Defensive

Prioritize kills	Land capture and low risk kills	Prioritize land capture, avoid combat

Real-time Dashboard [WIP]

Homepage
Replay tab

Dashboard Features

The main purpose of the dashboard is to visualize the agents as a live stream, and display real-time statistics in graphs and cards, as well as a live leaderboard of the best performing agents/episodes. This section will be completed as development progresses.

Live Training Dashboard
- Real-time updates via WebSocket for:
- Training progress (% complete, ETA, steps, speed)
- Live agent stats (total reward, epsilon, etc.)
- Leaderboard highlights (e.g. new top runs)
- Initial load data via SSR (REST API) for fast first paint
Stat Cards
- Dynamic, color-coded stat blocks for metrics like:
  - Avg Reward, Loss, Episode Length, Utilization
- Smooth animated number transitions with Framer Motion
- Icons, delta indicators, and responsive layout
Training Progress
- Live animated progress bar
- ETA and step counter
- Pause/resume and save checkpoint controls
Live Player View
- Toggle between specific agent or current best
- Real-time updates of player reward and status
Leaderboard
- SSR-rendered initial leaderboard
- WebSocket-driven updates for new top performances
- Metrics: replay ID, kills, area covered, time
Graphs & Charts
- Time series of reward/loss/etc. powered by Recharts
- SSR-backed initial data, live data injected from WebSocket
- Smooth transitions and tooltips
State Management
- Global state via Redux (minimal, efficient)
- Built-in selectors to avoid unnecessary re-renders
- WebSocket dispatches updates directly to store
UI Framework
- shadcn/ui for modern, accessible components
- TailwindCSS for utility-first styling
- Lucide icons for clean visuals

Tech Stack

React (v19)
Next.js (App Router)
Redux
shadcn/ui + Tailwind CSS for UI
Framer Motion for animation
Recharts for graphs
WebSocket for real-time updates
Python FastAPI (REST API) + SSR for initial hydration
Lucide Icons

Overview

Table of Contents

Reinforcement learning
Environment
Strategy
- Secondary Strategies
Getting Started
Roadmap

This repository has two main components: deep reinforcement learning algorithms and the grid environment.

Environment

Agents control a character in a grid-based map and aims to own as much land as possible by capturing it with their trail.
An agent can kill other agents (including themselves) by colliding with any of their trail segments.

Agents are rewarded for their current land area and each player kill.

Reinforcement Learning

Reward Function

A simple model function is ideal in RL since it leaves more creativity for the AI's behaviour, and is also less computationally expensive - important if it's run each step of the game (for every tile moved).

Spawn Conditions

A player spawns in a 5x5 area of their own land.

Model Design

A CNN is required for the AI to have spatial understanding - to avoid enemies and maximize the area captured by its path.

Several scalar inputs are also useful, to pass continuous values which vary within a range, or discrete values (flags or states we can enumerate). This can be used to force certain behaviour.

State Design

The game grid is viewed as an image with 3 color channels.

The 1st channel represents the state of the grid - empty, player block or enemy block. Up to n unique enemies can be represented (a configurable parameter) within the player's FOV at any instance.
The 2nd channel represents the trail/path taken by the player.
The 3rd channel represents the trail/path of the (up to) n enemies simultaneously.

Limitations

Training a CNN requires a much larger training set. In RL this would require gathering more state transition tuples or simply experiencing the game for longer.

Strategy

There are three primary play styles which arise from these rules - the primary AIs.

Aggressive - optimize for player kills over survival and land capturing.
Balanced - capture land while capitalizing on opportunities for kills and avoiding unnecessary risk.
Defensive - optimize for land capture, ignoring player kills and any danger.

However, these classes can be expanded on with new gameplay objectives.

Secondary Strategies

Targeting

Given the position of a mobile target (player or position), attempt to achieve a goal. The target may be outside of the FOV.
- Assassin
  - hunt down the target player, optimally paths towards the target with land capture
  - ignore other enemy players
  - avoid death from excessive risk
  - kills the target with a high risk, sacrificial playstyle
- Defender
  - optimally paths towards the target
  - when target is found, nearby enemies are fought while ensuring the target is still protected
Units

A group of agents which must maintain a formation to ensure an objective/prevent a failure condition.
- Guardian Formation
  - Defenders surround their target to prevent their death at all costs.
  - Has multiple Defender agents and must coordinate with other defenders and enemy Guardian formations.
Land Manipulation

Capturing all of an enemy's land renders them vulnerable and reduces their score.
- Consumer
  - Specifically captures enemy land, optimally pathing to enemy area
  - ignores other players

Getting Started

This simulation can be run to train AIs.

Prerequisites

Docker
A GPU which can be exposed using --gpus with docker. Read more here.

Setup

Create a .env file in the project root using the contents of .env.example. The training telemetry and logging is displayed by a discord webhook, however this feature is optional and can be disabled using the --no-webhook CLI option. See this official blog post for how to setup and use a webhook.

Example .env file:

WEBHOOK_ID=<discord-webhook-id>
WEBHOOK_TOKEN=<discord-webhook-token>

Usage

./run-offline-nvidia.sh is the entrypoint script and passess all arguments to main.py.

./run-offline-nvidia.sh

./run-offline-nvidia.sh train

./run-offline-nvidia.sh eval

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
.devcontainer		.devcontainer
.vscode		.vscode
assets		assets
core		core
frontend		frontend
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
build-nvidia.sh		build-nvidia.sh
config.json		config.json
docker-compose.cuda.yml		docker-compose.cuda.yml
docker-compose.rocm.yml		docker-compose.rocm.yml
docker-compose.yml		docker-compose.yml
nv-compose.sh		nv-compose.sh
requirements.txt		requirements.txt
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Polygon

Showcase

Real-time Dashboard [WIP]

Dashboard Features

Tech Stack

Overview

Environment

Reinforcement Learning

Reward Function

Spawn Conditions

Model Design

State Design

Limitations

Strategy

Secondary Strategies

Targeting

Units

Land Manipulation

Getting Started

Prerequisites

Setup

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Deep Polygon

Showcase

Real-time Dashboard [WIP]

Dashboard Features

Tech Stack

Overview

Environment

Reinforcement Learning

Reward Function

Spawn Conditions

Model Design

State Design

Limitations

Strategy

Secondary Strategies

Targeting

Units

Land Manipulation

Getting Started

Prerequisites

Setup

Usage

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages