QitOS

QitOS is the torch-flavor framework for agent researchers.

Prototype methods, run benchmarks, and inspect long-horizon trajectories on one AgentModule + Engine kernel with built-in qita observability.

QitOS core is the small framework. Product-grade applications and showcase agents live in qitos-zoo, including planned apps such as qitos-coder and qitos-cyber-agent.

Quickstart · Tutorial Track · Benchmarks · CLI Reference · Changelog · Chinese README

Latest Progress

v0.5 multimodal core phase 1 is now in the main kernel: OpenAI-compatible image input, screenshot-first ObservationPack support, qita visual asset inspection, and a new visual_inspect_agent baseline for visual-web / GUI research.
v0.5 computer-use phase 1 is now live: an OSWorld-inspired DesktopEnv, provider-neutral GUI action protocols, ComputerUseToolSet, and minimal desktop smoke/baseline components.
Desktop benchmarking is now split into clear layers: desktop-starter remains the canonical starter benchmark, qitos.recipes.desktop.osworld_starter now hosts the reproducible baseline recipe, and qitos.benchmark.osworld is the new home for real OSWorld-style adapter/runtime/evaluator integration.
QitOS now separates starter benchmarks, real benchmark adapters, and reproducible recipes across the whole benchmark surface: GAIA, Tau-Bench, CyBench, desktop-starter, and osworld all route through qitos.benchmark plus qitos.recipes, with a new contributor guide for third-party benchmark integration.

What's New in v0.3.0

Official reproducible-run foundation with RunSpec, ExperimentSpec, and normalized benchmark outputs.
New qit bench workflow for run, eval, replay, and export.
qita replay, export, and diff surfaces for review-grade trajectory inspection.
Course-style tutorial track plus new reproducibility and failed-run replay guides.

If this direction resonates, please star the repo, open an issue, or contribute. Early feedback matters a lot.

Live Terminal of QitOS for Code Review

Who QitOS is For

Method researchers who want to change prompts, parsers, critics, tools, and memory policies without rewriting the runtime.
Benchmark users who want GAIA, Tau-Bench, and CyBench workflows on the same kernel they use for agent development.
Long-running agent debuggers who care about trajectory review, replay, diff, and context-collapse diagnosis instead of app scaffolding alone.

Run QitOS in 2 Minutes

The minimal agent in QitOS is a minimal coding agent. It configures a real model, works inside a workspace, edits code, runs a verification command, and leaves behind a qita-ready trace.

pip install "qitos[models]"
export OPENAI_API_KEY="sk-..."
qit demo minimal
qita board --logdir runs

Optional but common for OpenAI-compatible providers:

export OPENAI_BASE_URL="https://api.siliconflow.cn/v1/"
export QITOS_MODEL="Qwen/Qwen3-8B"

qit demo minimal seeds a tiny buggy workspace, asks a model-backed coding agent to fix it, verifies the patch, and writes the trajectory to ./runs.

Then go deeper:

Want ReAct? See examples/patterns/react.py
Want a coding agent? See examples/real/coding_agent.py
Want benchmarks? Start with the benchmark guides

Why QitOS

If you want...	QitOS gives you...
reproducible agent research	a stable `AgentModule + Engine` kernel
observability	`qita` board, replay, export, and trace artifacts
benchmark workflows	GAIA, Tau-Bench, and CyBench adapters
less framework glue code	one canonical execution loop

Example Gallery

Core Patterns

ReAct: text protocol + one-action-per-step baseline.
PlanAct: explicit plan first, then execute step by step.
Tree-of-Thought: branch and select before acting.
Reflexion: actor-critic loop with grounded retry behavior.

Real Agents

Coding agent: practical coding loop with editor, shell, and memory.
Research harness agent: research-first prompt/parser/protocol authoring.
Desktop smoke: minimal deterministic desktop environment loop.

Product-grade coding, desktop, EPUB, and security agents are staged for qitos-zoo, not the QitOS core example path.

Evaluation

GAIA: benchmark runner on the QitOS kernel.
Tau-Bench: standardized benchmark adapter path.
CyBench: CTF-like evaluation with guided metrics.

Canonical examples live in:

Tooling Layout

QiTOS separates tool imports into three layers:

qitos.kit: the simplest curated entrypoint for common toolsets
qitos.kit.toolset: scenario-oriented presets and registry builders
qitos.kit.tool.<domain>: advanced atomic capability imports

Default composition is list-first:

from qitos import ToolRegistry
from qitos.kit.tool.file import ReadFile
from qitos.kit.toolset import coding_tools

registry = ToolRegistry().include_toolset(
    [
        ReadFile(workspace_root="."),
        coding_tools(workspace_root="."),
    ]
)

Security-sensitive tools are explicit opt-in imports and are not part of qitos, qitos.kit, qit demo, or the quickstart path.

Documentation Map

Start here: Introduction
First successful run: Quickstart
Install options: Installation
Build your own minimal coding agent: First Agent
Build the first screenshot-first baseline: Multimodal Core and Visual-Web Research
Learn the runtime: AgentModule / Engine
Inspect traces: Observability
Follow the course: Tutorials
Run benchmarks: Benchmarks Overview
Check commands: CLI Reference
Need API details: API Reference

Preview

QitOS CLI	qita Board	qita Trajectory View

Status

QitOS is currently Alpha.

Stable direction: AgentModule + Engine, trace/qita flow, canonical examples, benchmark adapters, and official reproducible-run contracts.
Likely to evolve: higher-level convenience APIs, some kit modules, and experimental toolsets.
If you are evaluating adoption, start from the kernel and examples, not assumptions about frozen surface area.
For ongoing project evolution and upgrade notes, see CHANGELOG.md.

Installation and Versions

Supported Python version: 3.10+
User install: pip install "qitos[models]"
Minimal coding agent: qit demo minimal
Optional provider config: OPENAI_API_KEY, OPENAI_BASE_URL, QITOS_MODEL
Core-only install: pip install qitos
Repo source install: pip install -r requirements.txt
Full contributor install: pip install -r requirements-dev.txt
Installation guide: Installation

Contributing

Contributions are welcome, especially around benchmark adapters, memory/history workflows, qita UX, and framework contracts. Product-grade agents should target qitos-zoo. Start with CONTRIBUTING.md for the PR process, DEVELOPMENT.md for the local workflow, ARCHITECTURE.md for system design, SECURITY.md for disclosure guidance, and CODE_OF_CONDUCT.md for community expectations.

License

MIT. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
.agents/skills/playwright-cli		.agents/skills/playwright-cli
.claude/skills/playwright-cli		.claude/skills/playwright-cli
.github		.github
assets		assets
docs		docs
examples		examples
plans		plans
qitos		qitos
sandbox		sandbox
templates		templates
tests		tests
.env.example		.env.example
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
ARCHITECTURE.md		ARCHITECTURE.md
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
CORE_BOUNDARY.md		CORE_BOUNDARY.md
DESIGN.md		DESIGN.md
DEVELOPMENT.md		DEVELOPMENT.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
README.zh.md		README.zh.md
SECURITY.md		SECURITY.md
agent.md		agent.md
agent_new.md		agent_new.md
demo.gif		demo.gif
live_test_flow_debug.py		live_test_flow_debug.py
live_test_flow_debug2.py		live_test_flow_debug2.py
live_test_gen_debug.py		live_test_gen_debug.py
live_test_gen_debug3.py		live_test_gen_debug3.py
live_test_generator.py		live_test_generator.py
live_test_generator2.py		live_test_generator2.py
live_test_parse.py		live_test_parse.py
live_test_pentagi.py		live_test_pentagi.py
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
security_report.md		security_report.md
setup.py		setup.py
test_generator_debug.py		test_generator_debug.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

QitOS

Latest Progress

What's New in v0.3.0

Live Terminal of QitOS for Code Review

Who QitOS is For

Run QitOS in 2 Minutes

Why QitOS

Example Gallery

Core Patterns

Real Agents

Evaluation

Tooling Layout

Documentation Map

Preview

Status

Installation and Versions

Contributing

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

QitOS

Latest Progress

What's New in v0.3.0

Live Terminal of QitOS for Code Review

Who QitOS is For

Run QitOS in 2 Minutes

Why QitOS

Example Gallery

Core Patterns

Real Agents

Evaluation

Tooling Layout

Documentation Map

Preview

Status

Installation and Versions

Contributing

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages