Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
241 changes: 241 additions & 0 deletions articles/20260531_run_ai_engineer_rehearsals_in_daytona.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,241 @@
---
title: 'Run AI Engineer Rehearsals in Daytona'
description: 'Use Daytona workspaces to run Omni Engineer and Claude Engineer as a reproducible pair-review loop for dependency upgrades.'
date: 2026-05-31
author: 'Goodgood Claw'
tags: ['daytona', 'ai engineering', 'dev containers']
---

# Run AI Engineer Rehearsals in Daytona

# Introduction

AI coding assistants are most useful when their work happens inside an
environment that can be recreated, inspected, and deleted. A chat transcript can
suggest a fix, but a repository branch, a passing test run, and a written risk
note are the parts a maintainer can actually review. Daytona is a good fit for
that style of work because it turns a repository configuration into a clean
workspace instead of relying on the developer's laptop setup.

This article shows a practical workflow for running
[Omni Engineer](https://github.com/Doriandarko/omni-engineer) and
[Claude Engineer](https://github.com/Doriandarko/claude-engineer) inside Daytona.
The example is a [dependency upgrade rehearsal](../definitions/20260531_definition_dependency_upgrade_rehearsal.md):
a disposable dry run where one assistant maps the change, the other challenges
the plan, and the developer turns the useful parts into a small pull request.

![Daytona AI engineer rehearsal workflow](assets/20260531_run_ai_engineer_rehearsals_in_daytona_img1.svg)

## TL;DR

- Put Omni Engineer and Claude Engineer in separate Daytona workspaces so each
assistant starts from a clean, reproducible environment.
- Store `OPENROUTER_API_KEY`, `ANTHROPIC_API_KEY`, and optional tool keys as
environment variables, not committed files.
- Use Omni Engineer for repository mapping and upgrade planning.
- Use Claude Engineer for second-pass review, regression checklists, and web or
CLI follow-up.
- Treat AI output as advisory. The artifact to trust is the branch, tests, and
pull request notes created inside Daytona.

## Why Run AI Engineers in Daytona?

Local AI-assisted development often drifts into a messy state. One terminal has
an activated virtual environment, another has a stale dependency cache, and a
third contains exported API keys from yesterday's experiment. That is fine for
quick exploration, but it becomes hard to review when the work needs to be
shared.

Daytona gives the assistant a tighter boundary. The workspace starts from the
repository, installs declared dependencies, and exposes only the environment
variables you choose to provide. If the assistant suggests a package bump, you
can test it in a branch and throw the workspace away afterward. If another
reviewer wants to repeat the same run, they can create the same workspace
instead of reconstructing your machine.

This separation matters even more when you use more than one assistant. Omni
Engineer and Claude Engineer have different interfaces and strengths. Omni
Engineer is a lightweight console around OpenRouter models, useful for quick
mapping, search, and file-context conversations. Claude Engineer provides a CLI
and Flask web interface around Anthropic's API, with tool execution and
self-improvement features. In Daytona, you can let them cross-check each other
without mixing credentials or generated files.

## Prepare the Workspaces

Start by adding the credentials to Daytona's environment store. Use the keys
that match your providers and leave optional tools empty until you need them.

```bash
daytona env set OPENROUTER_API_KEY=your_openrouter_key
daytona env set ANTHROPIC_API_KEY=your_anthropic_key
daytona env set E2B_API_KEY=your_e2b_key
```

Omni Engineer reads `OPENROUTER_API_KEY`. Claude Engineer reads
`ANTHROPIC_API_KEY`, and its E2B-powered code execution tool can use
`E2B_API_KEY` when enabled. Keeping those values in the workspace environment is
safer than creating a `.env` file that might be accidentally committed.

Next, create one workspace for each project:

```bash
daytona create https://github.com/Doriandarko/omni-engineer
daytona create https://github.com/Doriandarko/claude-engineer
```

The cleanest setup is to keep a `.devcontainer/devcontainer.json` in each
repository. For Omni Engineer, the Dev Container only needs Python, Git, the
existing `requirements.txt`, and the OpenRouter key. For Claude Engineer, it
also forwards port `5000` so the Flask web UI can open from the workspace.

The important part is not the exact image name. It is that the install command,
Python version, forwarded ports, and environment variables live in source
control. Daytona can then create the same assistant workspace every time.

## Split the Assistants by Job

A dependency upgrade rehearsal works best when the assistants have different
responsibilities. Do not ask both tools to make broad changes at the same time.
That produces overlapping edits and makes it harder to see which suggestion was
useful.

Use Omni Engineer first as the mapper. In the target repository, ask it to
summarize:

- the package manager and lockfile in use
- the dependency you want to upgrade
- code paths that import or configure that dependency
- tests that already cover those paths
- likely migration notes from the changelog

The output you want is not code yet. You want a small plan with files, commands,
and risk areas. For example, a useful Omni Engineer result might say: "Upgrade
the HTTP client, inspect middleware initialization, run the API tests, and add a
regression case for timeout handling." That is specific enough to act on and
small enough to review.

Then use Claude Engineer as the challenger. Feed it the plan, the lockfile diff,
and the first test result. Ask it what is missing, which assumptions are weak,
and which regression test would prove the behavior. This second pass is where
many AI-assisted changes improve. One assistant proposes the path; the other
tries to find the sharp edges.

## Run a Rehearsal Branch

Create a branch in the project you actually want to upgrade:

```bash
git checkout -b rehearse-http-client-upgrade
```

Apply the smallest dependency change first. For Python, that might be a single
`requirements.txt` edit. For Node.js, it may be a `package.json` and lockfile
update. Avoid combining the upgrade with formatting, folder moves, or unrelated
cleanup. The branch should answer one question: can this dependency move safely?

Run the existing test command before changing source code:

```bash
python -m pytest
```

If the first run fails, paste the failing test name and stack trace into Claude
Engineer. Ask for a diagnosis, not a patch. When the diagnosis points to a real
breaking change, make the code edit yourself or review the assistant's proposed
edit line by line before applying it.

After the source code is fixed, add one regression test for the behavior that
could break again. This is the part maintainers care about. A version bump with
no test evidence asks them to trust the tool. A version bump with a focused
regression test gives them something stable to review.

## Keep Secrets and Context Out of Git

AI engineer tools can make it tempting to paste everything into the prompt:
environment values, private issue comments, service tokens, and entire terminal
histories. Do not do that. A good Daytona workflow gives the assistant enough
context to reason about the code while keeping private state outside the
repository.

Use these rules during the rehearsal:

- Never commit `.env`, generated chat logs, or local prompt history.
- Describe secrets by name, such as `ANTHROPIC_API_KEY`, instead of pasting the
secret value.
- Share failing command output only after removing account IDs, tokens, and
private URLs.
- Keep assistant-generated scratch files out of the final branch unless they are
part of the documented project.

Daytona helps by giving the work a disposable home. When the rehearsal is done,
you can keep the branch and delete the workspace. The repository history remains
clean, and the next contributor can reproduce the setup from committed
configuration.

## Write the Pull Request Evidence

The pull request should explain the upgrade in the same structure as the
rehearsal:

- what changed
- why the dependency needed to move
- which files were touched
- which tests were run
- what risk remains

Here is a compact PR note format:

```markdown
## Summary
- upgrade the HTTP client dependency
- adjust timeout handling for the new client behavior
- add a regression test for request cancellation

## Validation
- python -m pytest tests/api/test_timeouts.py
- python -m pytest

## Risk
- low; the change is limited to API client setup and covered by regression tests
```

This is also where the two-assistant workflow pays off. Omni Engineer's mapping
becomes the summary. Claude Engineer's challenge list becomes the risk section.
Your actual test commands become the validation section. The assistant output is
not copied blindly; it is distilled into evidence a maintainer can verify.

## When to Use This Pattern

This workflow is strongest for changes with a narrow blast radius:

- dependency upgrades
- SDK migrations
- small framework configuration changes
- test coverage for known compatibility issues
- documentation updates backed by runnable examples

It is weaker for large rewrites or ambiguous product decisions. In those cases,
the assistant can still help with exploration, but the branch may not be small
enough for a clean rehearsal. Daytona keeps the environment reproducible, but it
does not replace engineering judgment about scope.

## Conclusion

Running Omni Engineer and Claude Engineer inside Daytona turns AI-assisted
coding into a reviewable loop. The workspace gives the tools a clean boundary.
The branch gives maintainers a concrete artifact. The tests decide whether the
change works.

That balance is the main benefit. You can move quickly with AI support while
still producing the kind of evidence that belongs in a serious pull request:
minimal diffs, repeatable setup, clear validation, and no leaked secrets.

## References

- [Omni Engineer on GitHub](https://github.com/Doriandarko/omni-engineer)
- [Claude Engineer on GitHub](https://github.com/Doriandarko/claude-engineer)
- [Daytona environment variables article](20241126_Using_Environmental_Variables_in_Daytona.md)
- [Dev Containers specification](https://containers.dev/)
- [Companion Omni Engineer Dev Container PR](https://github.com/Doriandarko/omni-engineer/pull/43)
- [Companion Claude Engineer Dev Container PR](https://github.com/Doriandarko/claude-engineer/pull/267)
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
23 changes: 23 additions & 0 deletions authors/goodgood_claw.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
Author: Goodgood Claw

Title: Independent Open Source Contributor

Description: Goodgood Claw is an independent open-source contributor focused on
developer tooling, reproducible environments, and practical AI-assisted
engineering workflows. They write about using automation without losing the
review habits that keep software changes trustworthy.

Author Image: [GitHub avatar](https://github.com/goodgoodclaw.png?size=512)

Author LinkedIn:

Author Twitter:

Company Name: Independent Contributor

Company Description: Independent software contributor focused on practical
developer workflow writing and tooling.

Company Logo Dark: N/A

Company Logo White: N/A
32 changes: 32 additions & 0 deletions definitions/20260531_definition_dependency_upgrade_rehearsal.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
---
title: 'Dependency Upgrade Rehearsal'
description: 'A disposable, repeatable dry run used to plan, test, and document a dependency upgrade before it reaches the main branch.'
date: 2026-05-31
author: 'Goodgood Claw'
---

# Dependency Upgrade Rehearsal

## Definition

A dependency upgrade rehearsal is a controlled dry run for changing one or more
software dependencies before the change is proposed for the main branch. The
developer uses a disposable environment, a temporary branch, and a clear test
checklist to discover breaking changes, migration steps, security notes, and
release risks early.

## Context and Usage

In a Daytona workspace, a dependency upgrade rehearsal is useful because the
environment can be recreated from repository configuration instead of the
developer's laptop state. A team can run the same package manager commands,
execute the same tests, and compare results without manually rebuilding local
toolchains.

The rehearsal usually starts by reading the package changelog and lockfile diff,
then applying the smallest possible upgrade. The developer records failing
tests, fixes code paths affected by the new dependency, and adds regression
coverage for behavior that changed. When AI assistants are used, their output
should be treated as planning and review input. The durable artifact remains the
branch, test results, and pull request notes produced inside the reproducible
workspace.