diff --git a/articles/20260531_run_ai_engineer_rehearsals_in_daytona.md b/articles/20260531_run_ai_engineer_rehearsals_in_daytona.md new file mode 100644 index 00000000..53de6c38 --- /dev/null +++ b/articles/20260531_run_ai_engineer_rehearsals_in_daytona.md @@ -0,0 +1,241 @@ +--- +title: 'Run AI Engineer Rehearsals in Daytona' +description: 'Use Daytona workspaces to run Omni Engineer and Claude Engineer as a reproducible pair-review loop for dependency upgrades.' +date: 2026-05-31 +author: 'Goodgood Claw' +tags: ['daytona', 'ai engineering', 'dev containers'] +--- + +# Run AI Engineer Rehearsals in Daytona + +# Introduction + +AI coding assistants are most useful when their work happens inside an +environment that can be recreated, inspected, and deleted. A chat transcript can +suggest a fix, but a repository branch, a passing test run, and a written risk +note are the parts a maintainer can actually review. Daytona is a good fit for +that style of work because it turns a repository configuration into a clean +workspace instead of relying on the developer's laptop setup. + +This article shows a practical workflow for running +[Omni Engineer](https://github.com/Doriandarko/omni-engineer) and +[Claude Engineer](https://github.com/Doriandarko/claude-engineer) inside Daytona. +The example is a [dependency upgrade rehearsal](../definitions/20260531_definition_dependency_upgrade_rehearsal.md): +a disposable dry run where one assistant maps the change, the other challenges +the plan, and the developer turns the useful parts into a small pull request. + +![Daytona AI engineer rehearsal workflow](assets/20260531_run_ai_engineer_rehearsals_in_daytona_img1.svg) + +## TL;DR + +- Put Omni Engineer and Claude Engineer in separate Daytona workspaces so each + assistant starts from a clean, reproducible environment. +- Store `OPENROUTER_API_KEY`, `ANTHROPIC_API_KEY`, and optional tool keys as + environment variables, not committed files. +- Use Omni Engineer for repository mapping and upgrade planning. +- Use Claude Engineer for second-pass review, regression checklists, and web or + CLI follow-up. +- Treat AI output as advisory. The artifact to trust is the branch, tests, and + pull request notes created inside Daytona. + +## Why Run AI Engineers in Daytona? + +Local AI-assisted development often drifts into a messy state. One terminal has +an activated virtual environment, another has a stale dependency cache, and a +third contains exported API keys from yesterday's experiment. That is fine for +quick exploration, but it becomes hard to review when the work needs to be +shared. + +Daytona gives the assistant a tighter boundary. The workspace starts from the +repository, installs declared dependencies, and exposes only the environment +variables you choose to provide. If the assistant suggests a package bump, you +can test it in a branch and throw the workspace away afterward. If another +reviewer wants to repeat the same run, they can create the same workspace +instead of reconstructing your machine. + +This separation matters even more when you use more than one assistant. Omni +Engineer and Claude Engineer have different interfaces and strengths. Omni +Engineer is a lightweight console around OpenRouter models, useful for quick +mapping, search, and file-context conversations. Claude Engineer provides a CLI +and Flask web interface around Anthropic's API, with tool execution and +self-improvement features. In Daytona, you can let them cross-check each other +without mixing credentials or generated files. + +## Prepare the Workspaces + +Start by adding the credentials to Daytona's environment store. Use the keys +that match your providers and leave optional tools empty until you need them. + +```bash +daytona env set OPENROUTER_API_KEY=your_openrouter_key +daytona env set ANTHROPIC_API_KEY=your_anthropic_key +daytona env set E2B_API_KEY=your_e2b_key +``` + +Omni Engineer reads `OPENROUTER_API_KEY`. Claude Engineer reads +`ANTHROPIC_API_KEY`, and its E2B-powered code execution tool can use +`E2B_API_KEY` when enabled. Keeping those values in the workspace environment is +safer than creating a `.env` file that might be accidentally committed. + +Next, create one workspace for each project: + +```bash +daytona create https://github.com/Doriandarko/omni-engineer +daytona create https://github.com/Doriandarko/claude-engineer +``` + +The cleanest setup is to keep a `.devcontainer/devcontainer.json` in each +repository. For Omni Engineer, the Dev Container only needs Python, Git, the +existing `requirements.txt`, and the OpenRouter key. For Claude Engineer, it +also forwards port `5000` so the Flask web UI can open from the workspace. + +The important part is not the exact image name. It is that the install command, +Python version, forwarded ports, and environment variables live in source +control. Daytona can then create the same assistant workspace every time. + +## Split the Assistants by Job + +A dependency upgrade rehearsal works best when the assistants have different +responsibilities. Do not ask both tools to make broad changes at the same time. +That produces overlapping edits and makes it harder to see which suggestion was +useful. + +Use Omni Engineer first as the mapper. In the target repository, ask it to +summarize: + +- the package manager and lockfile in use +- the dependency you want to upgrade +- code paths that import or configure that dependency +- tests that already cover those paths +- likely migration notes from the changelog + +The output you want is not code yet. You want a small plan with files, commands, +and risk areas. For example, a useful Omni Engineer result might say: "Upgrade +the HTTP client, inspect middleware initialization, run the API tests, and add a +regression case for timeout handling." That is specific enough to act on and +small enough to review. + +Then use Claude Engineer as the challenger. Feed it the plan, the lockfile diff, +and the first test result. Ask it what is missing, which assumptions are weak, +and which regression test would prove the behavior. This second pass is where +many AI-assisted changes improve. One assistant proposes the path; the other +tries to find the sharp edges. + +## Run a Rehearsal Branch + +Create a branch in the project you actually want to upgrade: + +```bash +git checkout -b rehearse-http-client-upgrade +``` + +Apply the smallest dependency change first. For Python, that might be a single +`requirements.txt` edit. For Node.js, it may be a `package.json` and lockfile +update. Avoid combining the upgrade with formatting, folder moves, or unrelated +cleanup. The branch should answer one question: can this dependency move safely? + +Run the existing test command before changing source code: + +```bash +python -m pytest +``` + +If the first run fails, paste the failing test name and stack trace into Claude +Engineer. Ask for a diagnosis, not a patch. When the diagnosis points to a real +breaking change, make the code edit yourself or review the assistant's proposed +edit line by line before applying it. + +After the source code is fixed, add one regression test for the behavior that +could break again. This is the part maintainers care about. A version bump with +no test evidence asks them to trust the tool. A version bump with a focused +regression test gives them something stable to review. + +## Keep Secrets and Context Out of Git + +AI engineer tools can make it tempting to paste everything into the prompt: +environment values, private issue comments, service tokens, and entire terminal +histories. Do not do that. A good Daytona workflow gives the assistant enough +context to reason about the code while keeping private state outside the +repository. + +Use these rules during the rehearsal: + +- Never commit `.env`, generated chat logs, or local prompt history. +- Describe secrets by name, such as `ANTHROPIC_API_KEY`, instead of pasting the + secret value. +- Share failing command output only after removing account IDs, tokens, and + private URLs. +- Keep assistant-generated scratch files out of the final branch unless they are + part of the documented project. + +Daytona helps by giving the work a disposable home. When the rehearsal is done, +you can keep the branch and delete the workspace. The repository history remains +clean, and the next contributor can reproduce the setup from committed +configuration. + +## Write the Pull Request Evidence + +The pull request should explain the upgrade in the same structure as the +rehearsal: + +- what changed +- why the dependency needed to move +- which files were touched +- which tests were run +- what risk remains + +Here is a compact PR note format: + +```markdown +## Summary +- upgrade the HTTP client dependency +- adjust timeout handling for the new client behavior +- add a regression test for request cancellation + +## Validation +- python -m pytest tests/api/test_timeouts.py +- python -m pytest + +## Risk +- low; the change is limited to API client setup and covered by regression tests +``` + +This is also where the two-assistant workflow pays off. Omni Engineer's mapping +becomes the summary. Claude Engineer's challenge list becomes the risk section. +Your actual test commands become the validation section. The assistant output is +not copied blindly; it is distilled into evidence a maintainer can verify. + +## When to Use This Pattern + +This workflow is strongest for changes with a narrow blast radius: + +- dependency upgrades +- SDK migrations +- small framework configuration changes +- test coverage for known compatibility issues +- documentation updates backed by runnable examples + +It is weaker for large rewrites or ambiguous product decisions. In those cases, +the assistant can still help with exploration, but the branch may not be small +enough for a clean rehearsal. Daytona keeps the environment reproducible, but it +does not replace engineering judgment about scope. + +## Conclusion + +Running Omni Engineer and Claude Engineer inside Daytona turns AI-assisted +coding into a reviewable loop. The workspace gives the tools a clean boundary. +The branch gives maintainers a concrete artifact. The tests decide whether the +change works. + +That balance is the main benefit. You can move quickly with AI support while +still producing the kind of evidence that belongs in a serious pull request: +minimal diffs, repeatable setup, clear validation, and no leaked secrets. + +## References + +- [Omni Engineer on GitHub](https://github.com/Doriandarko/omni-engineer) +- [Claude Engineer on GitHub](https://github.com/Doriandarko/claude-engineer) +- [Daytona environment variables article](20241126_Using_Environmental_Variables_in_Daytona.md) +- [Dev Containers specification](https://containers.dev/) +- [Companion Omni Engineer Dev Container PR](https://github.com/Doriandarko/omni-engineer/pull/43) +- [Companion Claude Engineer Dev Container PR](https://github.com/Doriandarko/claude-engineer/pull/267) diff --git a/articles/assets/20260531_run_ai_engineer_rehearsals_in_daytona_img1.svg b/articles/assets/20260531_run_ai_engineer_rehearsals_in_daytona_img1.svg new file mode 100644 index 00000000..e350fd17 --- /dev/null +++ b/articles/assets/20260531_run_ai_engineer_rehearsals_in_daytona_img1.svg @@ -0,0 +1,63 @@ + + Daytona AI engineer rehearsal workflow + A workflow showing Daytona workspaces connecting Omni Engineer, Claude Engineer, pull request changes, and regression evidence. + + + AI engineer upgrade rehearsal in Daytona + Keep credentials outside Git, run both assistants in isolated workspaces, and turn their output into reviewable evidence. + + + + Daytona + workspace + Dev Container + env vars + + + + + Omni Engineer + map repository + draft upgrade plan + + + + + Claude Engineer + challenge the plan + build test checklist + + + + + Patch branch + dependency bump + source edits + regression tests + + + + + PR + notes + tests + risks + + + + + + + + + + + Review loop: AI output is advisory. The trusted artifact is the branch, tests, PR notes, and reproducible Daytona workspace. + + + + + + + + diff --git a/authors/goodgood_claw.md b/authors/goodgood_claw.md new file mode 100644 index 00000000..86bc3285 --- /dev/null +++ b/authors/goodgood_claw.md @@ -0,0 +1,23 @@ +Author: Goodgood Claw + +Title: Independent Open Source Contributor + +Description: Goodgood Claw is an independent open-source contributor focused on +developer tooling, reproducible environments, and practical AI-assisted +engineering workflows. They write about using automation without losing the +review habits that keep software changes trustworthy. + +Author Image: [GitHub avatar](https://github.com/goodgoodclaw.png?size=512) + +Author LinkedIn: + +Author Twitter: + +Company Name: Independent Contributor + +Company Description: Independent software contributor focused on practical +developer workflow writing and tooling. + +Company Logo Dark: N/A + +Company Logo White: N/A diff --git a/definitions/20260531_definition_dependency_upgrade_rehearsal.md b/definitions/20260531_definition_dependency_upgrade_rehearsal.md new file mode 100644 index 00000000..1e7c3603 --- /dev/null +++ b/definitions/20260531_definition_dependency_upgrade_rehearsal.md @@ -0,0 +1,32 @@ +--- +title: 'Dependency Upgrade Rehearsal' +description: 'A disposable, repeatable dry run used to plan, test, and document a dependency upgrade before it reaches the main branch.' +date: 2026-05-31 +author: 'Goodgood Claw' +--- + +# Dependency Upgrade Rehearsal + +## Definition + +A dependency upgrade rehearsal is a controlled dry run for changing one or more +software dependencies before the change is proposed for the main branch. The +developer uses a disposable environment, a temporary branch, and a clear test +checklist to discover breaking changes, migration steps, security notes, and +release risks early. + +## Context and Usage + +In a Daytona workspace, a dependency upgrade rehearsal is useful because the +environment can be recreated from repository configuration instead of the +developer's laptop state. A team can run the same package manager commands, +execute the same tests, and compare results without manually rebuilding local +toolchains. + +The rehearsal usually starts by reading the package changelog and lockfile diff, +then applying the smallest possible upgrade. The developer records failing +tests, fixes code paths affected by the new dependency, and adds regression +coverage for behavior that changed. When AI assistants are used, their output +should be treated as planning and review input. The durable artifact remains the +branch, test results, and pull request notes produced inside the reproducible +workspace.