Skip to content

feat: project AGENTS.md instructions and DuckDB federation console#66

Open
tobias-gp wants to merge 3 commits into
mainfrom
add-project-agents-md-instructions
Open

feat: project AGENTS.md instructions and DuckDB federation console#66
tobias-gp wants to merge 3 commits into
mainfrom
add-project-agents-md-instructions

Conversation

@tobias-gp
Copy link
Copy Markdown
Contributor

@tobias-gp tobias-gp commented Jun 4, 2026

Summary

Project-root AGENTS.md (semantic model agent)

  • Load an optional, user-authored AGENTS.md at the project root into the semantic-model authoring agent via the Deep Agents memory feature (no custom file reading; a missing file is tolerated). Adds base system-prompt guidance instructing the agent to follow those project-specific instructions.
  • Repurpose the AGENTS.md slot: stop auto-generating the model-summary AGENTS.md (which nothing in the app ever consumed) and remove regenerateAgentsMd + its write()/delete() call sites.
  • On startup, remove stale auto-generated AGENTS.md files (identified by their # Semantic Models header), preserving user-authored files. Idempotent.
  • Docs + conventions updated (apps/docs semantic-models guide, openspec/project.md); OpenSpec change archived with applied spec deltas (semantic-model-agent +1, semantic-models +1/-1).

DuckDB federation console

  • Add Data Federation → Console (/$projectId/connections/console) for ad-hoc federated SQL against the project's DuckDB instance (raw connection slugs; not MCP scoped VIEWs).
  • Setup commands panel: copyable pre-installed INSTALL/LOAD pairs, redacted per-connection ATTACH examples, and an example federation query.
  • Install / Load control for validated single-statement INSTALL <name> [FROM community] or LOAD <name>.
  • API: GET/POST /api/projects/:projectId/duckdb-console/{setup,query,extensions}; core service with console-specific SQL validation and credential redaction in errors.
  • OpenSpec proposal: openspec/changes/add-duckdb-federation-console/.

Test plan

  • pnpm typecheck (incl. @archmax/api build) — exits 0
  • pnpm lint — exits 0
  • npx vitest run — passes; includes duckdb-console unit + API integration tests
  • openspec validate add-duckdb-federation-console --strict
  • Manual (AGENTS.md): drop an AGENTS.md in a project root and confirm the agent follows it; confirm a brand-new project (no file) starts without error
  • Manual (console): open Data Federation → Console, run SELECT 1, copy a setup command, install a community extension

Notes

  • AGENTS.md edge case (documented in design.md): a user-authored file beginning with # Semantic Models would be removed by startup cleanup.
  • Console extensions apply to the API process in-memory DuckDB instance; the worker has its own cache until rebuilt (documented in the data-federation guide).

Note

Medium Risk
Console runs arbitrary read-oriented SQL against raw federated catalogs (bypassing MCP scoped VIEWs) and can install DuckDB extensions on the API process instance; AGENTS.md content is injected into the agent system prompt.

Overview
This PR adds two operator-facing capabilities and the supporting specs/docs.

Optional project-root AGENTS.md for the semantic-model builder: The authoring agent now loads AGENTS.md via Deep Agents memory: ["AGENTS.md"], with base prompt guidance to treat it as authoritative project instructions. Auto-generation of the unused model-summary AGENTS.md on every model write/delete is removed. API startup runs idempotent cleanup that deletes only legacy files whose content starts with # Semantic Models, preserving user-authored files.

DuckDB Federation Console: New Data Federation → Console route and authenticated API (GET /setup, POST /query, POST /extensions) backed by @archmax/core services with read-only query allowlists, validated INSTALL/LOAD, timeouts, and redacted errors/attach examples. The UI is a single SQL textarea: Run sends queries or extension statements (by leading keyword), shows tabular results, and disables run when there are no active connections (setup is used for that gate, not a copy panel in the page). Docs and OpenSpec deltas cover console behavior, sidebar nav, and the AGENTS.md convention.

Reviewed by Cursor Bugbot for commit 6060afc. Bugbot is set up for automated code reviews on this repo. Configure here.

Let project owners steer the semantic-model authoring agent with an optional,
user-authored AGENTS.md at the project root, loaded via the Deep Agents
`memory` feature (no custom file reading; missing file is tolerated). Add base
system-prompt guidance telling the agent to follow those instructions.

Repurpose the AGENTS.md slot: stop auto-generating the model-summary AGENTS.md
(which nothing consumed) and remove stale auto-generated files on startup
(identified by their `# Semantic Models` header), preserving user-authored
files. Update docs and project conventions accordingly.

Co-authored-by: Cursor <cursoragent@cursor.com>
@railway-app railway-app Bot temporarily deployed to archmax SemLayer / archmax-pr-66 June 4, 2026 11:58 Destroyed
@railway-app
Copy link
Copy Markdown

railway-app Bot commented Jun 4, 2026

🚅 Deployed to the archmax-pr-66 environment in archmax SemLayer

Service Status Web Updated (UTC)
archmax_standalone_with_volume ✅ Success (View Logs) Jun 4, 2026 at 12:23 pm
archmax_standalone ✅ Success (View Logs) Jun 4, 2026 at 12:22 pm
archmax_external_dbs ✅ Success (View Logs) Jun 4, 2026 at 12:22 pm

Comment thread apps/api/src/index.ts
const removedLegacyAgentsMd = await new SemanticModelFileService(getEnv().projectsDir).cleanupLegacyAgentsMd();
if (removedLegacyAgentsMd > 0) {
console.log(`[startup] Removed ${removedLegacyAgentsMd} legacy auto-generated AGENTS.md file(s)`);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worker loads legacy before cleanup

Medium Severity

Legacy AGENTS.md cleanup runs only during API startup, while the BullMQ worker starts in parallel and creates the authoring agent with memory: ["AGENTS.md"] without running cleanup. Jobs can load the stale auto-generated summary as project instructions until API cleanup finishes, and worker-only runs never remove legacy files.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 81832b0. Configure here.

lines.push("", "**Metrics:**", ...model.metrics.map((m) => `- ${m.name}`));
if (content.startsWith(LEGACY_AGENTS_MD_SIGNATURE)) {
await unlink(agentsPath).catch(() => {});
removed++;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Counts removal when unlink fails

Low Severity

cleanupLegacyAgentsMd increments its removed count whenever the legacy header matches, even if unlink fails (errors are swallowed). Startup can log that legacy files were removed while the stale AGENTS.md remains and may still be loaded into the agent.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 81832b0. Configure here.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security review completed for the changed files in this PR. I did not find any concrete issues against the requested threat surfaces.

Checked:

  • MCP endpoint auth: the PR does not change apps/api/src/mcp/archmax-route.ts; existing flow authenticates bearer tokens and binds sessions to token/project before registerArchmaxTools() runs.
  • Query execution sandboxing: the PR does not change execute_query; existing executeScopedQuery() still checks token model scope, validates SQL via validateSqlAst(..., { mode: "mcp" }), materializes model views, opens DuckDB read-only, enforces timeout, and caps results at 1000 rows.
  • Admin auth / Better Auth: no Better Auth config changes; existing env validation requires BETTER_AUTH_SECRET min length 32 and production cookies are Secure, HttpOnly, SameSite=Lax.
  • API input validation: no Hono request handlers or schemas were changed; the new startup cleanup has no request input.
  • Environment secrets: no .env.local or hardcoded real secrets added; the only new key-like value is test-key in a unit-test mock.
  • Dependency exposure: no package.json or pnpm-lock.yaml changes, so no new dependency exposure to audit.

No inline findings.

Open in Web View Automation 

Sent by Cursor Automation: archmax Security Review

Add an admin console to run read-oriented federated SQL, install/load
extensions, and copy setup commands (INSTALL/LOAD/ATTACH). Includes API
routes, core service, frontend page, docs, and OpenSpec proposal.

Co-authored-by: Cursor <cursoragent@cursor.com>
@railway-app railway-app Bot temporarily deployed to archmax SemLayer / archmax-pr-66 June 4, 2026 12:02 Destroyed
@tobias-gp tobias-gp changed the title feat(agent): optional project-root AGENTS.md as agent instructions feat: project AGENTS.md instructions and DuckDB federation console Jun 4, 2026
if (!CONSOLE_ALLOWED_KEYWORDS.has(keyword)) {
throw new Error(`Statement type ${keyword} is not allowed; use SELECT, WITH, SHOW, DESCRIBE, or EXPLAIN`);
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EXPLAIN ANALYZE bypasses denylist

Medium Severity

Federation console query validation only inspects the first SQL keyword, so statements starting with EXPLAIN pass even when EXPLAIN ANALYZE wraps INSERT, UPDATE, DELETE, or other denied types. That undermines the console’s read-only denylist because EXPLAIN ANALYZE executes the inner statement.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit f8e9939. Configure here.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 4, 2026

Docker image ready

docker pull ghcr.io/archmaxai/archmax:pr-66

Remove the setup-commands side panel and the separate extension field.
One editor now runs queries and routes INSTALL/LOAD to the extension
endpoint, with the Run action in the page header per UI guidelines.
Update spec and data-federation docs to match.

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.

There are 4 total unresolved issues (including 3 from previous reviews).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 6060afc. Configure here.

const withoutTrailing = trimmed.replace(/;+\s*$/, "");
if (withoutTrailing.includes(";")) {
throw new Error("Only a single SQL statement is allowed");
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Semicolon inside string rejected

Low Severity

Multi-statement detection uses a raw ; search on the trimmed SQL string, so a single SELECT that contains a semicolon inside a string literal is rejected as multiple statements even though it is one valid statement.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 6060afc. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant