Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions content/Agents/evaluations.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,21 @@ description: Automatically test and validate agent outputs for quality and compl

Evaluations (evals) are automated tests that run after your agent completes. They validate output quality, check compliance, and monitor performance without blocking agent responses.

## Why Evals?

Most evaluation tools test the LLM: did the model respond appropriately? That's fine for chatbots, but agents aren't single LLM calls. They're entire runs with multiple model calls, tool executions, and orchestration working together.

Agent failures can happen anywhere in the run—a tool call that returned bad data, a state bug that corrupted context, and more. Testing just the LLM response misses most of this.

Agentuity evals test the whole run—every tool call, state change, and orchestration step. They run on every session in production, so you catch issues with real traffic.

**The result:**

- **Full-run evaluation**: Test the entire agent execution, not just LLM responses
- **Production monitoring**: Once configured, evals run automatically on every session
- **Async by default**: Evals don't block responses, so users aren't waiting
- **Preset library**: Common checks (PII, safety, hallucination) available out of the box

Evals come in two types: **binary** (pass/fail) for yes/no criteria, and **score** (0-1) for quality gradients.

<Callout type="info" title="Where Scores Appear">
Expand Down
65 changes: 46 additions & 19 deletions content/Agents/standalone-execution.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,20 @@ import { createAgentContext } from '@agentuity/runtime';
import chatAgent from '@agent/chat';

const ctx = createAgentContext();
const result = await ctx.invoke(() => chatAgent.run({ message: 'Hello' }));
const result = await ctx.run(chatAgent, { message: 'Hello' });
```

The `invoke()` method executes your agent with full infrastructure support: tracing, session management, and access to all storage services.
The `run()` method executes your agent with full infrastructure support: tracing, session management, and access to all storage services.

For agents that don't require input:

```typescript
const result = await ctx.run(statusAgent);
```

<Callout type="info" title="Legacy invoke() Method">
The older `ctx.invoke(() => agent.run(input))` pattern still works but `ctx.run(agent, input)` is preferred for its cleaner syntax.
</Callout>

## Options

Expand All @@ -45,10 +55,7 @@ await createApp();
// Run cleanup every hour
cron.schedule('0 * * * *', async () => {
const ctx = createAgentContext({ trigger: 'cron' });

await ctx.invoke(async () => {
await cleanupAgent.run({ task: 'expired-sessions' });
});
await ctx.run(cleanupAgent, { task: 'expired-sessions' });
});
```

Expand All @@ -58,35 +65,33 @@ For most scheduled tasks, use the [`cron()` middleware](/Routes/cron) instead. I

## Multiple Agents in Sequence

Run multiple agents within a single `invoke()` call to share the same session and tracing context:
Run multiple agents in sequence with the same context:

```typescript
const ctx = createAgentContext();

const result = await ctx.invoke(async () => {
// First agent analyzes the input
const analysis = await analyzeAgent.run({ text: userInput });

// Second agent generates response based on analysis
const response = await respondAgent.run({
analysis: analysis.summary,
sentiment: analysis.sentiment,
});
// First agent analyzes the input
const analysis = await ctx.run(analyzeAgent, { text: userInput });

return response;
// Second agent generates response based on analysis
const response = await ctx.run(respondAgent, {
analysis: analysis.summary,
sentiment: analysis.sentiment,
});
```

Each `ctx.run()` call shares the same session and tracing context.

## Reusing Contexts

Create a context once and reuse it for multiple invocations:

```typescript
const ctx = createAgentContext({ trigger: 'websocket' });

// Each invoke() gets its own session and tracing span
// Each run() gets its own session and tracing span
websocket.on('message', async (data) => {
const result = await ctx.invoke(() => messageAgent.run(data));
const result = await ctx.run(messageAgent, data);
websocket.send(result);
});
```
Expand All @@ -104,6 +109,28 @@ Standalone contexts provide the same infrastructure as HTTP request handlers:
- **Session events**: Start/complete events for observability
</Callout>

## Detecting Runtime Context

Use `isInsideAgentRuntime()` to check if code is running within the Agentuity runtime:

```typescript
import { isInsideAgentRuntime, createAgentContext } from '@agentuity/runtime';
import myAgent from '@agent/my-agent';

async function processRequest(data: unknown) {
if (isInsideAgentRuntime()) {
// Already in runtime context, call agent directly
return myAgent.run(data);
}

// Outside runtime, create context first
const ctx = createAgentContext();
return ctx.run(myAgent, data);
}
```

This is useful for writing utility functions that work both inside agent handlers and in standalone scripts.

## Next Steps

- [Calling Other Agents](/Agents/calling-other-agents): Agent-to-agent communication patterns
Expand Down
13 changes: 13 additions & 0 deletions content/Agents/workbench.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,19 @@ description: Use the built-in development UI to test agents, validate schemas, a

Workbench is a built-in UI for testing your agents during development. It automatically discovers your agents, displays their input/output schemas, and lets you execute them with real inputs.

## Why Workbench?

Testing agents isn't like testing traditional APIs. You need to validate input schemas, see how responses format, test multi-turn conversations, and understand execution timing. Using `curl` or Postman means manually constructing JSON payloads and parsing responses.

Workbench understands your agents. It reads your schemas, generates test forms, maintains conversation threads, and shows execution metrics. When something goes wrong, you see exactly what the agent received and returned.

**Key capabilities:**

- **Schema-aware testing**: Input forms generated from your actual schemas
- **Thread persistence**: Test multi-turn conversations without manual state tracking
- **Execution metrics**: See token usage and response times for every request
- **Quick iteration**: Test prompts display in the UI for one-click execution

## Enabling Workbench

Add a `workbench` section to your `agentuity.config.ts`:
Expand Down
146 changes: 144 additions & 2 deletions content/Learn/Cookbook/Patterns/server-utilities.mdx
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: SDK Utilities for External Apps
description: Use storage, logging, error handling, and schema utilities from external backends like Next.js or Express
description: Use storage, queues, logging, and error handling utilities from external backends like Next.js or Express
---

Use `@agentuity/server` and `@agentuity/core` utilities in external apps, scripts, or backends that integrate with Agentuity.
Expand Down Expand Up @@ -122,6 +122,146 @@ export async function GET(request: NextRequest) {
}
```

## Queue Management

Manage queues programmatically from external apps or scripts using `APIClient`:

```typescript title="lib/agentuity-queues.ts"
import { APIClient, createLogger, getServiceUrls } from '@agentuity/server';

export const logger = createLogger('info');
const urls = getServiceUrls(process.env.AGENTUITY_REGION!);

export const client = new APIClient(
urls.catalyst,
logger,
process.env.AGENTUITY_SDK_KEY
);
```

### Creating and Managing Queues

```typescript
import {
createQueue,
listQueues,
deleteQueue,
pauseQueue,
resumeQueue,
} from '@agentuity/server';
import { client } from '@/lib/agentuity-queues';

// Create a worker queue
const queue = await createQueue(client, {
name: 'order-processing',
queue_type: 'worker',
settings: {
default_max_retries: 5,
default_visibility_timeout_seconds: 60,
},
});

// List all queues
const { queues } = await listQueues(client);

// Pause and resume
await pauseQueue(client, 'order-processing');
await resumeQueue(client, 'order-processing');

// Delete a queue
await deleteQueue(client, 'old-queue');
```

### Dead Letter Queue Operations

```typescript
import {
listDeadLetterMessages,
replayDeadLetterMessage,
purgeDeadLetter,
} from '@agentuity/server';
import { client, logger } from '@/lib/agentuity-queues';

// List failed messages
const { messages } = await listDeadLetterMessages(client, 'order-processing');

for (const msg of messages) {
logger.warn('Failed message', { id: msg.id, reason: msg.failure_reason });

// Replay back to the queue
await replayDeadLetterMessage(client, 'order-processing', msg.id);
}

// Purge all DLQ messages
await purgeDeadLetter(client, 'order-processing');
```

### Webhook Destinations

```typescript
import { createDestination } from '@agentuity/server';
import { client } from '@/lib/agentuity-queues';

await createDestination(client, 'order-processing', {
destination_type: 'http',
config: {
url: 'https://api.example.com/webhook/orders',
method: 'POST',
headers: { 'X-API-Key': 'secret' },
timeout_ms: 30000,
retry_policy: {
max_attempts: 5,
initial_backoff_ms: 1000,
max_backoff_ms: 60000,
backoff_multiplier: 2.0,
},
},
});
```

### HTTP Ingestion Sources

```typescript
import { createSource } from '@agentuity/server';
import { client, logger } from '@/lib/agentuity-queues';

const source = await createSource(client, 'webhook-queue', {
name: 'stripe-webhooks',
description: 'Receives Stripe payment events',
auth_type: 'header',
auth_value: 'Bearer whsec_...',
});

// External services POST to this URL
logger.info('Source created', { url: source.url });
```

### Pull-Based Consumption

For workers that pull and acknowledge messages:

```typescript
import { receiveMessage, ackMessage, nackMessage } from '@agentuity/server';
import { client } from '@/lib/agentuity-queues';

// Receive a message (blocks until available or timeout)
const message = await receiveMessage(client, 'order-processing');

if (message) {
try {
await processOrder(message.payload);
await ackMessage(client, 'order-processing', message.id);
} catch (error) {
// Message returns to queue for retry
await nackMessage(client, 'order-processing', message.id);
}
}
```

<Callout type="info" title="CLI for Quick Operations">
For one-off queue management, use the CLI instead: `agentuity cloud queue create`, `agentuity cloud queue dlq`, etc. See [Queues](/Services/queues) for CLI commands.
</Callout>

## Alternative: HTTP Routes

If you want to centralize storage logic in your Agentuity project (for [middleware](/Routes/middleware), sharing across multiple apps, or avoiding SDK key distribution), use [HTTP routes](/Routes/http) instead.
Expand Down Expand Up @@ -182,7 +322,8 @@ export default router;
Add authentication middleware to protect storage endpoints:

```typescript title="src/api/sessions/route.ts"
import { createRouter, createMiddleware } from '@agentuity/runtime';
import { createRouter } from '@agentuity/runtime';
import { createMiddleware } from 'hono/factory';

const router = createRouter();

Expand Down Expand Up @@ -330,6 +471,7 @@ const jsonSchema = toJSONSchema(schema);

## See Also

- [Queues](/Services/queues): Queue concepts and CLI commands
- [HTTP Routes](/Routes/http): Route creation with `createRouter`
- [Route Middleware](/Routes/middleware): Authentication patterns
- [RPC Client](/Frontend/rpc-client): Typed client generation
52 changes: 52 additions & 0 deletions content/Reference/CLI/configuration.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,58 @@ agentuity cloud secret import .env.secrets

**In agents:** Access secrets via `process.env.API_KEY`. Secrets are injected at runtime and never logged.

## Organization-Level Configuration

Set environment variables and secrets at the organization level to share them across all projects in that organization. Use the `--org` flag with any `env` or `secret` command.

### Set Org-Level Variables

```bash
# Set using your default org
agentuity cloud env set DATABASE_URL "postgresql://..." --org

# Set for a specific org
agentuity cloud env set DATABASE_URL "postgresql://..." --org org_abc123
```

### Set Org-Level Secrets

```bash
# Set shared secret for default org
agentuity cloud secret set SHARED_API_KEY "sk_..." --org

# Set for specific org
agentuity cloud secret set SHARED_API_KEY "sk_..." --org org_abc123
```

### List Org-Level Values

```bash
# List org environment variables
agentuity cloud env list --org

# List org secrets
agentuity cloud secret list --org
```

### Get/Delete Org-Level Values

```bash
# Get an org variable
agentuity cloud env get DATABASE_URL --org

# Delete an org secret
agentuity cloud secret delete OLD_KEY --org
```

<Callout type="info" title="Inheritance">
Organization-level values are inherited by all projects in that organization. Project-level values take precedence over organization-level values when both are set.
</Callout>

<Callout type="tip" title="Default Organization">
Set a default organization with `agentuity auth org select` to avoid specifying `--org` on every command. See [Getting Started](/Reference/CLI/getting-started) for details.
</Callout>

## API Keys

Create and manage API keys for programmatic access to your project.
Expand Down
Loading