Skip to content

refactor: use inline usage cost from OpenRouter instead of generation cost API#4328

Open
FadhlanR wants to merge 2 commits intomainfrom
cs-10506-refactor-ai-credits-spending
Open

refactor: use inline usage cost from OpenRouter instead of generation cost API#4328
FadhlanR wants to merge 2 commits intomainfrom
cs-10506-refactor-ai-credits-spending

Conversation

@FadhlanR
Copy link
Copy Markdown
Contributor

@FadhlanR FadhlanR commented Apr 3, 2026

Summary

  • OpenRouter now includes cost directly in responses via usage.cost, so we primarily use that instead of polling a separate endpoint
  • Removed the old saveUsageCost flow that always polled OpenRouter's /generation API, and replaced it with direct spendUsageCost calls using the inline cost
  • Simplified CreditStrategy interface by removing the separate spendUsageCost method — saveUsageCost now handles both inline cost extraction and fallback
  • ~140 lines of billing code removed (old saveUsageCost, extractGenerationIdFromResponse)

Why we still need the generation cost API as a fallback

OpenRouter includes usage.cost in the final streaming chunk (the one with finish_reason). However, if a user cancels/stops the stream before that final chunk arrives, the cost is never received. Without a fallback, these interrupted generations would go unbilled. The generation cost API polling (/generation?id=) is retained as a fallback for this case — OpenRouter still tracks the cost server-side even for interrupted streams, so we can retrieve it after the fact.

Flow:

  1. Inline usage.cost available → use it directly (fast path, no extra API call)
  2. No inline cost but generationId available → poll /generation?id= endpoint with backoff (fallback for cancelled streams)
  3. Neither available → log warning, skip deduction

Closes CS-10506

Test plan

  • Verify AI chat generates responses and credits are deducted correctly (inline cost path)
  • Verify streaming responses that are cancelled mid-way still deduct credits (fallback path)
  • Verify non-streaming forwarded requests deduct credits from inline cost
  • Run realm-server request-forward tests (includes both inline and fallback test cases)

🤖 Generated with Claude Code

… cost API

OpenRouter now includes cost directly in streaming/non-streaming responses
via `usage.cost`. This eliminates the need for the separate generation cost
polling endpoint, removing the backoff/retry logic and simplifying the
billing flow significantly.

Closes CS-10506

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 3, 2026

Host Test Results

2 098 tests  +1   2 083 ✅ +1   2h 2m 7s ⏱️ - 15m 12s
    1 suites ±0      15 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit d531ef2. ± Comparison against base commit 0bdc6eb.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 3, 2026

Realm Server Test Results

  1 files  ±0    1 suites  ±0   13m 8s ⏱️ +36s
834 tests +6  834 ✅ +6  0 💤 ±0  0 ❌ ±0 
905 runs  +6  905 ✅ +6  0 💤 ±0  0 ❌ ±0 

Results for commit d531ef2. ± Comparison against base commit 0bdc6eb.

This pull request removes 1 and adds 7 tests. Note that renamed tests count towards both.
default ‑ should handle streaming requests
default ‑ can successfully run a command
default ‑ rejects invalid JSON body
default ‑ rejects missing command
default ‑ rejects missing realmURL
default ‑ requires auth
default ‑ should fall back to generation cost API when inline cost is missing
default ‑ should handle streaming requests and deduct credits from inline cost

♻️ This comment has been updated with latest results.

When a user cancels a stream mid-way, the final chunk containing
usage.cost never arrives. In this case, fall back to polling
OpenRouter's /generation endpoint using the generationId to ensure
credits are still deducted.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@FadhlanR FadhlanR marked this pull request as ready for review April 3, 2026 15:21
@FadhlanR FadhlanR requested a review from jurgenwerk April 3, 2026 15:21
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d531ef2670

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

return;
}

const generationId = response?.id;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Restore fallback generation ID extraction

When inline usage.cost is missing, this now falls back using only response.id, but the previous implementation also handled response.choices[0].id and response.usage.generation_id. For forwarded OpenRouter responses that do not include a top-level id, the fallback /generation?id=... lookup is skipped entirely, so those requests will not deduct credits even though they previously would have been billed.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

@jurgenwerk jurgenwerk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried running this locally and I saw that credits are deducted correctly

However, when I stopped AI generation in the middle of AI response, I did not see any credits being spent. I am not sure if this was how it worked before this change but something you might want to take a look at.

await spendUsageCost(this.pgAdapter, matrixUserId, costInUsd);
} else if (generationId) {
log.info(
`No inline cost for user ${matrixUserId}, falling back to generation cost API (generationId: ${generationId})`,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which case there is no inline cost?

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors OpenRouter billing to primarily use inline usage.cost from responses, with a fallback to the /generation?id= cost API when inline cost is unavailable (e.g., interrupted streams).

Changes:

  • Update realm-server request-forward billing to deduct credits via inline usage.cost and retain /generation polling as a fallback.
  • Simplify credit strategy interface/implementations to route all deductions through saveUsageCost.
  • Update ai-bot and realm-server tests to cover inline-cost streaming and generation-cost fallback paths.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
packages/realm-server/tests/request-forward-test.ts Updates streaming test to use inline usage.cost and adds a fallback test for /generation polling.
packages/realm-server/lib/credit-strategies.ts Refactors OpenRouter strategy to spend from inline cost first, then fallback to generation cost API.
packages/realm-server/handlers/handle-request-forward.ts Captures usage.cost during SSE proxying and passes cost/generationId into saveUsageCost; simplifies non-stream deduction flow.
packages/billing/ai-billing.ts Removes old saveUsageCost helper and exports fetchGenerationCostWithBackoff for shared fallback usage.
packages/ai-bot/main.ts Switches ai-bot usage tracking to inline cost fast-path with generation-cost fallback.
Comments suppressed due to low confidence (1)

packages/billing/ai-billing.ts:147

  • fetchGenerationCostWithBackoff now appears to be the primary fallback path after removing saveUsageCost, but on terminal failure it only logs an error and returns null. Because this can lead to permanently unbilled generations, consider capturing this failure in Sentry (or otherwise surfacing it) and including enough context (generationId, possibly matrixUserId when available) to investigate billing gaps.
export async function fetchGenerationCostWithBackoff(
  generationId: string,
  openRouterApiKey: string,
): Promise<number | null> {
  let startedAt = Date.now();
  let delayMs = INITIAL_BACKOFF_MS;

  for (let attempt = 1; attempt <= MAX_FETCH_ATTEMPTS; attempt++) {
    try {
      let cost = await fetchGenerationCost(generationId, openRouterApiKey);
      if (cost !== null) {
        return cost;
      }
    } catch (error) {
      log.warn(
        `Attempt ${attempt} to fetch generation cost failed (generationId: ${generationId})`,
        error,
      );
    }

    let elapsed = Date.now() - startedAt;
    if (attempt === MAX_FETCH_ATTEMPTS || elapsed >= MAX_FETCH_RUNTIME_MS) {
      break;
    }

    let remainingTime = MAX_FETCH_RUNTIME_MS - elapsed;
    let sleepMs = Math.min(delayMs, remainingTime);
    await delay(sleepMs);
    delayMs = Math.min(delayMs * 2, MAX_BACKOFF_DELAY_MS);
  }

  log.error(
    `Failed to fetch generation cost within ${MAX_FETCH_ATTEMPTS} attempts or ${Math.round(MAX_FETCH_RUNTIME_MS / 60000)} minutes (generationId: ${generationId})`,
  );
  return null;
}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +68 to +86
// Deduct credits using the cost from the streaming response.
// Chain per-user promises so costs are recorded sequentially.
const previousPromise =
pendingCostPromises.get(matrixUserId) ?? Promise.resolve();
const costPromise = previousPromise
.then(() =>
endpointConfig.creditStrategy.saveUsageCost(
dbAdapter,
matrixUserId,
{ id: generationId, usage: { cost: costInUsd } },
),
)
.finally(() => {
if (pendingCostPromises.get(matrixUserId) === costPromise) {
pendingCostPromises.delete(matrixUserId);
}
});
pendingCostPromises.set(matrixUserId, costPromise);

Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the streaming [DONE] handler you always enqueue a saveUsageCost call, even when neither a generationId nor an inline cost was ever observed. This can generate noisy warnings (and sets a pendingCostPromises entry) for streams where no billable metadata exists. Consider guarding this block so you only schedule cost deduction when you have either a valid numeric costInUsd or a generationId to use for the fallback.

Suggested change
// Deduct credits using the cost from the streaming response.
// Chain per-user promises so costs are recorded sequentially.
const previousPromise =
pendingCostPromises.get(matrixUserId) ?? Promise.resolve();
const costPromise = previousPromise
.then(() =>
endpointConfig.creditStrategy.saveUsageCost(
dbAdapter,
matrixUserId,
{ id: generationId, usage: { cost: costInUsd } },
),
)
.finally(() => {
if (pendingCostPromises.get(matrixUserId) === costPromise) {
pendingCostPromises.delete(matrixUserId);
}
});
pendingCostPromises.set(matrixUserId, costPromise);
// Deduct credits using the cost from the streaming response only
// when we have enough metadata to save or resolve billing details.
// Chain per-user promises so costs are recorded sequentially.
const hasNumericCost =
typeof costInUsd === 'number' && Number.isFinite(costInUsd);
const hasBillingMetadata = hasNumericCost || generationId != null;
if (hasBillingMetadata) {
const previousPromise =
pendingCostPromises.get(matrixUserId) ?? Promise.resolve();
const costPromise = previousPromise
.then(() =>
endpointConfig.creditStrategy.saveUsageCost(
dbAdapter,
matrixUserId,
{ id: generationId, usage: { cost: costInUsd } },
),
)
.finally(() => {
if (pendingCostPromises.get(matrixUserId) === costPromise) {
pendingCostPromises.delete(matrixUserId);
}
});
pendingCostPromises.set(matrixUserId, costPromise);
}

Copilot uses AI. Check for mistakes.
this.openRouterApiKey,
);
if (fetchedCost !== null) {
await spendUsageCostFromBilling(dbAdapter, matrixUserId, fetchedCost);
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When falling back to fetchGenerationCostWithBackoff, a terminal failure (fetchedCost === null) is silently ignored here. Since this results in under-billing, please add explicit error reporting (at least a warn/error log including matrixUserId + generationId, and/or forwarding the failure to Sentry) so missed deductions are observable.

Suggested change
await spendUsageCostFromBilling(dbAdapter, matrixUserId, fetchedCost);
await spendUsageCostFromBilling(dbAdapter, matrixUserId, fetchedCost);
} else {
log.warn(
`Failed to fetch generation cost after retries for user ${matrixUserId} (generationId: ${generationId}), skipping credit deduction`,
);

Copilot uses AI. Check for mistakes.
process.env.OPENROUTER_API_KEY!,
);
if (fetchedCost !== null) {
await spendUsageCost(this.pgAdapter, matrixUserId, fetchedCost);
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the ai-bot fallback path, if fetchGenerationCostWithBackoff returns null the code currently does nothing (beyond whatever logging happens inside ai-billing) and proceeds without any bot-level warning that credits were not deducted for this user. Please add explicit logging (and/or Sentry capture) here on fetchedCost === null so failed deductions for interrupted streams are visible and actionable.

Suggested change
await spendUsageCost(this.pgAdapter, matrixUserId, fetchedCost);
await spendUsageCost(this.pgAdapter, matrixUserId, fetchedCost);
} else {
let message = `Failed to fetch generation cost for user ${matrixUserId} (generationId: ${generationId}); credits were not deducted`;
log.warn(message);
Sentry.captureMessage(message, {
level: 'warning',
extra: {
matrixUserId,
generationId,
},
});

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants