[Enhancement] Expose inference speed metrics for Chat and Completion by cch1rag · Pull Request #399 · undreamai/LLMUnity

cch1rag · 2026-03-23T22:55:01Z

Summary

Closes #315.

This exposes the backend timing data on the Unity side.

After a completed Chat() or Completion() call, you can now read:

TokensPerSecond
PromptTokensPerSecond

These values come from the timing data already returned by the backend. If timing data is not available, both properties stay at -1.

What changed

added TokensPerSecond and PromptTokensPerSecond to LLMClient
updated the managed wrapper to request the JSON completion response when needed
parsed the response JSON and still returned plain text to callers
used the same parsing path for both Completion() and LLMAgent.Chat()
added tests for JSON parsing, fallback behavior, and stale timing reset

Testing

git diff --check
Unity EditMode parser tests passed:
- LLMUnityTests.TestLLMClient_ResponseParsing

I kept the README update and the package test-discovery cleanup out of this PR to keep it focused on the issue itself. I can send those too if they would be useful.

Expose inference speed metrics on LLMClient

2b43e19

cch1rag mentioned this pull request Mar 23, 2026

How to Calculate Token Score (Token / minute) to check if the CPU is strong enough? #315

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Enhancement] Expose inference speed metrics for Chat and Completion#399

[Enhancement] Expose inference speed metrics for Chat and Completion#399
cch1rag wants to merge 1 commit intoundreamai:mainfrom
cch1rag:enhancement/expose-inference-speed

cch1rag commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

cch1rag commented Mar 23, 2026

Summary

What changed

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant