Skip to content

Improve performance metrics collection #104

@orionpapadakis

Description

@orionpapadakis

It would be very helpful to expose more detailed performance metrics in gpullama, similar to what Ollama provides (see) .

Right now, it is difficult to properly evaluate performance (especially across CPU/GPU backends and TornadoVM execution) without fine-grained timing information. Having a consistent set of metrics would significantly improve benchmarking, profiling, and optimization.

Proposed metrics

Core metrics (aligned with Ollama-style reporting):

total_duration – total time to generate the full response
load_duration – time spent loading the model
prompt_eval_count – number of input tokens processed
prompt_eval_duration (prefill) – time spent processing the prompt
eval_count – number of generated output tokens
eval_duration (decode) – time spent generating tokens

TornadoVM-specific metrics:

tornado_task_graph_compile_duration – time to compile the Tornado task graph
tornado_task_graph_warmup_duration – time spent in warmup/execution until steady state

All timings should ideally be reported in nanoseconds for consistency and precision.

With the above we can calculate:

  • time_to_first_token (TTFT) – can be derived from existing durations (e.g. load + prefill + first decode step), so it may not need separate instrumentation if timestamps are available
  • prefill_throuput as tok/s = prompt_eval_count / prompt_eval_duration
  • decode_throuput as tok/s = eval_count / eval_duration
  • total_throuput as tok/s = prompt_eval_count + eval_count / total_duration <--- as we do now

Why this is useful

These metrics would make it easier to:

break down execution into loading, prefill, decode, and runtime overheads
understand TornadoVM-specific costs (compilation and warmup)
compare CPU vs GPU vs TornadoVM performance more accurately
identify bottlenecks and guide optimizations

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions