Skip to content

GPT-5 Responses taking a long time to complete #206

@richarddas

Description

@richarddas

I've dug in and implemented GPT-5 into my app, and although I understand it's a reasoning model, responses were taking 20s or more to complete.

I enabled streaming, and there is still a very long delay before the stream begins.

I am very familiar that GPT-5 is a reasoning model, and therefore takes longer (and burns 10× tokens compared to 4.1!!! 😳) but I'm just wondering if going via AIProxy is introducing any latency, or if there is anything in our control that can speed things up?

I've tried turning the reasoning effort down to .minimal (which throws an error as the OAI API doesn't recognise it, see my comment on the other ticket!) but even on .low time to first token is still very high.

I've opened this as a ticket in order to discuss, I understand it might not all be under AIP control, but I wanted to see if there's either something we can debug, or maybe just something to include in docs?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions