GPT-5 Responses taking a long time to complete

I've dug in and implemented GPT-5 into my app, and although I understand it's a reasoning model, responses were taking 20s or more to complete.

I enabled streaming, and there is still a very long delay before the stream begins.

I am very familiar that GPT-5 is a reasoning model, and therefore takes longer (and burns 10× tokens compared to 4.1!!! 😳) but I'm just wondering if going via AIProxy is introducing any latency, or if there is anything in our control that can speed things up?

I've tried turning the reasoning effort down to `.minimal` (which throws an error as the OAI API doesn't recognise it, see my comment on the other ticket!) but even on `.low` time to first token is still very high.

I've opened this as a ticket in order to discuss, I understand it might not all be under AIP control, but I wanted to see if there's either something we can debug, or maybe just something to include in docs?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPT-5 Responses taking a long time to complete #206

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

GPT-5 Responses taking a long time to complete #206

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions