I've dug in and implemented GPT-5 into my app, and although I understand it's a reasoning model, responses were taking 20s or more to complete.
I enabled streaming, and there is still a very long delay before the stream begins.
I am very familiar that GPT-5 is a reasoning model, and therefore takes longer (and burns 10× tokens compared to 4.1!!! 😳) but I'm just wondering if going via AIProxy is introducing any latency, or if there is anything in our control that can speed things up?
I've tried turning the reasoning effort down to .minimal (which throws an error as the OAI API doesn't recognise it, see my comment on the other ticket!) but even on .low time to first token is still very high.
I've opened this as a ticket in order to discuss, I understand it might not all be under AIP control, but I wanted to see if there's either something we can debug, or maybe just something to include in docs?
I've dug in and implemented GPT-5 into my app, and although I understand it's a reasoning model, responses were taking 20s or more to complete.
I enabled streaming, and there is still a very long delay before the stream begins.
I am very familiar that GPT-5 is a reasoning model, and therefore takes longer (and burns 10× tokens compared to 4.1!!! 😳) but I'm just wondering if going via AIProxy is introducing any latency, or if there is anything in our control that can speed things up?
I've tried turning the reasoning effort down to
.minimal(which throws an error as the OAI API doesn't recognise it, see my comment on the other ticket!) but even on.lowtime to first token is still very high.I've opened this as a ticket in order to discuss, I understand it might not all be under AIP control, but I wanted to see if there's either something we can debug, or maybe just something to include in docs?