I've found that without using headphones, the realtime model will hear itself and get caught in conversation loops as it keeps interrupting itself. This is especially true when using .semanticVAD, and a makeshift patch that ameliorates the problem is by using .serverVAD with a high threshold.
I know a lot of work has been put into this already, and I probably won't be able to work on a solution for it right away, but I do want to highlight that this still seems to be a problem and document it here.
This was found while testing on an iPhone 16 Pro, iOS 26.0.1
I've found that without using headphones, the realtime model will hear itself and get caught in conversation loops as it keeps interrupting itself. This is especially true when using
.semanticVAD, and a makeshift patch that ameliorates the problem is by using.serverVADwith a high threshold.I know a lot of work has been put into this already, and I probably won't be able to work on a solution for it right away, but I do want to highlight that this still seems to be a problem and document it here.
This was found while testing on an iPhone 16 Pro, iOS 26.0.1