I recently tested this model at work, and it worked very well, we might use it further, the Delay model works very well but is slow, can I try to use SGLang/vLLM as backend and maybe add PR for that? That will increase its speed by a lot, while at it I might try quantization too so it works with smaller GPUs.