Instead of using just Hugging Face Transformers, consider using vLLM: https://github.com/vllm-project/vllm this does continuous batching and is written in Python.