Currently, the codebase uses Python Flask for serving the LLM. I propose migrating this implementation to FastAPI, which is a more advanced library and provides better asynchronous support. This change will improve scalability, performance, and modernize the API architecture.