feat(docker): split docker-compose UI and gateway into separate services#28711
feat(docker): split docker-compose UI and gateway into separate services#28711yassin-berriai wants to merge 8 commits into
Conversation
…ces (LIT-2815) - Replace monolithic `litellm` service with dedicated `ui` (nginx/Next.js, port 3000) and `gateway` (uvicorn/FastAPI, port 4000) services, each building from their own Dockerfile - Update prometheus.yml scrape target from `litellm:4000` → `gateway:4000` - Add litellm/proxy/_experimental/out/ to .gitignore and remove the 680 committed build artefacts; the UI bundle is now produced inside the Docker build, not checked in to source control - Add tests/test_litellm/test_docker_compose.py with 17 static checks covering service presence, build config, ports, env vars, health checks, dependencies, and prometheus scrape targets Resolves LIT-2815 https://claude.ai/code/session_01JVLLUH66aUXF9kxoHcYxWu
|
|
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
…opied artefacts The _experimental/out/ directory was removed from source control in the UI/gateway split (LIT-2815). Dockerfile.non_root was still copying from that path, causing test-server-root-path CI to fail. Replace the cp from the now-absent _experimental/out/ with an inline npm ci + npm run build using the nodejs/npm already installed in the builder stage, then copy from ui/litellm-dashboard/out/ instead. https://claude.ai/code/session_01JVLLUH66aUXF9kxoHcYxWu
After removing litellm/proxy/_experimental/out/ from source control (LIT-2815), a fresh CI checkout no longer contains that directory. build_ui.sh used `cp -r ./out/* <dest>` which fails when <dest> does not exist. Replace with `rm -rf <dest> && mv ./out <dest>`, which works whether or not the destination exists — identical to the approach already used in the e2e_ui_testing CircleCI job (see config.yml comment). https://claude.ai/code/session_01JVLLUH66aUXF9kxoHcYxWu
…n_root Per review feedback, keep litellm/proxy/_experimental/out/ tracked in git so the PR diff stays focused on the docker-compose split. Remove the gitignore entry added in the initial commit and restore Dockerfile.non_root to its original approach of staging the pre-built UI artefacts. The build_ui.sh rm+mv fix (avoiding cp-as-child) is retained as a standalone improvement. Resolves LIT-2815 https://claude.ai/code/session_01JVLLUH66aUXF9kxoHcYxWu
…al state Complete the revert started in the previous commit — remove the litellm/proxy/_experimental/out/ gitignore entry and restore Dockerfile.non_root to copy from the checked-in static export rather than building from npm source. https://claude.ai/code/session_01JVLLUH66aUXF9kxoHcYxWu
…rt (LIT-2815) Add the `backend` service (port 4001, backend/Dockerfile) to docker-compose.yml to mirror the three-service architecture in helm/litellm: gateway (4000), backend (4001), ui (3000). Update tests to cover the new service. https://claude.ai/code/session_01JVLLUH66aUXF9kxoHcYxWu
The backend service at port 4001 was not part of LIT-2815 (UI/gateway split) and duplicated gateway's config without serving a defined role. Remove it so the compose file matches the two-service architecture described in the ticket and PR. Also trim the seven backend-specific test cases and the unused `import os` from test_docker_compose.py; all 17 remaining checks still pass. Resolves LIT-2815
|
Generated by Claude Code |
Greptile SummaryThis PR splits the monolithic
Confidence Score: 5/5Safe to merge — the changes are confined to Docker/infra config and a new static test file, with no modifications to application logic. All application code is untouched. The docker-compose split is clean, the service_healthy dependency is correctly wired, and the new tests use exact port-equality assertions. The only gap is a minor one in the test's else branch for depends_on, which has no effect on the current configuration. No files require special attention; the minor test-coverage gap in test_gateway_depends_on_db is low risk.
|
| Filename | Overview |
|---|---|
| docker-compose.yml | Splits monolithic litellm service into ui (nginx/Next.js, port 3000) and gateway (FastAPI, port 4000); gateway now uses condition: service_healthy for depends_on: db, and both services have healthchecks configured. |
| prometheus.yml | One-line update: scrape target changed from litellm:4000 to gateway:4000 to match the renamed service. |
| tests/test_litellm/test_docker_compose.py | New static-analysis test file with 17 checks; port assertions now use exact equality; test_gateway_depends_on_db verifies service_healthy in the dict branch but the else branch skips that check, leaving a small coverage gap. |
| ui/litellm-dashboard/build_ui.sh | Replaced rm -rf dir/* && cp -r out/* dir && rm -rf out with the cleaner rm -rf dir && mv out dir, making the build output replacement atomic and resilient to a missing destination directory. |
Reviews (2): Last reviewed commit: "fix(docker): address Greptile review and..." | Re-trigger Greptile
|
Addressed both items from the 4/5 review:
Also restored Generated by Claude Code |
…tests)
Two issues raised in Greptile 4/5 review:
1. gateway depends_on used the list form which starts the gateway before
Postgres is ready. Switch to condition: service_healthy so the
gateway waits for pg_isready to pass on cold boots.
2. Port tests used substring matching ("3000" in str(p)) which could
pass spurious mappings like "13000:3000". Change to exact string
equality "3000:3000" / "4000:4000". Also assert service_healthy
in test_gateway_depends_on_db.
Resolves LIT-2815
44cb2ee to
9275c6d
Compare
Summary
Resolves LIT-2815
The monolithic
litellmservice indocker-compose.ymlserved both the LLM API and the admin UI from a single container. This PR splits it into two independent services:gateway— FastAPI/uvicorn API proxy (port 4000), built fromgateway/Dockerfileui— nginx/Next.js admin dashboard (port 3000), built fromui/DockerfileThe
gatewayusesdepends_on: db: condition: service_healthyso it waits for Postgres to passpg_isreadybefore starting (avoids connection-refused on cold boot).Changes
docker-compose.ymllitellmservice withui+gateway; gateway waits for db healthprometheus.ymllitellm:4000→gateway:4000ui/litellm-dashboard/build_ui.shrm+mvinstead ofrm -rf *+cptests/test_litellm/test_docker_compose.pyTest Matrix
uiservice existsgatewayservice existslitellmservice is goneuibuilds fromui/Dockerfilebuild.dockerfile == "ui/Dockerfile"gatewaybuilds fromgateway/Dockerfilebuild.dockerfile == "gateway/Dockerfile"uiexposes port 3000"3000:3000"in portsgatewayexposes port 4000"4000:4000"in portsgatewayhasDATABASE_URLenv vargatewayhasSTORE_MODEL_IN_DBenv vargatewayhealth check configuredhealthcheck.testnon-emptyuihealth check configuredhealthcheck.testnon-emptygatewaydepends ondbwithservice_healthycondition: service_healthydbservice existsdbhealth check configuredhealthcheck.testnon-emptyprometheusservice existspostgres_datais declaredprometheus.ymlscrapesgateway:4000"gateway"prometheus.ymldoes not referencelitellm:4000All 17 tests pass (
uv run pytest tests/test_litellm/test_docker_compose.py -v).Migration note
Users who have a local
prometheus.ymlpointing atlitellm:4000should update the target togateway:4000. The documentation PR (BerriAI/litellm-docs#210) updates the quick-start guide accordingly.https://claude.ai/code/session_01LoB2H4kqJM5cFS98gV8GPY