Add pluggable event, compute, and database backends to modern stack#95
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
7d3d223
into
codex/fix-remaining-issues-and-raise-pr
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
| else: | ||
| database_config = { | ||
| "engine": "postgresql", | ||
| "url": settings.DATABASE_URL, | ||
| } |
There was a problem hiding this comment.
🔴 Database credentials leaked through architecture API endpoint
The _storage() method includes the full database connection URL (containing credentials) in its return value, which is served directly to API consumers via the /advanced-stack/architecture endpoint.
Root Cause and Impact
Previously, architecture() returned just the string "postgresql" for the database field. The new _storage() method now returns settings.DATABASE_URL (line 153), settings.COCKROACHDB_URL (line 133), or settings.MONGODB_URL (line 140) — all of which contain full connection strings with embedded credentials.
For example, the default DATABASE_URL is postgresql+psycopg2://airflow:airflow@localhost:5432/flexiroaster and COCKROACHDB_URL is postgresql://root@localhost:26257/flexiroaster?sslmode=disable.
This dict is returned by architecture() at pipeline/backend/core/modern_stack.py:194 and served to HTTP clients at pipeline/backend/api/routes/advanced_stack.py:26. The endpoint is accessible to users with the viewer role (pipeline/backend/api/routes/advanced_stack.py:23), meaning even low-privilege users can see database credentials.
Impact: Database credentials are exposed to any authenticated user with at least viewer role through the REST API.
Prompt for agents
In pipeline/backend/core/modern_stack.py, the _storage() method (lines 126-163) should not include raw database connection URLs in its return value, as this data is served through the public API endpoint at pipeline/backend/api/routes/advanced_stack.py:26. Remove the 'url' key from the database_config dicts for postgresql (line 153), cockroachdb (line 133), and mongodb (line 140). Instead, include only non-sensitive metadata like the engine name. For example, the postgresql case should be: database_config = {"engine": "postgresql"} without the url field. Apply the same pattern to cockroachdb and mongodb cases.
Was this helpful? React with 👍 or 👎 to provide feedback.
| config={ | ||
| "url": settings.RABBITMQ_URL, |
There was a problem hiding this comment.
🔴 RabbitMQ credentials leaked through architecture API endpoint
The _event_layer() method includes the full RabbitMQ connection URL (containing credentials) in its return value when EVENT_BACKEND is "rabbitmq".
Root Cause and Impact
At pipeline/backend/core/modern_stack.py:55, the RabbitMQ config includes "url": settings.RABBITMQ_URL. The default value of RABBITMQ_URL is amqp://guest:guest@localhost:5672/ (see pipeline/backend/config.py:155), which contains the username and password.
This is returned by _event_layer() and included in the architecture() response at line 183, which is served to HTTP clients via the /advanced-stack/architecture endpoint at pipeline/backend/api/routes/advanced_stack.py:26. Any user with viewer role can see these credentials.
Impact: RabbitMQ credentials are exposed to any authenticated user with at least viewer role through the REST API.
| config={ | |
| "url": settings.RABBITMQ_URL, | |
| config={ | |
| "exchange": settings.RABBITMQ_EXCHANGE, |
Was this helpful? React with 👍 or 👎 to provide feedback.
Motivation
Description
pipeline/backend/config.pyto allow runtime selection of eventing (EVENT_BACKEND+ Pulsar/RabbitMQ/NATS settings), distributed compute (DISTRIBUTED_COMPUTE_BACKEND+ Spark/Dask settings), and database backend (DATABASE_BACKEND+ CockroachDB/MongoDB/Cassandra settings).ModernOrchestrationStackinpipeline/backend/core/modern_stack.pyto resolve providers through helper methods._event_layer(),._distributed_compute(), and._storage()and to use those resolved components inarchitecture()andsubmit_execution()to generate provider-specific command metadata.pipeline/backend/tests/test_modern_stack.pyto validate selection and command generation for alternative backends (Pulsar+Spark+CockroachDB and RabbitMQ+Dask) usingmonkeypatchto override settings.Testing
pytest -q pipeline/backend/tests/test_modern_stack.pyand all tests passed (4 passed).architecture()andsubmit_execution()for default and alternative backend combinations and succeeded without regressions.Codex Task