Skip to content

Add pluggable event, compute, and database backends to modern stack#95

Merged
fuzziecoder merged 1 commit intocodex/fix-remaining-issues-and-raise-prfrom
codex/implement-event-and-messaging-system-alternatives-yxcb15
Feb 25, 2026
Merged

Add pluggable event, compute, and database backends to modern stack#95
fuzziecoder merged 1 commit intocodex/fix-remaining-issues-and-raise-prfrom
codex/implement-event-and-messaging-system-alternatives-yxcb15

Conversation

@fuzziecoder
Copy link
Copy Markdown
Owner

@fuzziecoder fuzziecoder commented Feb 25, 2026

Motivation

  • Provide pluggable alternatives to the hard-coded Kafka/Ray/PostgreSQL defaults so different messaging, compute, and database technologies can be selected via configuration.
  • Enable multi-tenant and geo-replication friendly eventing options (e.g. Pulsar) and lightweight/enterprise options (RabbitMQ, NATS).
  • Support larger distributed compute footprints (Spark, Dask) and alternative storage backends (CockroachDB, MongoDB, Cassandra) for different deployment profiles.

Description

  • Added new configuration variables in pipeline/backend/config.py to allow runtime selection of eventing (EVENT_BACKEND + Pulsar/RabbitMQ/NATS settings), distributed compute (DISTRIBUTED_COMPUTE_BACKEND + Spark/Dask settings), and database backend (DATABASE_BACKEND + CockroachDB/MongoDB/Cassandra settings).
  • Refactored ModernOrchestrationStack in pipeline/backend/core/modern_stack.py to resolve providers through helper methods ._event_layer(), ._distributed_compute(), and ._storage() and to use those resolved components in architecture() and submit_execution() to generate provider-specific command metadata.
  • Preserved existing defaults for Kafka/Ray/PostgreSQL while making alternative backends opt-in via environment settings.
  • Expanded pipeline/backend/tests/test_modern_stack.py to validate selection and command generation for alternative backends (Pulsar+Spark+CockroachDB and RabbitMQ+Dask) using monkeypatch to override settings.

Testing

  • Ran pytest -q pipeline/backend/tests/test_modern_stack.py and all tests passed (4 passed).
  • Tests exercise architecture() and submit_execution() for default and alternative backend combinations and succeeded without regressions.

Codex Task


Open with Devin

@vercel
Copy link
Copy Markdown

vercel bot commented Feb 25, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
flexi-roaster Ready Ready Preview, Comment Feb 25, 2026 1:36pm

@fuzziecoder fuzziecoder self-assigned this Feb 25, 2026
@fuzziecoder fuzziecoder merged commit 7d3d223 into codex/fix-remaining-issues-and-raise-pr Feb 25, 2026
5 of 7 checks passed
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 25, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch codex/implement-event-and-messaging-system-alternatives-yxcb15

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 potential issues.

View 5 additional findings in Devin Review.

Open in Devin Review

Comment on lines +150 to +154
else:
database_config = {
"engine": "postgresql",
"url": settings.DATABASE_URL,
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Database credentials leaked through architecture API endpoint

The _storage() method includes the full database connection URL (containing credentials) in its return value, which is served directly to API consumers via the /advanced-stack/architecture endpoint.

Root Cause and Impact

Previously, architecture() returned just the string "postgresql" for the database field. The new _storage() method now returns settings.DATABASE_URL (line 153), settings.COCKROACHDB_URL (line 133), or settings.MONGODB_URL (line 140) — all of which contain full connection strings with embedded credentials.

For example, the default DATABASE_URL is postgresql+psycopg2://airflow:airflow@localhost:5432/flexiroaster and COCKROACHDB_URL is postgresql://root@localhost:26257/flexiroaster?sslmode=disable.

This dict is returned by architecture() at pipeline/backend/core/modern_stack.py:194 and served to HTTP clients at pipeline/backend/api/routes/advanced_stack.py:26. The endpoint is accessible to users with the viewer role (pipeline/backend/api/routes/advanced_stack.py:23), meaning even low-privilege users can see database credentials.

Impact: Database credentials are exposed to any authenticated user with at least viewer role through the REST API.

Prompt for agents
In pipeline/backend/core/modern_stack.py, the _storage() method (lines 126-163) should not include raw database connection URLs in its return value, as this data is served through the public API endpoint at pipeline/backend/api/routes/advanced_stack.py:26. Remove the 'url' key from the database_config dicts for postgresql (line 153), cockroachdb (line 133), and mongodb (line 140). Instead, include only non-sensitive metadata like the engine name. For example, the postgresql case should be: database_config = {"engine": "postgresql"} without the url field. Apply the same pattern to cockroachdb and mongodb cases.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +54 to +55
config={
"url": settings.RABBITMQ_URL,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 RabbitMQ credentials leaked through architecture API endpoint

The _event_layer() method includes the full RabbitMQ connection URL (containing credentials) in its return value when EVENT_BACKEND is "rabbitmq".

Root Cause and Impact

At pipeline/backend/core/modern_stack.py:55, the RabbitMQ config includes "url": settings.RABBITMQ_URL. The default value of RABBITMQ_URL is amqp://guest:guest@localhost:5672/ (see pipeline/backend/config.py:155), which contains the username and password.

This is returned by _event_layer() and included in the architecture() response at line 183, which is served to HTTP clients via the /advanced-stack/architecture endpoint at pipeline/backend/api/routes/advanced_stack.py:26. Any user with viewer role can see these credentials.

Impact: RabbitMQ credentials are exposed to any authenticated user with at least viewer role through the REST API.

Suggested change
config={
"url": settings.RABBITMQ_URL,
config={
"exchange": settings.RABBITMQ_EXCHANGE,
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant