Skip to content

Add pluggable event, compute, and database backends to modern stack#94

Merged
fuzziecoder merged 2 commits intocodex/fix-remaining-issues-and-raise-prfrom
codex/implement-event-and-messaging-system-alternatives
Feb 25, 2026
Merged

Add pluggable event, compute, and database backends to modern stack#94
fuzziecoder merged 2 commits intocodex/fix-remaining-issues-and-raise-prfrom
codex/implement-event-and-messaging-system-alternatives

Conversation

@fuzziecoder
Copy link
Copy Markdown
Owner

@fuzziecoder fuzziecoder commented Feb 25, 2026

Motivation

  • Provide pluggable alternatives to the hard-coded Kafka/Ray/PostgreSQL defaults so different messaging, compute, and database technologies can be selected via configuration.
  • Enable multi-tenant and geo-replication friendly eventing options (e.g. Pulsar) and lightweight/enterprise options (RabbitMQ, NATS).
  • Support larger distributed compute footprints (Spark, Dask) and alternative storage backends (CockroachDB, MongoDB, Cassandra) for different deployment profiles.

Description

  • Added new configuration variables in pipeline/backend/config.py to allow runtime selection of eventing (EVENT_BACKEND + Pulsar/RabbitMQ/NATS settings), distributed compute (DISTRIBUTED_COMPUTE_BACKEND + Spark/Dask settings), and database backend (DATABASE_BACKEND + CockroachDB/MongoDB/Cassandra settings).
  • Refactored ModernOrchestrationStack in pipeline/backend/core/modern_stack.py to resolve providers through helper methods ._event_layer(), ._distributed_compute(), and ._storage() and to use those resolved components in architecture() and submit_execution() to generate provider-specific command metadata.
  • Preserved existing defaults for Kafka/Ray/PostgreSQL while making alternative backends opt-in via environment settings.
  • Expanded pipeline/backend/tests/test_modern_stack.py to validate selection and command generation for alternative backends (Pulsar+Spark+CockroachDB and RabbitMQ+Dask) using monkeypatch to override settings.

Testing

  • Ran pytest -q pipeline/backend/tests/test_modern_stack.py and all tests passed (4 passed).
  • Tests exercise architecture() and submit_execution() for default and alternative backend combinations and succeeded without regressions.

Codex Task


Open with Devin

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 25, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch codex/implement-event-and-messaging-system-alternatives

Comment @coderabbitai help to get the list of available commands and usage tips.

@vercel
Copy link
Copy Markdown

vercel bot commented Feb 25, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
flexi-roaster Ready Ready Preview, Comment Feb 25, 2026 2:03pm

Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 5 additional findings.

Open in Devin Review

…lement-event-and-messaging-system-alternatives
@fuzziecoder fuzziecoder merged commit fc3287c into codex/fix-remaining-issues-and-raise-pr Feb 25, 2026
5 of 7 checks passed
Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

🐛 2 issues in files not directly in the diff

🐛 Duplicate dict keys in architecture() silently discard first distributed_compute and storage entries (pipeline/backend/core/modern_stack.py:195-214)

The architecture() method defines "distributed_compute" twice (lines 195 and 213) and "storage" twice (lines 204 and 214) in the same dict literal. In Python, duplicate keys in a dict literal cause the first value to be silently overwritten by the last one.

Root Cause and Impact

The first "distributed_compute" entry (lines 195-203) contains a hardcoded Ray config with config.alternatives: ["spark", "dask"], and the first "storage" entry (lines 204-212) contains database_alternatives: ["cockroachdb", "mongodb", "cassandra"]. These are silently overwritten by the second definitions at lines 213-214 which use self._distributed_compute() and self._storage() respectively.

The pluggable self._distributed_compute() returns a dict like {"name": "ray", "enabled": True, "config": {"dashboard_url": ..., "entrypoint": ...}} — it does NOT contain config.alternatives.

Similarly, self._storage() returns {"database": {...}, "object_storage": {...}} — it does NOT contain database_alternatives.

This means test_architecture_contains_requested_layers at pipeline/backend/tests/test_modern_stack.py:25-26 would fail at runtime:

  • architecture["distributed_compute"]["config"]["alternatives"]KeyError because the pluggable compute dict doesn't have an alternatives key
  • architecture["storage"]["database_alternatives"]KeyError because self._storage() doesn't have a database_alternatives key

The old hardcoded entries should have been removed when the pluggable self._distributed_compute() and self._storage() calls were added.


🐛 Duplicate compute and event commands emitted in submit_execution() due to old hardcoded logic not being removed (pipeline/backend/core/modern_stack.py:250-276)

submit_execution() appends duplicate distributed_compute and event_layer commands because old hardcoded logic (lines 250-276) was not removed when the new pluggable logic (lines 277-308) was added.

Detailed Explanation

Duplicate compute commands: Lines 250-266 always append a hardcoded compute command — either {"engine": "ray", "action": "submit_ray_job"} if settings.RAY_ENABLED is true, or {"engine": "spark", "action": "submit_spark_job"} otherwise. Then lines 277-290 call self._distributed_compute() and append another compute command based on the DISTRIBUTED_COMPUTE_BACKEND setting. With default settings (RAY_ENABLED=True, DISTRIBUTED_COMPUTE_BACKEND="ray"), the commands list will contain two nearly identical ray compute commands.

Worse, when the user configures DISTRIBUTED_COMPUTE_BACKEND="dask" but RAY_ENABLED=True (the default), the old code still emits a ray command while the new code correctly emits a dask command — resulting in contradictory commands.

Duplicate event commands: Lines 268-276 append a hardcoded kafka event command when settings.KAFKA_ENABLED is true. Then lines 292-308 call self._event_layer() and append another event command based on EVENT_BACKEND. With KAFKA_ENABLED=True and EVENT_BACKEND="kafka" (the default), two kafka event commands are emitted.

The old hardcoded blocks at lines 250-276 should have been removed when the pluggable blocks at lines 277-308 were added.

View 9 additional findings in Devin Review.

Open in Devin Review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant