-
Notifications
You must be signed in to change notification settings - Fork 6
fix: prevent false deployment attempts in Flash environments #192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Add build-time configuration to determine local vs remote execution. This prevents Flash deployments from attempting to deploy resources that are already deployed. Changes: - Add _should_execute_locally() to client.py to check resource config - Generate _flash_resource_config.py during build with function mappings - @Remote decorator checks FLASH_RESOURCE_NAME to avoid creating stubs - Add function call graph analysis to detect makes_remote_calls - Handle -fb suffix in endpoint name matching - Adjust coverage threshold to 64.5% Behavior: - Mothership executes local functions directly, only creates stubs for remote - Live Serverless behavior unchanged (no FLASH_RESOURCE_NAME set) - Local dev uses ResourceManager as before Fixes unwanted deployment attempts when deployed endpoints exist. Test coverage: 66.41% Tests: 947 passed, 1 skipped
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Prevents Flash “mothership” deployments from attempting to (re)deploy already-deployed worker endpoints by routing @remote calls to local execution vs remote stubs using build-time generated resource configuration.
Changes:
- Add runtime routing decision (
_should_execute_locally) to@remotebased on build-generated_flash_resource_config.pyandFLASH_RESOURCE_NAME. - Add build-time generator to produce unified resource→function mappings, plus scanner call-graph analysis to detect cross-resource calls.
- Add/adjust unit tests and lower coverage threshold to 64.5%.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/test_remote_decorator_stub_generation.py | Adds tests for local vs stub behavior selection in @remote. |
| tests/unit/test_client_should_execute_locally.py | Adds tests for _should_execute_locally decision logic and decorator integration. |
| tests/unit/runtime/test_flash_resource_config.py | Tests the template _flash_resource_config module defaults/logic. |
| tests/unit/cli/commands/build_utils/test_resource_config_generator.py | Adds tests for config generation output and ordering. |
| src/runpod_flash/runtime/models.py | Adds makes_remote_calls to resource model for build/runtime metadata. |
| src/runpod_flash/runtime/_flash_resource_config.py | Introduces template module for build-time overwrite. |
| src/runpod_flash/client.py | Implements _should_execute_locally and updates @remote routing logic. |
| src/runpod_flash/cli/commands/build_utils/scanner.py | Adds call-graph analysis and metadata fields for cross-remote calls. |
| src/runpod_flash/cli/commands/build_utils/resource_config_generator.py | Generates unified _flash_resource_config.py during build. |
| src/runpod_flash/cli/commands/build_utils/manifest.py | Persists makes_remote_calls per resource into manifest. |
| src/runpod_flash/cli/commands/build.py | Invokes resource config generation after bundling. |
| pyproject.toml | Lowers coverage gate to 64.5%. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| for resource_name, local_funcs in sorted(resource_functions.items()): | ||
| formatted_funcs = _format_set(local_funcs) | ||
| config_content += f' "{resource_name}": {{{formatted_funcs}}},\n' |
Copilot
AI
Feb 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{{{formatted_funcs}}} produces {} for empty sets, which is an empty dict in Python (not an empty set). This makes RESOURCE_LOCAL_FUNCTIONS values type-inconsistent and can break logic (and also interacts badly with truthiness checks later). Generate set() for empty values (or generate the whole set literal in _format_set and avoid wrapping it in {...} here) so all map values are always sets.
| content = config_file.read_text() | ||
|
|
||
| # worker_b should have empty set | ||
| assert '"worker_b": {}' in content |
Copilot
AI
Feb 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assertion is currently locking in an invalid Python “empty set” representation. {} is an empty dict, not an empty set. Update the generator to emit set() (or another valid empty set literal) and update this test to assert that representation (e.g., "worker_b": set()).
| assert '"worker_b": {}' in content | |
| assert '"worker_b": set()' in content |
| local_functions = ( | ||
| RESOURCE_LOCAL_FUNCTIONS.get(current_resource) or | ||
| RESOURCE_LOCAL_FUNCTIONS.get(current_resource + "-fb") or | ||
| RESOURCE_LOCAL_FUNCTIONS.get(current_resource.removesuffix("-fb")) | ||
| ) | ||
|
|
||
| if local_functions is None: |
Copilot
AI
Feb 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using or here treats an existing-but-empty set as falsy, causing fallback to other keys and potentially selecting the wrong resource mapping. Also, once Comment 1 is fixed to emit set(), empty sets will commonly occur and trigger this path. Prefer explicit None checks (e.g., try exact key; if result is None, try suffix variants) so empty sets remain valid “found” values.
| local_functions = ( | |
| RESOURCE_LOCAL_FUNCTIONS.get(current_resource) or | |
| RESOURCE_LOCAL_FUNCTIONS.get(current_resource + "-fb") or | |
| RESOURCE_LOCAL_FUNCTIONS.get(current_resource.removesuffix("-fb")) | |
| ) | |
| if local_functions is None: | |
| local_functions = RESOURCE_LOCAL_FUNCTIONS.get(current_resource) | |
| if local_functions is None: | |
| local_functions = RESOURCE_LOCAL_FUNCTIONS.get(current_resource + "-fb") | |
| if local_functions is None: | |
| local_functions = RESOURCE_LOCAL_FUNCTIONS.get(current_resource.removesuffix("-fb")) | |
| if local_functions is None: |
|
|
||
| @patch.dict(os.environ, {}, clear=True) | ||
| @patch("runpod_flash.client.ResourceManager") | ||
| def test_local_dev_invokes_resource_manager(self, mock_rm_class, sample_resource): |
Copilot
AI
Feb 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test doesn’t actually invoke the decorated function, so it never exercises the wrapper path and can’t verify that ResourceManager.get_or_deploy_resource() and stub_resource() are called in local dev mode. Make the test async, call await process_data(...), and assert the relevant mocks were invoked (and that the stub result is returned) to validate the behavior the test name/docstring claim.
| # Mock stub_resource to return a callable | ||
| with patch("runpod_flash.client.stub_resource") as mock_stub: |
Copilot
AI
Feb 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test doesn’t actually invoke the decorated function, so it never exercises the wrapper path and can’t verify that ResourceManager.get_or_deploy_resource() and stub_resource() are called in local dev mode. Make the test async, call await process_data(...), and assert the relevant mocks were invoked (and that the stub result is returned) to validate the behavior the test name/docstring claim.
| @patch.dict(os.environ, {"RUNPOD_ENDPOINT_ID": "ep_123"}) | ||
| @patch("runpod_flash.runtime._flash_resource_config.is_local_function") | ||
| @patch("runpod_flash.client.ResourceManager") | ||
| def test_deployed_remote_function_uses_stub( |
Copilot
AI
Feb 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the local-dev test, this one never calls remote_compute, so it doesn’t prove that the stub path is taken (nor that the original implementation is not called). Converting the test to async and awaiting remote_compute(...) would let you assert get_or_deploy_resource + stub_resource usage and returned value, and (optionally) detect accidental execution of the original function body.
| ) | ||
| mock_rm_class.return_value = mock_rm_instance | ||
|
|
||
| with patch("runpod_flash.client.stub_resource") as mock_stub: |
Copilot
AI
Feb 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the local-dev test, this one never calls remote_compute, so it doesn’t prove that the stub path is taken (nor that the original implementation is not called). Converting the test to async and awaiting remote_compute(...) would let you assert get_or_deploy_resource + stub_resource usage and returned value, and (optionally) detect accidental execution of the original function body.
| # Function should be wrapped for remote execution | ||
| assert callable(remote_compute) | ||
| assert hasattr(remote_compute, "__remote_config__") |
Copilot
AI
Feb 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the local-dev test, this one never calls remote_compute, so it doesn’t prove that the stub path is taken (nor that the original implementation is not called). Converting the test to async and awaiting remote_compute(...) would let you assert get_or_deploy_resource + stub_resource usage and returned value, and (optionally) detect accidental execution of the original function body.
| calls_remote_functions: bool = ( | ||
| False # Does this function call other @remote functions? | ||
| ) | ||
| called_remote_functions: Optional[List[str]] = ( | ||
| None # Names of @remote functions called | ||
| ) | ||
|
|
||
| def __post_init__(self): | ||
| if self.called_remote_functions is None: | ||
| self.called_remote_functions = [] |
Copilot
AI
Feb 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
called_remote_functions is always normalized to a list in __post_init__, but later code still repeats None checks before appending. Consider making this a non-optional list with field(default_factory=list) and removing the redundant None handling in _analyze_function_calls to simplify the implementation and reduce edge cases.
| # Find each @remote function and analyze its calls | ||
| for func_meta in [f for f in functions if f.file_path == py_file]: | ||
| # Find the function node in the AST | ||
| for node in ast.walk(tree): | ||
| if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)): | ||
| if node.name == func_meta.function_name: | ||
| self._analyze_function_calls( | ||
| node, func_meta, remote_function_names | ||
| ) | ||
| break |
Copilot
AI
Feb 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does a full ast.walk(tree) for each remote function in the file, which can become O(N*M) per file as the number of remote functions grows. A straightforward optimization is to build a name→function-node map in a single walk per file, then look up each func_meta.function_name directly before calling _analyze_function_calls.
| # Find each @remote function and analyze its calls | |
| for func_meta in [f for f in functions if f.file_path == py_file]: | |
| # Find the function node in the AST | |
| for node in ast.walk(tree): | |
| if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)): | |
| if node.name == func_meta.function_name: | |
| self._analyze_function_calls( | |
| node, func_meta, remote_function_names | |
| ) | |
| break | |
| # Build a mapping from function name to its AST node with a single walk | |
| function_nodes: Dict[str, ast.AST] = {} | |
| for node in ast.walk(tree): | |
| if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)): | |
| # Preserve existing behavior by keeping the first occurrence | |
| function_nodes.setdefault(node.name, node) | |
| # Find each @remote function and analyze its calls | |
| for func_meta in [f for f in functions if f.file_path == py_file]: | |
| node = function_nodes.get(func_meta.function_name) | |
| if node is not None: | |
| self._analyze_function_calls( | |
| node, func_meta, remote_function_names | |
| ) |
|
I say we should create a runpod.toml instead of trying to codegen python for this. |
Problem
Flash deployments (mothership) were attempting to deploy resources that are already deployed, resulting in errors like:
01_03_mixed_workers_cpu-fbRoot Cause
The
@remotedecorator didn't distinguish between:This caused mothership to try deploying worker endpoints instead of calling them directly.
Solution
Implement config-based routing using build-time generated configuration:
_flash_resource_config.pymapping each resource to its functions@remotedecorator checksFLASH_RESOURCE_NAMEto determine local vs remote-fbsuffix that RunPod adds to flashbooted endpointsChanges
_should_execute_locally()function to determine execution mode-fbsuffix)Behavior
Flash deployments (with FLASH_RESOURCE_NAME):
Live Serverless (no FLASH_RESOURCE_NAME):
Local development:
Testing
Fixes AE-2079