Conversation
…to test-refactor
|
@microsoft-github-policy-service agree company="Microsoft" |
yaml.safe_load_all() enters an infinite loop when passed a MagicMock object because PyYAML detects the .read attribute and treats it as a file-like stream, then loops forever waiting to buffer enough bytes (len(MagicMock()) returns 0 by default). Fix by setting create_template.return_value to a valid YAML string in the three create_deployment tests, so yaml.safe_load_all receives a real string and parses it via the non-blocking code path. Affected tests: - test_create_deployment_success - test_create_deployment_failure - test_create_deployment_partial_success
begin_create_or_update() returns an LROPoller that was being discarded, allowing execution to continue while Azure still had an operation in-progress. Subsequent scale/delete calls were then rejected with OperationNotAllowed. Fix by calling poller.result() in scale_node_pool and _progressive_scale to block until Azure fully completes each operation before proceeding.
| ) | ||
| node_pool.count = step # Update node count in the node pool object | ||
| result = self.aks_client.agent_pools.begin_create_or_update( | ||
| poller = self.aks_client.agent_pools.begin_create_or_update( |
There was a problem hiding this comment.
Why did we add poller here ? We do not need it
There was a problem hiding this comment.
while running the pipeline I was getting failure error "OperationNotAllowed: Operation is not allowed because there's an in progress scale node pool operation". Operation was trying to move forward, while prior operation was still in progress.
begin_create_or_update was already returning a poller but it was being discarded. Adding poller.result (line 473) enforces a wait and check so that the prior operation can finish before moving on to the next. thereby fixing the failure I was receiving.
| wait_condition_type="available", | ||
| resource_name=deployment_name, | ||
| namespace="default", | ||
| timeout_seconds=300 # 5 minutes timeout |
There was a problem hiding this comment.
Can we use the self.step_timeout here
There was a problem hiding this comment.
yes, have removed hardcoded timeout and added self.step_timeout
| Returns: | ||
| True if all deployment creations were successful, False otherwise | ||
| """ | ||
| logger.info(f"Creating {number_of_deployments} deployment(s)") |
There was a problem hiding this comment.
Follow %-style logs for consistency with other methods
There was a problem hiding this comment.
Have refactored this and fixed
| operation_timeout_in_minutes=5, | ||
| namespace="default", | ||
| pod_count=replicas, | ||
| label_selector="app=nginx-container" |
There was a problem hiding this comment.
lets make label-selector derive from the parameters instead of hardcoding on the template and here
There was a problem hiding this comment.
have refactored this and updated deployment file
| resource_type="deployment", | ||
| wait_condition_type="available", | ||
| resource_name=deployment_name, | ||
| namespace="default", |
There was a problem hiding this comment.
pass namespace as parameter and make the default value as default
modules/python/crud/main.py
Outdated
| "deployment", parents=[common_parser], help="create deployments" | ||
| ) | ||
| deployment_parser.add_argument("--node-pool-name", required=True, help="Node pool name") | ||
| deployment_parser.add_argument("--deployment-name", required=True, help="Deployment name") |
There was a problem hiding this comment.
Do we need deployment name ?
There was a problem hiding this comment.
umm probably not, its not mentioned in deploy_kwargs. I'll remove it
modules/python/crud/main.py
Outdated
| deployment_parser.add_argument("--node-pool-name", required=True, help="Node pool name") | ||
| deployment_parser.add_argument("--deployment-name", required=True, help="Deployment name") | ||
| deployment_parser.add_argument( | ||
| "--number_of_deployments", |
There was a problem hiding this comment.
should be number-of-deployments, not number_of_deployments
| "replicas": args.replicas, | ||
| "manifest_dir": args.manifest_dir, | ||
| "number_of_deployments": args.number_of_deployments | ||
| } |
There was a problem hiding this comment.
add else here
else:
logger.error("Unknown workload command: '%s'", command)
return 1
There was a problem hiding this comment.
have added this to stop false successes and return logged errors
| node groups, including create, scale (up/down), and delete operations. It supports | ||
| both direct and progressive scaling operations and handles GPU-enabled node groups. | ||
| node groups, including create, scale (up/down), and delete operations. | ||
| It supports both direct and progressive scaling operations and handles GPU-enabled node groups. |
There was a problem hiding this comment.
Lets revert this change
nginx-container was hardcoded in deployment template and in create deployment method - add label_selector to parameters - replace nginx-container in deployment.yaml (label_alue) - derive label_value from selector - pass label_selector directly
No description provided.