initial update by engineeredcurlz · Pull Request #879 · Azure/telescope

engineeredcurlz · 2025-09-26T21:18:02Z

No description provided.

…to test-refactor

engineeredcurlz · 2025-11-13T13:40:16Z

@microsoft-github-policy-service agree company="Microsoft"

…down

…ations

yaml.safe_load_all() enters an infinite loop when passed a MagicMock object because PyYAML detects the .read attribute and treats it as a file-like stream, then loops forever waiting to buffer enough bytes (len(MagicMock()) returns 0 by default). Fix by setting create_template.return_value to a valid YAML string in the three create_deployment tests, so yaml.safe_load_all receives a real string and parses it via the non-blocking code path. Affected tests: - test_create_deployment_success - test_create_deployment_failure - test_create_deployment_partial_success

begin_create_or_update() returns an LROPoller that was being discarded, allowing execution to continue while Azure still had an operation in-progress. Subsequent scale/delete calls were then rejected with OperationNotAllowed. Fix by calling poller.result() in scale_node_pool and _progressive_scale to block until Azure fully completes each operation before proceeding.

lokesh-keyan · 2026-03-20T15:02:45Z

modules/python/clients/aks_client.py

                    )
                    node_pool.count = step  # Update node count in the node pool object
-                    result = self.aks_client.agent_pools.begin_create_or_update(
+                    poller = self.aks_client.agent_pools.begin_create_or_update(


Why did we add poller here ? We do not need it

while running the pipeline I was getting failure error "OperationNotAllowed: Operation is not allowed because there's an in progress scale node pool operation". Operation was trying to move forward, while prior operation was still in progress.

begin_create_or_update was already returning a poller but it was being discarded. Adding poller.result (line 473) enforces a wait and check so that the prior operation can finish before moving on to the next. thereby fixing the failure I was receiving.

lokesh-keyan · 2026-03-20T15:14:06Z

modules/python/crud/azure/node_pool_crud.py

+                        wait_condition_type="available",
+                        resource_name=deployment_name,
+                        namespace="default",
+                        timeout_seconds=300  # 5 minutes timeout


Can we use the self.step_timeout here

yes, have removed hardcoded timeout and added self.step_timeout

lokesh-keyan · 2026-03-20T15:15:15Z

modules/python/crud/azure/node_pool_crud.py

+        Returns:
+            True if all deployment creations were successful, False otherwise
+        """
+        logger.info(f"Creating {number_of_deployments} deployment(s)")


Follow %-style logs for consistency with other methods

Have refactored this and fixed

lokesh-keyan · 2026-03-20T15:21:30Z

modules/python/crud/azure/node_pool_crud.py

+                            operation_timeout_in_minutes=5,
+                            namespace="default",
+                            pod_count=replicas,
+                            label_selector="app=nginx-container"


lets make label-selector derive from the parameters instead of hardcoding on the template and here

have refactored this and updated deployment file

lokesh-keyan · 2026-03-20T15:22:00Z

modules/python/crud/azure/node_pool_crud.py

+                        resource_type="deployment",
+                        wait_condition_type="available",
+                        resource_name=deployment_name,
+                        namespace="default",


pass namespace as parameter and make the default value as default

refactored this

lokesh-keyan · 2026-03-20T15:24:17Z

modules/python/crud/main.py

+        "deployment", parents=[common_parser], help="create deployments"
+    )
+    deployment_parser.add_argument("--node-pool-name", required=True, help="Node pool name")
+    deployment_parser.add_argument("--deployment-name", required=True, help="Deployment name")


Do we need deployment name ?

umm probably not, its not mentioned in deploy_kwargs. I'll remove it

lokesh-keyan · 2026-03-20T15:25:04Z

modules/python/crud/main.py

+    deployment_parser.add_argument("--node-pool-name", required=True, help="Node pool name")
+    deployment_parser.add_argument("--deployment-name", required=True, help="Deployment name")
+    deployment_parser.add_argument(
+        "--number_of_deployments",


should be number-of-deployments, not number_of_deployments

fixed the hyphen

lokesh-keyan · 2026-03-20T15:27:24Z

modules/python/crud/main.py

+                "replicas": args.replicas,
+                "manifest_dir": args.manifest_dir,
+                "number_of_deployments": args.number_of_deployments
+            }


add else here

else:
logger.error("Unknown workload command: '%s'", command)
return 1

have added this to stop false successes and return logged errors

lokesh-keyan · 2026-03-20T15:32:53Z

modules/python/crud/aws/node_pool_crud.py

-node groups, including create, scale (up/down), and delete operations. It supports
-both direct and progressive scaling operations and handles GPU-enabled node groups.
+node groups, including create, scale (up/down), and delete operations.
+It supports both direct and progressive scaling operations and handles GPU-enabled node groups.


Lets revert this change

restored line wrapping

…yment

nginx-container was hardcoded in deployment template and in create deployment method - add label_selector to parameters - replace nginx-container in deployment.yaml (label_alue) - derive label_value from selector - pass label_selector directly

engineeredcurlz and others added 23 commits September 26, 2025 17:16

initial update

984bfb2

Merge branch 'main' into test-refactor

aeab336

wip: add create_deployment function to crud

032f59b

add import for handle_worload_operation function

bb4bca8

add test for success

47caf3c

change operation name

c435e36

update operation name in test

11be2fc

add test for failure

5e73464

add exception test

8530049

Merge branch 'main' into test-refactor

b724860

Linting error: removed elif and else

4e7a73b

Merge branch 'test-refactor' of https://github.com/Azure/telescope in…

69ce1e1

…to test-refactor

fixed the spacing

7a8dad4

removed extra spaces

5e52b71

Add deployment_name for consistency and to reference later

364c264

verify deployment using wait condition

0bc8275

Add logging for maniest and to wait for deployment - debug

6604ac0

add logger for deployment success

3e6beea

verify pods are available in deployment

ae99b54

add failure count

6623cb2

add logger to verify deployment

8916a99

add unit test for create_deployment method

a4273ac

ran lint

bf6143a

engineeredcurlz added 6 commits March 2, 2026 12:26

Add test for deployment partial sucess

712939e

Add test for multiple deployments

b6f248c

Add test for progressive scaling failure

e0d8037

Add test in node_pool_crud for returns false early exit

e6feced

Add test in node_pool_crud for scale up fails but continues to scale …

e3d142d

…down

Add test for node_pool_crud for scale down fails operation continues

769573e

engineeredcurlz added 11 commits March 17, 2026 15:01

correct deployment command routing and kwargs in handle_workload_oper…

3ca87d7

…ations

correct topology name and add deployment matrix variables

8ceb256

update handle_workload_operations tests to match deployment command

b8f876e

fix yamllint and pylint warnings

767f0cc

add correct indentation

d9ac29f

iterate multi-doc YAML generator when applying deployment manifests

e6ccf1f

refactor: seperate deploy workloads into its own pipelinee step

eae0409

fix: execute k8s workload operations displayname

3a6a9b0

Merge branch 'main' into test-refactor

d0b6578

engineeredcurlz requested a review from lokesh-keyan March 18, 2026 23:21

lokesh-keyan reviewed Mar 20, 2026

View reviewed changes

engineeredcurlz added 9 commits March 26, 2026 14:58

fix: replace hardcoded timeout with self.step_timeout in create_deplo…

6b7a448

…yment

refactor: convert f-string logger calls to %-style in create_deployment

8e3445c

feat: remove hardcoding add namespace parameter

7522f1a

fix: remove --deployment-name CLI

65fdd00

fix: use hyphen for --number-of-deployments

b0be1b1

fix: return error on unknown workload command

c5d01be

revert: restore original docstring line wrapping

946dea9

Merge branch 'main' into test-refactor

e349428

Conversation

engineeredcurlz commented Sep 26, 2025

Uh oh!

engineeredcurlz commented Nov 13, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants