Skip to content

Conversation

@vyadavmsft
Copy link
Collaborator

Perf-stable enhancements:

  • Add system settle phase: wait for cloud-init/systemd background work
    and transient CPU load to stabilize before running metrics tests
    (configurable via perf_system_settle_* knobs)
  • Add per-test policies: network tests now get a small warmup to reduce
    cold-start variance (ARP/route cache, initial softirq scheduling)
  • Refine storage warmup: skip pmem devices (only validate presence),
    keep dd warmup for block devices targeting actual metrics disk
  • Improve host state diagnostics: add numa topology, cpu placement,
    and mpstat/sar/pidstat output to read-back logging

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances the Cloud Hypervisor performance testing framework to reduce variance in metrics tests through three key improvements: system stabilization before testing, per-test network warmup policies, and refined storage warmup logic.

Key changes:

  • Adds a configurable system settle phase that waits for cloud-init/systemd background services and CPU load to stabilize before running metrics tests
  • Implements per-test policies with network-specific warmup to reduce cold-start variance
  • Refines storage warmup to properly handle pmem devices (verify presence only) vs block devices (perform dd warmup)
Comments suppressed due to low confidence (1)

lisa/microsoft/testsuites/cloud_hypervisor/ch_tests_tool.py:1567

  • The new system settle configuration parameters are not included in the perf knobs logging. These parameters should be added to the knobs dictionary to ensure complete reproducibility tracking across runs:
  • perf_system_settle_enabled
  • perf_system_settle_timeout_s
  • perf_system_settle_load_threshold
  • perf_system_settle_stable_seconds

This is important for debugging and comparing results across test runs with different settle configurations.

        knobs: Dict[str, Any] = {
            "perf_stable_enabled": self.perf_stable_enabled,
            "perf_numa_node": self.perf_numa_node,
            "perf_warmup_seconds": self.perf_warmup_seconds,
            "perf_mq_test_timeout": self.perf_mq_test_timeout,
            "perf_block_policy": self.perf_block_policy,
            "perf_read_cache_policy": self.perf_read_cache_policy,
            "perf_mtu": self.perf_mtu,
        }

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (1)

lisa/microsoft/testsuites/cloud_hypervisor/ch_tests_tool.py:1567

  • The new system settle configuration parameters (perf_system_settle_enabled, perf_system_settle_timeout_s, perf_system_settle_load_threshold, perf_system_settle_stable_seconds) are not included in the logged knobs dictionary. These should be added to ensure complete reproducibility tracking as mentioned in the function's docstring.
        knobs: Dict[str, Any] = {
            "perf_stable_enabled": self.perf_stable_enabled,
            "perf_numa_node": self.perf_numa_node,
            "perf_warmup_seconds": self.perf_warmup_seconds,
            "perf_mq_test_timeout": self.perf_mq_test_timeout,
            "perf_block_policy": self.perf_block_policy,
            "perf_read_cache_policy": self.perf_read_cache_policy,
            "perf_mtu": self.perf_mtu,
        }

@vyadavmsft vyadavmsft force-pushed the ch-perf-stable-metrics-fixes branch from b69495b to 8d78c21 Compare December 22, 2025 00:53
@vyadavmsft vyadavmsft force-pushed the ch-perf-stable-metrics-fixes branch from 8d78c21 to b4accc6 Compare December 22, 2025 01:03
@vyadavmsft
Copy link
Collaborator Author

@LiliDeng can you pls check this.

try:
cpu_count = max(1, int((cpu_count_raw or "1").strip().splitlines()[0]))
except Exception as exc:
self._log.warning(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't use warning


self.node.execute(cmd, shell=True, sudo=True)

def _system_settle(self) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

=> _settle_system

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants