-
Notifications
You must be signed in to change notification settings - Fork 225
ch: tighten perf-stable metrics policies and logging #4167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR enhances the Cloud Hypervisor performance testing framework to reduce variance in metrics tests through three key improvements: system stabilization before testing, per-test network warmup policies, and refined storage warmup logic.
Key changes:
- Adds a configurable system settle phase that waits for cloud-init/systemd background services and CPU load to stabilize before running metrics tests
- Implements per-test policies with network-specific warmup to reduce cold-start variance
- Refines storage warmup to properly handle pmem devices (verify presence only) vs block devices (perform dd warmup)
Comments suppressed due to low confidence (1)
lisa/microsoft/testsuites/cloud_hypervisor/ch_tests_tool.py:1567
- The new system settle configuration parameters are not included in the perf knobs logging. These parameters should be added to the knobs dictionary to ensure complete reproducibility tracking across runs:
- perf_system_settle_enabled
- perf_system_settle_timeout_s
- perf_system_settle_load_threshold
- perf_system_settle_stable_seconds
This is important for debugging and comparing results across test runs with different settle configurations.
knobs: Dict[str, Any] = {
"perf_stable_enabled": self.perf_stable_enabled,
"perf_numa_node": self.perf_numa_node,
"perf_warmup_seconds": self.perf_warmup_seconds,
"perf_mq_test_timeout": self.perf_mq_test_timeout,
"perf_block_policy": self.perf_block_policy,
"perf_read_cache_policy": self.perf_read_cache_policy,
"perf_mtu": self.perf_mtu,
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (1)
lisa/microsoft/testsuites/cloud_hypervisor/ch_tests_tool.py:1567
- The new system settle configuration parameters (perf_system_settle_enabled, perf_system_settle_timeout_s, perf_system_settle_load_threshold, perf_system_settle_stable_seconds) are not included in the logged knobs dictionary. These should be added to ensure complete reproducibility tracking as mentioned in the function's docstring.
knobs: Dict[str, Any] = {
"perf_stable_enabled": self.perf_stable_enabled,
"perf_numa_node": self.perf_numa_node,
"perf_warmup_seconds": self.perf_warmup_seconds,
"perf_mq_test_timeout": self.perf_mq_test_timeout,
"perf_block_policy": self.perf_block_policy,
"perf_read_cache_policy": self.perf_read_cache_policy,
"perf_mtu": self.perf_mtu,
}
b69495b to
8d78c21
Compare
8d78c21 to
b4accc6
Compare
|
@LiliDeng can you pls check this. |
| try: | ||
| cpu_count = max(1, int((cpu_count_raw or "1").strip().splitlines()[0])) | ||
| except Exception as exc: | ||
| self._log.warning( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't use warning
|
|
||
| self.node.execute(cmd, shell=True, sudo=True) | ||
|
|
||
| def _system_settle(self) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
=> _settle_system
Perf-stable enhancements:
and transient CPU load to stabilize before running metrics tests
(configurable via perf_system_settle_* knobs)
cold-start variance (ARP/route cache, initial softirq scheduling)
keep dd warmup for block devices targeting actual metrics disk
and mpstat/sar/pidstat output to read-back logging