Requirement Information
Requirement ID: REQ-NF-PERF-BENCHMARKS-001
Type: Non-Functional Requirement
Priority: P2 (Performance)
Status: Draft
Description
The driver MUST provide comprehensive performance benchmarks covering latency, throughput, CPU usage, and memory footprint to establish baseline performance and detect regressions.
Performance Metrics
1. Latency Benchmarks:
IOCTL Latency (user-mode to kernel-mode round-trip):
IOCTL_PHC_GET_TIME: P95 <100µs
IOCTL_PHC_SET_TIME: P95 <200µs
IOCTL_TAS_GET_CONFIG: P95 <150µs
IOCTL_TAS_SET_CONFIG: P95 <500µs
Rationale: Low-latency clock access critical for AVB synchronization
Packet Processing Latency (interrupt to DPC completion):
- P95 <50µs for 64-byte frames
- Rationale: Fast packet handling prevents queue buildup
2. Throughput Benchmarks:
TCP Performance:
- Bidirectional: >900 Mbps (Gigabit Ethernet)
- Rationale: Near line-rate performance for bulk data
UDP Performance:
- Unidirectional: >950 Mbps, <0.1% packet loss
- Rationale: Minimize overhead for real-time streams
Small Frame Performance:
- 64-byte UDP: >148,000 packets/second
- Rationale: AVB audio streams use small frames
3. CPU Usage:
- DPC time: <10% (single core) at 500 Mbps
- Interrupt time: <5% (single core) at 500 Mbps
- Total driver overhead: <15% CPU at 500 Mbps
Rationale: Low CPU usage leaves headroom for applications
4. Memory Footprint:
- Non-paged pool: <10 MB
- Paged pool: <5 MB
- No memory leaks (stable over 24-hour stress test)
Rationale: Kernel memory is limited resource
Acceptance Criteria
AC1: IOCTL Latency:
- Given 1000 iterations of each IOCTL
- When benchmark executes
- Then P95 latency meets targets (GET <100µs, SET <500µs)
- And results logged to JSON file
AC2: Throughput:
- Given iperf3 test for 60 seconds
- When TCP bidirectional traffic runs
- Then throughput >900 Mbps
- And packet loss <0.1%
AC3: CPU Usage:
- Given 500 Mbps bidirectional traffic
- When performance monitor samples every second
- Then DPC+Interrupt time <15% CPU
- And CPU usage stable over 5-minute run
AC4: Memory Footprint:
- Given 24-hour stress test (900 Mbps continuous)
- When memory pools monitored hourly
- Then driver footprint <15 MB total
- And no leaks detected (Driver Verifier enabled)
AC5: Regression Detection:
- Given baseline benchmark results
- When new driver build benchmarked
- Then metrics within ±10% of baseline
- And CI fails if regression >10%
Traceability
Rationale
Why This Requirement:
- Performance directly impacts AVB stream quality (audio/video glitches if slow)
- Baseline metrics enable regression detection (prevent accidental slowdowns)
- CPU/memory limits constrain embedded and industrial systems
- Benchmarks guide optimization efforts (focus on actual bottlenecks)
Industry Context:
- IEEE 802.1AS (gPTP): Requires <100µs sync accuracy
- AVB Class A: <2ms end-to-end latency budget
- AVB Class B: <50ms latency budget
- Driver overhead must be small fraction of total budget
Impact of Non-Compliance:
- Slow IOCTLs → Applications miss gPTP sync deadlines → Audio/video drift
- High CPU usage → System overload → Packet drops
- Memory leaks → System instability over time → Reboot required
- No regression testing → Silent performance degradation → User complaints
Implementation Notes
Benchmarking Tools:
1. IOCTL Latency (custom tool):
// avb_test_i210_um.c (benchmark mode)
void BenchmarkIoctl(HANDLE Device, ULONG Ioctl, ULONG Iterations) {
LARGE_INTEGER frequency, start, end;
QueryPerformanceFrequency(&frequency);
ULONGLONG* latencies = malloc(Iterations * sizeof(ULONGLONG));
for (ULONG i = 0; i < Iterations; i++) {
QueryPerformanceCounter(&start);
DeviceIoControl(Device, Ioctl, ...);
QueryPerformanceCounter(&end);
latencies[i] = (end.QuadPart - start.QuadPart) * 1000000 / frequency.QuadPart; // µs
}
// Calculate P50, P95, P99
qsort(latencies, Iterations, sizeof(ULONGLONG), compare);
printf("P50: %llu µs, P95: %llu µs, P99: %llu µs\n",
latencies[Iterations/2],
latencies[Iterations*95/100],
latencies[Iterations*99/100]);
}
2. Throughput (iperf3):
# TCP bidirectional
iperf3 -c <server> -t 60 --bidir --json > tcp_throughput.json
# UDP unidirectional
iperf3 -c <server> -u -b 1000M -t 60 --json > udp_throughput.json
# Small frames
iperf3 -c <server> -u -b 100M -l 64 -t 60 --json > small_frames.json
3. CPU Usage (ETW tracing):
# Start trace
xperf -on PROC_THREAD+DPC+INTERRUPT -stackwalk Profile
# Run traffic (5 minutes)
iperf3 -c <server> -t 300 -b 500M
# Stop trace
xperf -d cpu_trace.etl
# Analyze (CPU time by driver)
wpaexporter cpu_trace.etl -outputfolder results
4. Memory Footprint (WMI query):
# Before traffic
$before = Get-WmiObject Win32_PerfRawData_PerfOS_Memory | Select NonPagedBytes
# Run traffic (24 hours)
iperf3 -c <server> -t 86400
# After traffic
$after = Get-WmiObject Win32_PerfRawData_PerfOS_Memory | Select NonPagedBytes
# Driver footprint
$delta = $after.NonPagedBytes - $before.NonPagedBytes
Write-Host "Driver memory: $($delta / 1MB) MB"
Automation (CI integration):
# .github/workflows/performance-benchmarks.yml
name: Performance Benchmarks
on:
schedule:
- cron: '0 2 * * *' # Nightly at 2 AM
workflow_dispatch:
jobs:
benchmark:
runs-on: [self-hosted, performance-test-rig] # Dedicated hardware
steps:
- uses: actions/checkout@v4
- name: Build driver
run: msbuild IntelAvbFilter.sln /p:Configuration=Release
- name: Install driver
run: |
pnputil /add-driver IntelAvbFilter.inf /install
sc start IntelAvbFilter
- name: Run IOCTL benchmark
run: ./avb_test_i210_um.exe --benchmark-ioctl --json ioctl_results.json
- name: Run throughput benchmark
run: |
iperf3 -c ${{ secrets.PERF_SERVER_IP }} -t 60 --bidir --json > tcp_throughput.json
iperf3 -c ${{ secrets.PERF_SERVER_IP }} -u -b 1000M -t 60 --json > udp_throughput.json
- name: Analyze results
run: python scripts/analyze_benchmarks.py --compare-to baseline.json
- name: Upload results
uses: actions/upload-artifact@v4
with:
name: performance-results
path: |
ioctl_results.json
tcp_throughput.json
udp_throughput.json
- name: Fail on regression
run: |
# CI fails if any metric >10% worse than baseline
python scripts/check_regression.py --threshold 0.10
Regression Detection:
# scripts/check_regression.py
import json
def check_regression(current, baseline, threshold=0.10):
"""Fail if current performance >10% worse than baseline"""
for metric, value in current.items():
baseline_value = baseline.get(metric)
if baseline_value:
regression = (value - baseline_value) / baseline_value
if regression > threshold:
print(f"❌ Regression detected: {metric} {regression*100:.1f}% worse")
return False
return True
with open("current_results.json") as f:
current = json.load(f)
with open("baseline.json") as f:
baseline = json.load(f)
if not check_regression(current, baseline):
exit(1) # Fail CI
Dependencies
- Dedicated performance test hardware (consistent results)
- iperf3 server (network peer)
- Windows Performance Toolkit (WPT)
- Baseline results (initial benchmark run)
Risks
- Hardware Variability: Results differ across systems → Baseline per test rig
- Background Noise: Other processes interfere → Dedicated test system, minimal services
- False Positives: Transient spikes trigger false regressions → Use P95, not max latency
- Mitigation: Run benchmarks 3x, take median result
Performance Targets Summary
| Metric |
Target |
Rationale |
| IOCTL GET latency |
P95 <100µs |
gPTP sync accuracy |
| IOCTL SET latency |
P95 <500µs |
Config changes low-latency |
| Packet latency |
P95 <50µs |
Minimize queueing delay |
| TCP throughput |
>900 Mbps |
Near line-rate performance |
| UDP throughput |
>950 Mbps |
Minimize protocol overhead |
| Small frame rate |
>148k pps |
AVB audio streams |
| CPU usage (500 Mbps) |
<15% |
Leave headroom for applications |
| Memory footprint |
<15 MB |
Kernel memory limit |
| Stress test |
24 hours stable |
Long-running system stability |
References
Created: 2025-12-30
Last Updated: 2025-12-30
Status: Draft
Requirement Information
Requirement ID: REQ-NF-PERF-BENCHMARKS-001
Type: Non-Functional Requirement
Priority: P2 (Performance)
Status: Draft
Description
The driver MUST provide comprehensive performance benchmarks covering latency, throughput, CPU usage, and memory footprint to establish baseline performance and detect regressions.
Performance Metrics
1. Latency Benchmarks:
IOCTL Latency (user-mode to kernel-mode round-trip):
IOCTL_PHC_GET_TIME: P95 <100µsIOCTL_PHC_SET_TIME: P95 <200µsIOCTL_TAS_GET_CONFIG: P95 <150µsIOCTL_TAS_SET_CONFIG: P95 <500µsRationale: Low-latency clock access critical for AVB synchronization
Packet Processing Latency (interrupt to DPC completion):
2. Throughput Benchmarks:
TCP Performance:
UDP Performance:
Small Frame Performance:
3. CPU Usage:
Rationale: Low CPU usage leaves headroom for applications
4. Memory Footprint:
Rationale: Kernel memory is limited resource
Acceptance Criteria
AC1: IOCTL Latency:
AC2: Throughput:
AC3: CPU Usage:
AC4: Memory Footprint:
AC5: Regression Detection:
Traceability
Rationale
Why This Requirement:
Industry Context:
Impact of Non-Compliance:
Implementation Notes
Benchmarking Tools:
1. IOCTL Latency (custom tool):
2. Throughput (iperf3):
3. CPU Usage (ETW tracing):
4. Memory Footprint (WMI query):
Automation (CI integration):
Regression Detection:
Dependencies
Risks
Performance Targets Summary
References
Created: 2025-12-30
Last Updated: 2025-12-30
Status: Draft