Skip to content

REQ-NF-PERF-BENCHMARKS-001: Performance Benchmarks #307

Description

@zarfld

Requirement Information

Requirement ID: REQ-NF-PERF-BENCHMARKS-001
Type: Non-Functional Requirement
Priority: P2 (Performance)
Status: Draft

Description

The driver MUST provide comprehensive performance benchmarks covering latency, throughput, CPU usage, and memory footprint to establish baseline performance and detect regressions.

Performance Metrics

1. Latency Benchmarks:

IOCTL Latency (user-mode to kernel-mode round-trip):

  • IOCTL_PHC_GET_TIME: P95 <100µs
  • IOCTL_PHC_SET_TIME: P95 <200µs
  • IOCTL_TAS_GET_CONFIG: P95 <150µs
  • IOCTL_TAS_SET_CONFIG: P95 <500µs

Rationale: Low-latency clock access critical for AVB synchronization

Packet Processing Latency (interrupt to DPC completion):

  • P95 <50µs for 64-byte frames
  • Rationale: Fast packet handling prevents queue buildup

2. Throughput Benchmarks:

TCP Performance:

  • Bidirectional: >900 Mbps (Gigabit Ethernet)
  • Rationale: Near line-rate performance for bulk data

UDP Performance:

  • Unidirectional: >950 Mbps, <0.1% packet loss
  • Rationale: Minimize overhead for real-time streams

Small Frame Performance:

  • 64-byte UDP: >148,000 packets/second
  • Rationale: AVB audio streams use small frames

3. CPU Usage:

  • DPC time: <10% (single core) at 500 Mbps
  • Interrupt time: <5% (single core) at 500 Mbps
  • Total driver overhead: <15% CPU at 500 Mbps

Rationale: Low CPU usage leaves headroom for applications

4. Memory Footprint:

  • Non-paged pool: <10 MB
  • Paged pool: <5 MB
  • No memory leaks (stable over 24-hour stress test)

Rationale: Kernel memory is limited resource

Acceptance Criteria

AC1: IOCTL Latency:

  • Given 1000 iterations of each IOCTL
  • When benchmark executes
  • Then P95 latency meets targets (GET <100µs, SET <500µs)
  • And results logged to JSON file

AC2: Throughput:

  • Given iperf3 test for 60 seconds
  • When TCP bidirectional traffic runs
  • Then throughput >900 Mbps
  • And packet loss <0.1%

AC3: CPU Usage:

  • Given 500 Mbps bidirectional traffic
  • When performance monitor samples every second
  • Then DPC+Interrupt time <15% CPU
  • And CPU usage stable over 5-minute run

AC4: Memory Footprint:

  • Given 24-hour stress test (900 Mbps continuous)
  • When memory pools monitored hourly
  • Then driver footprint <15 MB total
  • And no leaks detected (Driver Verifier enabled)

AC5: Regression Detection:

  • Given baseline benchmark results
  • When new driver build benchmarked
  • Then metrics within ±10% of baseline
  • And CI fails if regression >10%

Traceability

Rationale

Why This Requirement:

  • Performance directly impacts AVB stream quality (audio/video glitches if slow)
  • Baseline metrics enable regression detection (prevent accidental slowdowns)
  • CPU/memory limits constrain embedded and industrial systems
  • Benchmarks guide optimization efforts (focus on actual bottlenecks)

Industry Context:

  • IEEE 802.1AS (gPTP): Requires <100µs sync accuracy
  • AVB Class A: <2ms end-to-end latency budget
  • AVB Class B: <50ms latency budget
  • Driver overhead must be small fraction of total budget

Impact of Non-Compliance:

  • Slow IOCTLs → Applications miss gPTP sync deadlines → Audio/video drift
  • High CPU usage → System overload → Packet drops
  • Memory leaks → System instability over time → Reboot required
  • No regression testing → Silent performance degradation → User complaints

Implementation Notes

Benchmarking Tools:

1. IOCTL Latency (custom tool):

// avb_test_i210_um.c (benchmark mode)
void BenchmarkIoctl(HANDLE Device, ULONG Ioctl, ULONG Iterations) {
    LARGE_INTEGER frequency, start, end;
    QueryPerformanceFrequency(&frequency);
    
    ULONGLONG* latencies = malloc(Iterations * sizeof(ULONGLONG));
    
    for (ULONG i = 0; i < Iterations; i++) {
        QueryPerformanceCounter(&start);
        DeviceIoControl(Device, Ioctl, ...);
        QueryPerformanceCounter(&end);
        
        latencies[i] = (end.QuadPart - start.QuadPart) * 1000000 / frequency.QuadPart; // µs
    }
    
    // Calculate P50, P95, P99
    qsort(latencies, Iterations, sizeof(ULONGLONG), compare);
    printf("P50: %llu µs, P95: %llu µs, P99: %llu µs\n",
           latencies[Iterations/2],
           latencies[Iterations*95/100],
           latencies[Iterations*99/100]);
}

2. Throughput (iperf3):

# TCP bidirectional
iperf3 -c <server> -t 60 --bidir --json > tcp_throughput.json

# UDP unidirectional
iperf3 -c <server> -u -b 1000M -t 60 --json > udp_throughput.json

# Small frames
iperf3 -c <server> -u -b 100M -l 64 -t 60 --json > small_frames.json

3. CPU Usage (ETW tracing):

# Start trace
xperf -on PROC_THREAD+DPC+INTERRUPT -stackwalk Profile

# Run traffic (5 minutes)
iperf3 -c <server> -t 300 -b 500M

# Stop trace
xperf -d cpu_trace.etl

# Analyze (CPU time by driver)
wpaexporter cpu_trace.etl -outputfolder results

4. Memory Footprint (WMI query):

# Before traffic
$before = Get-WmiObject Win32_PerfRawData_PerfOS_Memory | Select NonPagedBytes

# Run traffic (24 hours)
iperf3 -c <server> -t 86400

# After traffic
$after = Get-WmiObject Win32_PerfRawData_PerfOS_Memory | Select NonPagedBytes

# Driver footprint
$delta = $after.NonPagedBytes - $before.NonPagedBytes
Write-Host "Driver memory: $($delta / 1MB) MB"

Automation (CI integration):

# .github/workflows/performance-benchmarks.yml
name: Performance Benchmarks

on:
  schedule:
    - cron: '0 2 * * *'  # Nightly at 2 AM
  workflow_dispatch:

jobs:
  benchmark:
    runs-on: [self-hosted, performance-test-rig]  # Dedicated hardware
    steps:
      - uses: actions/checkout@v4
      
      - name: Build driver
        run: msbuild IntelAvbFilter.sln /p:Configuration=Release
      
      - name: Install driver
        run: |
          pnputil /add-driver IntelAvbFilter.inf /install
          sc start IntelAvbFilter
      
      - name: Run IOCTL benchmark
        run: ./avb_test_i210_um.exe --benchmark-ioctl --json ioctl_results.json
      
      - name: Run throughput benchmark
        run: |
          iperf3 -c ${{ secrets.PERF_SERVER_IP }} -t 60 --bidir --json > tcp_throughput.json
          iperf3 -c ${{ secrets.PERF_SERVER_IP }} -u -b 1000M -t 60 --json > udp_throughput.json
      
      - name: Analyze results
        run: python scripts/analyze_benchmarks.py --compare-to baseline.json
      
      - name: Upload results
        uses: actions/upload-artifact@v4
        with:
          name: performance-results
          path: |
            ioctl_results.json
            tcp_throughput.json
            udp_throughput.json
      
      - name: Fail on regression
        run: |
          # CI fails if any metric >10% worse than baseline
          python scripts/check_regression.py --threshold 0.10

Regression Detection:

# scripts/check_regression.py
import json

def check_regression(current, baseline, threshold=0.10):
    """Fail if current performance >10% worse than baseline"""
    for metric, value in current.items():
        baseline_value = baseline.get(metric)
        if baseline_value:
            regression = (value - baseline_value) / baseline_value
            if regression > threshold:
                print(f"❌ Regression detected: {metric} {regression*100:.1f}% worse")
                return False
    return True

with open("current_results.json") as f:
    current = json.load(f)
with open("baseline.json") as f:
    baseline = json.load(f)

if not check_regression(current, baseline):
    exit(1)  # Fail CI

Dependencies

  • Dedicated performance test hardware (consistent results)
  • iperf3 server (network peer)
  • Windows Performance Toolkit (WPT)
  • Baseline results (initial benchmark run)

Risks

  • Hardware Variability: Results differ across systems → Baseline per test rig
  • Background Noise: Other processes interfere → Dedicated test system, minimal services
  • False Positives: Transient spikes trigger false regressions → Use P95, not max latency
  • Mitigation: Run benchmarks 3x, take median result

Performance Targets Summary

Metric Target Rationale
IOCTL GET latency P95 <100µs gPTP sync accuracy
IOCTL SET latency P95 <500µs Config changes low-latency
Packet latency P95 <50µs Minimize queueing delay
TCP throughput >900 Mbps Near line-rate performance
UDP throughput >950 Mbps Minimize protocol overhead
Small frame rate >148k pps AVB audio streams
CPU usage (500 Mbps) <15% Leave headroom for applications
Memory footprint <15 MB Kernel memory limit
Stress test 24 hours stable Long-running system stability

References


Created: 2025-12-30
Last Updated: 2025-12-30
Status: Draft

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions