REQ-NF-PERF-BENCHMARKS-001: Performance Benchmarks

## Requirement Information

**Requirement ID**: REQ-NF-PERF-BENCHMARKS-001  
**Type**: Non-Functional Requirement  
**Priority**: P2 (Performance)  
**Status**: Draft  

## Description

The driver MUST provide comprehensive performance benchmarks covering latency, throughput, CPU usage, and memory footprint to establish baseline performance and detect regressions.

### Performance Metrics

**1. Latency Benchmarks**:

**IOCTL Latency** (user-mode to kernel-mode round-trip):
- `IOCTL_PHC_GET_TIME`: P95 <100µs
- `IOCTL_PHC_SET_TIME`: P95 <200µs
- `IOCTL_TAS_GET_CONFIG`: P95 <150µs
- `IOCTL_TAS_SET_CONFIG`: P95 <500µs

Rationale: Low-latency clock access critical for AVB synchronization

**Packet Processing Latency** (interrupt to DPC completion):
- P95 <50µs for 64-byte frames
- Rationale: Fast packet handling prevents queue buildup

**2. Throughput Benchmarks**:

**TCP Performance**:
- Bidirectional: >900 Mbps (Gigabit Ethernet)
- Rationale: Near line-rate performance for bulk data

**UDP Performance**:
- Unidirectional: >950 Mbps, <0.1% packet loss
- Rationale: Minimize overhead for real-time streams

**Small Frame Performance**:
- 64-byte UDP: >148,000 packets/second
- Rationale: AVB audio streams use small frames

**3. CPU Usage**:
- DPC time: <10% (single core) at 500 Mbps
- Interrupt time: <5% (single core) at 500 Mbps
- Total driver overhead: <15% CPU at 500 Mbps

Rationale: Low CPU usage leaves headroom for applications

**4. Memory Footprint**:
- Non-paged pool: <10 MB
- Paged pool: <5 MB
- No memory leaks (stable over 24-hour stress test)

Rationale: Kernel memory is limited resource

### Acceptance Criteria

**AC1: IOCTL Latency**:
- Given 1000 iterations of each IOCTL
- When benchmark executes
- Then P95 latency meets targets (GET <100µs, SET <500µs)
- And results logged to JSON file

**AC2: Throughput**:
- Given iperf3 test for 60 seconds
- When TCP bidirectional traffic runs
- Then throughput >900 Mbps
- And packet loss <0.1%

**AC3: CPU Usage**:
- Given 500 Mbps bidirectional traffic
- When performance monitor samples every second
- Then DPC+Interrupt time <15% CPU
- And CPU usage stable over 5-minute run

**AC4: Memory Footprint**:
- Given 24-hour stress test (900 Mbps continuous)
- When memory pools monitored hourly
- Then driver footprint <15 MB total
- And no leaks detected (Driver Verifier enabled)

**AC5: Regression Detection**:
- Given baseline benchmark results
- When new driver build benchmarked
- Then metrics within ±10% of baseline
- And CI fails if regression >10%

## Traceability

- Traces to: #31 (StR-REQ-001: Stakeholder Requirements Definition)
- Traces to: #127, #128, #185, #180
- Verified by: #253 

## Rationale

**Why This Requirement**:
- Performance directly impacts AVB stream quality (audio/video glitches if slow)
- Baseline metrics enable regression detection (prevent accidental slowdowns)
- CPU/memory limits constrain embedded and industrial systems
- Benchmarks guide optimization efforts (focus on actual bottlenecks)

**Industry Context**:
- IEEE 802.1AS (gPTP): Requires <100µs sync accuracy
- AVB Class A: <2ms end-to-end latency budget
- AVB Class B: <50ms latency budget
- Driver overhead must be small fraction of total budget

**Impact of Non-Compliance**:
- Slow IOCTLs → Applications miss gPTP sync deadlines → Audio/video drift
- High CPU usage → System overload → Packet drops
- Memory leaks → System instability over time → Reboot required
- No regression testing → Silent performance degradation → User complaints

## Implementation Notes

**Benchmarking Tools**:

**1. IOCTL Latency** (custom tool):
```c
// avb_test_i210_um.c (benchmark mode)
void BenchmarkIoctl(HANDLE Device, ULONG Ioctl, ULONG Iterations) {
    LARGE_INTEGER frequency, start, end;
    QueryPerformanceFrequency(&frequency);
    
    ULONGLONG* latencies = malloc(Iterations * sizeof(ULONGLONG));
    
    for (ULONG i = 0; i < Iterations; i++) {
        QueryPerformanceCounter(&start);
        DeviceIoControl(Device, Ioctl, ...);
        QueryPerformanceCounter(&end);
        
        latencies[i] = (end.QuadPart - start.QuadPart) * 1000000 / frequency.QuadPart; // µs
    }
    
    // Calculate P50, P95, P99
    qsort(latencies, Iterations, sizeof(ULONGLONG), compare);
    printf("P50: %llu µs, P95: %llu µs, P99: %llu µs\n",
           latencies[Iterations/2],
           latencies[Iterations*95/100],
           latencies[Iterations*99/100]);
}
```

**2. Throughput** (iperf3):
```bash
# TCP bidirectional
iperf3 -c <server> -t 60 --bidir --json > tcp_throughput.json

# UDP unidirectional
iperf3 -c <server> -u -b 1000M -t 60 --json > udp_throughput.json

# Small frames
iperf3 -c <server> -u -b 100M -l 64 -t 60 --json > small_frames.json
```

**3. CPU Usage** (ETW tracing):
```powershell
# Start trace
xperf -on PROC_THREAD+DPC+INTERRUPT -stackwalk Profile

# Run traffic (5 minutes)
iperf3 -c <server> -t 300 -b 500M

# Stop trace
xperf -d cpu_trace.etl

# Analyze (CPU time by driver)
wpaexporter cpu_trace.etl -outputfolder results
```

**4. Memory Footprint** (WMI query):
```powershell
# Before traffic
$before = Get-WmiObject Win32_PerfRawData_PerfOS_Memory | Select NonPagedBytes

# Run traffic (24 hours)
iperf3 -c <server> -t 86400

# After traffic
$after = Get-WmiObject Win32_PerfRawData_PerfOS_Memory | Select NonPagedBytes

# Driver footprint
$delta = $after.NonPagedBytes - $before.NonPagedBytes
Write-Host "Driver memory: $($delta / 1MB) MB"
```

**Automation** (CI integration):
```yaml
# .github/workflows/performance-benchmarks.yml
name: Performance Benchmarks

on:
  schedule:
    - cron: '0 2 * * *'  # Nightly at 2 AM
  workflow_dispatch:

jobs:
  benchmark:
    runs-on: [self-hosted, performance-test-rig]  # Dedicated hardware
    steps:
      - uses: actions/checkout@v4
      
      - name: Build driver
        run: msbuild IntelAvbFilter.sln /p:Configuration=Release
      
      - name: Install driver
        run: |
          pnputil /add-driver IntelAvbFilter.inf /install
          sc start IntelAvbFilter
      
      - name: Run IOCTL benchmark
        run: ./avb_test_i210_um.exe --benchmark-ioctl --json ioctl_results.json
      
      - name: Run throughput benchmark
        run: |
          iperf3 -c ${{ secrets.PERF_SERVER_IP }} -t 60 --bidir --json > tcp_throughput.json
          iperf3 -c ${{ secrets.PERF_SERVER_IP }} -u -b 1000M -t 60 --json > udp_throughput.json
      
      - name: Analyze results
        run: python scripts/analyze_benchmarks.py --compare-to baseline.json
      
      - name: Upload results
        uses: actions/upload-artifact@v4
        with:
          name: performance-results
          path: |
            ioctl_results.json
            tcp_throughput.json
            udp_throughput.json
      
      - name: Fail on regression
        run: |
          # CI fails if any metric >10% worse than baseline
          python scripts/check_regression.py --threshold 0.10
```

**Regression Detection**:
```python
# scripts/check_regression.py
import json

def check_regression(current, baseline, threshold=0.10):
    """Fail if current performance >10% worse than baseline"""
    for metric, value in current.items():
        baseline_value = baseline.get(metric)
        if baseline_value:
            regression = (value - baseline_value) / baseline_value
            if regression > threshold:
                print(f"❌ Regression detected: {metric} {regression*100:.1f}% worse")
                return False
    return True

with open("current_results.json") as f:
    current = json.load(f)
with open("baseline.json") as f:
    baseline = json.load(f)

if not check_regression(current, baseline):
    exit(1)  # Fail CI
```

## Dependencies

- Dedicated performance test hardware (consistent results)
- iperf3 server (network peer)
- Windows Performance Toolkit (WPT)
- Baseline results (initial benchmark run)

## Risks

- **Hardware Variability**: Results differ across systems → Baseline per test rig
- **Background Noise**: Other processes interfere → Dedicated test system, minimal services
- **False Positives**: Transient spikes trigger false regressions → Use P95, not max latency
- **Mitigation**: Run benchmarks 3x, take median result

## Performance Targets Summary

| Metric                | Target              | Rationale                          |
|-----------------------|---------------------|-----------------------------------|
| IOCTL GET latency     | P95 <100µs          | gPTP sync accuracy                 |
| IOCTL SET latency     | P95 <500µs          | Config changes low-latency         |
| Packet latency        | P95 <50µs           | Minimize queueing delay            |
| TCP throughput        | >900 Mbps           | Near line-rate performance         |
| UDP throughput        | >950 Mbps           | Minimize protocol overhead         |
| Small frame rate      | >148k pps           | AVB audio streams                  |
| CPU usage (500 Mbps)  | <15%                | Leave headroom for applications    |
| Memory footprint      | <15 MB              | Kernel memory limit                |
| Stress test           | 24 hours stable     | Long-running system stability      |

## References

- IEEE 802.1AS-2020 (gPTP timing requirements)
- IEEE 802.1Q-2018 (AVB latency budgets)
- Windows Performance Toolkit: https://learn.microsoft.com/windows-hardware/test/wpt/
- iperf3: https://software.es.net/iperf/

---

**Created**: 2025-12-30  
**Last Updated**: 2025-12-30  
**Status**: Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

REQ-NF-PERF-BENCHMARKS-001: Performance Benchmarks #307

Requirement Information

Description

Performance Metrics

Acceptance Criteria

Traceability

Rationale

Implementation Notes

Dependencies

Risks

Performance Targets Summary

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Metric	Target	Rationale
IOCTL GET latency	P95 <100µs	gPTP sync accuracy
IOCTL SET latency	P95 <500µs	Config changes low-latency
Packet latency	P95 <50µs	Minimize queueing delay
TCP throughput	>900 Mbps	Near line-rate performance
UDP throughput	>950 Mbps	Minimize protocol overhead
Small frame rate	>148k pps	AVB audio streams
CPU usage (500 Mbps)	<15%	Leave headroom for applications
Memory footprint	<15 MB	Kernel memory limit
Stress test	24 hours stable	Long-running system stability

Uh oh!

REQ-NF-PERF-BENCHMARKS-001: Performance Benchmarks #307

Description

Requirement Information

Description

Performance Metrics

Acceptance Criteria

Traceability

Rationale

Implementation Notes

Dependencies

Risks

Performance Targets Summary

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions