Skip to content

SIMD diagonal fill (SSE4.2) #71

@crashfrog

Description

@crashfrog

Parent

#58 (Phase 1 MVP: Evidence-Informed Alignment with Deferred Filtering)

What to build

Implement SSE4.2 SIMD-accelerated diagonal fill for WFA in phraya-align (x86_64 only). Use runtime CPU feature detection via is_x86_feature_detected!("sse4.2") to dispatch between SIMD and scalar implementations. Document all unsafe blocks with SAFETY invariants.

SIMD diagonal fill processes multiple cells in parallel, improving performance on modern x86 CPUs. Scalar fallback ensures correctness on older CPUs.

Acceptance criteria

  • wfa_diagonal_fill_sse42(diagonal: &mut [i32]) using SSE4.2 intrinsics
  • Runtime CPU detection: use SIMD if available, scalar fallback otherwise
  • All unsafe blocks documented with SAFETY comments
  • Tests: correctness (SIMD result == scalar result)
  • Tests: SIMD enabled path (requires SSE4.2 CPU or emulation)
  • Tests: scalar fallback path (force disable SIMD)
  • Tests: alignment correctness vs baseline scalar WFA
  • Benchmark: 10kb alignment (scalar vs SIMD, measure speedup)
  • Cross-platform: no-op on non-x86_64 (ARM, etc.)

Blocked by

#70 (WFA base implementation)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions