Skip to content

Magnetic upwind kernel optimization #1

@ekeever1

Description

@ekeever1

The cudaMagW kernels are partially optimized but at least a few register variables can be eliminated. The Y and Z kernels only require 2 of the 3 components present in dims which could eliminate one more register.

If a third tile of shared memory is not too much [check with nvidia's occupancy calculator spreadsheet] at least one __syncthreads() can be eliminated from the YZ kernels. Attempts to rewrite the algorithm to eliminate conditions using exact math ops (x+0, x-x and x*1 evaluate exactly in ieee754) are encouraged.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions