Skip to content

vs_3_0/GLSL: normalize() fed by a div-by-zero-guard ternary returns NaN (0*Inf in flattened select); vs_4_0 correct #85

Description

@ymiroshnyk

MojoShader vs_3_0 → GLSL: flattened fallback ternary computes inversesqrt(0)=Inf, then 0 * Inf = NaN poisons the selected (else) value

Component: MojoShader (HLSL bytecode → GLSL translator) as shipped in MonoGame DesktopGL.
Severity: Correctness — a normalize() whose argument comes from a "length>0 ? normalized : fallback" ternary returns NaN on OpenGL, silently culling geometry. The byte-identical DirectX (vs_4_0) build is correct.
Status: Reproduced with a 4-line minimal standalone shader (below), root-caused down to the generated GLSL.


1. TL;DR

This extremely common HLSL idiom — normalize a vector but guard the zero-length case with a ternary —

float2 d   = seed * flag;                 // can be the zero vector
float  len = length(d);
float2 dir = (len > 1e-6) ? (d / len) : seed;   // guard div-by-zero
float2 n   = normalize(dir);

returns n = (NaN, NaN) on the OpenGL (vs_3_0/MojoShader) backend whenever the else arm is
taken (i.e. when d is the zero vector), even though dir and n are perfectly finite on
DirectX (vs_4_0) from the same .fx.

Root cause (visible in the generated GLSL, §5): MojoShader flattens the ternary into a
branchless select that evaluates both arms unconditionally. The then arm d / len is emitted as
d * inversesqrt(dot(d,d)). When d == 0, dot(d,d) == 0, so inversesqrt(0) == +Inf, and
0 * Inf == NaN. The select is mask*then + (1-mask)*else; with mask == 0 it should yield the
else value, but IEEE-754 makes 0 * NaN == NaN, so the NaN from the unused then arm poisons the
result. DirectX/fxc does not hit this (it guards or predicates the reciprocal).


2. Environment

Item Value
MonoGame.Framework.DesktopGL 3.8.0.1641 (NuGet)
MonoGame.Framework.WindowsDX (control) 3.8.0.1641 (NuGet)
dotnet-mgfxc (effect compiler) 3.8.0.1641
GL profile vs_3_0 / ps_3_0
DX profile (same .fx) vs_4_0 / ps_4_0
OS Windows 11

Reproduces independent of GPU/driver — it is a translator codegen issue (the length() of the same
normalized vector reads back correct on the very same draw; only the components are NaN).


3. Minimal reproduction (complete, self-contained)

All files are in repro/. The trigger is 4 lines of HLSL; the rest is a 60-line MonoGame
harness that renders one triangle and reads back the center pixel.

3.1 repro.fx (the shader — full file)

#if OPENGL
    #define VS_SHADERMODEL vs_3_0
    #define PS_SHADERMODEL ps_3_0
#else
    #define VS_SHADERMODEL vs_4_0
    #define PS_SHADERMODEL ps_4_0
#endif

struct VSIn  { float3 pos : POSITION0; float2 seed : TEXCOORD0; float flag : TEXCOORD1; };
struct VSOut { float4 pos : SV_POSITION; float4 color : COLOR0; };

VSOut MainVS(VSIn input)
{
    VSOut o;
    o.pos = float4(input.pos, 1.0);

    // THE TRIGGER. flag==0 -> dA is the zero vector -> lenA==0 -> dir takes the ELSE arm.
    float2 dA   = input.seed * input.flag;
    float  lenA = length(dA);
    float2 dirA = (lenA > 1e-6) ? (dA / lenA) : input.seed;
    float2 B1   = normalize(dirA);

    // Encode B1 so the rendered pixel reveals it: R=B1.x, G=B1.y, B=length(B1). NaN -> 0.
    o.color = float4(saturate(B1.x * 0.5 + 0.5), saturate(B1.y * 0.5 + 0.5), saturate(length(B1)), 1.0);
    return o;
}

float4 MainPS(VSOut input) : COLOR0 { return input.color; }

technique T { pass P0 {
    VertexShader = compile VS_SHADERMODEL MainVS();
    PixelShader  = compile PS_SHADERMODEL MainPS();
} }

3.2 Build

mgfxc repro.fx repro_gl.mgfxo /Profile:OpenGL
mgfxc repro.fx repro_dx.mgfxo /Profile:DirectX_11

3.3 Run (the harness — repro/Program.cs)

Renders one full-screen triangle whose 3 vertices all use flag=0, seed=(0.6,0.8), reads the
center pixel of a 64×64 RenderTarget, and decodes it. Two project files build the same Program.cs
against the two backends (repro/harness_gl/Repro.GL.csproj,
repro/harness_dx/Repro.DX.csproj):

dotnet run --project repro/harness_dx/Repro.DX.csproj   # control
dotnet run --project repro/harness_gl/Repro.GL.csproj   # the bug

3.4 Observed output (verbatim)

==================== DX ====================
[REPRO] backend = DirectX (vs_4_0 / fxc)
[REPRO] center pixel RGBA = (204,230,255,255)
[REPRO]   R -> B1.x   =   0.600  (finite)
[REPRO]   G -> B1.y   =   0.804  (finite)
[REPRO]   B -> length =   1.000  (==1 (normalize ran OK))
[REPRO] RESULT: no bug (components finite)
==================== GL ====================
[REPRO] backend = OpenGL (vs_3_0 / MojoShader)
[REPRO] center pixel RGBA = (0,0,255,255)
[REPRO]   R -> B1.x   =  -1.000  (ZERO => NaN)
[REPRO]   G -> B1.y   =  -1.000  (ZERO => NaN)
[REPRO]   B -> length =   1.000  (==1 (normalize ran OK))
[REPRO] RESULT: BUG REPRODUCED (length==1 but B1.x/B1.y read as NaN)
  • DX: B1 = (0.60, 0.80) — correct (the normalized seed).
  • GL: B1 = (NaN, NaN) — but length(B1) == 1.0. Both components are NaN; the length (a
    separate dot/sqrt instruction sequence over the same register) is correct.

4. The exact characterization

Quantity DX (vs_4_0) GL (vs_3_0/MojoShader)
dir (ternary result), else arm taken finite (== seed) finite for length, NaN for components
B1.x 0.600 NaN
B1.y 0.804 NaN
length(B1) 1.000 1.000

The "length correct but components NaN" split is the tell: the NaN is produced in the
ternary-select instruction sequence (which both arms feed), not in the application's data.


5. Root cause in the generated GLSL (smoking gun)

mgfxc /Profile:OpenGL embeds the MojoShader GLSL in the .mgfxo. The full vertex shader for the
minimal repro is attached as repro_gl_vertex.glsl.txt. The relevant
lines (register names are MojoShader's; vs_v1=seed, vs_v2=flag):

vs_r0.xy = vs_v1.xy;                              // dA = seed
vs_r0.xy = vs_r0.xy * vs_v2.xx;                   // dA = seed * flag        (flag==0 -> dA = 0)
vs_r0.zw = vs_r0.xy * vs_r0.xy;                   // dA.x^2, dA.y^2
vs_r0.z  = vs_r0.w + vs_r0.z;                     // lenA^2 = 0
vs_r0.z  = inversesqrt(vs_r0.z);                  // *** inversesqrt(0) = +Inf ***
vs_r0.xy = (vs_r0.xy * vs_r0.zz) + -vs_v1.xy;     // THEN arm: dA*(1/lenA) - seed = (0*Inf) - seed = NaN
vs_r0.z  = 1.0 / vs_r0.z;                         // lenA = 1/Inf = 0
vs_r0.z  = float(vs_c0.x < vs_r0.z);             // mask = (1e-6 < 0) = 0.0  -> select the ELSE arm
vs_r0.xy = (vs_r0.zz * vs_r0.xy) + vs_v1.xy;      // SELECT: 0.0 * NaN + seed = NaN + seed = NaN  ***
vs_r0.zw = vs_r0.xy * vs_r0.xy;                   // normalize(NaN)...
vs_r0.z  = vs_r0.w + vs_r0.z;
vs_r0.z  = inversesqrt(vs_r0.z);                  // ... still NaN-poisoned
vs_r0.xy = vs_r0.zz * vs_r0.xy;                   // B1 = (NaN, NaN)

The select on the line marked *** is the classic flattened-ternary mistake:
result = mask*then + else, with mask == 0. Because the then value is NaN (from the
unconditionally-evaluated 0 * inversesqrt(0)), and IEEE-754 defines 0 * NaN = NaN, the result is
NaN even though the mask selected the else arm. The application's guard
(len > 1e-6 ? … : …) is exactly meant to avoid the div-by-zero, but MojoShader's branchless
lowering evaluates the guarded arm anyway and then lets its NaN leak through the multiply-select.

(The same pattern at larger scale is in our real strip/ribbon shader — attached
mojoshader_strip_vs.glsl.txt, 7 inversesqrt in one flattened VS
— where one ribbon segment whose previous-neighbour edge is degenerate vanishes on GL; that is what
led us here.)


6. Why DirectX is fine

fxc compiling the same HLSL to vs_4_0 does not produce the 0 * Inf hazard — it predicates the
reciprocal / uses a guarded path, so the then arm is not evaluated to Inf when len == 0. So the
defect is specifically in the MojoShader vs_3_0 → GLSL lowering of the conditional select over a
rcp/rsqrt-bearing expression, not in the HLSL or the application.


7. Suggested fix (translator side)

The select lowering mask*then + else is unsafe for non-finite then/else. Options:

  1. Use a NaN-safe select when lowering ternaries (mix(else, then, mask) with mix is also
    unsafe for the same reason; a true branch or a mask != 0 ? then : else that does not multiply
    the dead arm is needed — e.g. emit an actual if, or use bitwise select).
  2. Lower normalize/x / length(x) with a guarded reciprocal (len > 0 ? 1/len : 0) so the dead
    arm cannot produce Inf/NaN in the first place.

Either removes the whole class (it is the same root as our second, earlier instance of "a VS value
that should be finite comes back NaN/garbage on GL only").


8. Application-side workaround we use

Because shader-side rewrites on MojoShader proved unreliable, we precompute the affected vertex
values on the CPU and pass them through a trivial passthrough vertex shader on the GL backend
(we already do CPU-side vertex replication), bypassing the MojoShader codegen path for those values.
A pure-HLSL mitigation that may work: avoid the x/length(x) form inside a ternary — e.g.
float2 dir = normalize(lenA > 1e-6 ? dA : seed); (normalize once, after the select) — but this
depends on how MojoShader lowers that and we have not exhaustively verified it across our shaders.


9. Where to report

  1. Primary — MonoGame main repo (its DesktopGL / 2MGFX path is the MojoShader consumer we use):
    https://github.com/MonoGame/MonoGame/issues
    (precedent: MojoShader codegen issues are filed here, e.g. MonoGame#1813.)
    MonoGame's MojoShader fork — https://github.com/MonoGame/mojoshader — has no Issues tab.
  2. Upstream — icculus/mojoshader (the canonical translator where the fix lives):
    https://github.com/icculus/mojoshader/issues
    (precedent: [FXC] saturate() on vectors is broken in FXC debug mode, and mojoshader doesn't work around it #10 — MojoShader already special-cases per-component vector codegen
    for an FXC quirk; this is the same lowering area.)

Recommendation: file in MonoGame/MonoGame first, cross-link icculus/mojoshader.


10. Attachments (all under docs/bugreports/)

  • repro/ — the complete buildable repro: repro.fx, Program.cs,
    harness_gl/Repro.GL.csproj, harness_dx/Repro.DX.csproj, and the two compiled *.mgfxo.
  • repro_gl_vertex.glsl.txt — the MojoShader-generated GLSL of the
    minimal repro VS (the smoking gun in §5).
  • mojoshader_strip_vs.glsl.txt — the generated GLSL of the real
    strip shader where we first hit this (larger, same pattern).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions