Skip to content

fix DRI3 version detection#2165

Merged
metux merged 1 commit intoX11Libre:masterfrom
josephcrowell:dri3-version-detect-fix
Apr 10, 2026
Merged

fix DRI3 version detection#2165
metux merged 1 commit intoX11Libre:masterfrom
josephcrowell:dri3-version-detect-fix

Conversation

@josephcrowell
Copy link
Copy Markdown
Contributor

@josephcrowell josephcrowell commented Apr 9, 2026

This ensures that the info struct is populated when valid info has been obtained after the screen was initialized with a null info struct which always happens.

@josephcrowell josephcrowell force-pushed the dri3-version-detect-fix branch from 70574c7 to 1aecede Compare April 9, 2026 01:07
Signed-off-by: Joseph Crowell <joseph.w.crowell@gmail.com>
@cepelinas9000
Copy link
Copy Markdown
Contributor

@josephcrowell can you provide context where/what bug hit this fix

@josephcrowell
Copy link
Copy Markdown
Contributor Author

josephcrowell commented Apr 9, 2026

It's the thing we were talking about in the chat about 1.5 months ago. There were also some function exports that needed to happen I think but Enrico was going to handle that one.

Copy link
Copy Markdown
Contributor

@cepelinas9000 cepelinas9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

@stefan11111
Copy link
Copy Markdown
Contributor

Is this intended to allow re-initialization of DRI3?

@josephcrowell
Copy link
Copy Markdown
Contributor Author

josephcrowell commented Apr 10, 2026

Is this intended to allow re-initialization of DRI3?

DRI3 is initialized once early in the lifecycle with a null info struct. (dri3.c line 102)

Later on when the video card and driver are initializing this again (glamor_egl.c line 1174), they call this with a valid info struct that has all the DRI3 function calls from the driver included, but since it was already initialized earlier in the lifecycle with the null info struct, the if condition returns false and the info struct never gets populated with the valid info value.

In kwin, I had to do a workaround where it was checking for the existence of the DRI functions directly instead of using the version check because of this and also because of the missing function export.

At initialization time everything seems fine because the log shows that the driver initialized DRI3 and it's supposed to be available, but from a consumer's standpoint (such as kwin_x11 or even glxgears), it is NOT fine because they do the version check and it always reports DRI3 1.0 regardless of the actual capabilities of the driver because the null info struct of course doesn't point to any DRI3 functions.

@josephcrowell
Copy link
Copy Markdown
Contributor Author

Note: Enrico's commits #685 and #1204 didn't introduce this issue, they just made the code readable enough to more easily see it.

@metux metux merged commit 1a75635 into X11Libre:master Apr 10, 2026
@stefan11111
Copy link
Copy Markdown
Contributor

Doesn't glamor_egl call dri3_screen_init before the dix initializes the extensions?

dri3_extension_init has the following check:

    /* If no screens support DRI3, there's no point offering the
     * extension at all
     */
    if (dri3_screen_generation != serverGeneration)
        return;

If dri3_extension_init was called before dri3_screen_init, then wouldn't the server not report DRI3 support at all?

@josephcrowell
Copy link
Copy Markdown
Contributor Author

Doesn't glamor_egl call dri3_screen_init before the dix initializes the extensions?

dri3_extension_init has the following check:

    /* If no screens support DRI3, there's no point offering the
     * extension at all
     */
    if (dri3_screen_generation != serverGeneration)
        return;

If dri3_extension_init was called before dri3_screen_init, then wouldn't the server not report DRI3 support at all?

It has enough info to know that some level of DRI support exists, but the version check defaults to 1.0 and only raises the version based on finding functions, which can't be found because of this bug.

proc_dri3_query_version(ClientPtr client) dri3_request.c line 78

@stefan11111
Copy link
Copy Markdown
Contributor

It has enough info to know that some level of DRI support exists

How does it get this info?

When the driver calls glamor_init, glamor calls glamor_egl_screen_init, which calls dri3_screen_init

dri3_screen_init then sets dri3_screen_generation, and is the only thing that sets it.

Then, dix indirectly calls dri3_extension_init, which checks it, and is the only thing that checks it.

If dri3_extension_init somehow gets called before dri3_screen_init, then dri3_screen_generation != serverGeneration, and it returns early before doing anything, so DRI3 wouldn't work at all.

Does the amd driver try to reinitialize dri3 after glamor does it / instead of glamor , and it is not able to because of the check that was here before?

@josephcrowell
Copy link
Copy Markdown
Contributor Author

josephcrowell commented Apr 10, 2026

The amd driver does (and has to do) the exact same thing as the modesetting driver so the modesetting driver would be broken too fyi. I would prefer to use the modesetting driver and not have to screw around with all these different drivers myself but to make it as functional as the amd/nvidia/intel/etc specific drivers you have to do a bunch of nested if statements in real time code and every one of them involves a performance hit. It's kind of a chicken and egg type situation.

@josephcrowell
Copy link
Copy Markdown
Contributor Author

josephcrowell commented Apr 10, 2026

Does the amd driver try to reinitialize dri3 after glamor does it / instead of glamor , and it is not able to because of the check that was here before?

Yes that's pretty much what happens with every driver. dri3_extension_init has to be called to initialize DRI3. It happens so early in the lifecycle that the info struct hasn't been initialized yet so it passes null as the info struct, but if it isn't called at all then the whole driver falls over (for every driver). Then the driver tries to reinitialize dri3 just after it has gotten the info struct but the init fails to assign the info struct to the info variable within the screen struct because it finds an already initialized screen struct.

It's happening everywhere and people are finding it in bug reports all over the place when they run various things like glxgears, zutty apparently see screen shot, me with kwin. DRI3 simply hasn't been working properly with ANY driver looking at the commit history, for around 10 years, ever since Kieth Packard committed the current logic.

image

@stefan11111
Copy link
Copy Markdown
Contributor

stefan11111 commented Apr 11, 2026

I took a look at the amdgpu driver:

In AMDGPUScreenInit_KMS, amdgpu_dri3_screen_init gets called first, then amdgpu_glamor_init gets called.

amdgpu_dri3_screen_init then calls dri3_screen_init with the amd dri3 implementation.

amdgpu_glamor_init then calls glamor_init with GLAMOR_NO_DRI3

This is what the modesetting driver does:

In ScreenInit, drmmode_init gets called.
drmmode_init calls glamor_init, and uses glamor's dri3 implementation.

As I see it, both drivers initialize dri3 and glamor in ScreenInit, so if modesetting has working dri3, amdgpu should too.

And in modesetting, dri3 does work.
As a test:

$ xdpyinfo | grep DRI3
    DRI3

And vkgears works.

Sadly, dri3 is not runtime-disableable (yet), so I can't compare with the output from modesetting, but here is Xfbdev with no DRI3 support.

$ DISPLAY=:1 xdpyinfo | grep DRI3
<end of output>
$ DISPLAY=:1 vkgears
MESA: info: vulkan: No DRI3 support detected - required for presentation
Note: you can probably enable DRI3 in your Xorg config
Vulkan not supported on given XCB surface
Failed to create surface!

Fwiw, DRI3 also works in Xfbdev with this patch (applied on top of the whole pr): #2143 (comment)

This is what I don't get. How come DRI3 works in modesetting (and Xfbdev), but not with the amd driver?
Looking at the code, both modesetting and amdgpu initialize dri3 and glamor at the same time in ScreenInit, so if DRI3 works in one, it should work in the other and vice-versa.

EDIT: To be clear, I'm not denying there is an issue, not am I opposed to the change. I just want to understand what exactly is happening here.

@josephcrowell
Copy link
Copy Markdown
Contributor Author

To be clear, I'm not denying there is an issue, not am I opposed to the change. I just want to understand what exactly is happening here.

I would like to know too. We are inheriting spaghetti code and sorting through it one strand at a time. It would be better if the double initialization wasn't necessary at all in my opinion.

@stefan11111
Copy link
Copy Markdown
Contributor

stefan11111 commented Apr 11, 2026

If you've got the time, could you run the X server with the amd driver under gdb (both the server and the driver built with debug symbols), with breakpoints on dri3_screen_init and dri3_extension_init?

This way, we could see exactly where these functions get called, and the entire call stack at those points.

The way I understand it, dri3_screen_init should only be called once.

@josephcrowell
Copy link
Copy Markdown
Contributor Author

josephcrowell commented Apr 11, 2026

If it wasn't getting called twice, I wouldn't have made the change. I already tested and know although I was using debug messages to get the state of variables at every point in the path instead of break points.

The real problem is the call in dri3_extension_init.

    DIX_FOR_EACH_SCREEN({
        if (!dri3_screen_init(walkScreen, NULL))
            goto bail;
    });

The second null argument is the info struct. This is never good and will always break DRI3. The info struct isn't just screen information. It's the mechanism by which applications are supposed to access the DRI3 functions in the driver.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants