Currently, ptx-json emits invalid PTX, which means that regardless of whether we need to do a two C++ frontend pass compilation or not when building kernels, we have to do it. #5355 serves as a proof of concept for extracting data from PTX generated from LTOIR, and adapting that technique in c.parallel would allow us to eliminate one of the C++ frontend passes, speeding the build step considerably. Investigate making this possible.
Currently, ptx-json emits invalid PTX, which means that regardless of whether we need to do a two C++ frontend pass compilation or not when building kernels, we have to do it. #5355 serves as a proof of concept for extracting data from PTX generated from LTOIR, and adapting that technique in c.parallel would allow us to eliminate one of the C++ frontend passes, speeding the build step considerably. Investigate making this possible.