[kernelbench] Initial XeGPU support by tkarna · Pull Request #129 · llvm/lighthouse

tkarna · 2026-05-04T13:49:39Z

Adds initial support for xegpu in the kernel_bench tool.

inspect_payload utility returns payload matmul shapes in the dict. Utility is moved to lighthouse/utils/mlir.py.
xegpu parameter_selector is moved to lighthouse/schedule/xegpu/xegpu_parameter_selector.py.
Refactored Runner.get_gpu_argument_access_callback: takes host numpy buffer and func arg index.
Add xegpu support to kernel_bench
- Adds --target CLI option defaults to "cpu".
- Uses inspect_payload to infer func args and matmul shapes.
- If pipeline is not set and target is "xegpu", uses "xegpu_mlp_pipeline" which is currently hard-coded.
- Uses xegpu_parameter_selector to generate mlp schedule parameters.
- Handles GPU data movement with GPUMemoryManager.

tkarna · 2026-05-04T14:02:00Z

The main difference with the previous lowering flow is that

now we need to access the payload IR module to inspect it
https://github.com/tkarna/lighthouse/blob/5bf6ff5b7c98e2723d0cbffa06785a5023585980/tools/kernel_bench#L229-L231

and we need to be able to pass a parameter dict to the lowering schedule
https://github.com/tkarna/lighthouse/blob/5bf6ff5b7c98e2723d0cbffa06785a5023585980/tools/kernel_bench#L92-L95

adam-smnk · 2026-05-05T08:08:58Z

+            driver.add_module_stage(mlp_schedule(pipeline_params))
+            driver.add_module_stage(xegpu_to_binary())


Do you think mlp_schedule with its params could be represented by pipeline yaml file?

I'd be cleaner to avoid special builders but fine if this is simpler. Just asking to understand possible limitations.

I'd second that. Hard-coding the API at this point will make it hard later when we have too many variations. This tool should not get hard-coded schedules via API. If anything, we add transform stages to the yaml file, or even improve the yaml support.

Thanks, I think this is a central question for the pipeline design.

I suspect that in the long run we are going to have some kind of oracle/cost model/autotuner that generates tiles sizes etc. parameters. In the generic case this is some blackbox python routine. How we handle this in the yaml interface is an open question.

a) We could find a way to serialize the parameter dict in the yaml file, either directly as as yaml entry or use some placeholder e.g. to indicate that the params should be read from a foo.json file, or something like that. I think this is too restrictive though - E.g., the cost model should first inspect the payload and proposed pipeline and then dump parameters to a yaml or json file. The user would then pass in the generated yaml/json files to use the optimal parameters.

b) We could try to incorporate payload analysis and cost models into the transform schedules. Analysis results could be encoded e.g. in payload module attributes that subsequent schedules read. The cost model/oracle, I think in the general case cannot be represented as transfrom ops, so it would be some magic python routine call in the transform schedule. Then the entire pipeline with analysis and cost models could be represented as a list of schedules, e.g. in a yaml file. Developing this capability is nontrivial though, we cannot expect to have this in the short term to e.g. run the kernel bench on GPU.

c) We could use some "placeholder" schedule names in the yaml file. In this PR I'm proposing "xegpu_mlp_pipeline" string. It's not a schedule per se but refers to a "standard" lowering flow, that can include payload analysis and parameter selection stages etc. The flow is defined in python - currently just hard-coded in kernel_bench but once established it should be moved to lighthouse. We could use this string in the pipeline yaml file (not done in the PR yet). This option allows defining the full lowering with string representation and at the same time allows using arbitrarily complex dynamic analysis/cost model routines under the hood, and is faster to implement than b.

E.g., the cost model should first inspect the payload and proposed pipeline and then dump parameters to a yaml or json file. The user would then pass in the generated yaml/json files to use the optimal parameters.

Yes, this is one of the uses I had in mind.

We could try to incorporate payload analysis and cost models into the transform schedules. Then the entire pipeline with analysis and cost models could be represented as a list of schedules, e.g. in a yaml file. Developing this capability is nontrivial though

Yup, this is why the one above should come first.

We could use some "placeholder" schedule names in the yaml file. In this PR I'm proposing "xegpu_mlp_pipeline" string.

The question is: can that string just be a yaml file instead? For now, hard-coded, so neither solutions above need to exist for this to merge.

I'm leaning toward option C i.e., shifting complexity to yaml.

As you mentioned, ideally all these decision could be reified into IR using schedules or annotation attributes. Good north-star but not feasible as is today.
That's why I think adding custom logic and syntax to yaml descriptor is a good placeholder. Then we can slowly start shifting what works into IR proper over time.

The question is: can that string just be a yaml file instead? For now, hard-coded, so neither solutions above need to exist for this to merge.

Well, we cannot express the "inspect payload -> call parameter_selector -> pass parameters to schedule" progression in yaml. We can use some placeholders strings ("xegpu_mlp_pipeline") in yaml which are interpreted correctly in python.

The progression is in the class, the pipeline in the yaml file. I would not try to encode payload/parameter into yaml, just the pipeline. The rest is in Python on a GPU class that holds the pipeline driver and just loads the referred yaml file. The CPU class would be even simpler, and just do what is done today.

There could be a base class that does the importing and other similar tasks.

So the yaml pipeline in this case would contain the schedules ["mlp_schedule", "xegpu_to_binary"]. And the high-level GPU lowering class has the payload inspector and parameter selector calls hard coded in the lowering flow. How do we know that the parameter dict should be passed to the "mlp_schedule" schedule and not to the other one? What if the user adds a new "foobar" schedule? To me it seems that in any case we need a special stage string (e.g. "xegpu_mlp_schedule" or "xegpu_mlp_pipeline") that the gpu lowering class recognizes and does the right thing.

No, the yaml file would contain what today both of those schedules contain.

The small parts that are actual schedules need to be factored out. This is what we planned for the CPU pipeline, too. We don't need passes inside schedules.

Splitting the mlp_schedule into passes and smaller parts does not change the big picture. For example these parts are going to be transform schedules that require parameters passed in as a python dict. (Unless we implement option B).

rengolin · 2026-05-05T08:17:56Z


 def define_compiler_pipeline(
-    mlir_file: Path,
+    driver: CompilerDriver,


This is not defining the compiler pipeline, it's receiving from outside. This is the same discussion as before: users control their data structures, libraries get passed down from the user. This here is a creator of Compiler driver, so it can't receive it from outside.

We need to compartamentalize top-down, otherwise it will be hard to know what's broken when things start to fall appart.

Yes, this change is needed because we must be able to inspect the payload module object that CompilerDriver owns, to obtain the parameters we pass to the lowering schedule. Thus this must be done before we populate the CompilerDriver stages. We can refactor the define_compiler_pipeline function in any way we like, but the flow does not change.

Then this can be encapsulated in the GPU specific code that wraps this function, and leave it out of the CPU specific code that doesn't need any of that.

It's not clear to me if we need a class inside lighthouse at this point, or just here in this file for now. But it seems for now we keep them here and see if we need to move later.

To be clear, both classes must own the driver/pipeline as before, so that any logic is performed by them, not external functions.

rengolin · 2026-05-05T08:20:08Z

+    # Set target specific default pipeline if no pipeline is provided.
+    default_pipeline = None
+    pipeline_params = None
+    if args.target == "xegpu" and args.pipeline is None:


Can we have separated GPU / CPU logic outside of main? Having all those if xegpu is not reasonable for such a high-level tool.

Hmm, we can move it outside main by refactoring the inspect-lower-and-execute logic into a helper function or object. The current version is however kernel bench specific. I'd suggest we refactor to a generic method once we have more similar flows/use cases.

Agreed. I'd still keep it kernel bench specific, for now. But we need to separate CPU/GPU in a way that doesn't leave if/else blocks on each function.

rengolin · 2026-05-05T08:45:33Z

now we need to access the payload IR module to inspect it
https://github.com/tkarna/lighthouse/blob/5bf6ff5b7c98e2723d0cbffa06785a5023585980/tools/kernel_bench#L229-L231

This is (for now) a GPU-only issue, and can go to the GPU class. But in time, we'll need to do that for all targets in order to know about inputs. For example, look into kernel bench's global variables to know what's configurable (for the init args), look at global constants, etc.

and we need to be able to pass a parameter dict to the lowering schedule
https://github.com/tkarna/lighthouse/blob/5bf6ff5b7c98e2723d0cbffa06785a5023585980/tools/kernel_bench#L92-L95

This can/should be done using the yaml descriptor.

tkarna · 2026-05-07T14:27:01Z

Closing this for now. The kernel_bench changes should be introduced/merged only after we have an xegpu pipeline that can run without any parameters (e.g. has a built-in tile size selector).

…133) Pulling in the commits from #129 that are not `kernel_bench` related. Generic util changes that'll be useful in the future. - `inspect_payload` utility returns payload matmul shapes in the dict. Utility is moved to `lighthouse/utils/mlir.py`. - xegpu `parameter_selector` is moved to `lighthouse/schedule/xegpu/xegpu_parameter_selector.py`. - Refactored `Runner.get_gpu_argument_access_callback`: takes host numpy buffer and func arg index.

tkarna added 5 commits May 4, 2026 16:04

mlp: remove obsolete dump kernel stages

a003810

inspect_payload returns matmul shapes, move to mlir utils

d895379

move xegpu parameter selector to lighthouse

7733876

Runner: arg access callback takes arg index and host buffer

3efef2c

kernel_bench: add xegpu support

04442cb

tkarna requested review from adam-smnk, fschlimb and rengolin May 4, 2026 13:49

add xegpu test

5bf6ff5

tkarna force-pushed the kb-xegpu branch from 0d1cbe7 to 5bf6ff5 Compare May 4, 2026 13:53

adam-smnk approved these changes May 5, 2026

View reviewed changes

rengolin requested changes May 5, 2026

View reviewed changes

tkarna closed this May 7, 2026

tkarna mentioned this pull request May 7, 2026

[xegpu] Move parameter selector and payload inspector to lighthouse #133

Merged

		driver.add_module_stage(mlp_schedule(pipeline_params))
		driver.add_module_stage(xegpu_to_binary())

Conversation

tkarna commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tkarna commented May 4, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rengolin May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adam-smnk May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tkarna May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rengolin May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tkarna May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rengolin May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tkarna May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rengolin commented May 5, 2026

Uh oh!

tkarna commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tkarna commented May 4, 2026 •

edited

Loading

rengolin May 5, 2026 •

edited

Loading

adam-smnk May 5, 2026 •

edited

Loading

tkarna May 5, 2026 •

edited

Loading

rengolin May 5, 2026 •

edited

Loading

tkarna May 5, 2026 •

edited

Loading

rengolin May 5, 2026 •

edited

Loading

tkarna May 5, 2026 •

edited

Loading