Conversation
|
The main difference with the previous lowering flow is that now we need to access the payload IR module to inspect it and we need to be able to pass a parameter dict to the lowering schedule |
| driver.add_module_stage(mlp_schedule(pipeline_params)) | ||
| driver.add_module_stage(xegpu_to_binary()) |
There was a problem hiding this comment.
Do you think mlp_schedule with its params could be represented by pipeline yaml file?
I'd be cleaner to avoid special builders but fine if this is simpler. Just asking to understand possible limitations.
There was a problem hiding this comment.
I'd second that. Hard-coding the API at this point will make it hard later when we have too many variations. This tool should not get hard-coded schedules via API. If anything, we add transform stages to the yaml file, or even improve the yaml support.
There was a problem hiding this comment.
Thanks, I think this is a central question for the pipeline design.
I suspect that in the long run we are going to have some kind of oracle/cost model/autotuner that generates tiles sizes etc. parameters. In the generic case this is some blackbox python routine. How we handle this in the yaml interface is an open question.
a) We could find a way to serialize the parameter dict in the yaml file, either directly as as yaml entry or use some placeholder e.g. to indicate that the params should be read from a foo.json file, or something like that. I think this is too restrictive though - E.g., the cost model should first inspect the payload and proposed pipeline and then dump parameters to a yaml or json file. The user would then pass in the generated yaml/json files to use the optimal parameters.
b) We could try to incorporate payload analysis and cost models into the transform schedules. Analysis results could be encoded e.g. in payload module attributes that subsequent schedules read. The cost model/oracle, I think in the general case cannot be represented as transfrom ops, so it would be some magic python routine call in the transform schedule. Then the entire pipeline with analysis and cost models could be represented as a list of schedules, e.g. in a yaml file. Developing this capability is nontrivial though, we cannot expect to have this in the short term to e.g. run the kernel bench on GPU.
c) We could use some "placeholder" schedule names in the yaml file. In this PR I'm proposing "xegpu_mlp_pipeline" string. It's not a schedule per se but refers to a "standard" lowering flow, that can include payload analysis and parameter selection stages etc. The flow is defined in python - currently just hard-coded in kernel_bench but once established it should be moved to lighthouse. We could use this string in the pipeline yaml file (not done in the PR yet). This option allows defining the full lowering with string representation and at the same time allows using arbitrarily complex dynamic analysis/cost model routines under the hood, and is faster to implement than b.
There was a problem hiding this comment.
E.g., the cost model should first inspect the payload and proposed pipeline and then dump parameters to a yaml or json file. The user would then pass in the generated yaml/json files to use the optimal parameters.
Yes, this is one of the uses I had in mind.
We could try to incorporate payload analysis and cost models into the transform schedules. Then the entire pipeline with analysis and cost models could be represented as a list of schedules, e.g. in a yaml file. Developing this capability is nontrivial though
Yup, this is why the one above should come first.
We could use some "placeholder" schedule names in the yaml file. In this PR I'm proposing
"xegpu_mlp_pipeline"string.
The question is: can that string just be a yaml file instead? For now, hard-coded, so neither solutions above need to exist for this to merge.
There was a problem hiding this comment.
I'm leaning toward option C i.e., shifting complexity to yaml.
As you mentioned, ideally all these decision could be reified into IR using schedules or annotation attributes. Good north-star but not feasible as is today.
That's why I think adding custom logic and syntax to yaml descriptor is a good placeholder. Then we can slowly start shifting what works into IR proper over time.
There was a problem hiding this comment.
The question is: can that string just be a yaml file instead? For now, hard-coded, so neither solutions above need to exist for this to merge.
Well, we cannot express the "inspect payload -> call parameter_selector -> pass parameters to schedule" progression in yaml. We can use some placeholders strings ("xegpu_mlp_pipeline") in yaml which are interpreted correctly in python.
There was a problem hiding this comment.
The progression is in the class, the pipeline in the yaml file. I would not try to encode payload/parameter into yaml, just the pipeline. The rest is in Python on a GPU class that holds the pipeline driver and just loads the referred yaml file. The CPU class would be even simpler, and just do what is done today.
There could be a base class that does the importing and other similar tasks.
There was a problem hiding this comment.
So the yaml pipeline in this case would contain the schedules ["mlp_schedule", "xegpu_to_binary"]. And the high-level GPU lowering class has the payload inspector and parameter selector calls hard coded in the lowering flow. How do we know that the parameter dict should be passed to the "mlp_schedule" schedule and not to the other one? What if the user adds a new "foobar" schedule? To me it seems that in any case we need a special stage string (e.g. "xegpu_mlp_schedule" or "xegpu_mlp_pipeline") that the gpu lowering class recognizes and does the right thing.
There was a problem hiding this comment.
No, the yaml file would contain what today both of those schedules contain.
The small parts that are actual schedules need to be factored out. This is what we planned for the CPU pipeline, too. We don't need passes inside schedules.
There was a problem hiding this comment.
Splitting the mlp_schedule into passes and smaller parts does not change the big picture. For example these parts are going to be transform schedules that require parameters passed in as a python dict. (Unless we implement option B).
|
|
||
| def define_compiler_pipeline( | ||
| mlir_file: Path, | ||
| driver: CompilerDriver, |
There was a problem hiding this comment.
This is not defining the compiler pipeline, it's receiving from outside. This is the same discussion as before: users control their data structures, libraries get passed down from the user. This here is a creator of Compiler driver, so it can't receive it from outside.
We need to compartamentalize top-down, otherwise it will be hard to know what's broken when things start to fall appart.
There was a problem hiding this comment.
Yes, this change is needed because we must be able to inspect the payload module object that CompilerDriver owns, to obtain the parameters we pass to the lowering schedule. Thus this must be done before we populate the CompilerDriver stages. We can refactor the define_compiler_pipeline function in any way we like, but the flow does not change.
There was a problem hiding this comment.
Then this can be encapsulated in the GPU specific code that wraps this function, and leave it out of the CPU specific code that doesn't need any of that.
It's not clear to me if we need a class inside lighthouse at this point, or just here in this file for now. But it seems for now we keep them here and see if we need to move later.
To be clear, both classes must own the driver/pipeline as before, so that any logic is performed by them, not external functions.
| # Set target specific default pipeline if no pipeline is provided. | ||
| default_pipeline = None | ||
| pipeline_params = None | ||
| if args.target == "xegpu" and args.pipeline is None: |
There was a problem hiding this comment.
Can we have separated GPU / CPU logic outside of main? Having all those if xegpu is not reasonable for such a high-level tool.
There was a problem hiding this comment.
Hmm, we can move it outside main by refactoring the inspect-lower-and-execute logic into a helper function or object. The current version is however kernel bench specific. I'd suggest we refactor to a generic method once we have more similar flows/use cases.
There was a problem hiding this comment.
Agreed. I'd still keep it kernel bench specific, for now. But we need to separate CPU/GPU in a way that doesn't leave if/else blocks on each function.
This is (for now) a GPU-only issue, and can go to the GPU class. But in time, we'll need to do that for all targets in order to know about inputs. For example, look into kernel bench's global variables to know what's configurable (for the init args), look at global constants, etc.
This can/should be done using the yaml descriptor. |
|
Closing this for now. The kernel_bench changes should be introduced/merged only after we have an xegpu pipeline that can run without any parameters (e.g. has a built-in tile size selector). |
…133) Pulling in the commits from #129 that are not `kernel_bench` related. Generic util changes that'll be useful in the future. - `inspect_payload` utility returns payload matmul shapes in the dict. Utility is moved to `lighthouse/utils/mlir.py`. - xegpu `parameter_selector` is moved to `lighthouse/schedule/xegpu/xegpu_parameter_selector.py`. - Refactored `Runner.get_gpu_argument_access_callback`: takes host numpy buffer and func arg index.
Adds initial support for xegpu in the
kernel_benchtool.inspect_payloadutility returns payload matmul shapes in the dict. Utility is moved tolighthouse/utils/mlir.py.parameter_selectoris moved tolighthouse/schedule/xegpu/xegpu_parameter_selector.py.Runner.get_gpu_argument_access_callback: takes host numpy buffer and func arg index.kernel_bench--targetCLI option defaults to"cpu".inspect_payloadto infer func args and matmul shapes."xegpu", uses"xegpu_mlp_pipeline"which is currently hard-coded.xegpu_parameter_selectorto generate mlp schedule parameters.GPUMemoryManager.