Skip to content
View maawad's full-sized avatar
:shipit:
:shipit:

Highlights

  • Pro

Organizations

@owensgroup @gunrock

Block or report maawad

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
maawad/README.md

πŸ‘‹ Hi there β€” I'm Muhammad Awad

I'm a Senior Member of Technical Staff at AMD Research, where I conceive, architect, and tech-lead system-level software, libraries, and runtimes for next-generation computing. I lead cross-organizational teams and coordinate initiatives across ML frameworks, kernel engineering, distributed systems, and research teams.

Before joining AMD, I completed my Ph.D. in Electrical and Computer Engineering at UC Davis, advised by John Owens. My research there focused on building dynamic concurrent data structures on GPUs, such as B-trees, dynamic graphs, multiversioned trees, and high-performance hash tables.


πŸ’Ό Current Focus @ AMD

At AMD, I lead and contribute to projects spanning:

  • AI-powered tools for GPU performance and productivity
  • Libraries and runtimes for heterogeneous and distributed systems
  • Programming models for AMD GPUs and Ryzenβ„’ AI NPUs

Notable projects:

  • πŸ” Iris: Conceived, architected, and tech-led a Triton-based multi-GPU programming framework from scratch. Leading a cross-organizational team, demonstrated significant speedups in production LLM workloads.
  • 🧠 IntelliKit: Conceived and architected an open-source LLM-ready profiling toolkit for AMD CPUs and GPUs. Serving as tooling lead for AMD's company-wide ML for performance engineering initiative.
  • 🧠 IntelliPerf: Conceived and led development of an LLM-powered autonomous GPU performance engineering framework that profiles, diagnoses, and optimizes kernels end-to-end.
  • βš™οΈ IRON: Contributor to IRON, a low-level development stack for AMD Ryzenβ„’ AI NPUs with Python APIs and MLIR compiler passes.

πŸ”¬ Academic Research

I'm broadly interested in parallel computing, concurrent data structures, performance analysis, and low-level GPU programming. As a Ph.D. student, I designed and built several GPU-native data structures:


πŸ“« Connect


Pinned Loading

  1. ROCm/iris ROCm/iris Public

    AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming

    Python 171 33

  2. owensgroup/BGHT owensgroup/BGHT Public

    BGHT: High-performance static GPU hash tables.

    C++ 71 8

  3. owensgroup/MVGpuBTree owensgroup/MVGpuBTree Public

    GPU B-Tree with support for versioning (snapshots).

    C++ 51 5

  4. gunrock/gunrock gunrock/gunrock Public

    Programmable CUDA/C++ GPU Graph Analytics

    C++ 1.1k 218

  5. owensgroup/GpuBTree owensgroup/GpuBTree Public

    Code for paper "Engineering a High-Performance GPU B-Tree" accepted to PPoPP 2019

    Cuda 58 13

  6. PTX_BCHT PTX_BCHT Public

    Bucketed Cuckoo hash set written in PTX and JIT-compiled.

    C++ 1 1