Skip to content

Fuse aarch64 register moves#2723

Open
siraben wants to merge 1 commit intoish-app:masterfrom
siraben:bench/hot-system-paths
Open

Fuse aarch64 register moves#2723
siraben wants to merge 1 commit intoish-app:masterfrom
siraben:bench/hot-system-paths

Conversation

@siraben
Copy link
Copy Markdown
Contributor

@siraben siraben commented Apr 27, 2026

Summary

Fuse 32-bit register-to-register mov on aarch64 into one direct gadget instead of emitting separate load and store gadgets.

Changes:

  • Add direct aarch64 mov32_reg_reg gadgets for the general register matrix.
  • Route 32-bit register-to-register MOV generation through the direct gadget table on aarch64.
  • Keep non-aarch64 and non-register MOV generation on the existing load/store path.

Results

Measured with the benchmark/profiling branch now kept separately at siraben/ish:bench/hot-system-paths-bench-profile. Release build, 7 runs each:

benchmark baseline avg optimized avg change
python_startup 0.157s 0.151s -3.8%
python_compute 8.179s 7.524s -8.0%
python_imports 2.039s 1.846s -9.5%
bash_control 36.543s 34.936s -4.4%
shell_pipeline 0.600s 0.561s -6.5%

Instrumentation also showed translated block code words dropping for representative workloads:

  • bash control: 144074 -> 142484
  • python compute: 896963 -> 886056

Validation

  • ninja -C build

@siraben siraben force-pushed the bench/hot-system-paths branch 2 times, most recently from 3773893 to 3dcf07a Compare April 27, 2026 18:25
@siraben siraben marked this pull request as ready for review April 27, 2026 19:33
@siraben siraben force-pushed the bench/hot-system-paths branch from 3dcf07a to 3985ec9 Compare April 27, 2026 19:37
@siraben siraben changed the title Benchmark hot paths and fuse aarch64 register moves Fuse aarch64 register moves Apr 27, 2026
Copy link
Copy Markdown
Member

@saagarjha saagarjha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! I measured a consistent 2-5% win across various workloads which seems in line with what you were seeing. A couple of comments though, and also if you could make an x86-64 path for this optimization too that would be much appreciated.

Comment thread asbestos/gen.c
return true;
}

static inline enum arg gen_resolve_arg(enum arg arg, struct modrm *modrm, uint64_t *imm, dword_t addr_offset) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is basically the same code as what is right above it, can we avoid duplicating it? We aren't even using its full functionality; I think we should probably just fastpath the things we care about

Comment on lines +22 to +40
.macro mov32_reg_reg_row dst_name, dst
mov32_reg_reg \dst_name, \dst, reg_a, eax
mov32_reg_reg \dst_name, \dst, reg_c, ecx
mov32_reg_reg \dst_name, \dst, reg_d, edx
mov32_reg_reg \dst_name, \dst, reg_b, ebx
mov32_reg_reg \dst_name, \dst, reg_sp, esp
mov32_reg_reg \dst_name, \dst, reg_bp, ebp
mov32_reg_reg \dst_name, \dst, reg_si, esi
mov32_reg_reg \dst_name, \dst, reg_di, edi
.endm

mov32_reg_reg_row reg_a, eax
mov32_reg_reg_row reg_c, ecx
mov32_reg_reg_row reg_d, edx
mov32_reg_reg_row reg_b, ebx
mov32_reg_reg_row reg_sp, esp
mov32_reg_reg_row reg_bp, ebp
mov32_reg_reg_row reg_si, esi
mov32_reg_reg_row reg_di, edi
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have an each_reg macro for this or something like that

emkey1 pushed a commit to emkey1/ish-AOK that referenced this pull request May 3, 2026
Add a narrower variant of the register-to-register MOV fast path discussed in upstream ish-app#2723. This keeps argument handling on the existing gen_op path for the general case, fast-paths only 32-bit register moves, and emits direct host gadgets for both aarch64 and x86_64.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants