Fuse aarch64 register moves#2723
Open
siraben wants to merge 1 commit intoish-app:masterfrom
Open
Conversation
3773893 to
3dcf07a
Compare
3dcf07a to
3985ec9
Compare
saagarjha
requested changes
May 3, 2026
Member
saagarjha
left a comment
There was a problem hiding this comment.
Nice work! I measured a consistent 2-5% win across various workloads which seems in line with what you were seeing. A couple of comments though, and also if you could make an x86-64 path for this optimization too that would be much appreciated.
| return true; | ||
| } | ||
|
|
||
| static inline enum arg gen_resolve_arg(enum arg arg, struct modrm *modrm, uint64_t *imm, dword_t addr_offset) { |
Member
There was a problem hiding this comment.
This is basically the same code as what is right above it, can we avoid duplicating it? We aren't even using its full functionality; I think we should probably just fastpath the things we care about
Comment on lines
+22
to
+40
| .macro mov32_reg_reg_row dst_name, dst | ||
| mov32_reg_reg \dst_name, \dst, reg_a, eax | ||
| mov32_reg_reg \dst_name, \dst, reg_c, ecx | ||
| mov32_reg_reg \dst_name, \dst, reg_d, edx | ||
| mov32_reg_reg \dst_name, \dst, reg_b, ebx | ||
| mov32_reg_reg \dst_name, \dst, reg_sp, esp | ||
| mov32_reg_reg \dst_name, \dst, reg_bp, ebp | ||
| mov32_reg_reg \dst_name, \dst, reg_si, esi | ||
| mov32_reg_reg \dst_name, \dst, reg_di, edi | ||
| .endm | ||
|
|
||
| mov32_reg_reg_row reg_a, eax | ||
| mov32_reg_reg_row reg_c, ecx | ||
| mov32_reg_reg_row reg_d, edx | ||
| mov32_reg_reg_row reg_b, ebx | ||
| mov32_reg_reg_row reg_sp, esp | ||
| mov32_reg_reg_row reg_bp, ebp | ||
| mov32_reg_reg_row reg_si, esi | ||
| mov32_reg_reg_row reg_di, edi |
Member
There was a problem hiding this comment.
I think we have an each_reg macro for this or something like that
emkey1
pushed a commit
to emkey1/ish-AOK
that referenced
this pull request
May 3, 2026
Add a narrower variant of the register-to-register MOV fast path discussed in upstream ish-app#2723. This keeps argument handling on the existing gen_op path for the general case, fast-paths only 32-bit register moves, and emits direct host gadgets for both aarch64 and x86_64.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fuse 32-bit register-to-register
movon aarch64 into one direct gadget instead of emitting separate load and store gadgets.Changes:
mov32_reg_reggadgets for the general register matrix.Results
Measured with the benchmark/profiling branch now kept separately at
siraben/ish:bench/hot-system-paths-bench-profile. Release build, 7 runs each:Instrumentation also showed translated block code words dropping for representative workloads:
144074 -> 142484896963 -> 886056Validation
ninja -C build