penguin: guest register access, CPUArchState CFFI headers, powerpc64el target#7
Merged
Conversation
Penguin's PyPANDA compatibility layer needs real guest register reads and writes to restore PANDA-era semantics for arch.get_reg, arch.set_arg and the MIPS A3 syscall success/failure flag in arch.set_retval. The hypercall callback ABI only writes back a single return register, so expose two small wrappers over the gdbstub register accessors, keyed by GDB core-feature register number. cpu_synchronize_state keeps them correct under both TCG and KVM.
Penguin spells the little-endian ppc64 archend "powerpc64el" and asks for libqemu-system-powerpc64el.so, but the build only staged a "powerpc64le" alias, leaving the architecture unusable. Build and package the powerpc64el spelling alongside the existing one (both alias the ppc64-softmmu target).
The GDB-numbered register accessors cover the core register set only; Penguin also wants typed access to the rest of the per-target CPU state (coprocessor registers, timers, FPU) the way PyPANDA exposed env. Generate a layout-exact CFFI declaration of CPUArchState per target by walking the DWARF of the just-built library, so the header can never drift from the binary it ships with. Every emitted struct is verified field-by-field against DWARF offsets using cffi itself; members that cannot be represented (bitfields, exotic types, anonymous members) are dropped and padded over, keeping all other offsets exact. Export penguin_cpu_env (the CPUState+1 layout contract validated in cpu-target.c, usable from common code) and penguin_sync_cpu_state so env reads are fresh and writes stick under KVM. The generated qemu_cffi_<mode>_<arch>_env.h headers ride along in the existing cffi manifest and penguin-qemu.tar.gz.
Compile an API-mode CFFI extension module per target alongside the ABI-mode env header. The cdef uses real type names with ellipsis in every struct and is compiled against the actual QEMU headers using the flags harvested from compile_commands.json, making the C compiler the layout authority: bitfields and anonymous members are fully supported and nothing can drift from the library build. Named structs needed inside anonymous inline types (where cffi forbids partial types) are inlined as complete anonymous copies, keeping x87/MMX register unions accessible. The module is tied to the build's CPython ABI (cp310 on the 22.04 builder, matching Penguin's container); the ABI-mode header remains as a Python-version-agnostic fallback. Modules ship in penguin-qemu.tar.gz under lib/penguin-qemu-env/ via an env_module key in the cffi manifest.
lacraig2
added a commit
to rehosting/penguin
that referenced
this pull request
Jun 11, 2026
QEMU 0.0.8 ships the guest register access exports, the powerpc64el system target assets, and the generated CPUArchState env headers from rehosting/qemu#7, activating the full register/env semantics in the compat layer. Verified the shim against the released artifact: register exports resolve, powerpc64el loads with ppc64le conventions, and panda.cpu_env() works via the ABI env header. Note: 0.0.8 contains no compiled CFFI env modules -- the builder image lacks python3-dev, so the module compile silently fell back to header-only (fix queued in rehosting/qemu). The shim already prefers compiled modules when a future release ships them; no penguin change will be needed.
lacraig2
added a commit
to rehosting/penguin
that referenced
this pull request
Jun 11, 2026
The QEMU shim previously emulated only the hypercall-captured view of
guest registers: get_reg returned 0 for anything uncaptured, set_arg
mutated host-side state without touching the guest, and set_retval
ignored the convention/failure contract (dropping the MIPS A3
success/failure flag and error negation). Restore PANDA parity using
the new penguin_{read,write}_guest_reg QEMU exports, keyed by per-arch
GDB core-feature register numbers (verified against each target's
gdbstub). set_retval's default convention is now 'default', matching
pandare2, so A3 semantics apply only on an explicit syscall convention.
Everything degrades gracefully (warning, captured-only) against QEMU
libraries lacking the exports.
Add typed CPUArchState access: panda.cpu_env(cpu) returns the full
per-target env (coprocessor registers, timers, FPU) via the compiled
CFFI module shipped with the QEMU package, falling back to the
generated ABI-mode env header; panda.sync_cpu_state(cpu) keeps env
fresh and writes sticking under KVM (cpu_env auto-syncs there).
Restore fail-fast guest callbacks: _record_callback_exception stores
the first fatal handler error, requests shutdown, and run() re-raises
it after the main loop exits, mirroring PyPANDA.
Also fix powerpc64el (normalize to the ppc64le conventions and resolve
the powerpc64le-spelled library/header assets as a fallback) and make
virtual_memory_read(fmt='int') decode unsigned, matching pandare2 --
guest kernel pointers read this way must not come back negative.
Requires rehosting/qemu#7 for the register/env exports and compiled
env modules; without them the new APIs degrade or raise cleanly.
lacraig2
added a commit
to rehosting/penguin
that referenced
this pull request
Jun 11, 2026
QEMU 0.0.8 ships the guest register access exports, the powerpc64el system target assets, and the generated CPUArchState env headers from rehosting/qemu#7, activating the full register/env semantics in the compat layer. Verified the shim against the released artifact: register exports resolve, powerpc64el loads with ppc64le conventions, and panda.cpu_env() works via the ABI env header. Note: 0.0.8 contains no compiled CFFI env modules -- the builder image lacks python3-dev, so the module compile silently fell back to header-only (fix queued in rehosting/qemu). The shim already prefers compiled modules when a future release ships them; no penguin change will be needed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three changes restoring PANDA-era CPU state access for Penguin's PyPANDA compatibility layer (part of the panda-ng → qemu migration cleanup):
penguin_read_guest_reg/penguin_write_guest_reg— small exports over the gdbstub register accessors (gdb_read_register/gdb_write_register), keyed by GDB core-feature register number, withcpu_synchronize_stateso they are correct under both TCG and KVM. These restore PANDA-era semantics forarch.get_regoutside the captured syscall args,arch.set_argwrite-through to the guest, and the MIPS A3 syscall success/failure flag inarch.set_retval. Penguin handles name→regnum mapping and endianness and degrades gracefully against libs without these exports.Compiled
CPUArchStateCFFI modules + generated headers — the gdbstub surface covers the GDB core register set only; Penguin also wants typed access to the rest of the per-target CPU state (coprocessor registers, timers, FPU) the way PyPANDA'sCPUArchStatecdefs did.scripts/penguin-env-cffi-gen.pyenumerates fields from the DWARF of each just-built library and produces two artifacts per target:_penguin_qemu_env_<mode>_<arch>): real type names with...ellipses, compiled against the actual QEMU headers with flags harvested fromcompile_commands.json— the C compiler is the layout authority, bitfields and anonymous members are fully supported (x87/MMX register unions included), and layout checks run at import. Cached as a build artifact; tied to the builder's CPython ABI (cp310, matching Penguin's 22.04 container).qemu_cffi_<mode>_<arch>_env.h) as a Python-version-agnostic fallback: layout-exact cdef verified field-by-field against DWARF offsets, unrepresentable members padded over.Two supporting exports:
penguin_cpu_env(theCPUState+1 layout contract validated in cpu-target.c, usable from common code) andpenguin_sync_cpu_state(env freshness/write-back under KVM). Both artifacts ride along in the cffi manifest andpenguin-qemu.tar.gz; Penguin prefers the compiled module and falls back to the header.powerpc64el system target assets — Penguin's archend spelling for little-endian ppc64 is
powerpc64eland it dlopenslibqemu-system-powerpc64el.so, but the build only staged apowerpc64lealias, leaving the arch unusable under the QEMU backend. Both spellings now aliasppc64-softmmu.Testing
mipsel-softmmu,x86_64-softmmu): compile and link cleanly; all four new symbols exported with default visibility; NULL-CPU guards return error codes without crashing when called via cffi.sizeof(CPUArchState)=7520,CP0_Count@1052,CP0_Compare@1064,active_tc.gpr[7]@28) and x86_64 (regs@0,eip@256,cr@552,xmm_regs@992,efer@728, sizeof=15344) — all exact.panda.cpu_env(cpu)returns a typedCPUArchState *(from the compiled module when present, header otherwise) whose field writes land at the gdb-verified byte offsets; x86_64fpregs[i].d.low, MMX views, and the flattened breakpoint/watchpoint union all accessible.pyelftools>=0.31+cffiadded to the builder image (DWARF5 needs recent pyelftools).The Penguin-side counterpart (GDB regnum maps,
cpu_env()/sync_cpu_state()shim methods, set_retval/set_arg/get_reg semantics) lands separately in rehosting/penguin.