Skip to content

penguin: guest register access, CPUArchState CFFI headers, powerpc64el target#7

Merged
lacraig2 merged 4 commits into
mainfrom
penguin-guest-reg-access
Jun 11, 2026
Merged

penguin: guest register access, CPUArchState CFFI headers, powerpc64el target#7
lacraig2 merged 4 commits into
mainfrom
penguin-guest-reg-access

Conversation

@lacraig2

@lacraig2 lacraig2 commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Summary

Three changes restoring PANDA-era CPU state access for Penguin's PyPANDA compatibility layer (part of the panda-ng → qemu migration cleanup):

  1. penguin_read_guest_reg / penguin_write_guest_reg — small exports over the gdbstub register accessors (gdb_read_register/gdb_write_register), keyed by GDB core-feature register number, with cpu_synchronize_state so they are correct under both TCG and KVM. These restore PANDA-era semantics for arch.get_reg outside the captured syscall args, arch.set_arg write-through to the guest, and the MIPS A3 syscall success/failure flag in arch.set_retval. Penguin handles name→regnum mapping and endianness and degrades gracefully against libs without these exports.

  2. Compiled CPUArchState CFFI modules + generated headers — the gdbstub surface covers the GDB core register set only; Penguin also wants typed access to the rest of the per-target CPU state (coprocessor registers, timers, FPU) the way PyPANDA's CPUArchState cdefs did. scripts/penguin-env-cffi-gen.py enumerates fields from the DWARF of each just-built library and produces two artifacts per target:

    • a compiled CFFI API-mode module (_penguin_qemu_env_<mode>_<arch>): real type names with ... ellipses, compiled against the actual QEMU headers with flags harvested from compile_commands.json — the C compiler is the layout authority, bitfields and anonymous members are fully supported (x87/MMX register unions included), and layout checks run at import. Cached as a build artifact; tied to the builder's CPython ABI (cp310, matching Penguin's 22.04 container).
    • an ABI-mode header (qemu_cffi_<mode>_<arch>_env.h) as a Python-version-agnostic fallback: layout-exact cdef verified field-by-field against DWARF offsets, unrepresentable members padded over.

    Two supporting exports: penguin_cpu_env (the CPUState+1 layout contract validated in cpu-target.c, usable from common code) and penguin_sync_cpu_state (env freshness/write-back under KVM). Both artifacts ride along in the cffi manifest and penguin-qemu.tar.gz; Penguin prefers the compiled module and falls back to the header.

  3. powerpc64el system target assets — Penguin's archend spelling for little-endian ppc64 is powerpc64el and it dlopens libqemu-system-powerpc64el.so, but the build only staged a powerpc64le alias, leaving the arch unusable under the QEMU backend. Both spellings now alias ppc64-softmmu.

Testing

  • Single-target smoke builds (mipsel-softmmu, x86_64-softmmu): compile and link cleanly; all four new symbols exported with default visibility; NULL-CPU guards return error codes without crashing when called via cffi.
  • Env header generation runs in ~6s per target. Layout cross-checked independently against gdb ground truth on both targets: mipsel (sizeof(CPUArchState)=7520, CP0_Count@1052, CP0_Compare@1064, active_tc.gpr[7]@28) and x86_64 (regs@0, eip@256, cr@552, xmm_regs@992, efer@728, sizeof=15344) — all exact.
  • Only dropped members across both targets: zero-length save/reset marker arrays, one Int128 field, and the host-side breakpoint/watchpoint debug union — no guest-visible state lost.
  • End-to-end through Penguin's compat shim: panda.cpu_env(cpu) returns a typed CPUArchState * (from the compiled module when present, header otherwise) whose field writes land at the gdb-verified byte offsets; x86_64 fpregs[i].d.low, MMX views, and the flattened breakpoint/watchpoint union all accessible.
  • Build deps: pyelftools>=0.31 + cffi added to the builder image (DWARF5 needs recent pyelftools).

The Penguin-side counterpart (GDB regnum maps, cpu_env()/sync_cpu_state() shim methods, set_retval/set_arg/get_reg semantics) lands separately in rehosting/penguin.

lacraig2 added 3 commits June 10, 2026 08:13
Penguin's PyPANDA compatibility layer needs real guest register reads
and writes to restore PANDA-era semantics for arch.get_reg, arch.set_arg
and the MIPS A3 syscall success/failure flag in arch.set_retval. The
hypercall callback ABI only writes back a single return register, so
expose two small wrappers over the gdbstub register accessors, keyed by
GDB core-feature register number. cpu_synchronize_state keeps them
correct under both TCG and KVM.
Penguin spells the little-endian ppc64 archend "powerpc64el" and asks
for libqemu-system-powerpc64el.so, but the build only staged a
"powerpc64le" alias, leaving the architecture unusable. Build and
package the powerpc64el spelling alongside the existing one (both alias
the ppc64-softmmu target).
The GDB-numbered register accessors cover the core register set only;
Penguin also wants typed access to the rest of the per-target CPU state
(coprocessor registers, timers, FPU) the way PyPANDA exposed env.

Generate a layout-exact CFFI declaration of CPUArchState per target by
walking the DWARF of the just-built library, so the header can never
drift from the binary it ships with. Every emitted struct is verified
field-by-field against DWARF offsets using cffi itself; members that
cannot be represented (bitfields, exotic types, anonymous members) are
dropped and padded over, keeping all other offsets exact.

Export penguin_cpu_env (the CPUState+1 layout contract validated in
cpu-target.c, usable from common code) and penguin_sync_cpu_state so
env reads are fresh and writes stick under KVM. The generated
qemu_cffi_<mode>_<arch>_env.h headers ride along in the existing cffi
manifest and penguin-qemu.tar.gz.
@lacraig2 lacraig2 changed the title penguin: guest register access exports + powerpc64el target penguin: guest register access, CPUArchState CFFI headers, powerpc64el target Jun 10, 2026
Compile an API-mode CFFI extension module per target alongside the
ABI-mode env header. The cdef uses real type names with ellipsis in
every struct and is compiled against the actual QEMU headers using the
flags harvested from compile_commands.json, making the C compiler the
layout authority: bitfields and anonymous members are fully supported
and nothing can drift from the library build. Named structs needed
inside anonymous inline types (where cffi forbids partial types) are
inlined as complete anonymous copies, keeping x87/MMX register unions
accessible.

The module is tied to the build's CPython ABI (cp310 on the 22.04
builder, matching Penguin's container); the ABI-mode header remains as
a Python-version-agnostic fallback. Modules ship in penguin-qemu.tar.gz
under lib/penguin-qemu-env/ via an env_module key in the cffi manifest.
@lacraig2 lacraig2 merged commit 8ccd9c9 into main Jun 11, 2026
1 check passed
lacraig2 added a commit to rehosting/penguin that referenced this pull request Jun 11, 2026
QEMU 0.0.8 ships the guest register access exports, the powerpc64el
system target assets, and the generated CPUArchState env headers from
rehosting/qemu#7, activating the full register/env semantics in the
compat layer. Verified the shim against the released artifact: register
exports resolve, powerpc64el loads with ppc64le conventions, and
panda.cpu_env() works via the ABI env header.

Note: 0.0.8 contains no compiled CFFI env modules -- the builder image
lacks python3-dev, so the module compile silently fell back to
header-only (fix queued in rehosting/qemu). The shim already prefers
compiled modules when a future release ships them; no penguin change
will be needed.
lacraig2 added a commit to rehosting/penguin that referenced this pull request Jun 11, 2026
The QEMU shim previously emulated only the hypercall-captured view of
guest registers: get_reg returned 0 for anything uncaptured, set_arg
mutated host-side state without touching the guest, and set_retval
ignored the convention/failure contract (dropping the MIPS A3
success/failure flag and error negation). Restore PANDA parity using
the new penguin_{read,write}_guest_reg QEMU exports, keyed by per-arch
GDB core-feature register numbers (verified against each target's
gdbstub). set_retval's default convention is now 'default', matching
pandare2, so A3 semantics apply only on an explicit syscall convention.
Everything degrades gracefully (warning, captured-only) against QEMU
libraries lacking the exports.

Add typed CPUArchState access: panda.cpu_env(cpu) returns the full
per-target env (coprocessor registers, timers, FPU) via the compiled
CFFI module shipped with the QEMU package, falling back to the
generated ABI-mode env header; panda.sync_cpu_state(cpu) keeps env
fresh and writes sticking under KVM (cpu_env auto-syncs there).

Restore fail-fast guest callbacks: _record_callback_exception stores
the first fatal handler error, requests shutdown, and run() re-raises
it after the main loop exits, mirroring PyPANDA.

Also fix powerpc64el (normalize to the ppc64le conventions and resolve
the powerpc64le-spelled library/header assets as a fallback) and make
virtual_memory_read(fmt='int') decode unsigned, matching pandare2 --
guest kernel pointers read this way must not come back negative.

Requires rehosting/qemu#7 for the register/env exports and compiled
env modules; without them the new APIs degrade or raise cleanly.
lacraig2 added a commit to rehosting/penguin that referenced this pull request Jun 11, 2026
QEMU 0.0.8 ships the guest register access exports, the powerpc64el
system target assets, and the generated CPUArchState env headers from
rehosting/qemu#7, activating the full register/env semantics in the
compat layer. Verified the shim against the released artifact: register
exports resolve, powerpc64el loads with ppc64le conventions, and
panda.cpu_env() works via the ABI env header.

Note: 0.0.8 contains no compiled CFFI env modules -- the builder image
lacks python3-dev, so the module compile silently fell back to
header-only (fix queued in rehosting/qemu). The shim already prefers
compiled modules when a future release ships them; no penguin change
will be needed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant