Skip to content

Restore PANDA-faithful semantics in the QEMU compatibility layer#832

Merged
lacraig2 merged 6 commits into
mainfrom
fix/qemu-compat-parity
Jun 11, 2026
Merged

Restore PANDA-faithful semantics in the QEMU compatibility layer#832
lacraig2 merged 6 commits into
mainfrom
fix/qemu-compat-parity

Conversation

@lacraig2

@lacraig2 lacraig2 commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

Summary

Fixes a batch of regressions from the panda-ng → qemu migration, identified in a systematic audit of the old stack (oldpenguin + PANDA) vs the new one. Counterpart to rehosting/qemu#7, which adds the QEMU-side exports these fixes use.

Register/env semantics (compat/qemu_compat.py) — the shim previously only emulated the hypercall-captured register view:

  • arch.get_reg on uncaptured registers returned 0; now reads the real register via the new GDB-numbered QEMU exports (per-arch regnum maps verified against each target's gdbstub).
  • arch.set_arg was a host-side-only no-op (live callers in nvram2); now writes through to the guest.
  • arch.set_retval ignored convention/failure; MIPS A3 success/failure flag and error negation restored, gated on an explicit convention="syscall" to match pandare2's 'default' default.
  • New panda.cpu_env(cpu) returns a typed CPUArchState * (full env: coprocessor registers, timers, FPU) via the compiled CFFI module shipped in the QEMU package, with the generated ABI-mode header as fallback; panda.sync_cpu_state(cpu) handles KVM freshness/write-back and cpu_env auto-syncs there.
  • powerpc64el now normalizes to ppc64le conventions and resolves powerpc64le-spelled assets — the arch was entirely unusable before.
  • virtual_memory_read(fmt="int") decodes unsigned again (pandare2 parity; kernel pointers must not come back negative).

Fail-fast guest callbacks — hypercall handler exceptions were logged and swallowed, with the guest seeing rv=0 for an unserviced hypercall. Now the first fatal error stops dispatch, ends the emulation, and re-raises out of panda.run(), mirroring PyPANDA.

Syscall-event writeback — the direct writeback path lost the address mask and the portal fallback of the old mem.write_bytes path; hooked-syscall modifications could be silently dropped. Both restored.

Portalcalls — unregistered magics skipped the real syscall and faked success (guest sendto() returned 0); now the real syscall runs and the first miss per magic logs at error level.

Events — portalcall-delivered events published cpu=None; subscribers doing memory reads got a NULL CPUState*. Now the real current CPU is passed on both delivery paths.

Now pins QEMU 0.0.8, which ships the rehosting/qemu#7 exports, the powerpc64el assets, and the generated env headers — so the full register/env semantics are active. Verified the shim directly against the released v0.0.8 artifact: register exports resolve, powerpc64el loads with ppc64le conventions, and panda.cpu_env() works via the ABI env header.

One caveat: v0.0.8 contains no compiled CFFI env modules — the qemu builder image lacks python3-dev, so the module build silently fell back to header-only (fix + release gate in rehosting/qemu#8). The shim already prefers compiled modules when present, so the next qemu release activates them with no penguin change.

Testing

  • 11-check behavioral harness: MIPS A3/negation/convention gating, set_arg write-through + graceful degradation, get_reg fallback, fail-fast recording, portalcall passthrough with the registered path intact, unsigned fmt="int", powerpc64el resolution.
  • End-to-end against a locally built qemu lib with the PR#7 exports: register writes land in env at gdb-verified offsets; cpu_env field writes (CP0_Count, active_tc.gpr[7], x86 fpregs[i].d.low) land at gdb-verified byte offsets via both the compiled module and the header fallback.
  • tests/unit_tests/test_kvm_logic.py / test_kvm_runner_final.py unchanged (3 pre-existing host-environment failures present with and without these changes).

Note: the rw_logger target_long cast site flagged in the audit is already covered by 79a8f89's guest-sized target_long typedefs — no change needed there.

lacraig2 added 6 commits June 10, 2026 20:32
The QEMU shim previously emulated only the hypercall-captured view of
guest registers: get_reg returned 0 for anything uncaptured, set_arg
mutated host-side state without touching the guest, and set_retval
ignored the convention/failure contract (dropping the MIPS A3
success/failure flag and error negation). Restore PANDA parity using
the new penguin_{read,write}_guest_reg QEMU exports, keyed by per-arch
GDB core-feature register numbers (verified against each target's
gdbstub). set_retval's default convention is now 'default', matching
pandare2, so A3 semantics apply only on an explicit syscall convention.
Everything degrades gracefully (warning, captured-only) against QEMU
libraries lacking the exports.

Add typed CPUArchState access: panda.cpu_env(cpu) returns the full
per-target env (coprocessor registers, timers, FPU) via the compiled
CFFI module shipped with the QEMU package, falling back to the
generated ABI-mode env header; panda.sync_cpu_state(cpu) keeps env
fresh and writes sticking under KVM (cpu_env auto-syncs there).

Restore fail-fast guest callbacks: _record_callback_exception stores
the first fatal handler error, requests shutdown, and run() re-raises
it after the main loop exits, mirroring PyPANDA.

Also fix powerpc64el (normalize to the ppc64le conventions and resolve
the powerpc64le-spelled library/header assets as a fallback) and make
virtual_memory_read(fmt='int') decode unsigned, matching pandare2 --
guest kernel pointers read this way must not come back negative.

Requires rehosting/qemu#7 for the register/env exports and compiled
env modules; without them the new APIs degrade or raise cleanly.
Exceptions in hypercall handlers were caught, logged, and swallowed;
the guest saw rv=0 (success) for a hypercall that was never serviced
and the run continued diverging silently. PyPANDA was fail-fast.
Record the exception, stop dispatching, and end the emulation; the
error re-raises out of panda.run().
The direct syscall-event writeback path lost two protections the old
mem.write_bytes path had: the address mask (sign-extended 32-bit guest
pointers) and the guest-mediated portal fallback when the direct QEMU
write fails. A hooked syscall whose modified event could not be written
back silently lost its skip/retval/arg rewrites. Restore both.
An unregistered portalcall magic skipped the guest syscall and returned
0, making the guest's sendto() appear to succeed; previously the real
syscall executed and failed loudly. Return a missing-handler sentinel
so the syscall runs, and log the first miss per magic at error level
instead of debug.
Portalcall-delivered events published cpu=None, so subscribers doing
memory reads (e.g. hyper/shell) would hand a NULL CPUState to the
compat layer. The portalcall arrives via a syscall hypercall on the
vCPU thread, so pass the real current CPU, keeping the subscriber
signature identical across both delivery paths.
QEMU 0.0.8 ships the guest register access exports, the powerpc64el
system target assets, and the generated CPUArchState env headers from
rehosting/qemu#7, activating the full register/env semantics in the
compat layer. Verified the shim against the released artifact: register
exports resolve, powerpc64el loads with ppc64le conventions, and
panda.cpu_env() works via the ABI env header.

Note: 0.0.8 contains no compiled CFFI env modules -- the builder image
lacks python3-dev, so the module compile silently fell back to
header-only (fix queued in rehosting/qemu). The shim already prefers
compiled modules when a future release ships them; no penguin change
will be needed.
@lacraig2 lacraig2 merged commit c19b599 into main Jun 11, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant