Restore PANDA-faithful semantics in the QEMU compatibility layer#832
Merged
Conversation
The QEMU shim previously emulated only the hypercall-captured view of
guest registers: get_reg returned 0 for anything uncaptured, set_arg
mutated host-side state without touching the guest, and set_retval
ignored the convention/failure contract (dropping the MIPS A3
success/failure flag and error negation). Restore PANDA parity using
the new penguin_{read,write}_guest_reg QEMU exports, keyed by per-arch
GDB core-feature register numbers (verified against each target's
gdbstub). set_retval's default convention is now 'default', matching
pandare2, so A3 semantics apply only on an explicit syscall convention.
Everything degrades gracefully (warning, captured-only) against QEMU
libraries lacking the exports.
Add typed CPUArchState access: panda.cpu_env(cpu) returns the full
per-target env (coprocessor registers, timers, FPU) via the compiled
CFFI module shipped with the QEMU package, falling back to the
generated ABI-mode env header; panda.sync_cpu_state(cpu) keeps env
fresh and writes sticking under KVM (cpu_env auto-syncs there).
Restore fail-fast guest callbacks: _record_callback_exception stores
the first fatal handler error, requests shutdown, and run() re-raises
it after the main loop exits, mirroring PyPANDA.
Also fix powerpc64el (normalize to the ppc64le conventions and resolve
the powerpc64le-spelled library/header assets as a fallback) and make
virtual_memory_read(fmt='int') decode unsigned, matching pandare2 --
guest kernel pointers read this way must not come back negative.
Requires rehosting/qemu#7 for the register/env exports and compiled
env modules; without them the new APIs degrade or raise cleanly.
Exceptions in hypercall handlers were caught, logged, and swallowed; the guest saw rv=0 (success) for a hypercall that was never serviced and the run continued diverging silently. PyPANDA was fail-fast. Record the exception, stop dispatching, and end the emulation; the error re-raises out of panda.run().
The direct syscall-event writeback path lost two protections the old mem.write_bytes path had: the address mask (sign-extended 32-bit guest pointers) and the guest-mediated portal fallback when the direct QEMU write fails. A hooked syscall whose modified event could not be written back silently lost its skip/retval/arg rewrites. Restore both.
An unregistered portalcall magic skipped the guest syscall and returned 0, making the guest's sendto() appear to succeed; previously the real syscall executed and failed loudly. Return a missing-handler sentinel so the syscall runs, and log the first miss per magic at error level instead of debug.
Portalcall-delivered events published cpu=None, so subscribers doing memory reads (e.g. hyper/shell) would hand a NULL CPUState to the compat layer. The portalcall arrives via a syscall hypercall on the vCPU thread, so pass the real current CPU, keeping the subscriber signature identical across both delivery paths.
QEMU 0.0.8 ships the guest register access exports, the powerpc64el system target assets, and the generated CPUArchState env headers from rehosting/qemu#7, activating the full register/env semantics in the compat layer. Verified the shim against the released artifact: register exports resolve, powerpc64el loads with ppc64le conventions, and panda.cpu_env() works via the ABI env header. Note: 0.0.8 contains no compiled CFFI env modules -- the builder image lacks python3-dev, so the module compile silently fell back to header-only (fix queued in rehosting/qemu). The shim already prefers compiled modules when a future release ships them; no penguin change will be needed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes a batch of regressions from the panda-ng → qemu migration, identified in a systematic audit of the old stack (oldpenguin + PANDA) vs the new one. Counterpart to rehosting/qemu#7, which adds the QEMU-side exports these fixes use.
Register/env semantics (
compat/qemu_compat.py) — the shim previously only emulated the hypercall-captured register view:arch.get_regon uncaptured registers returned 0; now reads the real register via the new GDB-numbered QEMU exports (per-arch regnum maps verified against each target's gdbstub).arch.set_argwas a host-side-only no-op (live callers in nvram2); now writes through to the guest.arch.set_retvalignoredconvention/failure; MIPS A3 success/failure flag and error negation restored, gated on an explicitconvention="syscall"to match pandare2's'default'default.panda.cpu_env(cpu)returns a typedCPUArchState *(full env: coprocessor registers, timers, FPU) via the compiled CFFI module shipped in the QEMU package, with the generated ABI-mode header as fallback;panda.sync_cpu_state(cpu)handles KVM freshness/write-back andcpu_envauto-syncs there.powerpc64elnow normalizes to ppc64le conventions and resolvespowerpc64le-spelled assets — the arch was entirely unusable before.virtual_memory_read(fmt="int")decodes unsigned again (pandare2 parity; kernel pointers must not come back negative).Fail-fast guest callbacks — hypercall handler exceptions were logged and swallowed, with the guest seeing rv=0 for an unserviced hypercall. Now the first fatal error stops dispatch, ends the emulation, and re-raises out of
panda.run(), mirroring PyPANDA.Syscall-event writeback — the direct writeback path lost the address mask and the portal fallback of the old
mem.write_bytespath; hooked-syscall modifications could be silently dropped. Both restored.Portalcalls — unregistered magics skipped the real syscall and faked success (guest
sendto()returned 0); now the real syscall runs and the first miss per magic logs at error level.Events — portalcall-delivered events published
cpu=None; subscribers doing memory reads got a NULLCPUState*. Now the real current CPU is passed on both delivery paths.Now pins QEMU 0.0.8, which ships the rehosting/qemu#7 exports, the powerpc64el assets, and the generated env headers — so the full register/env semantics are active. Verified the shim directly against the released v0.0.8 artifact: register exports resolve, powerpc64el loads with ppc64le conventions, and
panda.cpu_env()works via the ABI env header.One caveat: v0.0.8 contains no compiled CFFI env modules — the qemu builder image lacks
python3-dev, so the module build silently fell back to header-only (fix + release gate in rehosting/qemu#8). The shim already prefers compiled modules when present, so the next qemu release activates them with no penguin change.Testing
fmt="int", powerpc64el resolution.cpu_envfield writes (CP0_Count,active_tc.gpr[7], x86fpregs[i].d.low) land at gdb-verified byte offsets via both the compiled module and the header fallback.tests/unit_tests/test_kvm_logic.py/test_kvm_runner_final.pyunchanged (3 pre-existing host-environment failures present with and without these changes).Note: the
rw_loggertarget_long cast site flagged in the audit is already covered by 79a8f89's guest-sizedtarget_longtypedefs — no change needed there.