Skip to content

Arch: single registry, broad aliases, x86_64 canonical#821

Merged
lacraig2 merged 2 commits into
mainfrom
config-reshape-arch
Jun 11, 2026
Merged

Arch: single registry, broad aliases, x86_64 canonical#821
lacraig2 merged 2 commits into
mainfrom
config-reshape-arch

Conversation

@lacraig2

@lacraig2 lacraig2 commented Jun 6, 2026

Copy link
Copy Markdown
Collaborator

Fourth PR in the config-reshape series. Targets config-reshape-patches (#820) (it overlaps arch.py, utils.get_arch_subdir, and templating.build_context). Merge #818#820 → this.

Makes architecture naming consistent and forgiving: one source of truth, accept any common spelling, and flip the odd-one-out intel64 to x86_64.

One registry

src/penguin/arch_registry.py (pure stdlib) defines one ArchSpec per arch holding every per-namespace name — arch_subdir, dylib_subdir, kmod_subdir, qemu/panda arch, machine, cpu, kconf, kernel_fmt/kernel_whole, serial (major,minor), console fixup, endianness — plus accepted aliases. normalize_arch()/spec()/accessors are alias-tolerant. A unit test asserts the schema Literal equals arch_registry.all_names(), so it can't drift.

Broad aliases, x86_64 canonical

You can now write x86_64, intel64, amd64, arm64, ppc64le, powerpc64el, … — all normalize to one canonical name at load. x86_64 is now canonical (intel64 is an accepted alias), matching the on-disk asset layout where x86_64 was always the odd one out.

Consolidation (deletes the duplicates)

Refactored every consumer to read the registry: utils.get_arch_subdir/get_driver_kmod_path, arch.get_dylib_subdir, dropin_compile.DYLIB_DIRS, q_config.qemu_configs, abi_info, the nvram2 dylib-override copy, and the inline serial/console branches. Bugs fixed along the way:

  • q_config load_q_config("powerpc64le") no longer KeyErrors (it keyed the stale powerpc64el), and it now returns a fresh dict instead of mutating a shared module dict.
  • dropin_compile.DYLIB_DIRS was missing intel64 (would look in dylibs/intel64) — gone.

Safe transition (no x86 breakage)

The only two assets named by the config arch are guesthopper.<arch> and sysroots/<arch>. utils.resolve_arch_asset prefers the canonical (x86_64) name and falls back to an alias filename that actually exists, so x86 rehosting keeps working against the current guesthopper.intel64 artifact. The in-repo Dockerfile sysroots stage is renamed intel64x86_64.

Cross-repo follow-up (deferred, non-breaking): rename the guesthopper sibling artifact guesthopper.intel64guesthopper.x86_64. The resolver covers the interim.

Left intentionally

arch.arch_filter/arch_end and static_analyses keep emitting intel64 — that's the ELF-identifier namespace, distinct from config arch; normalization happens at the identifier→config boundary (set_arch_info).

Known pre-existing smell (not changed)

penguin_prep.py keeps its own ARCH_ABI_INFO copy (drifted from abi_info.py on powerpc64/powerpc64le); I rekeyed/normalized it but didn't merge the duplicate to avoid behavior change. Flagged for a follow-up.

Testing

tests/unit_tests/test_config.py: 75 passing on host — alias normalization (all arches, case-insensitive, unknown raises), Literal↔registry sync, subdir/dylib/kmod parity vs the OLD tables, load_config with intel64 and x86_64 producing identical realized configs, q_config powerpc64le + fresh-dict, abi rekey, dropin dedupe, templating alias→canonical. Recommend tests/comprehensive/test.sh across the arch matrix in-container before merge.

@lacraig2 lacraig2 force-pushed the config-reshape-patches branch from 098c882 to d87ba87 Compare June 7, 2026 02:09
@lacraig2 lacraig2 force-pushed the config-reshape-arch branch from ee2681d to 2adeddf Compare June 7, 2026 02:09
@lacraig2 lacraig2 force-pushed the config-reshape-patches branch from d87ba87 to 388e2b4 Compare June 7, 2026 03:08
@lacraig2 lacraig2 force-pushed the config-reshape-arch branch 2 times, most recently from c544e17 to 9102e90 Compare June 7, 2026 13:57
@lacraig2 lacraig2 force-pushed the config-reshape-patches branch from 388e2b4 to 32a062b Compare June 7, 2026 17:28
@lacraig2 lacraig2 force-pushed the config-reshape-arch branch from 106ae1f to e8ec7a6 Compare June 7, 2026 17:31
@lacraig2 lacraig2 force-pushed the config-reshape-patches branch from 32a062b to 48d2db9 Compare June 8, 2026 13:58
@lacraig2 lacraig2 force-pushed the config-reshape-arch branch from e8ec7a6 to 8738a7e Compare June 8, 2026 13:59
@lacraig2 lacraig2 force-pushed the config-reshape-patches branch 3 times, most recently from 99b376e to da51102 Compare June 11, 2026 17:16
@lacraig2 lacraig2 force-pushed the config-reshape-arch branch from 8738a7e to 72b1945 Compare June 11, 2026 21:29
@lacraig2 lacraig2 marked this pull request as ready for review June 11, 2026 21:30
@lacraig2 lacraig2 changed the base branch from config-reshape-patches to main June 11, 2026 21:37
@lacraig2 lacraig2 closed this Jun 11, 2026
@lacraig2 lacraig2 reopened this Jun 11, 2026
Luke Craig added 2 commits June 11, 2026 17:39
Consolidate the seven scattered/duplicated architecture tables into one source
of truth and let users write any common spelling for an arch.

- New src/penguin/arch_registry.py (pure stdlib): one ArchSpec per arch holding
  every per-namespace name (arch_subdir, dylib_subdir, kmod_subdir, qemu/panda
  arch, machine, cpu, kconf, kernel_fmt/kernel_whole, serial, console, endianness)
  plus accepted aliases. normalize_arch()/spec()/accessors are alias-tolerant.
- Flip x86 canonical to x86_64 (intel64 becomes an alias), matching the on-disk
  asset names. Broad aliases for every arch (arm64->aarch64, ppc64le/powerpc64el
  ->powerpc64le, amd64->x86_64, ...).
- Refactor consumers to read the registry, deleting the duplicate tables:
  utils.get_arch_subdir/get_driver_kmod_path, arch.get_dylib_subdir,
  dropin_compile.DYLIB_DIRS (also fixes its missing-intel64 bug),
  q_config.qemu_configs/load_q_config (now returns a fresh dict and fixes the
  powerpc64le KeyError + the in-place mutation), abi_info (rekey intel64->x86_64
  + arch_abi_info() helper), and the nvram2 dylib-override duplicate. Rewire the
  serial (config_patchers) and console (penguin_run) inline branches.
- Normalize core.arch to canonical at config-load (after merge), so all
  downstream consumers and the realized config use one spelling; widen the schema
  Literal to canonical+aliases (with a unit test asserting it equals
  arch_registry.all_names()); canonicalize {{ arch }} in templating.
- Safe transition for the two assets still named by config arch
  (guesthopper.<arch>, sysroots/<arch>): utils.resolve_arch_asset prefers the
  canonical name and falls back to an alias filename that exists, so x86 keeps
  working until the guesthopper sibling artifact is renamed. Dockerfile sysroots
  stage renamed intel64 -> x86_64. Update plugin arch maps (live_image, unwind,
  kffi) for the flip.

Identifier-namespace code (arch.arch_filter/arch_end, static_analyses) is left
emitting intel64 by design; normalization happens at the identifier->config
boundary (config_patchers.set_arch_info).
@lacraig2 lacraig2 force-pushed the config-reshape-arch branch from 72b1945 to 022af4b Compare June 11, 2026 21:40
@lacraig2 lacraig2 enabled auto-merge (squash) June 11, 2026 21:40
@lacraig2 lacraig2 merged commit 7080253 into main Jun 11, 2026
20 of 27 checks passed
@lacraig2 lacraig2 deleted the config-reshape-arch branch June 12, 2026 21:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant