Summary
The PseudofileFinder static analysis scrapes /dev/* and /proc/* paths from firmware binaries with greedy regexes that run across string boundaries, producing invalid, concatenated pseudofile paths. PseudofilesTailored then faithfully models each one (pseudofiles.dynamic patch). The bogus entries are usually harmless noise, but some actively break or crash rehostings:
Where it comes from
src/penguin/static_analyses.py:743-744 — the finder patterns:
dev_pattern = re.compile(r"/dev/([a-zA-Z0-9_/]+)", re.MULTILINE)
proc_pattern = re.compile(r"/proc/([a-zA-Z0-9_/]+)", re.MULTILINE)
Because / is inside the character class, a match does not stop at the end of a real path — it greedily consumes any following [A-Za-z0-9_/] run, so two adjacent strings in a binary (or a string immediately followed by more alnum bytes) get glued into one nonsensical path.
src/penguin/gen_config.py:251 — wires the finder output into the generator: CP.PseudofilesTailored(static_results['PseudofileFinder']).
src/penguin/config_patchers.py:1640 — PseudofilesTailored.generate() emits a default model for every path it was handed, with no validation.
Examples actually produced (GL.iNet GL-MT3000 firmware, aarch64)
From the generated static_patches/pseudofiles.dynamic.yaml:
/dev/null/dev/ptmx/dev/pts/SSH_FX_OKhmac # makes /dev/null a directory
/dev/nullfiles /dev/nullinterruptbus /dev/nullstream
/proc/sys/fs/binfmt_misc/WSLInteropos/signal # crashes 4.10 register_sysctl
/proc/sys/kernel/hostname2006 /proc/sys/kernel/hostnamejson
/proc/sys/net/core/somaxconnabi /proc/sys/net/core/somaxconnos
/proc/sys/net/ipv6flush /proc/sys/none
WSLInteropos = WSLInterop + os, somaxconnabi = somaxconn + abi, hostname2006 = hostname + 2006, etc. — classic "matched past the real path boundary."
Impact
Suggested fixes (any/all)
- Bound the extraction. Don't let
/ extend a match indefinitely. Capture path components conservatively (e.g. stop at the first byte that isn't a valid filename char, cap path depth/length, require the match to be NUL/quote/whitespace-delimited in the binary rather than mid-run).
- Validate before modeling in
PseudofilesTailored / the finder: reject paths with implausible components, reject creating an entry whose path nests under an existing device node (/dev/null/...), and skip subtrees that are filesystem mounts rather than pseudofiles/sysctls (/proc/sys/fs/binfmt_misc, etc.).
- Never turn a known char device into a directory. Guard
/dev/null, /dev/zero, /dev/console, … specifically — modeling a child path under them should be dropped, not allowed to recreate the parent as a dir.
#829 added a guard for the sysctl symptom on the Penguin side (and igloo_driver#76 as a kernel backstop), but those only stop the crash; the generator still emits invalid paths that cause other breakage (the /dev/null case above). This issue is about fixing the source.
Repro
penguin init any firmware that contains binaries with dense path-like strings (the GL-MT3000 image is a reliable example) and inspect static_patches/pseudofiles.dynamic.yaml for entries like the ones above.
Summary
The
PseudofileFinderstatic analysis scrapes/dev/*and/proc/*paths from firmware binaries with greedy regexes that run across string boundaries, producing invalid, concatenated pseudofile paths.PseudofilesTailoredthen faithfully models each one (pseudofiles.dynamicpatch). The bogus entries are usually harmless noise, but some actively break or crash rehostings:/proc/sys/fs/binfmt_misc/WSLInteropos/signalmakes the guest try to create actl_tableunder the filesystem-backedbinfmt_miscnode. On older kernels (e.g. 4.10, shipped for aarch64) the failedregister_sysctl()panics in its cleanup path and kills init ~5s into boot. (Mitigated as a symptom in hyperfile/sysctl: reject unregisterable sysctl paths (fixes aarch64 boot panic) #829 / portal_sysctl: don't panic creating sysctls under non-existent dirs (old kernels) igloo_driver#76, but the bad input is still generated.)/dev/nullinto a directory. The entry/dev/null/dev/ptmx/dev/pts/SSH_FX_OKhmacforces/dev/nullto be created as a directory, so every>/dev/nullredirect in the guest's init fails (can't create /dev/null: Is a directory) and services never start.Where it comes from
src/penguin/static_analyses.py:743-744— the finder patterns:/is inside the character class, a match does not stop at the end of a real path — it greedily consumes any following[A-Za-z0-9_/]run, so two adjacent strings in a binary (or a string immediately followed by more alnum bytes) get glued into one nonsensical path.src/penguin/gen_config.py:251— wires the finder output into the generator:CP.PseudofilesTailored(static_results['PseudofileFinder']).src/penguin/config_patchers.py:1640—PseudofilesTailored.generate()emits a default model for every path it was handed, with no validation.Examples actually produced (GL.iNet GL-MT3000 firmware, aarch64)
From the generated
static_patches/pseudofiles.dynamic.yaml:WSLInteropos=WSLInterop+os,somaxconnabi=somaxconn+abi,hostname2006=hostname+2006, etc. — classic "matched past the real path boundary."Impact
/dev/nullwas a directory. Dropping thepseudofiles.dynamicpatch entirely makes the device boot cleanly to its full service set (uhttpd, dnsmasq, dropbear, samba, avahi). Theexamples/cisco/rv130device already works around the analogous auto-generated models bysed-deleting the patch fromconfig.yaml, which suggests this bites broadly.Suggested fixes (any/all)
/extend a match indefinitely. Capture path components conservatively (e.g. stop at the first byte that isn't a valid filename char, cap path depth/length, require the match to be NUL/quote/whitespace-delimited in the binary rather than mid-run).PseudofilesTailored/ the finder: reject paths with implausible components, reject creating an entry whose path nests under an existing device node (/dev/null/...), and skip subtrees that are filesystem mounts rather than pseudofiles/sysctls (/proc/sys/fs/binfmt_misc, etc.)./dev/null,/dev/zero,/dev/console, … specifically — modeling a child path under them should be dropped, not allowed to recreate the parent as a dir.#829 added a guard for the sysctl symptom on the Penguin side (and igloo_driver#76 as a kernel backstop), but those only stop the crash; the generator still emits invalid paths that cause other breakage (the
/dev/nullcase above). This issue is about fixing the source.Repro
penguin initany firmware that contains binaries with dense path-like strings (the GL-MT3000 image is a reliable example) and inspectstatic_patches/pseudofiles.dynamic.yamlfor entries like the ones above.