Skip to content

Dynamic pseudofile generator emits invalid paths (greedy string scrape) — crashes kernel / clobbers /dev/null #830

@lacraig2

Description

@lacraig2

Summary

The PseudofileFinder static analysis scrapes /dev/* and /proc/* paths from firmware binaries with greedy regexes that run across string boundaries, producing invalid, concatenated pseudofile paths. PseudofilesTailored then faithfully models each one (pseudofiles.dynamic patch). The bogus entries are usually harmless noise, but some actively break or crash rehostings:

Where it comes from

  • src/penguin/static_analyses.py:743-744 — the finder patterns:
    dev_pattern  = re.compile(r"/dev/([a-zA-Z0-9_/]+)",  re.MULTILINE)
    proc_pattern = re.compile(r"/proc/([a-zA-Z0-9_/]+)", re.MULTILINE)
    Because / is inside the character class, a match does not stop at the end of a real path — it greedily consumes any following [A-Za-z0-9_/] run, so two adjacent strings in a binary (or a string immediately followed by more alnum bytes) get glued into one nonsensical path.
  • src/penguin/gen_config.py:251 — wires the finder output into the generator: CP.PseudofilesTailored(static_results['PseudofileFinder']).
  • src/penguin/config_patchers.py:1640PseudofilesTailored.generate() emits a default model for every path it was handed, with no validation.

Examples actually produced (GL.iNet GL-MT3000 firmware, aarch64)

From the generated static_patches/pseudofiles.dynamic.yaml:

/dev/null/dev/ptmx/dev/pts/SSH_FX_OKhmac      # makes /dev/null a directory
/dev/nullfiles    /dev/nullinterruptbus    /dev/nullstream
/proc/sys/fs/binfmt_misc/WSLInteropos/signal  # crashes 4.10 register_sysctl
/proc/sys/kernel/hostname2006   /proc/sys/kernel/hostnamejson
/proc/sys/net/core/somaxconnabi /proc/sys/net/core/somaxconnos
/proc/sys/net/ipv6flush   /proc/sys/none

WSLInteropos = WSLInterop + os, somaxconnabi = somaxconn + abi, hostname2006 = hostname + 2006, etc. — classic "matched past the real path boundary."

Impact

Suggested fixes (any/all)

  1. Bound the extraction. Don't let / extend a match indefinitely. Capture path components conservatively (e.g. stop at the first byte that isn't a valid filename char, cap path depth/length, require the match to be NUL/quote/whitespace-delimited in the binary rather than mid-run).
  2. Validate before modeling in PseudofilesTailored / the finder: reject paths with implausible components, reject creating an entry whose path nests under an existing device node (/dev/null/...), and skip subtrees that are filesystem mounts rather than pseudofiles/sysctls (/proc/sys/fs/binfmt_misc, etc.).
  3. Never turn a known char device into a directory. Guard /dev/null, /dev/zero, /dev/console, … specifically — modeling a child path under them should be dropped, not allowed to recreate the parent as a dir.

#829 added a guard for the sysctl symptom on the Penguin side (and igloo_driver#76 as a kernel backstop), but those only stop the crash; the generator still emits invalid paths that cause other breakage (the /dev/null case above). This issue is about fixing the source.

Repro

penguin init any firmware that contains binaries with dense path-like strings (the GL-MT3000 image is a reliable example) and inspect static_patches/pseudofiles.dynamic.yaml for entries like the ones above.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions