Skip to content

Device-node stripping is unconditional and silent — can break rehosts of static-/dev firmware #53

@lacraig2

Description

@lacraig2

Summary

fw2tar unconditionally strips all character and block device nodes from every
extracted filesystem, and the only record of what was removed is opt-in. For firmware
that ships a static /dev (no devtmpfs/mdev at runtime), this silently removes
nodes the firmware needs, which can surface downstream as a hard-to-diagnose daemon crash
rather than an obvious "missing device" error.

Current behavior

  • src/archive.rs excludes any node where is_block_device() || is_char_device() from the
    output tar (the entry is simply not added).
  • --log-devices (default off, see src/args.rs) is the only way to find out what was
    dropped; it writes a *.devices.log listing the paths, but nothing recreates them and the
    major/minor/type aren't captured.

This is reasonable as a default — tar created as a non-root user can't mknod, and most
modern targets recreate /dev at runtime — but it's lossy and silent.

Where it's fine vs. where it bites

  • devtmpfs/mdev firmware (most modern Linux targets): harmless. The image ships an
    essentially empty /dev and the kernel/userland repopulate it at boot. Stripping removes
    almost nothing (often just /dev/console).
  • Static-/dev firmware (older/simpler embedded SDK images): the device nodes live in
    the rootfs itself. Stripping them means a daemon doing open("/dev/<x>") gets ENOENT;
    if that return isn't checked (common in vendor C code), the result is a NULL-deref
    segfault at startup
    that looks like a generic crash, not a missing-file error. Because
    --log-devices is off by default, there's no breadcrumb pointing at extraction.

A quick way to tell which class you're in: inspect the extracted rootfs /dev — empty
directory => runtime-populated (strip is harmless); pre-populated with nodes => static
/dev (strip is potentially destructive).

Why this matters for rehosting

In a recent rehost, a web daemon bound its port and then crashed on startup. The
investigation could have been short-circuited if the extraction step had surfaced "these N
device nodes were removed" by default. (In that particular case the firmware was
devtmpfs-based, so the strip turned out not to be the cause — but precisely because
there was no default manifest, ruling fw2tar in/out took manual digging. The signal is
cheap; its absence is what costs time.)

Suggested improvements (in rough order of value)

  1. Always emit the device manifest (not gated behind --log-devices), or at minimum
    emit it whenever any node was dropped. Cheap, and it turns a silent loss into a visible
    one.
  2. Record type, major, minor, and mode in that manifest, not just the path, so a
    downstream consumer can faithfully recreate the nodes (e.g. a sidecar *.devices.tsv/JSON).
    Today only the path is logged.
  3. Optionally re-materialize the nodes for consumers that can act on it — via a
    fakeroot-style path that preserves them in the archive, or by leaving the manifest for
    the orchestrator (e.g. Penguin) to recreate as static device files. The manifest from (2)
    is the enabling piece.

Scope note

This is specifically about static-/dev firmware. For devtmpfs targets the missing
runtime state is driver/sysfs-created and is not addressed by preserving image device
nodes — that's a separate, orchestrator-side modeling concern and shouldn't be conflated
with this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions