Skip to content

insmod kernel module slow issue #32

@matthewgui

Description

@matthewgui

insmod kernel module slow issue

Issue Description

A customer reported that loading kernel modules using insmod on Linux Kernel 6.6.90 is slow. Through strace analysis of the insmod system calls, it was found that the finit_module function is time-consuming. finit_module is a Linux system call. Further analysis by printing kernel timestamps revealed that module_frob_arch_sections is the main bottleneck.

An approximately 2.8MB .ko file on a UX900 50MHz FPGA takes about 11.4 seconds in total for insmod, with most of the time consumed by the module_frob_arch_sections function.

Issue Solution

The issue has already been addressed in the upstream Linux kernel (v6.18) with a series of optimizations to arch/riscv/kernel/module-sections.c.

Backporting the relevant upstream patches resolved the problem. The same 2.8MB module on the same hardware now loads in only 1.9 seconds, representing a performance improvement of over 80%.

Upstream Patches

commit c42458fcf54b3d0bc2ac06667c98dceb43831889
Author: Miaoqian Lin <linmq006@gmail.com>
Date:   Mon Oct 27 11:40:44 2025 -0600

    riscv: Fix memory leak in module_frob_arch_sections()

    The current code directly overwrites the scratch pointer with the
    return value of kvrealloc(). If kvrealloc() fails and returns NULL,
    the original buffer becomes unreachable, causing a memory leak.

    Fix this by using a temporary variable to store kvrealloc()'s return
    value and only update the scratch pointer on success.

    Found via static anlaysis and this is similar to commit 42378a9ca553
    ("bpf, verifier: Fix memory leak in array reallocation for stack state")

    Fixes: be17c0df6795 ("riscv: module: Optimize PLT/GOT entry counting")
    Cc: stable@vger.kernel.org
    Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
    Link: https://lore.kernel.org/r/20251026091912.39727-1-linmq006@gmail.com
    Signed-off-by: Paul Walmsley <pjw@kernel.org>

commit be17c0df67959fe4f88dac75dc26ed9252d4b133
Author: Samuel Holland <samuel.holland@sifive.com>
Date:   Wed Apr 9 10:14:51 2025 -0700

    riscv: module: Optimize PLT/GOT entry counting

    perf reports that 99.63% of the cycles from `modprobe amdgpu` are spent
    inside module_frob_arch_sections(). This is because amdgpu.ko contains
    about 300000 relocations in its .rela.text section, and the algorithm in
    count_max_entries() takes quadratic time.

    Apply two optimizations from the arm64 code, which together reduce the
    total execution time by 99.58%. First, sort the relocations so duplicate
    entries are adjacent. Second, reduce the number of relocations that must
    be sorted by filtering to only relocations that need PLT/GOT entries, as
    done in commit d4e0340919fb ("arm64/module: Optimize module load time by
    optimizing PLT counting").

    Unlike the arm64 code, here the filtering and sorting is done in a
    scratch buffer, because the HI20 relocation search optimization in
    apply_relocate_add() depends on the original order of the relocations.
    This allows accumulating PLT/GOT relocations across sections so sorting
    and counting is only done once per module.

    Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
    Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
    Link: https://lore.kernel.org/r/20250409171526.862481-3-samuel.holland@sifive.com
    Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
    Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>

For reference, the full patch context can be found at:
https://github.com/Nuclei-Software/linux/commits/2dcdd77f9fbe4cf521fb4ad63a8a2ce9f7f876a8/arch/riscv/kernel/module-sections.c

commit c42458fcf54b3d0bc2ac06667c98dceb43831889

commit be17c0df67959fe4f88dac75dc26ed9252d4b133

Additional Patches for RV32

For 32-bit RISC-V (RV32), the following additional patch is required to correctly handle the R_RISCV_PLT32 relocation type and allocate sufficient PLT space:

commit 1ee1313f4722e6d67c6e9447ee81d24d6e3ff4ad
Author: Samuel Holland <samuel.holland@sifive.com>
Date:   Wed Apr 9 10:14:50 2025 -0700

    riscv: module: Allocate PLT entries for R_RISCV_PLT32

    apply_r_riscv_plt32_rela() may need to emit a PLT entry for the
    referenced symbol, so there must be space allocated in the PLT.

    Fixes: 8fd6c5142395 ("riscv: Add remaining module relocations")
    Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
    Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
    Link: https://lore.kernel.org/r/20250409171526.862481-2-samuel.holland@sifive.com
    Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>

Note: Depending on the exact kernel version and configuration, backporting may also require additional preparatory patches, for instance to module.c.

Metadata

Metadata

Assignees

Labels

documentationImprovements or additions to documentation

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions