insmod kernel module slow issue
Issue Description
A customer reported that loading kernel modules using insmod on Linux Kernel 6.6.90 is slow. Through strace analysis of the insmod system calls, it was found that the finit_module function is time-consuming. finit_module is a Linux system call. Further analysis by printing kernel timestamps revealed that module_frob_arch_sections is the main bottleneck.
An approximately 2.8MB .ko file on a UX900 50MHz FPGA takes about 11.4 seconds in total for insmod, with most of the time consumed by the module_frob_arch_sections function.
Issue Solution
The issue has already been addressed in the upstream Linux kernel (v6.18) with a series of optimizations to arch/riscv/kernel/module-sections.c.
Backporting the relevant upstream patches resolved the problem. The same 2.8MB module on the same hardware now loads in only 1.9 seconds, representing a performance improvement of over 80%.
Upstream Patches
commit c42458fcf54b3d0bc2ac06667c98dceb43831889
Author: Miaoqian Lin <linmq006@gmail.com>
Date: Mon Oct 27 11:40:44 2025 -0600
riscv: Fix memory leak in module_frob_arch_sections()
The current code directly overwrites the scratch pointer with the
return value of kvrealloc(). If kvrealloc() fails and returns NULL,
the original buffer becomes unreachable, causing a memory leak.
Fix this by using a temporary variable to store kvrealloc()'s return
value and only update the scratch pointer on success.
Found via static anlaysis and this is similar to commit 42378a9ca553
("bpf, verifier: Fix memory leak in array reallocation for stack state")
Fixes: be17c0df6795 ("riscv: module: Optimize PLT/GOT entry counting")
Cc: stable@vger.kernel.org
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Link: https://lore.kernel.org/r/20251026091912.39727-1-linmq006@gmail.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
commit be17c0df67959fe4f88dac75dc26ed9252d4b133
Author: Samuel Holland <samuel.holland@sifive.com>
Date: Wed Apr 9 10:14:51 2025 -0700
riscv: module: Optimize PLT/GOT entry counting
perf reports that 99.63% of the cycles from `modprobe amdgpu` are spent
inside module_frob_arch_sections(). This is because amdgpu.ko contains
about 300000 relocations in its .rela.text section, and the algorithm in
count_max_entries() takes quadratic time.
Apply two optimizations from the arm64 code, which together reduce the
total execution time by 99.58%. First, sort the relocations so duplicate
entries are adjacent. Second, reduce the number of relocations that must
be sorted by filtering to only relocations that need PLT/GOT entries, as
done in commit d4e0340919fb ("arm64/module: Optimize module load time by
optimizing PLT counting").
Unlike the arm64 code, here the filtering and sorting is done in a
scratch buffer, because the HI20 relocation search optimization in
apply_relocate_add() depends on the original order of the relocations.
This allows accumulating PLT/GOT relocations across sections so sorting
and counting is only done once per module.
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20250409171526.862481-3-samuel.holland@sifive.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
For reference, the full patch context can be found at:
https://github.com/Nuclei-Software/linux/commits/2dcdd77f9fbe4cf521fb4ad63a8a2ce9f7f876a8/arch/riscv/kernel/module-sections.c
commit c42458fcf54b3d0bc2ac06667c98dceb43831889
commit be17c0df67959fe4f88dac75dc26ed9252d4b133
Additional Patches for RV32
For 32-bit RISC-V (RV32), the following additional patch is required to correctly handle the R_RISCV_PLT32 relocation type and allocate sufficient PLT space:
commit 1ee1313f4722e6d67c6e9447ee81d24d6e3ff4ad
Author: Samuel Holland <samuel.holland@sifive.com>
Date: Wed Apr 9 10:14:50 2025 -0700
riscv: module: Allocate PLT entries for R_RISCV_PLT32
apply_r_riscv_plt32_rela() may need to emit a PLT entry for the
referenced symbol, so there must be space allocated in the PLT.
Fixes: 8fd6c5142395 ("riscv: Add remaining module relocations")
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20250409171526.862481-2-samuel.holland@sifive.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Note: Depending on the exact kernel version and configuration, backporting may also require additional preparatory patches, for instance to module.c.
insmod kernel module slow issue
Issue Description
A customer reported that loading kernel modules using
insmodon Linux Kernel 6.6.90 is slow. Through strace analysis of theinsmodsystem calls, it was found that thefinit_modulefunction is time-consuming.finit_moduleis a Linux system call. Further analysis by printing kernel timestamps revealed thatmodule_frob_arch_sectionsis the main bottleneck.An approximately 2.8MB
.kofile on a UX900 50MHz FPGA takes about 11.4 seconds in total forinsmod, with most of the time consumed by themodule_frob_arch_sectionsfunction.Issue Solution
The issue has already been addressed in the upstream Linux kernel (v6.18) with a series of optimizations to
arch/riscv/kernel/module-sections.c.Backporting the relevant upstream patches resolved the problem. The same 2.8MB module on the same hardware now loads in only 1.9 seconds, representing a performance improvement of over 80%.
Upstream Patches
For reference, the full patch context can be found at:
https://github.com/Nuclei-Software/linux/commits/2dcdd77f9fbe4cf521fb4ad63a8a2ce9f7f876a8/arch/riscv/kernel/module-sections.c
commit c42458fcf54b3d0bc2ac06667c98dceb43831889
commit be17c0df67959fe4f88dac75dc26ed9252d4b133
Additional Patches for RV32
For 32-bit RISC-V (RV32), the following additional patch is required to correctly handle the
R_RISCV_PLT32relocation type and allocate sufficient PLT space: