From 54108bb6a6cfabd4decc3fe7620b879538d7e8a9 Mon Sep 17 00:00:00 2001
From: Yoshifumi Nakamura <nakamura@riken.jp>
Date: Tue, 16 Jun 2026 22:05:25 +0900
Subject: [PATCH 1/4] Add GPU LightGBM estimator package

Connect PerfTools LightGBM_model/1.0 as a GPU kernel section package so GENESIS can exercise the second GPU estimation model through the existing instrumented section flow.

Add a small NCU raw CSV preparation bridge for LightGBM, wire the shared PerfTools checkout into the estimation runner, and make GENESIS default to the LightGBM GPU section package while still allowing BK_GENESIS_GPU_SECTION_PACKAGE to select the MLP path.

Keep padata archives smaller by excluding heavy Nsight Compute binary reports by default. The raw CSV and text import remain available for estimation and portal summaries, with BK_PROFILER_ARCHIVE_NCU_REPORT=true left as an explicit debugging opt-in.

Add shell coverage for the LightGBM prediction CSV path, GENESIS package selection, and the new profiler archive behavior.

Signed-off-by: Yoshifumi Nakamura <nakamura@riken.jp>
---
 .github/workflows/result-server-tests.yml     |   3 +
 docs/guides/add-app.md                        |   3 +-
 docs/guides/add-estimation-package.md         |  20 +-
 docs/guides/profiler-support.md               |   6 +-
 programs/genesis/estimate.sh                  |   8 +-
 scripts/bk_functions.sh                       |  16 +-
 .../instrumented_app_sections_dummy.sh        |   1 +
 .../prepare_gpu_lightgbm_ncu_input.py         |  69 +++
 scripts/estimation/run.sh                     |  16 +-
 .../gpu_kernel_lightgbm_v10.sh                | 543 ++++++++++++++++++
 scripts/test_estimate_submit.sh               |   1 +
 scripts/tests/test_bk_profiler.sh             |  19 +-
 ...test_estimation_gpu_kernel_lightgbm_v10.sh |  74 +++
 .../tests/test_genesis_gpu_mlp_estimation.sh  |   7 +-
 14 files changed, 770 insertions(+), 16 deletions(-)
 create mode 100644 scripts/estimation/prepare_gpu_lightgbm_ncu_input.py
 create mode 100644 scripts/estimation/section_packages/gpu_kernel_lightgbm_v10.sh
 create mode 100644 scripts/tests/test_estimation_gpu_kernel_lightgbm_v10.sh

diff --git a/.github/workflows/result-server-tests.yml b/.github/workflows/result-server-tests.yml
index 137f601..a46c441 100644
--- a/.github/workflows/result-server-tests.yml
+++ b/.github/workflows/result-server-tests.yml
@@ -9,6 +9,7 @@ on:
       - "scripts/result_server/**"
       - "scripts/estimation/**"
       - "scripts/tests/test_bk_profiler.sh"
+      - "scripts/tests/test_estimation_gpu_kernel_lightgbm_v10.sh"
       - "scripts/tests/test_estimation_gpu_kernel_mlp_v15.sh"
       - "scripts/tests/test_genesis_gpu_mlp_estimation.sh"
       - "scripts/tests/test_qws_gpu_mlp_smoke_estimation.sh"
@@ -33,6 +34,7 @@ on:
       - "scripts/result_server/**"
       - "scripts/estimation/**"
       - "scripts/tests/test_bk_profiler.sh"
+      - "scripts/tests/test_estimation_gpu_kernel_lightgbm_v10.sh"
       - "scripts/tests/test_estimation_gpu_kernel_mlp_v15.sh"
       - "scripts/tests/test_genesis_gpu_mlp_estimation.sh"
       - "scripts/tests/test_qws_gpu_mlp_smoke_estimation.sh"
@@ -102,6 +104,7 @@ jobs:
           bash scripts/tests/test_result_profile_data.sh
           bash scripts/tests/test_send_results_profile_data.sh
           bash scripts/tests/test_send_estimate_artifacts.sh
+          bash scripts/tests/test_estimation_gpu_kernel_lightgbm_v10.sh
           bash scripts/tests/test_estimation_gpu_kernel_mlp_v15.sh
           bash scripts/tests/test_genesis_gpu_mlp_estimation.sh
           bash scripts/tests/test_qws_gpu_mlp_smoke_estimation.sh
diff --git a/docs/guides/add-app.md b/docs/guides/add-app.md
index 3fdf40e..c36586d 100644
--- a/docs/guides/add-app.md
+++ b/docs/guides/add-app.md
@@ -402,7 +402,8 @@ bk_profiler ncu --level single --archive ../results/padata0.tgz --raw-dir ncu --
 ```
 
 `ncu` の既定 level は `single` です。最初は採取時間を抑えるため、`single` または `simple` から始めてください。
-raw report は `padata*.tgz` 内の `bk_profiler_artifact/raw/rep1/` に保存され、可能な場合は `bk_profiler_artifact/reports/ncu_import_rep1.txt` に text report が保存されます。
+`padata*.tgz` には、可能な場合は `bk_profiler_artifact/reports/ncu_import_rep1.txt` に text report、`BK_PROFILER_NCU_RAW_CSV=true` の場合は `bk_profiler_artifact/raw/rep1/profile_raw.csv` に raw CSV が保存されます。
+Nsight Compute の binary report (`*.ncu-rep` など) は重いため既定では `padata*.tgz` から除外されます。デバッグ目的で保存したい場合だけ `BK_PROFILER_ARCHIVE_NCU_REPORT=true` を明示してください。
 site の既定 module に `ncu` が含まれない場合は、アプリ側で module を load するか、system 固有の module 変数を用意してください。
 Genesis GH200 参照実装では `GENESIS_MIYABIG_MODULE` / `GENESIS_GH200_MODULE` で module を上書きできます。
 既定の `ncu` が PATH にない場合は warning を出して profiler なしで benchmark 本体を実行しますが、`GENESIS_MIYABIG_PROFILER_TOOL=ncu`、`GENESIS_GH200_PROFILER_TOOL=ncu`、または `GENESIS_PROFILER_TOOL=ncu` を明示した場合は採取不能として失敗します。
diff --git a/docs/guides/add-estimation-package.md b/docs/guides/add-estimation-package.md
index 4b81662..475590a 100644
--- a/docs/guides/add-estimation-package.md
+++ b/docs/guides/add-estimation-package.md
@@ -43,6 +43,7 @@
   - `counter_papi_detailed.sh`
   - `trace_mpi_basic.sh`
   - `overlap_max_basic.sh`
+  - `gpu_kernel_lightgbm_v10.sh`
   - `gpu_kernel_mlp_v15.sh`
 
 ## 3. top-level package の責務
@@ -69,19 +70,28 @@ section package はもっと小さくてかまいません。
 ここでは「1 区間の変換規則」に集中し、Estimate JSON 全体の組み立てや current / future の side 管理は BenchKit 共通層や top-level package 側へ寄せる方が自然です。
 
 GPU kernel 単位の外部推定ツールは、通常は section package として扱います。
-たとえば `gpu_kernel_mlp_v15` は、PerfTools の `MLP_NN/v1.5` を「GPU 区間だけを変換する package」として接続します。
-top-level package は `instrumented_app_sections_dummy` などのままにして、GPU 区間にだけ `gpu_kernel_mlp_v15` を割り当てます。
+たとえば `gpu_kernel_mlp_v15` は PerfTools の `MLP_NN/v1.5`、`gpu_kernel_lightgbm_v10` は PerfTools の `LightGBM_model/1.0` を「GPU 区間だけを変換する package」として接続します。
+top-level package は `instrumented_app_sections_dummy` などのままにして、GPU 区間にだけ GPU kernel section package を割り当てます。
 
 ```bash
 bk_declare_section --side future gpu_kernel_region gpu_kernel_mlp_v15
 bk_emit_declared_section --side future gpu_kernel_region "$measured_gpu_time" results/estimation_artifacts/gpu_kernel_region_input.csv
 ```
 
+GENESIS では既定は `gpu_kernel_mlp_v15` ですが、LightGBM を試す場合は次のように切り替えられます。
+
+```bash
+export BK_GENESIS_GPU_SECTION_PACKAGE=gpu_kernel_lightgbm_v10
+```
+
 PerfTools 本体は BenchKit に vendoring せず、実行時に次の環境変数で渡します。
 
 ```bash
 export BK_GPU_MLP_PERFTOOLS_ROOT=/path/to/PerfTools
 export BK_GPU_MLP_PYTHON=python3
+# LightGBM package だけを明示したい場合
+export BK_GPU_LIGHTGBM_PERFTOOLS_ROOT=/path/to/PerfTools
+export BK_GPU_LIGHTGBM_PYTHON=python3
 ```
 
 section artifact は PerfTools 側の static GPU spec sheet から作られた prepared CSV を想定します。
@@ -94,7 +104,8 @@ export BK_GPU_MLP_ARTIFACT_MODE=prediction
 export BK_GPU_MLP_PREDICTION_CSV_GPU_KERNEL_REGION=/path/to/pred.csv
 ```
 
-section package は prediction CSV の `Execution Time [ns]` を合算し、その section の future-side `time` にします。
+section package は prediction CSV の推定実行時間を合算し、その section の future-side `time` にします。
+MLP package は `Execution Time [ns]`、LightGBM package は `O-Execution Time` を主な入力列として扱います。
 
 qws を使って CI 配管だけを確認する場合は、実際の qws が GPU 化されていなくても GPU MLP smoke test を有効にできます。
 `BK_QWS_GPU_MLP_SMOKE_MODE=prediction` では、同梱のサンプル prediction CSV を使い、run job が `gpu_kernel_region` section と prediction CSV artifact を結果に埋め込みます。
@@ -114,7 +125,8 @@ export BK_GPU_MLP_PERFTOOLS_REF=main
 `BK_QWS_GPU_MLP_SMOKE` は qws を使った配管確認用、`BK_QWS_GPU_MLP_SMOKE_MODE` は prediction fixture 取り込みと PerfTools 実行の切り替え用、`BK_ESTIMATE_RUNNER_TAG` は推定用 runner/container を手動で逃がすためのものです。
 実際の GPU profiling input と推定 runner の運用が固まったら、専用の package/runner 設定へ置き換え、これらの暫定変数は削除対象として見直してください。
 
-`perftools` smoke mode は GitHub から PerfTools を取得するため、推定 runner/container には `git` と外部接続、Python 3.12 以上、numpy/pandas/torch が必要です。
+`perftools` smoke mode は GitHub から PerfTools を取得するため、推定 runner/container には `git` と外部接続、Python 3.12 以上が必要です。
+MLP package には numpy/pandas/torch、LightGBM package には numpy/pandas/lightgbm/pyyaml が必要です。
 実運用では smoke mode ではなく、推定 runner/container に PerfTools checkout を用意し、section artifact として実アプリ由来の prepared input CSV を渡してください。
 
 ## 5. metadata に持たせるもの
diff --git a/docs/guides/profiler-support.md b/docs/guides/profiler-support.md
index 4c97f67..6410ed3 100644
--- a/docs/guides/profiler-support.md
+++ b/docs/guides/profiler-support.md
@@ -49,6 +49,7 @@ bk_profiler <tool> [options] -- <command ...>
 - `BK_PROFILER_REPORT_ARGS`
 - `BK_PROFILER_DIR`
 - `BK_PROFILER_STAGE_DIR`
+- `BK_PROFILER_ARCHIVE_NCU_REPORT`
 
 ## 3. 共通語彙としての level
 
@@ -86,7 +87,10 @@ BenchKit は「CSV があること」を共通必須にはしない。
 - `detailed` → `--set full --nvtx`
 
 既定の report format は `text` とする。
-raw report は archive 内の `bk_profiler_artifact/raw/rep1/profile*.ncu-rep` または Nsight Compute の出力形式に従う report file として保存し、可能な場合は `ncu --import ... --page details` の出力を `bk_profiler_artifact/reports/ncu_import_rep1.txt` に保存する。
+`padata*.tgz` の肥大化を避けるため、Nsight Compute の binary report (`*.ncu-rep` など) は既定では archive から除外する。
+可能な場合は `ncu --import ... --page details` の出力を `bk_profiler_artifact/reports/ncu_import_rep1.txt` に保存する。
+`BK_PROFILER_NCU_RAW_CSV=true` の場合は、推定 package が使う raw CSV を `bk_profiler_artifact/raw/rep1/profile_raw.csv` に保存する。
+binary report も保存したいデバッグ用途では、`BK_PROFILER_ARCHIVE_NCU_REPORT=true` を明示する。
 
 MPI launcher 経由の GPU application では、既定で `--target-processes all` を付けて child process も採取対象にする。
 追加の kernel filter、section set、NVTX filter などは `BK_PROFILER_ARGS` で `ncu` に渡す。
diff --git a/programs/genesis/estimate.sh b/programs/genesis/estimate.sh
index 1c26103..c1f021b 100644
--- a/programs/genesis/estimate.sh
+++ b/programs/genesis/estimate.sh
@@ -2,6 +2,8 @@
 # estimate.sh — GENESIS estimation entrypoint and run-time section metadata.
 
 genesis_declare_estimation_layout() {
+  local gpu_section_package="${BK_GENESIS_GPU_SECTION_PACKAGE:-gpu_kernel_lightgbm_v10}"
+
   bk_clear_estimation_defaults
   bk_clear_estimation_declarations
   bk_define_current_estimation_package weakscaling
@@ -11,7 +13,7 @@ genesis_declare_estimation_layout() {
   bk_define_future_system "${BK_ESTIMATION_FUTURE_SYSTEM:-GPU_MLP_TARGET}"
   bk_define_current_target_nodes "${BK_ESTIMATION_CURRENT_TARGET_NODES:-1}"
   bk_define_future_target_nodes "${BK_ESTIMATION_FUTURE_TARGET_NODES:-1}"
-  bk_declare_section --side future gpu_kernel_region gpu_kernel_mlp_v15
+  bk_declare_section --side future gpu_kernel_region "$gpu_section_package"
 }
 
 genesis_emit_estimation_data_from_fom() {
@@ -42,9 +44,13 @@ BK_ESTIMATION_SECTION_DEFAULT_FACTOR="${BK_ESTIMATION_SECTION_DEFAULT_FACTOR:-1.
 BK_GPU_MLP_ARTIFACT_MODE="${BK_GPU_MLP_ARTIFACT_MODE:-ncu}"
 BK_GPU_MLP_SOURCE_GPU="${BK_GPU_MLP_SOURCE_GPU:-H100}"
 BK_GPU_MLP_KERNEL_COUNT="${BK_GPU_MLP_KERNEL_COUNT:-20}"
+BK_GPU_LIGHTGBM_ARTIFACT_MODE="${BK_GPU_LIGHTGBM_ARTIFACT_MODE:-ncu}"
+BK_GPU_LIGHTGBM_SOURCE_GPU="${BK_GPU_LIGHTGBM_SOURCE_GPU:-${BK_GPU_MLP_SOURCE_GPU}}"
 export BK_GPU_MLP_ARTIFACT_MODE
 export BK_GPU_MLP_SOURCE_GPU
 export BK_GPU_MLP_KERNEL_COUNT
+export BK_GPU_LIGHTGBM_ARTIFACT_MODE
+export BK_GPU_LIGHTGBM_SOURCE_GPU
 
 genesis_declare_estimation_layout
 bk_estimation_apply_declared_defaults
diff --git a/scripts/bk_functions.sh b/scripts/bk_functions.sh
index e4f098d..daf90bb 100644
--- a/scripts/bk_functions.sh
+++ b/scripts/bk_functions.sh
@@ -802,7 +802,12 @@ bk_profiler_find_ncu_report() {
     -name '*.ncu-rep' -o \
     -name '*.nsight-cuprof' -o \
     -name 'profile*' \
-  \) | head -n 1
+  \) \
+    ! -name 'profile_raw.csv' \
+    ! -name 'profile_raw.csv.log' \
+    ! -name '*.csv' \
+    ! -name '*.log' \
+    | head -n 1
 }
 
 bk_json_escape() {
@@ -1164,6 +1169,15 @@ bk_profiler() {
           ;;
       esac
       cp -R "$_bk_ncu_rep_dir" "$_bk_stage_dir/raw/${_bk_ncu_rep_name}"
+      case "${BK_PROFILER_ARCHIVE_NCU_REPORT:-false}" in
+        1|true|TRUE|yes|YES|on|ON) ;;
+        *)
+          find "$_bk_stage_dir/raw/${_bk_ncu_rep_name}" -maxdepth 1 -type f \( \
+            -name '*.ncu-rep' -o \
+            -name '*.nsight-cuprof' \
+          \) -delete
+          ;;
+      esac
       _bk_profiler_run_names="${_bk_ncu_rep_name}"
       _bk_profiler_run_events="${_bk_profiler_level}"
       ;;
diff --git a/scripts/estimation/packages/instrumented_app_sections_dummy.sh b/scripts/estimation/packages/instrumented_app_sections_dummy.sh
index 1549914..d4eab3d 100644
--- a/scripts/estimation/packages/instrumented_app_sections_dummy.sh
+++ b/scripts/estimation/packages/instrumented_app_sections_dummy.sh
@@ -31,6 +31,7 @@ bk_estimation_package_metadata() {
     "quarter",
     "counter_papi_detailed",
     "trace_mpi_basic",
+    "gpu_kernel_lightgbm_v10",
     "gpu_kernel_mlp_v15",
     "logp"
   ],
diff --git a/scripts/estimation/prepare_gpu_lightgbm_ncu_input.py b/scripts/estimation/prepare_gpu_lightgbm_ncu_input.py
new file mode 100644
index 0000000..d889989
--- /dev/null
+++ b/scripts/estimation/prepare_gpu_lightgbm_ncu_input.py
@@ -0,0 +1,69 @@
+#!/usr/bin/env python3
+"""Prepare a PerfTools LightGBM_model/1.0 NCU input CSV.
+
+BenchKit's NCU profiler archive stores Nsight Compute raw CSV in the wide
+metric layout exported by ``ncu --page raw --csv``.  PerfTools LightGBM can read
+wide CSV directly, but it expects a few compatibility columns such as
+``Duration [ns]`` to already exist.  This bridge normalizes the archive into
+that wide CSV without running the MLP-specific ``prepare_data.py`` step.
+"""
+
+from __future__ import annotations
+
+import argparse
+import shutil
+import tempfile
+from pathlib import Path
+
+from prepare_gpu_mlp_ncu_input import (
+    build_wide_ncu_csv,
+    extract_padata,
+    read_clean_raw_csv,
+    strip_ncu_log_preamble,
+)
+
+
+def parse_args() -> argparse.Namespace:
+    parser = argparse.ArgumentParser()
+    input_group = parser.add_mutually_exclusive_group(required=True)
+    input_group.add_argument("--padata", help="BenchKit padata*.tgz archive")
+    input_group.add_argument("--raw-csv", help="Nsight Compute raw wide CSV")
+    parser.add_argument("--source-gpu", default="H100")
+    parser.add_argument("--out-csv", required=True)
+    parser.add_argument("--work-dir")
+    parser.add_argument("--keep-work", action="store_true")
+    return parser.parse_args()
+
+
+def main() -> None:
+    args = parse_args()
+    out_csv = Path(args.out_csv).resolve()
+    work_dir_owned = False
+    if args.work_dir:
+        work_dir = Path(args.work_dir).resolve()
+        work_dir.mkdir(parents=True, exist_ok=True)
+    else:
+        work_dir = Path(tempfile.mkdtemp(prefix="benchkit-gpu-lightgbm-"))
+        work_dir_owned = True
+
+    try:
+        if args.raw_csv:
+            raw_csv = Path(args.raw_csv).resolve()
+        else:
+            raw_csv = extract_padata(Path(args.padata).resolve(), work_dir / "padata")
+
+        clean_csv = work_dir / "profile_raw_clean.csv"
+        strip_ncu_log_preamble(raw_csv, clean_csv)
+        raw_df = read_clean_raw_csv(clean_csv)
+        if raw_df.empty:
+            raise SystemExit(f"no kernel rows found in {raw_csv}")
+
+        build_wide_ncu_csv(raw_df, out_csv, args.source_gpu)
+        print(f"wrote {out_csv}: {len(raw_df)} kernels")
+    finally:
+        if work_dir_owned and not args.keep_work:
+            shutil.rmtree(work_dir, ignore_errors=True)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/scripts/estimation/run.sh b/scripts/estimation/run.sh
index 15c5e19..37eabab 100644
--- a/scripts/estimation/run.sh
+++ b/scripts/estimation/run.sh
@@ -24,6 +24,10 @@ bk_estimation_gpu_mlp_perftools_needed() {
     return 0
   fi
 
+  if bk_estimation_bool_enabled "${BK_GPU_LIGHTGBM_FETCH_PERFTOOLS:-false}"; then
+    return 0
+  fi
+
   if [[ "${code:-}" == "genesis" ]] && bk_estimation_bool_enabled "${BK_GENESIS_GPU_MLP_PROFILE:-false}"; then
     return 0
   fi
@@ -58,14 +62,14 @@ bk_estimation_prepare_gpu_mlp_perftools() {
     use_genesis_ncu=1
   fi
 
-  if [[ ! -f "${root}/MLP_NN/v1.5/predict_v15.py" ]]; then
+  if [[ ! -f "${root}/MLP_NN/v1.5/predict_v15.py" && ! -f "${root}/LightGBM_model/1.0/AI_model/run_inference.py" ]]; then
     if ! command -v git >/dev/null 2>&1; then
-      echo "ERROR: git is required to fetch PerfTools for GPU MLP estimation" >&2
+      echo "ERROR: git is required to fetch PerfTools for GPU estimation" >&2
       return 1
     fi
 
     mkdir -p "$(dirname "$root")"
-    echo "Fetching PerfTools for GPU MLP estimation: ${repo} (${ref})"
+    echo "Fetching PerfTools for GPU estimation: ${repo} (${ref})"
     git clone --depth 1 "$repo" "$root"
     if [[ "$ref" != "main" && "$ref" != "master" ]]; then
       git -C "$root" fetch --depth 1 origin "$ref" || true
@@ -74,12 +78,16 @@ bk_estimation_prepare_gpu_mlp_perftools() {
   fi
 
   export BK_GPU_MLP_PERFTOOLS_ROOT="$root"
+  export BK_GPU_LIGHTGBM_PERFTOOLS_ROOT="${BK_GPU_LIGHTGBM_PERFTOOLS_ROOT:-$root}"
   export BK_GPU_MLP_OUTPUT_DIR="${BK_GPU_MLP_OUTPUT_DIR:-results/estimation_artifacts/gpu_kernel_mlp_v15}"
+  export BK_GPU_LIGHTGBM_OUTPUT_DIR="${BK_GPU_LIGHTGBM_OUTPUT_DIR:-results/estimation_artifacts/gpu_kernel_lightgbm_v10}"
 
-  echo "GPU MLP estimator root: ${BK_GPU_MLP_PERFTOOLS_ROOT}"
+  echo "GPU estimator root: ${BK_GPU_MLP_PERFTOOLS_ROOT}"
   if [[ "$use_genesis_ncu" -eq 1 ]]; then
     export BK_GPU_MLP_ARTIFACT_MODE="${BK_GPU_MLP_ARTIFACT_MODE:-ncu}"
+    export BK_GPU_LIGHTGBM_ARTIFACT_MODE="${BK_GPU_LIGHTGBM_ARTIFACT_MODE:-ncu}"
     echo "GPU MLP estimator artifact mode: ${BK_GPU_MLP_ARTIFACT_MODE}"
+    echo "GPU LightGBM estimator artifact mode: ${BK_GPU_LIGHTGBM_ARTIFACT_MODE}"
   elif [[ "$use_qws_example" -eq 1 ]]; then
     input_csv="${BK_GPU_MLP_INPUT_CSV:-${root}/MLP_NN/examples/example_input_mixed-src_20kernels.csv}"
     if [[ ! -f "$input_csv" ]]; then
diff --git a/scripts/estimation/section_packages/gpu_kernel_lightgbm_v10.sh b/scripts/estimation/section_packages/gpu_kernel_lightgbm_v10.sh
new file mode 100644
index 0000000..4f9863d
--- /dev/null
+++ b/scripts/estimation/section_packages/gpu_kernel_lightgbm_v10.sh
@@ -0,0 +1,543 @@
+#!/bin/bash
+# gpu_kernel_lightgbm_v10.sh - Section package for PerfTools LightGBM_model/1.0.
+
+bk_section_package_metadata_gpu_kernel_lightgbm_v10() {
+  cat <<'EOF'
+{
+  "name": "gpu_kernel_lightgbm_v10",
+  "fallback_target": "identity",
+  "source_system_scope": {
+    "kind": "benchmark_system",
+    "accepted_values": ["any"]
+  },
+  "target_system_scope": {
+    "accepted_values": ["any"]
+  },
+  "item_kind_scope": ["section"],
+  "required_result_fields": ["name", "time or bench_time"],
+  "required_artifact_kinds": [
+    "PerfTools LightGBM_model/1.0 compatible NCU CSV",
+    "precomputed prediction CSV",
+    "or BenchKit padata archive with Nsight Compute raw CSV"
+  ],
+  "acquisition_mode": "external",
+  "output_fields": [
+    "time",
+    "bench_time",
+    "scaling_method",
+    "metrics",
+    "package_applicability"
+  ],
+  "not_applicable_when": [
+    "item kind is not section",
+    "neither section artifact nor BK_GPU_LIGHTGBM_INPUT_CSV/BK_GPU_LIGHTGBM_PREDICTION_CSV is available",
+    "padata artifact mode is requested but the archive has no Nsight Compute raw CSV",
+    "PerfTools checkout is not available when running the external predictor",
+    "Python runtime for CSV parsing or external inference is not available",
+    "prediction CSV does not contain a recognized execution-time column"
+  ]
+}
+EOF
+}
+
+_bk_gpu_lightgbm_section_key() {
+  local section_name="$1"
+  printf '%s' "$section_name" | tr '[:lower:]' '[:upper:]' | tr -c 'A-Z0-9' '_'
+}
+
+_bk_gpu_lightgbm_section_var() {
+  local prefix="$1"
+  local section_name="$2"
+  local key
+
+  key=$(_bk_gpu_lightgbm_section_key "$section_name")
+  printf '%s_%s\n' "$prefix" "$key"
+}
+
+_bk_gpu_lightgbm_env_value() {
+  local var_name="$1"
+  eval "printf '%s\n' \"\${${var_name}:-}\""
+}
+
+_bk_gpu_lightgbm_perftools_root() {
+  printf '%s\n' "${BK_GPU_LIGHTGBM_PERFTOOLS_ROOT:-${BK_GPU_MLP_PERFTOOLS_ROOT:-${BK_PERFTOOLS_ROOT:-}}}"
+}
+
+_bk_gpu_lightgbm_model_dir() {
+  local root="$1"
+
+  if [[ -z "$root" ]]; then
+    printf '%s\n' ""
+    return 0
+  fi
+
+  printf '%s\n' "${root}/LightGBM_model/1.0"
+}
+
+_bk_gpu_lightgbm_predictor() {
+  local root="$1"
+  local model_dir
+
+  model_dir=$(_bk_gpu_lightgbm_model_dir "$root")
+  if [[ -z "$model_dir" ]]; then
+    printf '%s\n' ""
+    return 0
+  fi
+
+  printf '%s\n' "${model_dir}/AI_model/run_inference.py"
+}
+
+_bk_gpu_lightgbm_python_exists() {
+  local python_bin="$1"
+
+  if [[ "$python_bin" == */* ]]; then
+    [[ -x "$python_bin" ]]
+    return $?
+  fi
+
+  command -v "$python_bin" >/dev/null 2>&1
+}
+
+_bk_gpu_lightgbm_abs_path() {
+  local path="$1"
+  local dir
+  local base
+
+  if [[ -z "$path" ]]; then
+    printf '%s\n' ""
+    return 0
+  fi
+
+  if [[ "$path" == /* ]]; then
+    printf '%s\n' "$path"
+    return 0
+  fi
+
+  dir=$(dirname "$path")
+  base=$(basename "$path")
+  if [[ -d "$dir" ]]; then
+    (cd "$dir" && printf '%s/%s\n' "$PWD" "$base")
+  else
+    printf '%s/%s\n' "$PWD" "$path"
+  fi
+}
+
+_bk_gpu_lightgbm_first_artifact_path() {
+  local item_json="$1"
+
+  echo "$item_json" | jq -r '(.artifacts // [])[0].path // empty'
+}
+
+_bk_gpu_lightgbm_artifact_mode() {
+  case "${BK_GPU_LIGHTGBM_ARTIFACT_MODE:-input}" in
+    ncu|padata|profiler|profile) printf 'ncu\n' ;;
+    prediction) printf 'prediction\n' ;;
+    *) printf 'input\n' ;;
+  esac
+}
+
+_bk_gpu_lightgbm_resolve_section_input_csv() {
+  local item_json="$1"
+  local section_name="$2"
+  local scoped_var
+  local value
+  local artifact_path
+
+  scoped_var=$(_bk_gpu_lightgbm_section_var "BK_GPU_LIGHTGBM_INPUT_CSV" "$section_name")
+  value=$(_bk_gpu_lightgbm_env_value "$scoped_var")
+  if [[ -n "$value" ]]; then
+    printf '%s\n' "$value"
+    return 0
+  fi
+
+  if [[ -n "${BK_GPU_LIGHTGBM_INPUT_CSV:-}" ]]; then
+    printf '%s\n' "$BK_GPU_LIGHTGBM_INPUT_CSV"
+    return 0
+  fi
+
+  artifact_path=$(_bk_gpu_lightgbm_first_artifact_path "$item_json")
+  if [[ -n "$artifact_path" && "$(_bk_gpu_lightgbm_artifact_mode)" == "input" ]]; then
+    printf '%s\n' "$artifact_path"
+    return 0
+  fi
+
+  printf '%s\n' ""
+}
+
+_bk_gpu_lightgbm_resolve_section_ncu_archive() {
+  local item_json="$1"
+  local section_name="$2"
+  local scoped_var
+  local value
+  local artifact_path
+
+  scoped_var=$(_bk_gpu_lightgbm_section_var "BK_GPU_LIGHTGBM_NCU_ARCHIVE" "$section_name")
+  value=$(_bk_gpu_lightgbm_env_value "$scoped_var")
+  if [[ -n "$value" ]]; then
+    printf '%s\n' "$value"
+    return 0
+  fi
+
+  if [[ -n "${BK_GPU_LIGHTGBM_NCU_ARCHIVE:-}" ]]; then
+    printf '%s\n' "$BK_GPU_LIGHTGBM_NCU_ARCHIVE"
+    return 0
+  fi
+
+  artifact_path=$(_bk_gpu_lightgbm_first_artifact_path "$item_json")
+  if [[ -n "$artifact_path" ]]; then
+    case "$(_bk_gpu_lightgbm_artifact_mode):${artifact_path}" in
+      ncu:*|*:*.tgz|*:*.tar.gz)
+        printf '%s\n' "$artifact_path"
+        return 0
+        ;;
+    esac
+  fi
+
+  printf '%s\n' ""
+}
+
+_bk_gpu_lightgbm_resolve_section_prediction_csv() {
+  local item_json="$1"
+  local section_name="$2"
+  local scoped_var
+  local value
+  local artifact_path
+
+  scoped_var=$(_bk_gpu_lightgbm_section_var "BK_GPU_LIGHTGBM_PREDICTION_CSV" "$section_name")
+  value=$(_bk_gpu_lightgbm_env_value "$scoped_var")
+  if [[ -n "$value" ]]; then
+    printf '%s\n' "$value"
+    return 0
+  fi
+
+  if [[ -n "${BK_GPU_LIGHTGBM_PREDICTION_CSV:-}" ]]; then
+    printf '%s\n' "$BK_GPU_LIGHTGBM_PREDICTION_CSV"
+    return 0
+  fi
+
+  artifact_path=$(_bk_gpu_lightgbm_first_artifact_path "$item_json")
+  if [[ -n "$artifact_path" && "$(_bk_gpu_lightgbm_artifact_mode)" == "prediction" ]]; then
+    printf '%s\n' "$artifact_path"
+    return 0
+  fi
+
+  printf '%s\n' ""
+}
+
+_bk_gpu_lightgbm_section_slug() {
+  local section_name="$1"
+  printf '%s_%s_%s' "${est_code:-unknown}" "$section_name" "${est_uuid:-local}" |
+    tr -c 'A-Za-z0-9._-' '_'
+}
+
+bk_section_package_check_applicability_gpu_kernel_lightgbm_v10() {
+  local item_json="$1"
+  local item_kind="$2"
+  local section_name
+  local prediction_csv
+  local input_csv
+  local ncu_archive
+  local root
+  local predictor
+  local python_bin="${BK_GPU_LIGHTGBM_PYTHON:-${BK_GPU_MLP_PYTHON:-python3}}"
+  local missing=()
+
+  if [[ "$item_kind" != "section" ]]; then
+    cat <<'EOF'
+{"status":"not_applicable","missing_inputs":["item_kind:section_required"]}
+EOF
+    return 1
+  fi
+
+  section_name=$(echo "$item_json" | jq -r '.name // "gpu_section"')
+  prediction_csv=$(_bk_gpu_lightgbm_resolve_section_prediction_csv "$item_json" "$section_name")
+  input_csv=$(_bk_gpu_lightgbm_resolve_section_input_csv "$item_json" "$section_name")
+  ncu_archive=$(_bk_gpu_lightgbm_resolve_section_ncu_archive "$item_json" "$section_name")
+
+  if ! _bk_gpu_lightgbm_python_exists "$python_bin"; then
+    missing+=("\"python:${python_bin}\"")
+  fi
+
+  if [[ -n "$prediction_csv" ]]; then
+    if [[ ! -f "$prediction_csv" ]]; then
+      missing+=("\"prediction_csv:${prediction_csv}\"")
+    fi
+  else
+    root=$(_bk_gpu_lightgbm_perftools_root)
+    predictor=$(_bk_gpu_lightgbm_predictor "$root")
+
+    if [[ -z "$input_csv" && -z "$ncu_archive" ]]; then
+      missing+=('"gpu_lightgbm_input_csv"')
+    fi
+    if [[ -n "$input_csv" && ! -f "$input_csv" ]]; then
+      missing+=("\"input_csv:${input_csv}\"")
+    fi
+    if [[ -n "$ncu_archive" && ! -f "$ncu_archive" ]]; then
+      missing+=("\"ncu_archive:${ncu_archive}\"")
+    fi
+    if [[ -z "$root" || ! -d "$root" ]]; then
+      missing+=('"BK_GPU_LIGHTGBM_PERFTOOLS_ROOT"')
+    fi
+    if [[ -z "$predictor" || ! -f "$predictor" ]]; then
+      missing+=('"PerfTools LightGBM_model/1.0/AI_model/run_inference.py"')
+    fi
+  fi
+
+  if (( ${#missing[@]} > 0 )); then
+    printf '{"status":"not_applicable","missing_inputs":[%s]}\n' "$(IFS=,; echo "${missing[*]}")"
+    return 1
+  fi
+
+  cat <<'EOF'
+{"status":"applicable","missing_inputs":[]}
+EOF
+}
+
+_bk_gpu_lightgbm_parse_prediction_csv() {
+  local prediction_csv="$1"
+  local package_name="$2"
+  local model_version="$3"
+  local python_bin="${BK_GPU_LIGHTGBM_PYTHON:-${BK_GPU_MLP_PYTHON:-python3}}"
+
+  "$python_bin" - "$prediction_csv" "$package_name" "$model_version" <<'PY'
+import csv
+import json
+import math
+import sys
+
+prediction_csv, package_name, model_version = sys.argv[1:4]
+
+time_columns = [
+    "O-Execution Time",
+    "O-Execution Time [ns]",
+    "Execution Time [ns]",
+    "Predicted Execution Time [ns]",
+    "predicted_execution_time_ns",
+]
+name_columns = ["meta-kernel", "kernel_name", "Kernel Name", "kernel", "Kernel", "name", "Name"]
+source_columns = ["meta-src_gpu", "src_gpu", "source_gpu"]
+target_columns = ["meta-tgt_gpu", "tgt_gpu", "target_gpu"]
+metric_columns = [
+    "O-Memory Throughput [%]",
+    "O-Achieved Occupancy",
+    "O-breakdown_memory",
+    "O-breakdown_pipeline_contention",
+    "O-breakdown_sync",
+    "O-breakdown_scheduling_overhead",
+]
+
+
+def cleaned_lines(path):
+    with open(path, newline="", encoding="utf-8-sig") as handle:
+        for line in handle:
+            if not line.strip() or line.lstrip().startswith("#"):
+                continue
+            yield line
+
+
+def as_number(value):
+    if value is None or value == "":
+        return None
+    try:
+        number = float(value)
+    except ValueError:
+        return None
+    if math.isnan(number) or math.isinf(number):
+        return None
+    return number
+
+
+reader = csv.DictReader(cleaned_lines(prediction_csv))
+if not reader.fieldnames:
+    raise SystemExit(f"prediction CSV has no header: {prediction_csv}")
+
+time_column = next((col for col in time_columns if col in reader.fieldnames), None)
+if time_column is None:
+    raise SystemExit(
+        "prediction CSV does not contain a supported execution-time column: "
+        + ", ".join(time_columns)
+    )
+
+kernels = []
+source_gpus = []
+target_gpus = []
+total_seconds = 0.0
+
+for idx, row in enumerate(reader, start=1):
+    predicted_ns = as_number(row.get(time_column))
+    if predicted_ns is None:
+        raise SystemExit(f"row {idx} has no numeric predicted execution time in {time_column}")
+
+    raw_name = next((row.get(col, "").strip() for col in name_columns if row.get(col, "").strip()), "")
+    source_gpu = next((row.get(col, "").strip() for col in source_columns if row.get(col, "").strip()), "")
+    target_gpu = next((row.get(col, "").strip() for col in target_columns if row.get(col, "").strip()), "")
+    if source_gpu:
+        source_gpus.append(source_gpu)
+    if target_gpu:
+        target_gpus.append(target_gpu)
+
+    seconds = predicted_ns / 1e9
+    total_seconds += seconds
+
+    metrics = {
+        key: as_number(row.get(key))
+        for key in metric_columns
+        if key in row and as_number(row.get(key)) is not None
+    }
+    kernel = {
+        "name": raw_name or f"kernel_{idx}",
+        "predicted_time_ns": predicted_ns,
+        "predicted_time": seconds,
+    }
+    if source_gpu:
+        kernel["source_gpu"] = source_gpu
+    if target_gpu:
+        kernel["target_gpu"] = target_gpu
+    if metrics:
+        kernel["metrics"] = metrics
+    kernels.append(kernel)
+
+print(json.dumps({
+    "time": total_seconds,
+    "metrics": {
+        "kernel_count": len(kernels),
+        "time_column": time_column,
+        "total_predicted_time_ns": total_seconds * 1e9,
+        "source_gpus": sorted(set(source_gpus)),
+        "target_gpus": sorted(set(target_gpus)),
+        "kernels": kernels,
+    },
+    "package_applicability": {
+        "status": "applicable",
+        "missing_inputs": [],
+    },
+    "model": {
+        "type": "cross_gpu_kernel_prediction_model",
+        "name": "PerfTools LightGBM_model/1.0",
+        "version": model_version,
+        "repository": "https://github.com/masaaki-kondo/PerfTools",
+    },
+    "estimation_package": package_name,
+}))
+PY
+}
+
+_bk_gpu_lightgbm_prepare_input_from_ncu() {
+  local ncu_archive="$1"
+  local _section_name="$2"
+  local output_dir="$3"
+  local slug="$4"
+  local python_bin="${BK_GPU_LIGHTGBM_PYTHON:-${BK_GPU_MLP_PYTHON:-python3}}"
+  local source_gpu="${BK_GPU_LIGHTGBM_SOURCE_GPU:-${BK_GPU_MLP_SOURCE_GPU:-H100}}"
+  local prepared_csv="${output_dir}/${slug}_lightgbm_input.csv"
+  local script_path="scripts/estimation/prepare_gpu_lightgbm_ncu_input.py"
+  local archive_abs
+  local prepared_abs
+
+  archive_abs=$(_bk_gpu_lightgbm_abs_path "$ncu_archive")
+  prepared_abs=$(_bk_gpu_lightgbm_abs_path "$prepared_csv")
+
+  "$python_bin" "$script_path" \
+    --padata "$archive_abs" \
+    --source-gpu "$source_gpu" \
+    --out-csv "$prepared_abs" >&2
+
+  printf '%s\n' "$prepared_csv"
+}
+
+_bk_gpu_lightgbm_run_predictor() {
+  local item_json="$1"
+  local section_name="$2"
+  local root
+  local model_dir
+  local input_csv
+  local ncu_archive
+  local output_dir="${BK_GPU_LIGHTGBM_OUTPUT_DIR:-results/estimation_artifacts/gpu_kernel_lightgbm_v10}"
+  local prediction_csv
+  local prediction_log
+  local input_csv_abs
+  local prediction_csv_abs
+  local prediction_log_abs
+  local python_bin="${BK_GPU_LIGHTGBM_PYTHON:-${BK_GPU_MLP_PYTHON:-python3}}"
+  local source_gpu="${BK_GPU_LIGHTGBM_SOURCE_GPU:-${BK_GPU_MLP_SOURCE_GPU:-H100}}"
+  local target_gpu="${BK_GPU_LIGHTGBM_TARGET_GPU:-${BK_GPU_MLP_TARGET_GPU:-A100}}"
+  local slug
+
+  root=$(_bk_gpu_lightgbm_perftools_root)
+  model_dir=$(_bk_gpu_lightgbm_model_dir "$root")
+  input_csv=$(_bk_gpu_lightgbm_resolve_section_input_csv "$item_json" "$section_name")
+  ncu_archive=$(_bk_gpu_lightgbm_resolve_section_ncu_archive "$item_json" "$section_name")
+  slug=$(_bk_gpu_lightgbm_section_slug "$section_name")
+
+  mkdir -p "$output_dir"
+  if [[ -z "$input_csv" && -n "$ncu_archive" ]]; then
+    input_csv=$(_bk_gpu_lightgbm_prepare_input_from_ncu "$ncu_archive" "$section_name" "$output_dir" "$slug")
+  fi
+
+  prediction_csv="${output_dir}/${slug}_pred.csv"
+  prediction_log="${output_dir}/${slug}.log"
+  input_csv_abs=$(_bk_gpu_lightgbm_abs_path "$input_csv")
+  prediction_csv_abs=$(_bk_gpu_lightgbm_abs_path "$prediction_csv")
+  prediction_log_abs=$(_bk_gpu_lightgbm_abs_path "$prediction_log")
+
+  (
+    cd "$model_dir"
+    "$python_bin" AI_model/run_inference.py \
+      --src_gpu="$source_gpu" \
+      --tgt_gpu="$target_gpu" \
+      --ncu_csv="$input_csv_abs" \
+      --out="$prediction_csv_abs" \
+      --log="$prediction_log_abs"
+  ) >/dev/null
+
+  printf '%s\t%s\t%s\n' "$prediction_csv" "$input_csv" "$prediction_log"
+}
+
+bk_section_package_transform_gpu_kernel_lightgbm_v10() {
+  local item_json="$1"
+  local _target_nodes="$2"
+  local _bench_nodes="$3"
+  local _default_factor="$4"
+  local _item_kind="$5"
+  local section_name
+  local prediction_csv
+  local input_csv=""
+  local prediction_log=""
+  local run_outputs
+  local parsed_json
+  local package_name="gpu_kernel_lightgbm_v10"
+  local model_version="${BK_GPU_LIGHTGBM_MODEL_VERSION:-1.0}"
+
+  section_name=$(echo "$item_json" | jq -r '.name // "gpu_section"')
+  prediction_csv=$(_bk_gpu_lightgbm_resolve_section_prediction_csv "$item_json" "$section_name")
+
+  if [[ -z "$prediction_csv" ]]; then
+    run_outputs=$(_bk_gpu_lightgbm_run_predictor "$item_json" "$section_name")
+    IFS=$'\t' read -r prediction_csv input_csv prediction_log <<< "$run_outputs"
+  fi
+
+  parsed_json=$(_bk_gpu_lightgbm_parse_prediction_csv "$prediction_csv" "$package_name" "$model_version")
+
+  echo "$item_json" | jq -c \
+    --arg prediction_csv "$prediction_csv" \
+    --arg input_csv "$input_csv" \
+    --arg prediction_log "$prediction_log" \
+    --argjson parsed "$parsed_json" '
+    .
+    + {
+        time: $parsed.time,
+        bench_time: (.bench_time // .time // null),
+        scaling_method: "gpu-kernel-lightgbm-v1.0",
+        estimation_package: $parsed.estimation_package,
+        package_applicability: $parsed.package_applicability,
+        model: $parsed.model,
+        metrics: $parsed.metrics
+      }
+    | .artifacts = (
+        (.artifacts // [])
+        + [{kind: "gpu_lightgbm_prediction_csv", path: $prediction_csv}]
+        + (if $input_csv != "" then [{kind: "gpu_lightgbm_input_csv", path: $input_csv}] else [] end)
+        + (if $prediction_log != "" then [{kind: "gpu_lightgbm_log", path: $prediction_log}] else [] end)
+      )
+  '
+}
diff --git a/scripts/test_estimate_submit.sh b/scripts/test_estimate_submit.sh
index e605ce7..1d4d2ac 100644
--- a/scripts/test_estimate_submit.sh
+++ b/scripts/test_estimate_submit.sh
@@ -90,6 +90,7 @@ rm -rf results
 mkdir -p results
 
 export BK_GENESIS_GPU_MLP_PROFILE="\${BK_GENESIS_GPU_MLP_PROFILE:-true}"
+export BK_GENESIS_GPU_SECTION_PACKAGE="\${BK_GENESIS_GPU_SECTION_PACKAGE:-gpu_kernel_lightgbm_v10}"
 export BK_GPU_MLP_NCU_LAUNCH_COUNT="\${BK_GPU_MLP_NCU_LAUNCH_COUNT:-20}"
 export BK_GPU_MLP_SOURCE_GPU="\${BK_GPU_MLP_SOURCE_GPU:-H100}"
 export BK_GPU_MLP_KERNEL_COUNT="\${BK_GPU_MLP_KERNEL_COUNT:-20}"
diff --git a/scripts/tests/test_bk_profiler.sh b/scripts/tests/test_bk_profiler.sh
index 3fe1b1a..79e4339 100644
--- a/scripts/tests/test_bk_profiler.sh
+++ b/scripts/tests/test_bk_profiler.sh
@@ -170,12 +170,23 @@ bk_profiler ncu --level single --archive "$ncu_archive" --raw-dir "$ncu_raw" --
 mkdir -p "$ncu_extract"
 tar -xzf "$ncu_archive" -C "$ncu_extract"
 test -f "${ncu_extract}/bk_profiler_artifact/meta.json"
-test -f "${ncu_extract}/bk_profiler_artifact/raw/rep1/profile.ncu-rep"
+! test -f "${ncu_extract}/bk_profiler_artifact/raw/rep1/profile.ncu-rep"
 test -f "${ncu_extract}/bk_profiler_artifact/reports/ncu_import_rep1.txt"
 grep -q '"tool": "ncu"' "${ncu_extract}/bk_profiler_artifact/meta.json"
-grep -q '"kind": "ncu_report"' "${ncu_extract}/bk_profiler_artifact/meta.json"
+! grep -q '"kind": "ncu_report"' "${ncu_extract}/bk_profiler_artifact/meta.json"
 grep -q '"ncu_options": \["--target-processes", "all", "--set", "basic", "--launch-count", "1"\]' "${ncu_extract}/bk_profiler_artifact/meta.json"
 
+ncu_report_archive="${TMP_DIR}/ncu_report.tgz"
+ncu_report_extract="${TMP_DIR}/ncu_report_extract"
+ncu_report_raw="${TMP_DIR}/ncu_report_pa"
+export BK_PROFILER_ARCHIVE_NCU_REPORT=true
+bk_profiler ncu --level single --archive "$ncu_report_archive" --raw-dir "$ncu_report_raw" -- bash -c 'printf "ncu report target\n"'
+unset BK_PROFILER_ARCHIVE_NCU_REPORT
+mkdir -p "$ncu_report_extract"
+tar -xzf "$ncu_report_archive" -C "$ncu_report_extract"
+test -f "${ncu_report_extract}/bk_profiler_artifact/raw/rep1/profile.ncu-rep"
+grep -q '"kind": "ncu_report"' "${ncu_report_extract}/bk_profiler_artifact/meta.json"
+
 ncu_detailed_archive="${TMP_DIR}/ncu_detailed.tgz"
 ncu_detailed_extract="${TMP_DIR}/ncu_detailed_extract"
 ncu_detailed_raw="${TMP_DIR}/ncu_detailed_pa"
@@ -193,7 +204,9 @@ unset BK_PROFILER_NCU_RAW_CSV
 mkdir -p "$ncu_raw_csv_extract"
 tar -xzf "$ncu_raw_csv_archive" -C "$ncu_raw_csv_extract"
 test -f "${ncu_raw_csv_extract}/bk_profiler_artifact/raw/rep1/profile_raw.csv"
+! test -f "${ncu_raw_csv_extract}/bk_profiler_artifact/raw/rep1/profile.ncu-rep"
 grep -q '"kind": "ncu_raw_csv"' "${ncu_raw_csv_extract}/bk_profiler_artifact/meta.json"
+! grep -q '"kind": "ncu_report"' "${ncu_raw_csv_extract}/bk_profiler_artifact/meta.json"
 
 fapp_fail_archive="${TMP_DIR}/fapp_fail.tgz"
 fapp_fail_extract="${TMP_DIR}/fapp_fail_extract"
@@ -226,6 +239,6 @@ test "$ncu_fail_status" -eq 42
 mkdir -p "$ncu_fail_extract"
 tar -xzf "$ncu_fail_archive" -C "$ncu_fail_extract"
 test -f "${ncu_fail_extract}/bk_profiler_artifact/meta.json"
-test -f "${ncu_fail_extract}/bk_profiler_artifact/raw/rep1/profile.ncu-rep"
+! test -f "${ncu_fail_extract}/bk_profiler_artifact/raw/rep1/profile.ncu-rep"
 
 echo "bk_profiler tests passed"
diff --git a/scripts/tests/test_estimation_gpu_kernel_lightgbm_v10.sh b/scripts/tests/test_estimation_gpu_kernel_lightgbm_v10.sh
new file mode 100644
index 0000000..bf1be54
--- /dev/null
+++ b/scripts/tests/test_estimation_gpu_kernel_lightgbm_v10.sh
@@ -0,0 +1,74 @@
+#!/bin/bash
+set -euo pipefail
+
+SCRIPT_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
+REPO_DIR=$(cd "${SCRIPT_DIR}/../.." && pwd)
+
+TMP_DIR=$(mktemp -d)
+trap 'rm -rf "${TMP_DIR}"' EXIT
+
+if ! command -v jq >/dev/null 2>&1; then
+  echo "jq not found; skipping gpu_kernel_lightgbm_v10 estimation test"
+  exit 0
+fi
+if ! command -v python3 >/dev/null 2>&1; then
+  echo "python3 not found; skipping gpu_kernel_lightgbm_v10 estimation test"
+  exit 0
+fi
+
+cat > "${TMP_DIR}/lightgbm_pred.csv" <<'EOF'
+meta-kernel,meta-src_gpu,meta-tgt_gpu,O-Execution Time,O-Memory Throughput [%],O-Achieved Occupancy,O-breakdown_memory,O-breakdown_pipeline_contention,O-breakdown_sync,O-breakdown_scheduling_overhead
+kern_inter,H100,A100,1000,51.5,70.1,0.5,0.2,0.1,0.2
+kern_intra,H100,A100,2000,48.0,69.0,0.4,0.3,0.1,0.2
+EOF
+
+cat > "${TMP_DIR}/breakdown.json" <<EOF
+{
+  "sections": [
+    {
+      "name": "gpu_kernel_region",
+      "bench_time": 0.009,
+      "estimation_package": "gpu_kernel_lightgbm_v10",
+      "artifacts": [
+        {"path": "${TMP_DIR}/lightgbm_pred.csv"}
+      ]
+    },
+    {
+      "name": "cpu_tail",
+      "bench_time": 0.001,
+      "estimation_package": "identity"
+    }
+  ],
+  "overlaps": []
+}
+EOF
+
+pushd "${REPO_DIR}" >/dev/null
+source scripts/estimation/common.sh
+source scripts/estimation/packages/instrumented_app_sections_dummy.sh
+
+export BK_GPU_LIGHTGBM_ARTIFACT_MODE="prediction"
+export BK_GPU_LIGHTGBM_PYTHON="python3"
+
+transformed=$(bk_top_level_transform_breakdown "$(cat "${TMP_DIR}/breakdown.json")" "1" "1" "1" "identity" "identity")
+popd >/dev/null
+
+echo "$transformed" | jq -e '
+  (.sections | length == 2) and
+  .sections[0].name == "gpu_kernel_region" and
+  .sections[0].time == 0.000003 and
+  .sections[0].bench_time == 0.009 and
+  .sections[0].scaling_method == "gpu-kernel-lightgbm-v1.0" and
+  .sections[0].estimation_package == "gpu_kernel_lightgbm_v10" and
+  .sections[0].package_applicability.status == "applicable" and
+  .sections[0].metrics.kernel_count == 2 and
+  .sections[0].metrics.time_column == "O-Execution Time" and
+  .sections[0].metrics.source_gpus == ["H100"] and
+  .sections[0].metrics.target_gpus == ["A100"] and
+  .sections[0].metrics.kernels[0].name == "kern_inter" and
+  .sections[0].metrics.kernels[0].metrics."O-Memory Throughput [%]" == 51.5 and
+  .sections[0].artifacts[-1].kind == "gpu_lightgbm_prediction_csv" and
+  .sections[1].time == 0.001
+' >/dev/null
+
+echo "gpu_kernel_lightgbm_v10 section estimation test passed"
diff --git a/scripts/tests/test_genesis_gpu_mlp_estimation.sh b/scripts/tests/test_genesis_gpu_mlp_estimation.sh
index e28c40b..7acb476 100644
--- a/scripts/tests/test_genesis_gpu_mlp_estimation.sh
+++ b/scripts/tests/test_genesis_gpu_mlp_estimation.sh
@@ -28,9 +28,14 @@ grep -q 'profiler archive was not found' results/no_archive.err
 
 touch results/padata0.tgz
 genesis_emit_estimation_data_from_fom 10 > results/with_archive.result
-grep -q '^SECTION:gpu_kernel_region ' results/with_archive.result
+grep -q '^SECTION:gpu_kernel_region time:10 estimation_package:gpu_kernel_lightgbm_v10 ' results/with_archive.result
 grep -q 'artifact:results/padata0.tgz' results/with_archive.result
 
+BK_GENESIS_GPU_SECTION_PACKAGE=gpu_kernel_mlp_v15 \
+  bash -c 'source programs/genesis/estimate.sh; genesis_emit_estimation_data_from_fom 10' \
+  > results/with_mlp_archive.result
+grep -q '^SECTION:gpu_kernel_region time:10 estimation_package:gpu_kernel_mlp_v15 ' results/with_mlp_archive.result
+
 mkdir -p genesis_benchmark_input/npt/genesis2.0beta_3.5fs/apoa1
 GENESIS_BENCHKIT_ROOT="$PWD" \
   bash -c 'source programs/genesis/estimate.sh; cd genesis_benchmark_input/npt/genesis2.0beta_3.5fs/apoa1; export BK_GENESIS_GPU_MLP_PROFILE=true; genesis_emit_estimation_data_from_fom 10' \

From da13e3affa2ce7357f5fccbe56d438c4658b5788 Mon Sep 17 00:00:00 2001
From: Yoshifumi Nakamura <nakamura@riken.jp>
Date: Tue, 16 Jun 2026 22:44:11 +0900
Subject: [PATCH 2/4] Fail estimation when GPU predictors fail

Propagate section package transform failures through the top-level breakdown dispatcher so external predictor failures cannot produce a green estimate job with missing values.

Make the MLP and LightGBM predictor wrappers fail explicitly when inference exits non-zero or no prediction CSV is produced, and cover the LightGBM failure path in shell tests.

Signed-off-by: Yoshifumi Nakamura <nakamura@riken.jp>
---
 .../packages/top_level_package_common.sh      | 22 ++++++----
 .../gpu_kernel_lightgbm_v10.sh                | 12 +++++-
 .../section_packages/gpu_kernel_mlp_v15.sh    | 12 +++++-
 ...test_estimation_gpu_kernel_lightgbm_v10.sh | 40 +++++++++++++++++++
 4 files changed, 75 insertions(+), 11 deletions(-)

diff --git a/scripts/estimation/packages/top_level_package_common.sh b/scripts/estimation/packages/top_level_package_common.sh
index 49d4693..00ac4df 100644
--- a/scripts/estimation/packages/top_level_package_common.sh
+++ b/scripts/estimation/packages/top_level_package_common.sh
@@ -100,6 +100,7 @@ bk_top_level_dispatch_bound_item() {
   local fallback_target
   local check_result
   local missing_inputs_json
+  local transformed_item
 
   package_name=$(echo "$item_json" | jq -r '.estimation_package // empty')
   if [[ -z "$package_name" ]]; then
@@ -122,7 +123,11 @@ bk_top_level_dispatch_bound_item() {
       check_result=$(bk_top_level_unsupported_bound_package_result "$package_name" "$item_kind")
     fi
     if declare -F "$fn_name" >/dev/null 2>&1 && [[ "$(echo "$check_result" | jq -r '.status // "not_applicable"')" == "applicable" ]]; then
-      "$fn_name" "$item_json" "$target_nodes" "$bench_nodes" "$default_factor" "$item_kind"
+      if ! transformed_item=$("$fn_name" "$item_json" "$target_nodes" "$bench_nodes" "$default_factor" "$item_kind"); then
+        echo "ERROR: section package ${package_name} failed for ${item_kind}" >&2
+        return 1
+      fi
+      printf '%s\n' "$transformed_item"
       return 0
     fi
 
@@ -163,6 +168,7 @@ bk_top_level_transform_breakdown() {
   local sections_out=()
   local overlaps_out=()
   local item_json
+  local transformed_item
 
   if [[ -z "$breakdown_json" || "$breakdown_json" == "null" ]]; then
     echo ""
@@ -171,16 +177,18 @@ bk_top_level_transform_breakdown() {
 
   while IFS= read -r item_json; do
     [[ -z "$item_json" ]] && continue
-    sections_out+=("$(
-      bk_top_level_dispatch_bound_item "$item_json" "$target_nodes" "$bench_nodes" "$default_factor" "section" "$default_section_package"
-    )")
+    if ! transformed_item=$(bk_top_level_dispatch_bound_item "$item_json" "$target_nodes" "$bench_nodes" "$default_factor" "section" "$default_section_package"); then
+      return 1
+    fi
+    sections_out+=("$transformed_item")
   done < <(echo "$breakdown_json" | jq -c '.sections // [] | .[]')
 
   while IFS= read -r item_json; do
     [[ -z "$item_json" ]] && continue
-    overlaps_out+=("$(
-      bk_top_level_dispatch_bound_item "$item_json" "$target_nodes" "$bench_nodes" "$default_factor" "overlap" "$default_overlap_package"
-    )")
+    if ! transformed_item=$(bk_top_level_dispatch_bound_item "$item_json" "$target_nodes" "$bench_nodes" "$default_factor" "overlap" "$default_overlap_package"); then
+      return 1
+    fi
+    overlaps_out+=("$transformed_item")
   done < <(echo "$breakdown_json" | jq -c '.overlaps // [] | .[]')
 
   jq -cn \
diff --git a/scripts/estimation/section_packages/gpu_kernel_lightgbm_v10.sh b/scripts/estimation/section_packages/gpu_kernel_lightgbm_v10.sh
index 4f9863d..161bffe 100644
--- a/scripts/estimation/section_packages/gpu_kernel_lightgbm_v10.sh
+++ b/scripts/estimation/section_packages/gpu_kernel_lightgbm_v10.sh
@@ -480,7 +480,7 @@ _bk_gpu_lightgbm_run_predictor() {
   prediction_csv_abs=$(_bk_gpu_lightgbm_abs_path "$prediction_csv")
   prediction_log_abs=$(_bk_gpu_lightgbm_abs_path "$prediction_log")
 
-  (
+  if ! (
     cd "$model_dir"
     "$python_bin" AI_model/run_inference.py \
       --src_gpu="$source_gpu" \
@@ -488,7 +488,15 @@ _bk_gpu_lightgbm_run_predictor() {
       --ncu_csv="$input_csv_abs" \
       --out="$prediction_csv_abs" \
       --log="$prediction_log_abs"
-  ) >/dev/null
+  ) >/dev/null; then
+    echo "ERROR: PerfTools LightGBM_model/1.0 inference failed" >&2
+    return 1
+  fi
+
+  if [[ ! -s "$prediction_csv_abs" ]]; then
+    echo "ERROR: PerfTools LightGBM_model/1.0 did not create prediction CSV: ${prediction_csv_abs}" >&2
+    return 1
+  fi
 
   printf '%s\t%s\t%s\n' "$prediction_csv" "$input_csv" "$prediction_log"
 }
diff --git a/scripts/estimation/section_packages/gpu_kernel_mlp_v15.sh b/scripts/estimation/section_packages/gpu_kernel_mlp_v15.sh
index 93b826a..18b0fda 100644
--- a/scripts/estimation/section_packages/gpu_kernel_mlp_v15.sh
+++ b/scripts/estimation/section_packages/gpu_kernel_mlp_v15.sh
@@ -469,14 +469,22 @@ _bk_gpu_mlp_run_predictor() {
   prediction_csv_abs=$(_bk_gpu_mlp_abs_existing_path "$prediction_csv")
   prediction_log_abs=$(_bk_gpu_mlp_abs_existing_path "$prediction_log")
 
-  (
+  if ! (
     cd "$root"
     "$python_bin" MLP_NN/v1.5/predict_v15.py \
       --csv "$input_csv_abs" \
       --row "${BK_GPU_MLP_ROW:-all}" \
       --out "$prediction_csv_abs" \
       --log "$prediction_log_abs"
-  ) >/dev/null
+  ) >/dev/null; then
+    echo "ERROR: PerfTools MLP_NN/v1.5 inference failed" >&2
+    return 1
+  fi
+
+  if [[ ! -s "$prediction_csv_abs" ]]; then
+    echo "ERROR: PerfTools MLP_NN/v1.5 did not create prediction CSV: ${prediction_csv_abs}" >&2
+    return 1
+  fi
 
   printf '%s\n' "$prediction_csv"
 }
diff --git a/scripts/tests/test_estimation_gpu_kernel_lightgbm_v10.sh b/scripts/tests/test_estimation_gpu_kernel_lightgbm_v10.sh
index bf1be54..d085ccb 100644
--- a/scripts/tests/test_estimation_gpu_kernel_lightgbm_v10.sh
+++ b/scripts/tests/test_estimation_gpu_kernel_lightgbm_v10.sh
@@ -71,4 +71,44 @@ echo "$transformed" | jq -e '
   .sections[1].time == 0.001
 ' >/dev/null
 
+FAKE_PERFTOOLS="${TMP_DIR}/PerfTools"
+mkdir -p "${FAKE_PERFTOOLS}/LightGBM_model/1.0/AI_model"
+cat > "${FAKE_PERFTOOLS}/LightGBM_model/1.0/AI_model/run_inference.py" <<'PY'
+raise SystemExit(7)
+PY
+
+cat > "${TMP_DIR}/input.csv" <<'EOF'
+Kernel Name,Duration [ns]
+probe_kernel,1000
+EOF
+
+cat > "${TMP_DIR}/breakdown_input.json" <<EOF
+{
+  "sections": [
+    {
+      "name": "gpu_kernel_region",
+      "bench_time": 0.011,
+      "estimation_package": "gpu_kernel_lightgbm_v10",
+      "artifacts": [
+        {"path": "${TMP_DIR}/input.csv"}
+      ]
+    }
+  ],
+  "overlaps": []
+}
+EOF
+
+pushd "${REPO_DIR}" >/dev/null
+export BK_GPU_LIGHTGBM_ARTIFACT_MODE="input"
+export BK_GPU_LIGHTGBM_PERFTOOLS_ROOT="${FAKE_PERFTOOLS}"
+export BK_GPU_LIGHTGBM_OUTPUT_DIR="${TMP_DIR}/lightgbm_outputs"
+if bk_top_level_transform_breakdown "$(cat "${TMP_DIR}/breakdown_input.json")" "1" "1" "1" "identity" "identity" >/tmp/benchkit-lightgbm-unexpected.out 2>"${TMP_DIR}/lightgbm_failure.err"; then
+  echo "expected failing LightGBM predictor to fail the transform" >&2
+  cat /tmp/benchkit-lightgbm-unexpected.out >&2
+  exit 1
+fi
+popd >/dev/null
+grep -q "PerfTools LightGBM_model/1.0 inference failed" "${TMP_DIR}/lightgbm_failure.err"
+grep -q "section package gpu_kernel_lightgbm_v10 failed" "${TMP_DIR}/lightgbm_failure.err"
+
 echo "gpu_kernel_lightgbm_v10 section estimation test passed"

From ca3ca67233dd405523a5973b732d164feb55f230 Mon Sep 17 00:00:00 2001
From: Yoshifumi Nakamura <nakamura@riken.jp>
Date: Tue, 16 Jun 2026 23:17:00 +0900
Subject: [PATCH 3/4] Normalize current weak-scaling section packages

Treat recorded/current weak-scaling breakdowns as identity/logp projections instead of carrying cross-system GPU section packages into the current side.

This removes misleading fallback metadata such as gpu_kernel_lightgbm_v10 -> identity from current_system while preserving the future-side GPU estimator result.

Signed-off-by: Yoshifumi Nakamura <nakamura@riken.jp>
---
 scripts/estimation/common.sh                  |  3 ++
 scripts/estimation/packages/weakscaling.sh    | 28 +++++++++++++--
 ...test_estimation_gpu_kernel_lightgbm_v10.sh | 34 +++++++++++++++++++
 3 files changed, 63 insertions(+), 2 deletions(-)

diff --git a/scripts/estimation/common.sh b/scripts/estimation/common.sh
index e5c0fdb..1e9d817 100644
--- a/scripts/estimation/common.sh
+++ b/scripts/estimation/common.sh
@@ -291,6 +291,9 @@ bk_estimation_run_recorded_current_with_weakscaling() {
   if [[ -z "$baseline_breakdown" || "$baseline_breakdown" == "null" ]]; then
     baseline_breakdown="$est_input_fom_breakdown"
   fi
+  if declare -F bk_estimation_package_normalize_recorded_current_breakdown >/dev/null 2>&1; then
+    baseline_breakdown=$(bk_estimation_package_normalize_recorded_current_breakdown "$baseline_breakdown")
+  fi
 
   est_current_system="$baseline_system"
   est_current_target_nodes="$current_target_nodes"
diff --git a/scripts/estimation/packages/weakscaling.sh b/scripts/estimation/packages/weakscaling.sh
index 7249f90..6401d33 100644
--- a/scripts/estimation/packages/weakscaling.sh
+++ b/scripts/estimation/packages/weakscaling.sh
@@ -126,6 +126,30 @@ bk_estimation_package_check_applicability() {
   return 0
 }
 
+_bk_weakscaling_normalize_breakdown_packages() {
+  local breakdown_json="$1"
+
+  echo "$breakdown_json" | jq -c '
+    .
+    | .sections = ((.sections // []) | map(
+        if (.estimation_package // "") == "logp" then
+          .
+        else
+          (. + {estimation_package: "identity"}
+           | del(.requested_estimation_package, .fallback_used, .package_applicability, .scaling_method, .model, .metrics))
+        end
+      ))
+    | .overlaps = ((.overlaps // []) | map(
+        . + {estimation_package: "identity"}
+        | del(.requested_estimation_package, .fallback_used, .package_applicability, .scaling_method, .model, .metrics)
+      ))
+  '
+}
+
+bk_estimation_package_normalize_recorded_current_breakdown() {
+  _bk_weakscaling_normalize_breakdown_packages "$1"
+}
+
 bk_estimation_package_run() {
   local current_system="${BK_ESTIMATION_CURRENT_SYSTEM:-$est_system}"
   local future_system="${BK_ESTIMATION_FUTURE_SYSTEM:-$est_system}"
@@ -146,7 +170,7 @@ bk_estimation_package_run() {
   est_current_bench_numproc_node="$est_numproc_node"
   est_current_bench_timestamp="$est_timestamp"
   est_current_bench_uuid="$est_uuid"
-  est_current_fom_breakdown=$(bk_top_level_transform_breakdown "$est_input_fom_breakdown" "$current_target_nodes" "$est_node_count" "1" "identity" "identity")
+  est_current_fom_breakdown=$(bk_top_level_transform_breakdown "$(_bk_weakscaling_normalize_breakdown_packages "$est_input_fom_breakdown")" "$current_target_nodes" "$est_node_count" "1" "identity" "identity")
   est_current_fom=$(bk_top_level_breakdown_total_time "$est_current_fom_breakdown")
 
   est_future_system="$future_system"
@@ -158,7 +182,7 @@ bk_estimation_package_run() {
   est_future_bench_numproc_node="$est_numproc_node"
   est_future_bench_timestamp="$est_timestamp"
   est_future_bench_uuid="$est_uuid"
-  est_future_fom_breakdown=$(bk_top_level_transform_breakdown "$est_input_fom_breakdown" "$future_target_nodes" "$est_node_count" "1" "identity" "identity")
+  est_future_fom_breakdown=$(bk_top_level_transform_breakdown "$(_bk_weakscaling_normalize_breakdown_packages "$est_input_fom_breakdown")" "$future_target_nodes" "$est_node_count" "1" "identity" "identity")
   est_future_fom=$(bk_top_level_breakdown_total_time "$est_future_fom_breakdown")
 
   applicability_issues_json=$(jq -cn \
diff --git a/scripts/tests/test_estimation_gpu_kernel_lightgbm_v10.sh b/scripts/tests/test_estimation_gpu_kernel_lightgbm_v10.sh
index d085ccb..7804e2d 100644
--- a/scripts/tests/test_estimation_gpu_kernel_lightgbm_v10.sh
+++ b/scripts/tests/test_estimation_gpu_kernel_lightgbm_v10.sh
@@ -111,4 +111,38 @@ popd >/dev/null
 grep -q "PerfTools LightGBM_model/1.0 inference failed" "${TMP_DIR}/lightgbm_failure.err"
 grep -q "section package gpu_kernel_lightgbm_v10 failed" "${TMP_DIR}/lightgbm_failure.err"
 
+cat > "${TMP_DIR}/weak_breakdown.json" <<'EOF'
+{
+  "sections": [
+    {
+      "name": "gpu_kernel_region",
+      "bench_time": 10,
+      "estimation_package": "gpu_kernel_lightgbm_v10"
+    },
+    {
+      "name": "comm",
+      "bench_time": 2,
+      "estimation_package": "logp"
+    }
+  ],
+  "overlaps": []
+}
+EOF
+
+pushd "${REPO_DIR}" >/dev/null
+source scripts/estimation/packages/weakscaling.sh
+normalized=$(bk_estimation_package_normalize_recorded_current_breakdown "$(cat "${TMP_DIR}/weak_breakdown.json")")
+current_breakdown=$(bk_top_level_transform_breakdown "$normalized" "1" "1" "1" "identity" "identity")
+popd >/dev/null
+
+echo "$current_breakdown" | jq -e '
+  .sections[0].name == "gpu_kernel_region" and
+  .sections[0].estimation_package == "identity" and
+  .sections[0].time == 10 and
+  (.sections[0].requested_estimation_package // "") == "" and
+  (.sections[0].fallback_used // "") == "" and
+  .sections[1].name == "comm" and
+  .sections[1].estimation_package == "logp"
+' >/dev/null
+
 echo "gpu_kernel_lightgbm_v10 section estimation test passed"

From f86e62cbf85cdde4e5622ef95be8104153b12b1a Mon Sep 17 00:00:00 2001
From: Yoshifumi Nakamura <nakamura@riken.jp>
Date: Wed, 17 Jun 2026 08:32:54 +0900
Subject: [PATCH 4/4] Clarify GPU estimator package selection docs

Document GPU kernel estimator packages as selectable options rather than canonical defaults. Note that GENESIS can choose the section package through BK_GENESIS_GPU_SECTION_PACKAGE during bring-up, and keep qws smoke guidance scoped to the MLP package.

Signed-off-by: Yoshifumi Nakamura <nakamura@riken.jp>
---
 docs/guides/add-estimation-package.md | 26 +++++++++++++++++++++-----
 1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/docs/guides/add-estimation-package.md b/docs/guides/add-estimation-package.md
index 475590a..36f407b 100644
--- a/docs/guides/add-estimation-package.md
+++ b/docs/guides/add-estimation-package.md
@@ -70,17 +70,31 @@ section package はもっと小さくてかまいません。
 ここでは「1 区間の変換規則」に集中し、Estimate JSON 全体の組み立てや current / future の side 管理は BenchKit 共通層や top-level package 側へ寄せる方が自然です。
 
 GPU kernel 単位の外部推定ツールは、通常は section package として扱います。
-たとえば `gpu_kernel_mlp_v15` は PerfTools の `MLP_NN/v1.5`、`gpu_kernel_lightgbm_v10` は PerfTools の `LightGBM_model/1.0` を「GPU 区間だけを変換する package」として接続します。
+たとえば次の package は、PerfTools の各モデルを「GPU 区間だけを変換する package」として接続します。
+
+- `gpu_kernel_mlp_v15`
+  - PerfTools `MLP_NN/v1.5`
+  - 主な依存: numpy/pandas/torch
+- `gpu_kernel_lightgbm_v10`
+  - PerfTools `LightGBM_model/1.0`
+  - 主な依存: numpy/pandas/lightgbm/pyyaml と `libgomp`
+
 top-level package は `instrumented_app_sections_dummy` などのままにして、GPU 区間にだけ GPU kernel section package を割り当てます。
+どの section package を既定で使うかは app 側の bring-up 状況や CI runner/container に依存するため、このガイドでは特定の package を正解として固定しません。
+新しい package を追加した場合は、この一覧と app 側の切り替え変数から選択肢として見えるようにしてください。
 
 ```bash
-bk_declare_section --side future gpu_kernel_region gpu_kernel_mlp_v15
+gpu_section_package="${BK_GENESIS_GPU_SECTION_PACKAGE:-gpu_kernel_lightgbm_v10}"
+bk_declare_section --side future gpu_kernel_region "$gpu_section_package"
 bk_emit_declared_section --side future gpu_kernel_region "$measured_gpu_time" results/estimation_artifacts/gpu_kernel_region_input.csv
 ```
 
-GENESIS では既定は `gpu_kernel_mlp_v15` ですが、LightGBM を試す場合は次のように切り替えられます。
+GENESIS の GPU kernel section package は `BK_GENESIS_GPU_SECTION_PACKAGE` で切り替えられます。
+未指定時の既定値は接続確認中の実装に合わせて変わることがあるため、検証や再現性が必要な場合は明示してください。
 
 ```bash
+export BK_GENESIS_GPU_SECTION_PACKAGE=gpu_kernel_mlp_v15
+# or
 export BK_GENESIS_GPU_SECTION_PACKAGE=gpu_kernel_lightgbm_v10
 ```
 
@@ -111,6 +125,7 @@ qws を使って CI 配管だけを確認する場合は、実際の qws が GPU
 `BK_QWS_GPU_MLP_SMOKE_MODE=prediction` では、同梱のサンプル prediction CSV を使い、run job が `gpu_kernel_region` section と prediction CSV artifact を結果に埋め込みます。
 `BK_QWS_GPU_MLP_SMOKE_MODE=perftools` では、estimate job が PerfTools repo を checkout し、`MLP_NN/examples/example_input_mixed-src_20kernels.csv` を `predict_v15.py` に渡して prediction CSV を生成します。
 どちらのモードでも、estimate job が `gpu_kernel_mlp_v15` section package を通して Estimate JSON へ変換できることを確認します。
+LightGBM など別の GPU kernel section package は、GENESIS の `BK_GENESIS_GPU_SECTION_PACKAGE` のように app 側の切り替え変数や専用テストで確認します。
 qws の推定スクリプト単体では既定無効ですが、GPU estimator integration の立ち上げ期間中は GitLab CI 側の既定を一時的に有効化しています。
 
 ```bash
@@ -125,8 +140,9 @@ export BK_GPU_MLP_PERFTOOLS_REF=main
 `BK_QWS_GPU_MLP_SMOKE` は qws を使った配管確認用、`BK_QWS_GPU_MLP_SMOKE_MODE` は prediction fixture 取り込みと PerfTools 実行の切り替え用、`BK_ESTIMATE_RUNNER_TAG` は推定用 runner/container を手動で逃がすためのものです。
 実際の GPU profiling input と推定 runner の運用が固まったら、専用の package/runner 設定へ置き換え、これらの暫定変数は削除対象として見直してください。
 
-`perftools` smoke mode は GitHub から PerfTools を取得するため、推定 runner/container には `git` と外部接続、Python 3.12 以上が必要です。
-MLP package には numpy/pandas/torch、LightGBM package には numpy/pandas/lightgbm/pyyaml が必要です。
+`perftools` smoke mode は GitHub から PerfTools を取得するため、推定 runner/container には `git` と外部接続が必要です。
+Python とライブラリは、選択した PerfTools モデル側の要件に合わせます。
+MLP package には Python 3.11 以上と numpy/pandas/torch、LightGBM package には Python 3.11 以上と numpy/pandas/lightgbm/pyyaml、さらに LightGBM 実行用の `libgomp` が必要です。
 実運用では smoke mode ではなく、推定 runner/container に PerfTools checkout を用意し、section artifact として実アプリ由来の prepared input CSV を渡してください。
 
 ## 5. metadata に持たせるもの