cpu: Add imHist-assisted MGSC IMLI path#851
Conversation
Change-Id: If70a82b86d47732edce31a7f1c848be2698d5fda
📝 WalkthroughWalkthroughThis PR adds constant-IMLI-history (IMHist) table support to the BTBMGSC branch predictor. Four new configuration parameters control table count, history lengths, and index width. Prediction generation computes folded indices, looks up perceptron contributions, and includes them in the I-component sum. Speculative and recovery logic maintain IM state across speculative branches and mispredictions. ChangesIMHist Feature
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
[Generated by GEM5 Performance Robot] Align BTB PerformanceOverall Score
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 59a529d069
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
src/cpu/pred/btb/btb_mgsc.hh (1)
256-258: ⚡ Quick winRename new helper methods to lower_snake_case.
foldIntHistory,updateImHistState, andupdateImliCountStatedon’t follow the repository’s function/method naming rule.As per coding guidelines,
**/*.{c,cpp,cc,cxx,h,hpp,hh,hxx,py}: Use lower_snake_case for functions and methods.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cpu/pred/btb/btb_mgsc.hh` around lines 256 - 258, The three method names in btb_mgsc.hh violate the lower_snake_case rule: rename foldIntHistory -> fold_int_history, updateImHistState -> update_im_hist_state, and updateImliCountState -> update_imli_count_state; update the declarations in btb_mgsc.hh and every corresponding definition and call site in the implementation files to use the new names (including any virtual/override signatures, tests, and references), and rebuild to ensure all references are updated consistently.src/cpu/pred/btb/test/btb_mgsc.test.cc (1)
185-196: ⚡ Quick winRename the new harness method to lower_snake_case.
setOnlyIMHistTabledoes not match the repository function/method naming rule.As per coding guidelines,
**/*.{c,cpp,cc,cxx,h,hpp,hh,hxx,py}: Use lower_snake_case for functions and methods.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cpu/pred/btb/test/btb_mgsc.test.cc` around lines 185 - 196, Rename the test harness method setOnlyIMHistTable to a lower_snake_case name (e.g., set_only_im_hist_table) to comply with the repository naming rules; update the method definition and all call sites and declarations that reference setOnlyIMHistTable, keeping the body unchanged (the references to BTBMGSC::TestAccess::enableBwTable, enableLTable, enableITable, enableIMHistTable, enableGTable, enablePTable, enableBiasTable, enablePCThreshold, and forceUseSC should remain as-is).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/cpu/pred/btb/btb_mgsc.cc`:
- Around line 87-92: The loop that checks imHistHistLen[i] can index past the
end when imHistTableNum is larger than the length of imHistHistLen; add a size
guard before accessing imHistHistLen or iterate using the smaller of the two
sizes. Specifically, in the block that calls allocPredTable and then loops over
i from 0 to imHistTableNum, ensure you first verify i < imHistHistLen.size() (or
compute auto checkNum = std::min(imHistTableNum, imHistHistLen.size()) and loop
to checkNum) before evaluating imHistHistLen[i], keeping the existing assertions
on range and pow2 against imHistTableSize (variables: imHistTableNum,
imHistHistLen, imHistTableSize, allocPredTable, imHistTableIdxWidth).
- Around line 937-940: The IMHist table is being updated regardless of the
feature flag; guard updates with the enableIMHistTable check so IMHist writes
are skipped when disabled. Specifically, wrap the call to
updatePredTable(imHistTable, pred.imHistIndex, imHistTableNum, entry.pc,
actual_taken) and any related updates (e.g., updateWeightTable calls that use
pred.imHistIndex or imHistTableNum) in a conditional that tests
enableIMHistTable before invoking them, leaving existing iTable updates
(updatePredTable(iTable, ...), updateWeightTable(iWeightTable, ...)) unchanged.
- Around line 347-361: foldIntHistory currently XORs the last chunk with
foldedMask even when the remaining bits (bitsLeft) < foldedLen, leaking higher
bits; fix it by masking the final chunk down to bitsLeft before XORing: inside
the loop compute a per-iteration mask (use foldedMask when bitsLeft >=
foldedLen, otherwise (1ULL << bitsLeft) - 1), then XOR folded with (history &
currentMask) and proceed as before; this change should be applied in
BTBMGSC::foldIntHistory to ensure correct folded indices for non-multiple
histLen values.
---
Nitpick comments:
In `@src/cpu/pred/btb/btb_mgsc.hh`:
- Around line 256-258: The three method names in btb_mgsc.hh violate the
lower_snake_case rule: rename foldIntHistory -> fold_int_history,
updateImHistState -> update_im_hist_state, and updateImliCountState ->
update_imli_count_state; update the declarations in btb_mgsc.hh and every
corresponding definition and call site in the implementation files to use the
new names (including any virtual/override signatures, tests, and references),
and rebuild to ensure all references are updated consistently.
In `@src/cpu/pred/btb/test/btb_mgsc.test.cc`:
- Around line 185-196: Rename the test harness method setOnlyIMHistTable to a
lower_snake_case name (e.g., set_only_im_hist_table) to comply with the
repository naming rules; update the method definition and all call sites and
declarations that reference setOnlyIMHistTable, keeping the body unchanged (the
references to BTBMGSC::TestAccess::enableBwTable, enableLTable, enableITable,
enableIMHistTable, enableGTable, enablePTable, enableBiasTable,
enablePCThreshold, and forceUseSC should remain as-is).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c50f6acd-415d-4a08-8e75-ddfde5b8eda1
📒 Files selected for processing (4)
src/cpu/pred/BranchPredictor.pysrc/cpu/pred/btb/btb_mgsc.ccsrc/cpu/pred/btb/btb_mgsc.hhsrc/cpu/pred/btb/test/btb_mgsc.test.cc
| auto imHistTableSize = allocPredTable(imHistTable, imHistTableNum, imHistTableIdxWidth); | ||
| for (unsigned int i = 0; i < imHistTableNum; ++i) { | ||
| assert(imHistHistLen[i] >= 0); | ||
| assert(static_cast<unsigned>(imHistHistLen[i]) < 63); | ||
| assert(pow2(static_cast<unsigned>(imHistHistLen[i])) <= imHistTableSize); | ||
| } |
There was a problem hiding this comment.
Add a size guard before indexing imHistHistLen.
This loop indexes imHistHistLen[i] up to imHistTableNum; a mismatched config can cause out-of-bounds access during init.
Proposed fix
+ assert(imHistHistLen.size() == imHistTableNum);
auto imHistTableSize = allocPredTable(imHistTable, imHistTableNum, imHistTableIdxWidth);
for (unsigned int i = 0; i < imHistTableNum; ++i) {📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| auto imHistTableSize = allocPredTable(imHistTable, imHistTableNum, imHistTableIdxWidth); | |
| for (unsigned int i = 0; i < imHistTableNum; ++i) { | |
| assert(imHistHistLen[i] >= 0); | |
| assert(static_cast<unsigned>(imHistHistLen[i]) < 63); | |
| assert(pow2(static_cast<unsigned>(imHistHistLen[i])) <= imHistTableSize); | |
| } | |
| assert(imHistHistLen.size() == imHistTableNum); | |
| auto imHistTableSize = allocPredTable(imHistTable, imHistTableNum, imHistTableIdxWidth); | |
| for (unsigned int i = 0; i < imHistTableNum; ++i) { | |
| assert(imHistHistLen[i] >= 0); | |
| assert(static_cast<unsigned>(imHistHistLen[i]) < 63); | |
| assert(pow2(static_cast<unsigned>(imHistHistLen[i])) <= imHistTableSize); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/cpu/pred/btb/btb_mgsc.cc` around lines 87 - 92, The loop that checks
imHistHistLen[i] can index past the end when imHistTableNum is larger than the
length of imHistHistLen; add a size guard before accessing imHistHistLen or
iterate using the smaller of the two sizes. Specifically, in the block that
calls allocPredTable and then loops over i from 0 to imHistTableNum, ensure you
first verify i < imHistHistLen.size() (or compute auto checkNum =
std::min(imHistTableNum, imHistHistLen.size()) and loop to checkNum) before
evaluating imHistHistLen[i], keeping the existing assertions on range and pow2
against imHistTableSize (variables: imHistTableNum, imHistHistLen,
imHistTableSize, allocPredTable, imHistTableIdxWidth).
| // Update I tables | ||
| updatePredTable(iTable, pred.iIndex, iTableNum, entry.pc, actual_taken); | ||
| updatePredTable(imHistTable, pred.imHistIndex, imHistTableNum, entry.pc, actual_taken); | ||
| updateWeightTable(iWeightTable, weightTableIdx, entry.pc, pred.i_weight_scale_diff, |
There was a problem hiding this comment.
Gate IMHist training with enableIMHistTable.
The IMHist table is trained even when disabled, which breaks feature-flag semantics and performs unnecessary writes.
Proposed fix
// Update I tables
updatePredTable(iTable, pred.iIndex, iTableNum, entry.pc, actual_taken);
- updatePredTable(imHistTable, pred.imHistIndex, imHistTableNum, entry.pc, actual_taken);
+ if (enableIMHistTable) {
+ updatePredTable(imHistTable, pred.imHistIndex, imHistTableNum, entry.pc, actual_taken);
+ }
updateWeightTable(iWeightTable, weightTableIdx, entry.pc, pred.i_weight_scale_diff,
(pred.i_percsum >= 0) == actual_taken);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Update I tables | |
| updatePredTable(iTable, pred.iIndex, iTableNum, entry.pc, actual_taken); | |
| updatePredTable(imHistTable, pred.imHistIndex, imHistTableNum, entry.pc, actual_taken); | |
| updateWeightTable(iWeightTable, weightTableIdx, entry.pc, pred.i_weight_scale_diff, | |
| // Update I tables | |
| updatePredTable(iTable, pred.iIndex, iTableNum, entry.pc, actual_taken); | |
| if (enableIMHistTable) { | |
| updatePredTable(imHistTable, pred.imHistIndex, imHistTableNum, entry.pc, actual_taken); | |
| } | |
| updateWeightTable(iWeightTable, weightTableIdx, entry.pc, pred.i_weight_scale_diff, | |
| (pred.i_percsum >= 0) == actual_taken); |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/cpu/pred/btb/btb_mgsc.cc` around lines 937 - 940, The IMHist table is
being updated regardless of the feature flag; guard updates with the
enableIMHistTable check so IMHist writes are skipped when disabled.
Specifically, wrap the call to updatePredTable(imHistTable, pred.imHistIndex,
imHistTableNum, entry.pc, actual_taken) and any related updates (e.g.,
updateWeightTable calls that use pred.imHistIndex or imHistTableNum) in a
conditional that tests enableIMHistTable before invoking them, leaving existing
iTable updates (updatePredTable(iTable, ...), updateWeightTable(iWeightTable,
...)) unchanged.
🚀 Coremark Smoke Test Results
✅ Difftest smoke test passed! |
|
可以,把它讲成三层就很清楚。 1. 原来的 IMLI
for (outer = 0; outer < ...; outer++) {
for (i = 0; i < 8; i++) {
if (pattern depends on outer and i) ...
}
}旧 IMLI 能分开 2. 新增的 imHist 所以预测时不再只是: 而是额外查一张 helper table: 最后它不是单独投票,而是加回 I family: 这是“给 IMLI 加 phase memory”,让同一个 loop iteration 能根据最近在这个 iteration 位置看到的分支结果分相位。 3. imHistStoreBits 的作用 每次更新: 所以它的作用就是“限制每个 bucket 只保留最近 N 次方向”,避免历史无限增长,同时保证后面 fold index 时有足够位数可用。 举个例子,假设: 同一个 loop 位置 那么 下一次如果这个位置又看到 只保留最近 4 位。 一个具体例子 旧 IMLI 看到的都是: 它会把 taken 和 not-taken 混在一个 counter/table entry 里,最后学不稳。 新 imHist 会看到: 这样同一个 对应代码位置大概是:
一句话总结:旧 IMLI 是“loop 位置预测”,新增 imHist 后变成“loop 位置 + 该位置自己的历史相位预测”。这就是为什么它对 |
Summary
This PR adds an optional
imHist-assisted helper table for the MGSC IMLI path.The change is mainly useful as an IMLI modeling/alignment experiment: it makes the I-family much stronger in standalone SC / no-TAGE mode, but the current SPEC-level gain on the normal TAGE-on mainline path is very small. So this PR is not urgent to merge; it is mostly here to document and preserve the mechanism and data.
Basic Principle
The old MGSC IMLI path was effectively indexed by:
imliCountseparates positions inside a loop, but it cannot distinguish cases where the same inner-loop position has different behavior under different outer-loop phases. For example, "iteration 3" may be taken in one outer phase and not-taken in another. Count-only IMLI aliases those samples together.This PR adds a small helper history per IMLI count:
Prediction then uses an additional GEHL-style helper table indexed by:
The helper contribution is added into the existing I-family perceptron sum:
So this is not a separate new SC family. It is an auxiliary path for the existing IMLI/I-family, sharing the same I-family weight path.
Focused Micro-Test Results
The focused
mgsc_testprobes show the intended mechanism clearly when isolatingi_only:imli_thresholdimli_phase_shiftimli_iterimli_two_hot_positionsThe strongest evidence is
imli_phase_shift: the old IMLI aliases different outer phases at the same inner position, whileimHist[imliCount]separates them and flips the key PCs to the expected I-table-driven behavior.SPEC06 CI Results
no-TAGE / standalone SC
Baseline:
run529imHist branch:
run542Branch-counter changes line up with a real predictor improvement:
cond_MPKIdeltah264refastarperlbenchsjenggobmkThe frontend counters also move in the expected direction. For example,
fetch_nisn_totaldrops by about-8.1%onastar,-5.6%onperlbench,-5.2%onsjeng,-3.5%onh264ref, and-2.6%ongobmk.normal TAGE-on mainline path
Baseline:
run544imHist branch:
run543Branch/frontend changes are mostly noise-level:
cond_MPKIdeltah264refastarperlbenchsjenggobmkThe frontend group mean absolute change is only about
0.17%.Counter-Based Explanation
The MGSC raw counters confirm that the helper is working, but the normal TAGE-on path leaves little remaining correction headroom.
In no-TAGE mode, imHist makes IMLI percsum more accurate and increases SC net fixes:
scCorrectTageWrong - scWrongTageCorrecth264refastarWith TAGE enabled, IMLI percsum can still improve, but the net correction space is much smaller because TAGE already predicts most of those branches correctly:
scCorrectTageWrong - scWrongTageCorrecth264refastarsjengAlso, many SC uses are in TAGE high-confidence regions. For example, in the TAGE-on imHist run, weighted high/mid/low SC-use counts are approximately:
h264refgobmkSo the current interpretation is: imHist-assisted IMLI is a real improvement for standalone SC/no-TAGE behavior, but on the normal mainline path most of that information is already captured by TAGE or becomes same-direction confirmation rather than a new final correction.
Merge Note
This PR is probably better treated as an alignment/data point for now rather than an urgent performance patch. The mechanism is useful and the micro-test evidence is strong, but the mainline TAGE-on SPEC gain is currently close to noise.
Change-Id: If70a82b86d47732edce31a7f1c848be2698d5fda