Skip to content

cpu: Add imHist-assisted MGSC IMLI path#851

Open
jensen-yan wants to merge 1 commit into
xs-devfrom
xs-dev-imhist-align
Open

cpu: Add imHist-assisted MGSC IMLI path#851
jensen-yan wants to merge 1 commit into
xs-devfrom
xs-dev-imhist-align

Conversation

@jensen-yan
Copy link
Copy Markdown
Collaborator

@jensen-yan jensen-yan commented May 7, 2026

Summary

This PR adds an optional imHist-assisted helper table for the MGSC IMLI path.

The change is mainly useful as an IMLI modeling/alignment experiment: it makes the I-family much stronger in standalone SC / no-TAGE mode, but the current SPEC-level gain on the normal TAGE-on mainline path is very small. So this PR is not urgent to merge; it is mostly here to document and preserve the mechanism and data.

Basic Principle

The old MGSC IMLI path was effectively indexed by:

PC + imliCount

imliCount separates positions inside a loop, but it cannot distinguish cases where the same inner-loop position has different behavior under different outer-loop phases. For example, "iteration 3" may be taken in one outer phase and not-taken in another. Count-only IMLI aliases those samples together.

This PR adds a small helper history per IMLI count:

imHist[imliCount]

Prediction then uses an additional GEHL-style helper table indexed by:

PC + folded(imHist[imliCount])

The helper contribution is added into the existing I-family perceptron sum:

i_percsum = i_count_percsum + imHist_percsum

So this is not a separate new SC family. It is an auxiliary path for the existing IMLI/I-family, sharing the same I-family weight path.

Focused Micro-Test Results

The focused mgsc_test probes show the intended mechanism clearly when isolating i_only:

Test Old count-only IMLI condMiss imHist-assisted IMLI condMiss
imli_threshold 20001 4049
imli_phase_shift 20488 2050
imli_iter 3083 2198
imli_two_hot_positions 5128 4107

The strongest evidence is imli_phase_shift: the old IMLI aliases different outer phases at the same inner position, while imHist[imliCount] separates them and flips the key PCs to the expected I-table-driven behavior.

SPEC06 CI Results

no-TAGE / standalone SC

Baseline: run529
imHist branch: run542

Metric run529 run542 Delta
SPECint score/GHz 16.2086 16.4736 +0.2650
Overall score/GHz 18.1067 18.3297 +0.2230

Branch-counter changes line up with a real predictor improvement:

Benchmark cond_MPKI delta
h264ref -33.6%
astar -25.7%
perlbench -28.0%
sjeng -13.4%
gobmk -4.9%

The frontend counters also move in the expected direction. For example, fetch_nisn_total drops by about -8.1% on astar, -5.6% on perlbench, -5.2% on sjeng, -3.5% on h264ref, and -2.6% on gobmk.

normal TAGE-on mainline path

Baseline: run544
imHist branch: run543

Metric run544 run543 Delta
SPECint score/GHz 18.729385 18.729351 ~0
Overall score/GHz 19.93025 19.93676 +0.0065

Branch/frontend changes are mostly noise-level:

Benchmark cond_MPKI delta
h264ref -0.28%
astar +0.10%
perlbench +0.71%
sjeng -0.06%
gobmk -0.41%

The frontend group mean absolute change is only about 0.17%.

Counter-Based Explanation

The MGSC raw counters confirm that the helper is working, but the normal TAGE-on path leaves little remaining correction headroom.

In no-TAGE mode, imHist makes IMLI percsum more accurate and increases SC net fixes:

Benchmark IMLI percsum accuracy scCorrectTageWrong - scWrongTageCorrect
h264ref 78.4% -> 86.7% 361k -> 375k
astar 59.9% -> 62.7% 1067k -> 1114k

With TAGE enabled, IMLI percsum can still improve, but the net correction space is much smaller because TAGE already predicts most of those branches correctly:

Benchmark IMLI percsum accuracy scCorrectTageWrong - scWrongTageCorrect
h264ref 52.6% -> 73.0% 7.7k -> 6.3k
astar 63.8% -> 69.8% 2.0k -> 1.9k
sjeng 75.4% -> 76.7% 9.9k -> 10.5k

Also, many SC uses are in TAGE high-confidence regions. For example, in the TAGE-on imHist run, weighted high/mid/low SC-use counts are approximately:

Benchmark high / mid / low SC use
h264ref 659k / 27k / 124k
gobmk 2330k / 348k / 1149k

So the current interpretation is: imHist-assisted IMLI is a real improvement for standalone SC/no-TAGE behavior, but on the normal mainline path most of that information is already captured by TAGE or becomes same-direction confirmation rather than a new final correction.

Merge Note

This PR is probably better treated as an alignment/data point for now rather than an urgent performance patch. The mechanism is useful and the micro-test evidence is strong, but the mainline TAGE-on SPEC gain is currently close to noise.

Change-Id: If70a82b86d47732edce31a7f1c848be2698d5fda

Change-Id: If70a82b86d47732edce31a7f1c848be2698d5fda
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 7, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR adds constant-IMLI-history (IMHist) table support to the BTBMGSC branch predictor. Four new configuration parameters control table count, history lengths, and index width. Prediction generation computes folded indices, looks up perceptron contributions, and includes them in the I-component sum. Speculative and recovery logic maintain IM state across speculative branches and mispredictions.

Changes

IMHist Feature

Layer / File(s) Summary
Configuration Parameters
src/cpu/pred/BranchPredictor.py, src/cpu/pred/btb/btb_mgsc.hh
New public parameters enableIMHistTable, imHistTableNum, imHistTableIdxWidth, and imHistHistLen configure constant-IMLI-history tables.
Prediction Metadata Structures
src/cpu/pred/btb/btb_mgsc.hh
MgscPrediction adds imHistIndex vector and imHist_percsum field; MgscMeta adds imliCount and imHist state fields for recovery.
Predictor State & Table Storage
src/cpu/pred/btb/btb_mgsc.hh
BTBMGSC declares imHistTable container, per-instruction imHistIndex cache, and IM accumulation state members (imliCount, imHist).
Storage Initialization
src/cpu/pred/btb/btb_mgsc.cc
initStorage() allocates IMHist table structures, resizes index vectors, and initializes IM state based on configuration parameters.
Helper Functions
src/cpu/pred/btb/btb_mgsc.cc
New foldIntHistory(), updateImHistState(), and updateImliCountState() utilities compute folded indices and manage IM state transitions.
Prediction Generation
src/cpu/pred/btb/btb_mgsc.cc
generateSinglePrediction() computes IMHist indices, sums table perceptron contributions into I-component, stores metadata for recovery, and returns extended MgscPrediction.
Predictor Update & Recovery
src/cpu/pred/btb/btb_mgsc.cc
updateSinglePredictor() trains IMHist table; specUpdateIHist() advances IM state during speculation; recoverIHist() restores state and applies resolved outcome after misprediction.
Unit Test Support & Coverage
src/cpu/pred/btb/btb_mgsc.cc, src/cpu/pred/btb/btb_mgsc.hh, src/cpu/pred/btb/test/btb_mgsc.test.cc
Constructor wiring, test accessors, helper methods (setOnlyIMHistTable()), and new test case (IMHistTableSeparatesOuterPhaseAtSameIteration) validate IMHist prediction improvements over I-table-only configuration.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • OpenXiangShan/GEM5#686: Both modify the BTBMGSC predictor state and MgscPrediction structure alongside BranchPredictor.py parameter additions.
  • OpenXiangShan/GEM5#839: Both adjust BTBMGSC table flags, parameters, and internal MGSC configuration.
  • OpenXiangShan/GEM5#710: Both modify BTBMGSC predictor classes and test harness scaffolding in src/cpu/pred/btb/*.

Suggested labels

perf, align-kmhv3

Suggested reviewers

  • CJ362ff
  • Yakkhini

Poem

🐰 A history table hops through time,
Folding integers so fine—
IMHist joins the MGSC fold,
Making predictions less bold!
Speculative dreams now trace,
Recovery keeps the proper pace. 🌟

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: adding imHist-assisted MGSC IMLI path support with new configuration parameters and prediction logic across multiple files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch xs-dev-imhist-align

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@XiangShanRobot
Copy link
Copy Markdown

[Generated by GEM5 Performance Robot]
commit: 59a529d
workflow: gem5 Align BTB Performance Test(0.3c)

Align BTB Performance

Overall Score

PR Master Diff(%)
Score 18.73 18.73 -0.00

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 59a529d069

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/cpu/pred/btb/btb_mgsc.cc
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
src/cpu/pred/btb/btb_mgsc.hh (1)

256-258: ⚡ Quick win

Rename new helper methods to lower_snake_case.

foldIntHistory, updateImHistState, and updateImliCountState don’t follow the repository’s function/method naming rule.

As per coding guidelines, **/*.{c,cpp,cc,cxx,h,hpp,hh,hxx,py}: Use lower_snake_case for functions and methods.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cpu/pred/btb/btb_mgsc.hh` around lines 256 - 258, The three method names
in btb_mgsc.hh violate the lower_snake_case rule: rename foldIntHistory ->
fold_int_history, updateImHistState -> update_im_hist_state, and
updateImliCountState -> update_imli_count_state; update the declarations in
btb_mgsc.hh and every corresponding definition and call site in the
implementation files to use the new names (including any virtual/override
signatures, tests, and references), and rebuild to ensure all references are
updated consistently.
src/cpu/pred/btb/test/btb_mgsc.test.cc (1)

185-196: ⚡ Quick win

Rename the new harness method to lower_snake_case.

setOnlyIMHistTable does not match the repository function/method naming rule.

As per coding guidelines, **/*.{c,cpp,cc,cxx,h,hpp,hh,hxx,py}: Use lower_snake_case for functions and methods.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cpu/pred/btb/test/btb_mgsc.test.cc` around lines 185 - 196, Rename the
test harness method setOnlyIMHistTable to a lower_snake_case name (e.g.,
set_only_im_hist_table) to comply with the repository naming rules; update the
method definition and all call sites and declarations that reference
setOnlyIMHistTable, keeping the body unchanged (the references to
BTBMGSC::TestAccess::enableBwTable, enableLTable, enableITable,
enableIMHistTable, enableGTable, enablePTable, enableBiasTable,
enablePCThreshold, and forceUseSC should remain as-is).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/cpu/pred/btb/btb_mgsc.cc`:
- Around line 87-92: The loop that checks imHistHistLen[i] can index past the
end when imHistTableNum is larger than the length of imHistHistLen; add a size
guard before accessing imHistHistLen or iterate using the smaller of the two
sizes. Specifically, in the block that calls allocPredTable and then loops over
i from 0 to imHistTableNum, ensure you first verify i < imHistHistLen.size() (or
compute auto checkNum = std::min(imHistTableNum, imHistHistLen.size()) and loop
to checkNum) before evaluating imHistHistLen[i], keeping the existing assertions
on range and pow2 against imHistTableSize (variables: imHistTableNum,
imHistHistLen, imHistTableSize, allocPredTable, imHistTableIdxWidth).
- Around line 937-940: The IMHist table is being updated regardless of the
feature flag; guard updates with the enableIMHistTable check so IMHist writes
are skipped when disabled. Specifically, wrap the call to
updatePredTable(imHistTable, pred.imHistIndex, imHistTableNum, entry.pc,
actual_taken) and any related updates (e.g., updateWeightTable calls that use
pred.imHistIndex or imHistTableNum) in a conditional that tests
enableIMHistTable before invoking them, leaving existing iTable updates
(updatePredTable(iTable, ...), updateWeightTable(iWeightTable, ...)) unchanged.
- Around line 347-361: foldIntHistory currently XORs the last chunk with
foldedMask even when the remaining bits (bitsLeft) < foldedLen, leaking higher
bits; fix it by masking the final chunk down to bitsLeft before XORing: inside
the loop compute a per-iteration mask (use foldedMask when bitsLeft >=
foldedLen, otherwise (1ULL << bitsLeft) - 1), then XOR folded with (history &
currentMask) and proceed as before; this change should be applied in
BTBMGSC::foldIntHistory to ensure correct folded indices for non-multiple
histLen values.

---

Nitpick comments:
In `@src/cpu/pred/btb/btb_mgsc.hh`:
- Around line 256-258: The three method names in btb_mgsc.hh violate the
lower_snake_case rule: rename foldIntHistory -> fold_int_history,
updateImHistState -> update_im_hist_state, and updateImliCountState ->
update_imli_count_state; update the declarations in btb_mgsc.hh and every
corresponding definition and call site in the implementation files to use the
new names (including any virtual/override signatures, tests, and references),
and rebuild to ensure all references are updated consistently.

In `@src/cpu/pred/btb/test/btb_mgsc.test.cc`:
- Around line 185-196: Rename the test harness method setOnlyIMHistTable to a
lower_snake_case name (e.g., set_only_im_hist_table) to comply with the
repository naming rules; update the method definition and all call sites and
declarations that reference setOnlyIMHistTable, keeping the body unchanged (the
references to BTBMGSC::TestAccess::enableBwTable, enableLTable, enableITable,
enableIMHistTable, enableGTable, enablePTable, enableBiasTable,
enablePCThreshold, and forceUseSC should remain as-is).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c50f6acd-415d-4a08-8e75-ddfde5b8eda1

📥 Commits

Reviewing files that changed from the base of the PR and between c345ad8 and 59a529d.

📒 Files selected for processing (4)
  • src/cpu/pred/BranchPredictor.py
  • src/cpu/pred/btb/btb_mgsc.cc
  • src/cpu/pred/btb/btb_mgsc.hh
  • src/cpu/pred/btb/test/btb_mgsc.test.cc

Comment on lines +87 to +92
auto imHistTableSize = allocPredTable(imHistTable, imHistTableNum, imHistTableIdxWidth);
for (unsigned int i = 0; i < imHistTableNum; ++i) {
assert(imHistHistLen[i] >= 0);
assert(static_cast<unsigned>(imHistHistLen[i]) < 63);
assert(pow2(static_cast<unsigned>(imHistHistLen[i])) <= imHistTableSize);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add a size guard before indexing imHistHistLen.

This loop indexes imHistHistLen[i] up to imHistTableNum; a mismatched config can cause out-of-bounds access during init.

Proposed fix
+    assert(imHistHistLen.size() == imHistTableNum);
     auto imHistTableSize = allocPredTable(imHistTable, imHistTableNum, imHistTableIdxWidth);
     for (unsigned int i = 0; i < imHistTableNum; ++i) {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
auto imHistTableSize = allocPredTable(imHistTable, imHistTableNum, imHistTableIdxWidth);
for (unsigned int i = 0; i < imHistTableNum; ++i) {
assert(imHistHistLen[i] >= 0);
assert(static_cast<unsigned>(imHistHistLen[i]) < 63);
assert(pow2(static_cast<unsigned>(imHistHistLen[i])) <= imHistTableSize);
}
assert(imHistHistLen.size() == imHistTableNum);
auto imHistTableSize = allocPredTable(imHistTable, imHistTableNum, imHistTableIdxWidth);
for (unsigned int i = 0; i < imHistTableNum; ++i) {
assert(imHistHistLen[i] >= 0);
assert(static_cast<unsigned>(imHistHistLen[i]) < 63);
assert(pow2(static_cast<unsigned>(imHistHistLen[i])) <= imHistTableSize);
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cpu/pred/btb/btb_mgsc.cc` around lines 87 - 92, The loop that checks
imHistHistLen[i] can index past the end when imHistTableNum is larger than the
length of imHistHistLen; add a size guard before accessing imHistHistLen or
iterate using the smaller of the two sizes. Specifically, in the block that
calls allocPredTable and then loops over i from 0 to imHistTableNum, ensure you
first verify i < imHistHistLen.size() (or compute auto checkNum =
std::min(imHistTableNum, imHistHistLen.size()) and loop to checkNum) before
evaluating imHistHistLen[i], keeping the existing assertions on range and pow2
against imHistTableSize (variables: imHistTableNum, imHistHistLen,
imHistTableSize, allocPredTable, imHistTableIdxWidth).

Comment thread src/cpu/pred/btb/btb_mgsc.cc
Comment on lines 937 to 940
// Update I tables
updatePredTable(iTable, pred.iIndex, iTableNum, entry.pc, actual_taken);
updatePredTable(imHistTable, pred.imHistIndex, imHistTableNum, entry.pc, actual_taken);
updateWeightTable(iWeightTable, weightTableIdx, entry.pc, pred.i_weight_scale_diff,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Gate IMHist training with enableIMHistTable.

The IMHist table is trained even when disabled, which breaks feature-flag semantics and performs unnecessary writes.

Proposed fix
         // Update I tables
         updatePredTable(iTable, pred.iIndex, iTableNum, entry.pc, actual_taken);
-        updatePredTable(imHistTable, pred.imHistIndex, imHistTableNum, entry.pc, actual_taken);
+        if (enableIMHistTable) {
+            updatePredTable(imHistTable, pred.imHistIndex, imHistTableNum, entry.pc, actual_taken);
+        }
         updateWeightTable(iWeightTable, weightTableIdx, entry.pc, pred.i_weight_scale_diff,
                           (pred.i_percsum >= 0) == actual_taken);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Update I tables
updatePredTable(iTable, pred.iIndex, iTableNum, entry.pc, actual_taken);
updatePredTable(imHistTable, pred.imHistIndex, imHistTableNum, entry.pc, actual_taken);
updateWeightTable(iWeightTable, weightTableIdx, entry.pc, pred.i_weight_scale_diff,
// Update I tables
updatePredTable(iTable, pred.iIndex, iTableNum, entry.pc, actual_taken);
if (enableIMHistTable) {
updatePredTable(imHistTable, pred.imHistIndex, imHistTableNum, entry.pc, actual_taken);
}
updateWeightTable(iWeightTable, weightTableIdx, entry.pc, pred.i_weight_scale_diff,
(pred.i_percsum >= 0) == actual_taken);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cpu/pred/btb/btb_mgsc.cc` around lines 937 - 940, The IMHist table is
being updated regardless of the feature flag; guard updates with the
enableIMHistTable check so IMHist writes are skipped when disabled.
Specifically, wrap the call to updatePredTable(imHistTable, pred.imHistIndex,
imHistTableNum, entry.pc, actual_taken) and any related updates (e.g.,
updateWeightTable calls that use pred.imHistIndex or imHistTableNum) in a
conditional that tests enableIMHistTable before invoking them, leaving existing
iTable updates (updatePredTable(iTable, ...), updateWeightTable(iWeightTable,
...)) unchanged.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

🚀 Coremark Smoke Test Results

Branch IPC Change
Base (xs-dev) 2.3130 -
This PR 2.3123 📉 -0.0008 (-0.03%)

✅ Difftest smoke test passed!

@jensen-yan
Copy link
Copy Markdown
Collaborator Author

可以,把它讲成三层就很清楚。

1. 原来的 IMLI
旧的 I 表大概只表达:

当前分支 PC + 当前 loop 位置 imliCount

imliCount 可以理解为“连续 backward-taken 的次数”,也就是 inner loop 里大概走到第几个位置。比如一个 loop 里每轮都有同一个热点分支:

for (outer = 0; outer < ...; outer++) {
    for (i = 0; i < 8; i++) {
        if (pattern depends on outer and i) ...
    }
}

旧 IMLI 能分开 i=0/1/2/3...,但如果同一个 i=3 在不同 outer phase 下答案不同,它就会 alias。

2. 新增的 imHist
新表补的是:

每个 imliCount 各自维护一段局部方向历史:imHist[imliCount]

所以预测时不再只是:

PC + imliCount

而是额外查一张 helper table:

PC + folded(imHist[imliCount])

最后它不是单独投票,而是加回 I family:

i_percsum = i_count_percsum + imHist_percsum

这是“给 IMLI 加 phase memory”,让同一个 loop iteration 能根据最近在这个 iteration 位置看到的分支结果分相位。

3. imHistStoreBits 的作用
imHistStoreBits 是每个 imHist[imliCount] bucket 里保留多少位历史。代码里会取所有 imHistHistLen 的最大值,然后生成 mask:

imHistStoreMask = (1 << imHistStoreBits) - 1

每次更新:

imHist[imliCount] = ((imHist[imliCount] << 1) | taken) & imHistStoreMask

所以它的作用就是“限制每个 bucket 只保留最近 N 次方向”,避免历史无限增长,同时保证后面 fold index 时有足够位数可用。

举个例子,假设:

imHistStoreBits = 4
imliCount = 3

同一个 loop 位置 3 过去看到的结果是:

T, N, T, T

那么 imHist[3] 可能就是二进制:

1011

下一次如果这个位置又看到 N,更新后变成:

0110

只保留最近 4 位。

一个具体例子
假设某个分支在 inner loop 第 3 次时:

outer 偶数轮: taken
outer 奇数轮: not-taken

旧 IMLI 看到的都是:

PC = X, imliCount = 3

它会把 taken 和 not-taken 混在一个 counter/table entry 里,最后学不稳。

新 imHist 会看到:

PC = X, imliCount = 3, imHist[3] = 1011  -> taken
PC = X, imliCount = 3, imHist[3] = 0100  -> not-taken

这样同一个 imliCount=3imHist[3] 再拆成不同 phase,helper table 就能学出两个方向。

对应代码位置大概是:

一句话总结:旧 IMLI 是“loop 位置预测”,新增 imHist 后变成“loop 位置 + 该位置自己的历史相位预测”。这就是为什么它对 imli_phase_shift 这种测试特别有效。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants