Skip to content

Larger tage align#846

Open
jensen-yan wants to merge 4 commits into
xs-devfrom
larger-tage-align
Open

Larger tage align#846
jensen-yan wants to merge 4 commits into
xs-devfrom
larger-tage-align

Conversation

@jensen-yan
Copy link
Copy Markdown
Collaborator

@jensen-yan jensen-yan commented Apr 28, 2026

Summary by CodeRabbit

Release Notes

  • New Features
    • Added loop predictor override capability to force loop predictions when valid
    • Added shadow overflow tables for extended branch prediction capacity
    • Enabled row-bundle mode for branch predictor table organization
    • Introduced position-based index mixing for enhanced table indexing
    • Expanded TAGE branch predictor associativity configuration options

Change-Id: Ia31f46f482243e08f5052ff3f7a4cabe212c4626
Change-Id: Idfc615700d8b7e216e3520efa293d00f11c9a19c
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 28, 2026

Warning

Rate limit exceeded

@jensen-yan has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 46 minutes and 46 seconds before requesting another review.

To continue reviewing without waiting, purchase usage credits in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 87f21553-761f-4418-87c4-5c450bae8dd9

📥 Commits

Reviewing files that changed from the base of the PR and between 506afd1 and 9d8ea49.

📒 Files selected for processing (4)
  • configs/example/kmhv3.py
  • src/cpu/pred/BranchPredictor.py
  • src/cpu/pred/btb/btb_tage.cc
  • src/cpu/pred/btb/btb_tage.hh
📝 Walkthrough

Walkthrough

The PR extends the BTBTAGE branch predictor with loop predictor integration, shadow overflow tables for sparse entries, row-bundle mode for TAGE tables, and position-aware indexing. New configuration parameters control these features, and the kmhv3 example configuration is updated to enable row-bundle tables.

Changes

Cohort / File(s) Summary
Configuration Setup
configs/example/kmhv3.py
Enables row-bundle support with explicit TAGE table configuration by setting rowBundleTables parameter in the DecoupledBPUWithBTB branch predictor setup.
Parameter Definitions
src/cpu/pred/BranchPredictor.py
Adds forceLoopUse parameter to LoopPredictor and extends BTBTAGE with 9 new parameters for loop predictor integration, position-aware indexing, shadow overflow tables, and row-bundle support.
Loop Predictor Implementation
src/cpu/pred/loop_predictor.cc, src/cpu/pred/loop_predictor.hh
Adds forceLoopUse configuration to allow loop predictions to override normal control flow when valid, and initializes this parameter in the predictor constructor.
Core Predictor Data Structures
src/cpu/pred/btb/btb_tage.hh
Extends TageEntry and TageTableInfo with row-bundle subentry support; adds shadowTageTable storage; introduces getTageIndex overload and getShadowIndex helper; integrates loop predictor metadata via TageMeta with LoopPredictionMeta; adds configuration fields for position mixing, shadow overflow, and row-bundle modes; expands statistics tracking.
Core Predictor Logic
src/cpu/pred/btb/btb_tage.cc
Implements loop predictor lookup/override path with dedicated statistics; adds shadow overflow table search on miss and allocation tracking; introduces row-bundle secondary subentry handling with position-disambiguated matching; adds position mixing in index generation; updates counter/useful access through new TageTableInfo accessors; expands statistics suite for new features.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~65 minutes

Possibly related PRs

Suggested labels

perf, align-kmhv3

Suggested reviewers

  • CJ362ff
  • Yakkhini
  • tastynoob

Poem

🐰 Hop, hop, the TAGE grows!
With shadows deep and row-bundles in rows,
Loop predictors now override with flair,
Position-aware indexing fills the air!
KMHv3 stands taller, enhanced with care

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 36.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'Larger tage align' is vague and does not clearly convey the substantial changes made to branch predictor functionality. Consider using a more descriptive title that summarizes the primary changes, such as 'Add loop predictor override and shadow overflow support to BTBTAGE' or 'Extend BTBTAGE with row-bundle and loop predictor integration'.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch larger-tage-align

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/cpu/pred/btb/btb_tage.cc`:
- Around line 386-406: The shadow-table match/allocate path currently compares
only tag and thus ignores row-bundle's position-specific subentries when both
enableShadowOverflow and enableRowBundle are enabled; update the shadow logic
(in the code around shadowTageTable lookup, getShadowIndex, getShadowNumWays,
matching_entry, matching_way, matching_subentry, and index assignment) to mirror
the main-table position/subentry handling: compute the correct subentry for
row-bundle entries, include position in the equality/matching check (or use the
same getSubentry/getPosition logic used by the main table), and set
matching_subentry accordingly so updates target the correct sub-slot;
alternatively, add an explicit guard that rejects/avoids using the shadow path
when enableRowBundle && enableShadowOverflow to prevent incorrect aliasing.
- Around line 1158-1172: The code calls loopPredictor->updateStats(...) for
every conditional branch that has loop metadata even when the loop predictor
didn't provide a prediction; change the logic in the block that handles
predMeta->loopPreds (the loop_it handling) to only call
LoopPredictor::updateStats(...) when loop_it->second.loopInfo->loopPredUsed is
true (i.e., move or guard updateStats behind the loopPredUsed check that
increments tageStats.loopPredUsedCorrect/Wrong), while still calling
loopPredictor->condBranchUpdate(...) as before; use the existing symbols
(enableLoopPredictor, has_original_pred, predMeta->loopPreds, loop_it, loopInfo,
loopPredUsed, updateStats, condBranchUpdate, instShiftAmt) to locate and modify
the code.
- Around line 497-514: The code overwrites the original TAGE prediction in
pred.taken before saving the pre-override direction, causing
LoopPredictor::loopUpdate() (when useDirectionBit is enabled) to see the
post-override value and mis-train for BTBTAGE; preserve the original TAGE
direction by storing the initial TAGE result (tage_pred_taken) into
loop_info->predTaken before calling loopPredict or otherwise ensure
meta->loopPreds[btb_entry.pc] receives the pre-override direction
(tage_pred_taken) along with the loop info; update the block around loopPredict,
pred.taken, loop_info->predTaken, and meta->loopPreds so loop_info->predTaken
contains the TAGE direction used before loopPredict flips pred.taken.
- Around line 1704-1715: The stats increments for shadowMainProvider,
shadowAltProvider, rowBundleSecondaryMainProvider and
rowBundleSecondaryAltProvider in updateStatsWithTagePrediction() are executed
both at lookup and at resolved update, causing double-counting; guard these
increments with the lookup-only condition by checking the when_pred variable
(e.g. only run them when when_pred indicates the initial prediction/lookup
phase) and leave other stats updates unchanged—locate the block using
pred.mainInfo / pred.altInfo and wrap the four increments in an if (when_pred ==
<lookup-phase>) { ... } using the existing when_pred enum/constant in this file.

In `@src/cpu/pred/loop_predictor.cc`:
- Around line 292-295: The branch info flag bi->loopPredUsed can remain stale
when the loop predictor is valid-but-not-used (or invalid), so reset it to false
before the override check: set bi->loopPredUsed = false prior to the if that
checks (forceLoopUse || loopUseCounter >= 0) && bi->loopPredValid, then keep the
existing assignment bi->loopPredUsed = true inside that block so the flag
correctly reflects whether the loop prediction was actually used; reference
symbols: bi->loopPredUsed, bi->loopPredValid, forceLoopUse, loopUseCounter,
pred_taken.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d1b3fc58-e287-47e9-8d12-c7b0e377e5bd

📥 Commits

Reviewing files that changed from the base of the PR and between b813b57 and 506afd1.

📒 Files selected for processing (6)
  • configs/example/kmhv3.py
  • src/cpu/pred/BranchPredictor.py
  • src/cpu/pred/btb/btb_tage.cc
  • src/cpu/pred/btb/btb_tage.hh
  • src/cpu/pred/loop_predictor.cc
  • src/cpu/pred/loop_predictor.hh

Comment thread src/cpu/pred/btb/btb_tage.cc
Comment thread src/cpu/pred/btb/btb_tage.cc
Comment thread src/cpu/pred/btb/btb_tage.cc
Comment on lines +1704 to +1715
if (pred.mainInfo.found && pred.mainInfo.inShadow) {
shadowMainProvider++;
}
if (pred.altInfo.found && pred.altInfo.inShadow) {
shadowAltProvider++;
}
if (pred.mainInfo.found && pred.mainInfo.subentry == 1) {
rowBundleSecondaryMainProvider++;
}
if (pred.altInfo.found && pred.altInfo.subentry == 1) {
rowBundleSecondaryAltProvider++;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Gate the new provider-shape stats on when_pred.

updateStatsWithTagePrediction() runs during lookup and again during resolved update. These increments happen before the when_pred split, so shadowMainProvider, shadowAltProvider, and the row-bundle secondary-provider stats are double-counted.

Suggested fix
-    if (pred.mainInfo.found && pred.mainInfo.inShadow) {
-        shadowMainProvider++;
-    }
-    if (pred.altInfo.found && pred.altInfo.inShadow) {
-        shadowAltProvider++;
-    }
-    if (pred.mainInfo.found && pred.mainInfo.subentry == 1) {
-        rowBundleSecondaryMainProvider++;
-    }
-    if (pred.altInfo.found && pred.altInfo.subentry == 1) {
-        rowBundleSecondaryAltProvider++;
-    }
     if (when_pred) {
+        if (pred.mainInfo.found && pred.mainInfo.inShadow) {
+            shadowMainProvider++;
+        }
+        if (pred.altInfo.found && pred.altInfo.inShadow) {
+            shadowAltProvider++;
+        }
+        if (pred.mainInfo.found && pred.mainInfo.subentry == 1) {
+            rowBundleSecondaryMainProvider++;
+        }
+        if (pred.altInfo.found && pred.altInfo.subentry == 1) {
+            rowBundleSecondaryAltProvider++;
+        }
         if (hit) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/pred/btb/btb_tage.cc` around lines 1704 - 1715, The stats increments
for shadowMainProvider, shadowAltProvider, rowBundleSecondaryMainProvider and
rowBundleSecondaryAltProvider in updateStatsWithTagePrediction() are executed
both at lookup and at resolved update, causing double-counting; guard these
increments with the lookup-only condition by checking the when_pred variable
(e.g. only run them when when_pred indicates the initial prediction/lookup
phase) and leave other stats updates unchanged—locate the block using
pred.mainInfo / pred.altInfo and wrap the four increments in an if (when_pred ==
<lookup-phase>) { ... } using the existing when_pred enum/constant in this file.

Comment on lines +292 to 295
if ((forceLoopUse || loopUseCounter >= 0) && bi->loopPredValid) {
pred_taken = bi->loopPred;
bi->loopPredUsed = true;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reset loopPredUsed before the override check.

This path only writes bi->loopPredUsed on success. When the loop predictor is valid-but-not-used, or invalid, the flag keeps its prior/uninitialized value, and BTBTAGE now reads it for stats/update decisions.

Suggested fix
     bool pred_taken = prev_pred_taken;
     if (cond_branch) {
         // loop prediction
         bi->loopPred = getLoop(branch_pc, bi, useSpeculation, instShiftAmt);
+        bi->loopPredUsed = false;
 
         if ((forceLoopUse || loopUseCounter >= 0) && bi->loopPredValid) {
             pred_taken = bi->loopPred;
             bi->loopPredUsed = true;
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cpu/pred/loop_predictor.cc` around lines 292 - 295, The branch info flag
bi->loopPredUsed can remain stale when the loop predictor is valid-but-not-used
(or invalid), so reset it to false before the override check: set
bi->loopPredUsed = false prior to the if that checks (forceLoopUse ||
loopUseCounter >= 0) && bi->loopPredValid, then keep the existing assignment
bi->loopPredUsed = true inside that block so the flag correctly reflects whether
the loop prediction was actually used; reference symbols: bi->loopPredUsed,
bi->loopPredValid, forceLoopUse, loopUseCounter, pred_taken.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 506afd1505

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

tageStats.updateStatsWithTagePrediction(pred, true);
results.push_back({btb_entry.pc, pred.taken || btb_entry.alwaysTaken});
tageInfoForMgscs[btb_entry.pc].tage_pred_taken = pred.taken;
tageInfoForMgscs[btb_entry.pc].tage_pred_taken = tage_pred_taken;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Feed loop-adjusted direction into MGSC handoff

This line passes the pre-loop TAGE result to MGSC (tage_pred_taken captured before loopPredict), so when MGSC is enabled its fallback path (src/cpu/pred/btb/btb_mgsc.cc, taken = ... ? ... : tage_info.tage_pred_taken) ignores any loop-predictor override. In practice, enabling BTBTAGE.enableLoopPredictor will not affect final branch direction whenever MGSC does not select SC, because MGSC rewrites condTakens using this stale value.

Useful? React with 👍 / 👎.

const unsigned shadowWays = getShadowNumWays(i);
for (unsigned way = 0; way < shadowWays; way++) {
auto &entry = shadowTageTable[i][shadowIndex][way];
if (entry.valid && tag == entry.tag) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Disambiguate shadow hits by branch position

Shadow lookup matches only on tag, but row-bundle mode removes position from the tag (tagPosition = 0 for bundled tables). If enableRowBundle and enableShadowOverflow are both used on the same tables, different branch positions in the same fetch block can alias to the same shadow tag and be treated as a hit, producing wrong directions and updating the wrong shadow entry.

Useful? React with 👍 / 👎.

@XiangShanRobot
Copy link
Copy Markdown

[Generated by GEM5 Performance Robot]
commit: 506afd1
workflow: gem5 Align BTB Performance Test(0.3c)

Align BTB Performance

Overall Score

PR Master Diff(%)
Score 18.82 18.72 +0.54 🟢

1 similar comment
@XiangShanRobot
Copy link
Copy Markdown

[Generated by GEM5 Performance Robot]
commit: 506afd1
workflow: gem5 Align BTB Performance Test(0.3c)

Align BTB Performance

Overall Score

PR Master Diff(%)
Score 18.82 18.72 +0.54 🟢

jensen-yan added 2 commits May 7, 2026 18:08
Change-Id: I135a9fab8186bc4d41be1cc6aca0e8d0bfa9da2d
Change-Id: Ie46db37ec481b93f9735bf8606304f17c9059655
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9d8ea492aa

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +396 to +397
const Addr shadowIndex = getShadowIndex(
index, i, position, startPC, indexFoldedHist[i].get());
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use snapshot history for shadow lookup

When generateSinglePrediction() is called with predMeta on the update/recompute path, the main index and tag above use the prediction-time folded histories, but this shadow lookup still hashes the live indexFoldedHist. With enableShadowOverflow plus useAltHashForShadowIndex, any speculative history change between lookup and update makes recomputation probe a different shadow set, so shadow providers are missed and the update can train/allocate as if no provider existed.

Useful? React with 👍 / 👎.

Comment on lines +962 to +963
const Addr shadowIndex = getShadowIndex(
newIndex, ti, position, startPC, indexFoldedHist[ti].get());
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Allocate shadow entries with prediction history

This shadow allocation uses the live folded history even though the primary allocation just above is indexed with meta->indexFoldedHist. When useAltHashForShadowIndex is enabled, updates occur after history has advanced, so entries placed in the shadow table are keyed by update-time history and later predictions for the same branch/history will probe a different shadow index, effectively losing the allocation.

Useful? React with 👍 / 👎.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

🚀 Coremark Smoke Test Results

Branch IPC Change
Base (xs-dev) 2.3130 -
This PR 2.3237 📈 +0.0107 (+0.46%)

✅ Difftest smoke test passed!

@XiangShanRobot
Copy link
Copy Markdown

[Generated by GEM5 Performance Robot]
commit: 9d8ea49
workflow: gem5 Align BTB Performance Test(0.3c)

Align BTB Performance

Overall Score

PR Master Diff(%)
Score 18.73 18.73 +0.01 🟢

[Generated by GEM5 Performance Robot]
commit: 9d8ea49
workflow: gem5 Align BTB Performance Test(0.3c)

Align BTB Performance

Overall Score

PR Previous Commit Diff(%)
Score 18.73 18.82 -0.49 🔴

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants