cpu-o3: btbtage two slot pre entry#832
Conversation
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 54 minutes and 46 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughSlot-aware TAGE: prediction, lookup, update, and allocation are rewritten to track per-entry slots (positioned by block-relative position). Tag computation no longer XORs branch position. Trace/schema extended with slot fields. Tests and helper APIs updated; several function signatures changed to return allocated slot information. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
[Generated by GEM5 Performance Robot] Align BTB PerformanceOverall Score
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/cpu/pred/btb/btb_tage.cc`:
- Around line 547-556: syncEntryLegacyMirror currently updates only
counter/useful and must also mirror the slot-0 PC per the compatibility
contract; update BTBTAGE::syncEntryLegacyMirror to set entry.pc =
entry.slots[0].pc when entry.slots[0].valid (and set entry.pc to a defined
invalid/sentinel value when not valid, e.g., 0 or the project's INVALID_PC) so
legacy readers observe the same PC as slot 0; update any tests that relied on
manually rewriting entry.pc to validate the helper.
- Around line 789-824: The code currently weakens a non-useful strong slot in
same_entry and then returns false, which stops the outer loop over tables and
prevents probing higher-history tables; change that final return so the
allocator continues probing later tables: after adjusting cand_slot.counter,
call sortEntrySlotsByPosition(same_entry), syncEntryLegacyMirror(same_entry),
noteAllocationFailure(), and then continue the outer for (unsigned ti =
start_table; ti < numPredictors; ++ti) loop instead of returning false (i.e.,
replace the final "return false;" with a control flow that continues to the next
ti), keeping the existing updates to tageStats and allocation tracking only
where appropriate.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 94de6458-6410-4e28-a096-98a0a5b5f8ce
📒 Files selected for processing (7)
src/cpu/pred/btb/btb_tage.ccsrc/cpu/pred/btb/btb_tage.hhsrc/cpu/pred/btb/btb_tage_ub.ccsrc/cpu/pred/btb/common.hhsrc/cpu/pred/btb/docs/btb_tage.mdsrc/cpu/pred/btb/microtage.ccsrc/cpu/pred/btb/test/btb_tage.test.cc
| BTBTAGE::syncEntryLegacyMirror(TageEntry &entry) | ||
| { | ||
| if (entry.slots[0].valid) { | ||
| entry.counter = entry.slots[0].counter; | ||
| entry.useful = entry.slots[0].useful; | ||
| } else { | ||
| entry.counter = 0; | ||
| entry.useful = false; | ||
| } | ||
| } |
There was a problem hiding this comment.
syncEntryLegacyMirror() no longer syncs the mirrored pc.
The compatibility contract in src/cpu/pred/btb/btb_tage.hh says counter/useful/pc still mirror slots[0], but this helper only refreshes counter/useful. After sortEntrySlotsByPosition() moves a different branch into slot 0, entry.pc stays on the old branch, so any remaining legacy reader will attribute the entry to the wrong PC. The new test helper in src/cpu/pred/btb/test/btb_tage.test.cc:108-113 having to rewrite entry.pc manually is a symptom of the same gap.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/cpu/pred/btb/btb_tage.cc` around lines 547 - 556, syncEntryLegacyMirror
currently updates only counter/useful and must also mirror the slot-0 PC per the
compatibility contract; update BTBTAGE::syncEntryLegacyMirror to set entry.pc =
entry.slots[0].pc when entry.slots[0].valid (and set entry.pc to a defined
invalid/sentinel value when not valid, e.g., 0 or the project's INVALID_PC) so
legacy readers observe the same PC as slot 0; update any tests that relied on
manually rewriting entry.pc to validate the helper.
| int replace_slot = -1; | ||
| for (unsigned slot = 0; slot < same_entry.slots.size(); ++slot) { | ||
| const auto &cand_slot = same_entry.slots[slot]; | ||
| if (!cand_slot.useful && isWeakishCounter(cand_slot.counter)) { | ||
| replace_slot = static_cast<int>(slot); | ||
| break; | ||
| } | ||
| } | ||
| if (replace_slot >= 0) { | ||
| // same-tag + full slots + replaceable slot: replace. | ||
| same_entry.slots[replace_slot] = TageSlot(true, position, initCounter, false); | ||
| sortEntrySlotsByPosition(same_entry); | ||
| syncEntryLegacyMirror(same_entry); | ||
| tageStats.updateAllocSuccess++; | ||
| allocated_table = ti; | ||
| allocated_index = newIndex; | ||
| allocated_way = same_tag_way; | ||
| allocated_slot = findSlotByPosition(same_entry, position); | ||
| return true; | ||
| } | ||
|
|
||
| // same-tag + full slots + no replaceable slot: weaken one non-useful strong slot. | ||
| for (auto &cand_slot : same_entry.slots) { | ||
| if (!cand_slot.useful && !isWeakishCounter(cand_slot.counter)) { | ||
| if (cand_slot.counter > 0) { | ||
| cand_slot.counter--; | ||
| } else { | ||
| cand_slot.counter++; | ||
| } | ||
| break; | ||
| } | ||
| } | ||
| sortEntrySlotsByPosition(same_entry); | ||
| syncEntryLegacyMirror(same_entry); | ||
| noteAllocationFailure(); | ||
| return false; |
There was a problem hiding this comment.
Keep probing higher tables after a full same-tag entry can't accept the new slot.
This path weakens the local slot and then returns immediately, which skips the rest of the for (unsigned ti = start_table; ti < numPredictors; ++ti) search. In practice, one congested shared entry in table ti can prevent allocation in every longer-history table above it, even though those later tables are still valid allocation targets.
Possible fix
// same-tag + full slots + no replaceable slot: weaken one non-useful strong slot.
for (auto &cand_slot : same_entry.slots) {
if (!cand_slot.useful && !isWeakishCounter(cand_slot.counter)) {
if (cand_slot.counter > 0) {
cand_slot.counter--;
} else {
cand_slot.counter++;
}
break;
}
}
sortEntrySlotsByPosition(same_entry);
syncEntryLegacyMirror(same_entry);
noteAllocationFailure();
- return false;
+ continue;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| int replace_slot = -1; | |
| for (unsigned slot = 0; slot < same_entry.slots.size(); ++slot) { | |
| const auto &cand_slot = same_entry.slots[slot]; | |
| if (!cand_slot.useful && isWeakishCounter(cand_slot.counter)) { | |
| replace_slot = static_cast<int>(slot); | |
| break; | |
| } | |
| } | |
| if (replace_slot >= 0) { | |
| // same-tag + full slots + replaceable slot: replace. | |
| same_entry.slots[replace_slot] = TageSlot(true, position, initCounter, false); | |
| sortEntrySlotsByPosition(same_entry); | |
| syncEntryLegacyMirror(same_entry); | |
| tageStats.updateAllocSuccess++; | |
| allocated_table = ti; | |
| allocated_index = newIndex; | |
| allocated_way = same_tag_way; | |
| allocated_slot = findSlotByPosition(same_entry, position); | |
| return true; | |
| } | |
| // same-tag + full slots + no replaceable slot: weaken one non-useful strong slot. | |
| for (auto &cand_slot : same_entry.slots) { | |
| if (!cand_slot.useful && !isWeakishCounter(cand_slot.counter)) { | |
| if (cand_slot.counter > 0) { | |
| cand_slot.counter--; | |
| } else { | |
| cand_slot.counter++; | |
| } | |
| break; | |
| } | |
| } | |
| sortEntrySlotsByPosition(same_entry); | |
| syncEntryLegacyMirror(same_entry); | |
| noteAllocationFailure(); | |
| return false; | |
| int replace_slot = -1; | |
| for (unsigned slot = 0; slot < same_entry.slots.size(); ++slot) { | |
| const auto &cand_slot = same_entry.slots[slot]; | |
| if (!cand_slot.useful && isWeakishCounter(cand_slot.counter)) { | |
| replace_slot = static_cast<int>(slot); | |
| break; | |
| } | |
| } | |
| if (replace_slot >= 0) { | |
| // same-tag + full slots + replaceable slot: replace. | |
| same_entry.slots[replace_slot] = TageSlot(true, position, initCounter, false); | |
| sortEntrySlotsByPosition(same_entry); | |
| syncEntryLegacyMirror(same_entry); | |
| tageStats.updateAllocSuccess++; | |
| allocated_table = ti; | |
| allocated_index = newIndex; | |
| allocated_way = same_tag_way; | |
| allocated_slot = findSlotByPosition(same_entry, position); | |
| return true; | |
| } | |
| // same-tag + full slots + no replaceable slot: weaken one non-useful strong slot. | |
| for (auto &cand_slot : same_entry.slots) { | |
| if (!cand_slot.useful && !isWeakishCounter(cand_slot.counter)) { | |
| if (cand_slot.counter > 0) { | |
| cand_slot.counter--; | |
| } else { | |
| cand_slot.counter++; | |
| } | |
| break; | |
| } | |
| } | |
| sortEntrySlotsByPosition(same_entry); | |
| syncEntryLegacyMirror(same_entry); | |
| noteAllocationFailure(); | |
| continue; |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/cpu/pred/btb/btb_tage.cc` around lines 789 - 824, The code currently
weakens a non-useful strong slot in same_entry and then returns false, which
stops the outer loop over tables and prevents probing higher-history tables;
change that final return so the allocator continues probing later tables: after
adjusting cand_slot.counter, call sortEntrySlotsByPosition(same_entry),
syncEntryLegacyMirror(same_entry), noteAllocationFailure(), and then continue
the outer for (unsigned ti = start_table; ti < numPredictors; ++ti) loop instead
of returning false (i.e., replace the final "return false;" with a control flow
that continues to the next ti), keeping the existing updates to tageStats and
allocation tracking only where appropriate.
There was a problem hiding this comment.
♻️ Duplicate comments (2)
src/cpu/pred/btb/btb_tage.cc (2)
588-598:⚠️ Potential issue | 🟠 Major
syncEntryLegacyMirror()still leaves the mirroredpcstale.After
sortEntrySlotsByPosition()moves a different branch intoslots[0], this helper refreshes onlycounter/useful. Any remaining legacy reader usingentry.pcwill now observe the wrong branch.Suggested change
void BTBTAGE::syncEntryLegacyMirror(TageEntry &entry) { if (entry.slots[0].valid) { entry.counter = entry.slots[0].counter; entry.useful = entry.slots[0].useful; + entry.pc = entry.ownerBlockBase + + (static_cast<Addr>(entry.slots[0].position) << instShiftAmt); } else { entry.counter = 0; entry.useful = false; + entry.pc = 0; } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cpu/pred/btb/btb_tage.cc` around lines 588 - 598, syncEntryLegacyMirror updates only counter/useful after sortEntrySlotsByPosition may have moved a different branch into slots[0], leaving entry.pc stale; modify syncEntryLegacyMirror (operating on TageEntry &entry) to also refresh the mirrored pc from entry.slots[0].pc when entry.slots[0].valid, and set entry.pc to a safe default/invalid value when not valid so legacy readers see the correct branch identity.
1012-1014:⚠️ Potential issue | 🟠 MajorKeep probing higher tables after a blocked same-tag entry.
This
return falseexits the outer allocation loop immediately after weakening the current table. A congested table attican therefore prevent allocation in every longer-history table above it.Suggested change
weakenFirstNonUsefulStrongSlot(ti, newIndex); noteAllocationFailure(); - return false; + continue;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cpu/pred/btb/btb_tage.cc` around lines 1012 - 1014, The early "return false" after calling weakenFirstNonUsefulStrongSlot(ti, newIndex) and noteAllocationFailure() prematurely exits the outer allocation loop and prevents probing higher-history tables; change the control flow so that instead of returning, the code continues probing upper tables (e.g., replace the return false with a continue of the allocation loop or otherwise skip to the next table index), ensuring weakenFirstNonUsefulStrongSlot and noteAllocationFailure are still invoked but allocation attempts proceed for larger ti values.
🧹 Nitpick comments (1)
src/cpu/pred/btb/btb_tage.hh (1)
130-132: MakeTageTableInfo::taken()return false on misses.
slotInfodefaults tocounter == 0, so a default-constructed miss currently reportstaken() == true. Most call sites guard onfound, but baking that check into the helper avoids another subtle misuse later.Suggested change
bool taken() const { - return slotInfo.taken(); + return found && slotInfo.valid && slotInfo.taken(); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cpu/pred/btb/btb_tage.hh` around lines 130 - 132, TageTableInfo::taken() currently returns slotInfo.taken() even for misses (default slotInfo.counter==0), so change taken() to return false when this entry isn't valid by checking the `found` flag first (e.g., if (!found) return false;), otherwise return slotInfo.taken(); update the TageTableInfo::taken() implementation to reference `found` and `slotInfo.taken()` so callers no longer need to guard on `found`.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@src/cpu/pred/btb/btb_tage.cc`:
- Around line 588-598: syncEntryLegacyMirror updates only counter/useful after
sortEntrySlotsByPosition may have moved a different branch into slots[0],
leaving entry.pc stale; modify syncEntryLegacyMirror (operating on TageEntry
&entry) to also refresh the mirrored pc from entry.slots[0].pc when
entry.slots[0].valid, and set entry.pc to a safe default/invalid value when not
valid so legacy readers see the correct branch identity.
- Around line 1012-1014: The early "return false" after calling
weakenFirstNonUsefulStrongSlot(ti, newIndex) and noteAllocationFailure()
prematurely exits the outer allocation loop and prevents probing higher-history
tables; change the control flow so that instead of returning, the code continues
probing upper tables (e.g., replace the return false with a continue of the
allocation loop or otherwise skip to the next table index), ensuring
weakenFirstNonUsefulStrongSlot and noteAllocationFailure are still invoked but
allocation attempts proceed for larger ti values.
---
Nitpick comments:
In `@src/cpu/pred/btb/btb_tage.hh`:
- Around line 130-132: TageTableInfo::taken() currently returns slotInfo.taken()
even for misses (default slotInfo.counter==0), so change taken() to return false
when this entry isn't valid by checking the `found` flag first (e.g., if
(!found) return false;), otherwise return slotInfo.taken(); update the
TageTableInfo::taken() implementation to reference `found` and
`slotInfo.taken()` so callers no longer need to guard on `found`.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: cc0f0afd-4086-423a-afcb-4aec26aa3263
📒 Files selected for processing (5)
src/cpu/pred/btb/btb_tage.ccsrc/cpu/pred/btb/btb_tage.hhsrc/cpu/pred/btb/common.hhsrc/cpu/pred/btb/microtage.ccsrc/cpu/pred/btb/test/btb_tage.test.cc
🚧 Files skipped from review as they are similar to previous changes (2)
- src/cpu/pred/btb/microtage.cc
- src/cpu/pred/btb/test/btb_tage.test.cc
|
[Generated by GEM5 Performance Robot] Align BTB PerformanceOverall Score
[Generated by GEM5 Performance Robot] Align BTB PerformanceOverall Score
|
Change-Id: I54fe1b814015c69233d5944f89e67bfa177886fb
There was a problem hiding this comment.
🧹 Nitpick comments (2)
src/cpu/pred/btb/btb_tage.cc (1)
515-525: Document the "weakish" counter threshold.The formula
std::abs(counter * 2 + 1) <= 3classifies counters -2, -1, 0, 1 as "weakish" (not saturated or near-saturated). Consider adding a brief comment explaining this threshold choice for future maintainers.📝 Suggested documentation
bool BTBTAGE::isWeakishCounter(short counter) const { + // "Weakish" = counters -2, -1, 0, 1 (not saturated or near-saturated) + // Formula maps [-2,1] to |val| <= 3, while [-4,-3] and [2,3] map to |val| >= 5 return std::abs(counter * 2 + 1) <= 3; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cpu/pred/btb/btb_tage.cc` around lines 515 - 525, Add a short inline comment above BTBTAGE::isWeakishCounter explaining the threshold choice: clarify that the expression std::abs(counter * 2 + 1) <= 3 is used to detect "weakish" 2-bit-ish signed counters (it evaluates true for counter values -2, -1, 0, 1 which are not saturated/strongly biased), and note that isSlotUnprotected treats slots with such counters (and non-useful) as unprotected; update comments near BTBTAGE::isWeakishCounter and BTBTAGE::isSlotUnprotected to document this rationale for future maintainers.src/cpu/pred/btb/test/btb_tage.test.cc (1)
964-1026: Consider adding test teardown to prevent memory leaks.Multiple tests (e.g., this one at line 965) create new
BTBTAGEinstances without deleting the previous one assigned totage. While acceptable for short-lived test processes, this pattern could accumulate significant memory in large test suites.♻️ Suggested pattern to avoid leaks
TEST_F(BTBTAGETest, DifferentTagWholeEntryEvictionRequiresAllSlotsUnprotected) { + delete tage; tage = new BTBTAGE(1, 2, 32);Or consider using smart pointers in the test fixture:
std::unique_ptr<BTBTAGE> tage;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cpu/pred/btb/test/btb_tage.test.cc` around lines 964 - 1026, The test DifferentTagWholeEntryEvictionRequiresAllSlotsUnprotected (and others) repeatedly does tage = new BTBTAGE(...) without freeing the previous BTBTAGE, causing memory leaks; fix by ensuring the fixture-owned tage is freed between tests—either change the fixture member to std::unique_ptr<BTBTAGE> tage and replace raw new with std::make_unique, or add a TearDown/cleanup that deletes the existing tage before reassigning (or at test end); update all usages of the tage pointer accordingly so the BTBTAGE instance is properly destroyed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/cpu/pred/btb/btb_tage.cc`:
- Around line 515-525: Add a short inline comment above
BTBTAGE::isWeakishCounter explaining the threshold choice: clarify that the
expression std::abs(counter * 2 + 1) <= 3 is used to detect "weakish" 2-bit-ish
signed counters (it evaluates true for counter values -2, -1, 0, 1 which are not
saturated/strongly biased), and note that isSlotUnprotected treats slots with
such counters (and non-useful) as unprotected; update comments near
BTBTAGE::isWeakishCounter and BTBTAGE::isSlotUnprotected to document this
rationale for future maintainers.
In `@src/cpu/pred/btb/test/btb_tage.test.cc`:
- Around line 964-1026: The test
DifferentTagWholeEntryEvictionRequiresAllSlotsUnprotected (and others)
repeatedly does tage = new BTBTAGE(...) without freeing the previous BTBTAGE,
causing memory leaks; fix by ensuring the fixture-owned tage is freed between
tests—either change the fixture member to std::unique_ptr<BTBTAGE> tage and
replace raw new with std::make_unique, or add a TearDown/cleanup that deletes
the existing tage before reassigning (or at test end); update all usages of the
tage pointer accordingly so the BTBTAGE instance is properly destroyed.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4b0f074f-9c0b-4eb2-aec9-58219c831da4
📒 Files selected for processing (3)
src/cpu/pred/btb/btb_tage.ccsrc/cpu/pred/btb/btb_tage.hhsrc/cpu/pred/btb/test/btb_tage.test.cc
|
[Generated by GEM5 Performance Robot] Align BTB PerformanceOverall Score
[Generated by GEM5 Performance Robot] Align BTB PerformanceOverall Score
|
Change-Id: I731abc26584eaeb7d275beeeac8264836e4c0a1f
|
[Generated by GEM5 Performance Robot] Align BTB PerformanceOverall Score
[Generated by GEM5 Performance Robot] Align BTB PerformanceOverall Score
|
Summary by CodeRabbit