You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am working on modeling Predictive Store Forwarding (PSF) documented in AMD and Intel within gem5's O3 CPU model. I'm primarily targeting the RISC-V ISA for my research project.
During my implementation, I ran into some structural issues within the gem5 O3 pipeline that prevent PSF from working as intended. I wanted to ask if there are any ongoing efforts, open PRs, or future roadmap plans to modernize these specific pipeline behaviors:
MemDepUnit (store set predictor) Stalling Loads in the IQ:
When the Store Sets predictor predicts a load will alias with an older unresolved store, the MemDepUnit clears the load's CanIssue flag, trapping it in the inst queue. Because the load is not allowed to generate its Effective Address or enter the lsq, our predictor update logic (which relies on address matching in the LSQ at store execution) cannot verify the alias and build confidence. If my understanding of this is correct, I was wondering if there are any architectural plans to modernize the MemDepUnit so that predicted-dependent loads can issue, calculate their addresses, and stall inside the LSQ rather than being blocked at the IQ?
(Incidentally) Monolithic RISC-V Stores:
To trigger PSF , we need a store's data to be ready while its address calculation is heavily delayed. I could be wrong, but from my observation, in the current RISC-V ISA implementation, stores are monolithic. The inst queue gates the issue until both the address base register and data register are ready. Once issued, initiateAcc calculates the effective address and writes the data to the lsq in the same cycle. Are there any plans or undergoing efforts to decouple this process for RISCV?
We are trying to gauge the engineering effort required to implement these changes ourselves versus waiting for upstream updates. Any insights into whether these pipeline behaviors are slated for an overhaul would be greatly appreciated.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Hello gem5 community,
I am working on modeling Predictive Store Forwarding (PSF) documented in AMD and Intel within gem5's O3 CPU model. I'm primarily targeting the RISC-V ISA for my research project.
During my implementation, I ran into some structural issues within the gem5 O3 pipeline that prevent PSF from working as intended. I wanted to ask if there are any ongoing efforts, open PRs, or future roadmap plans to modernize these specific pipeline behaviors:
MemDepUnit (store set predictor) Stalling Loads in the IQ:
When the Store Sets predictor predicts a load will alias with an older unresolved store, the MemDepUnit clears the load's
CanIssueflag, trapping it in the inst queue. Because the load is not allowed to generate its Effective Address or enter the lsq, our predictor update logic (which relies on address matching in the LSQ at store execution) cannot verify the alias and build confidence. If my understanding of this is correct, I was wondering if there are any architectural plans to modernize the MemDepUnit so that predicted-dependent loads can issue, calculate their addresses, and stall inside the LSQ rather than being blocked at the IQ?(Incidentally) Monolithic RISC-V Stores:
To trigger PSF , we need a store's data to be ready while its address calculation is heavily delayed. I could be wrong, but from my observation, in the current RISC-V ISA implementation, stores are monolithic. The inst queue gates the issue until both the address base register and data register are ready. Once issued,
initiateAcccalculates the effective address and writes the data to the lsq in the same cycle. Are there any plans or undergoing efforts to decouple this process for RISCV?We are trying to gauge the engineering effort required to implement these changes ourselves versus waiting for upstream updates. Any insights into whether these pipeline behaviors are slated for an overhaul would be greatly appreciated.
Thank you for your time and guidance!
Beta Was this translation helpful? Give feedback.
All reactions