[Feature] Implement DMA support#293
Conversation
| s.mem_rd_req_val = OutPort() # dma_read_request_valid | ||
| s.mem_rd_req_rdy = InPort() # dma_read_request_ready | ||
| s.mem_rd_req_addr = OutPort(DmaDramAddrType) |
There was a problem hiding this comment.
Why can't we use the RecvIfcRTL and SendIfcRTL interfaces to connect the DmaRTL?
| s.data_mem.spm_dma_rval //= s.spm_dma_rval | ||
| s.data_mem.spm_dma_rrdy //= s.spm_dma_rrdy | ||
| s.data_mem.spm_dma_raddr //= s.spm_dma_raddr | ||
| s.data_mem.spm_dma_rresp_val //= s.spm_dma_rresp_val | ||
| s.data_mem.spm_dma_rresp_rdy //= s.spm_dma_rresp_rdy | ||
| s.data_mem.spm_dma_rresp_data //= s.spm_dma_rresp_data |
There was a problem hiding this comment.
As we discussed before, should we connect the dma to the controller (as intermediate interface/transition)? instead of directly connecting to data spm?
So then we can leverage
VectorCGRA/controller/ControllerRTL.py
Lines 138 to 139 in 44618d5
…RecvIfcRTL. Replace `mem` with `dram` for clarity.
|
Hi @tancheng @BenkangPeng , I summarized two direction of DMA design as below:
I prefer the second method but I think there are still some logic should be written in DataMemControllerRTL. WDTY? |
|
Hi @HobbitQia, option 2 looks good to me. Though I am not sure what logic should be additionally in |
| s.dma_cmd_val //= s.dma.dma_cmd_val | ||
| s.dma_cmd_rdy //= s.dma.dma_cmd_rdy | ||
| s.dma_cmd_opcode //= s.dma.dma_cmd_opcode | ||
| s.dma_cmd_dram_addr //= s.dma.dma_cmd_dram_addr | ||
| s.dma_cmd_spm_addr //= s.dma.dma_cmd_spm_addr | ||
| s.dma_cmd_bytes //= s.dma.dma_cmd_bytes | ||
| s.dma_cmd_tag //= s.dma.dma_cmd_tag | ||
|
|
||
| s.dma_done_val //= s.dma.dma_done_val | ||
| s.dma_done_rdy //= s.dma.dma_done_rdy | ||
| s.dma_done_tag //= s.dma.dma_done_tag | ||
|
|
||
| s.dram_rd_req //= s.dma.dram_rd_req | ||
| s.dram_rd_resp //= s.dma.dram_rd_resp | ||
|
|
||
| s.dram_wr_req_val //= s.dma.dram_wr_req_val | ||
| s.dram_wr_req_rdy //= s.dma.dram_wr_req_rdy | ||
| s.dram_wr_req_addr //= s.dma.dram_wr_req_addr | ||
| s.dram_wr_req_data //= s.dma.dram_wr_req_data | ||
| s.dram_wr_req_mask //= s.dma.dram_wr_req_mask | ||
|
|
||
| s.dram_wr_resp_val //= s.dma.dram_wr_resp_val | ||
| s.dram_wr_resp_rdy //= s.dma.dram_wr_resp_rdy | ||
|
|
||
| # DMA to SPM connections. | ||
|
|
||
| s.dma.spm_dma_wval //= s.cgra.spm_dma_wval | ||
| s.dma.spm_dma_wrdy //= s.cgra.spm_dma_wrdy | ||
| s.dma.spm_dma_waddr //= s.cgra.spm_dma_waddr | ||
| s.dma.spm_dma_wdata //= s.cgra.spm_dma_wdata | ||
| s.dma.spm_dma_wmask //= s.cgra.spm_dma_wmask | ||
|
|
||
| s.dma.spm_dma_rval //= s.cgra.spm_dma_rval | ||
| s.dma.spm_dma_rrdy //= s.cgra.spm_dma_rrdy | ||
| s.dma.spm_dma_raddr //= s.cgra.spm_dma_raddr | ||
| s.dma.spm_dma_rresp_val //= s.cgra.spm_dma_rresp_val | ||
| s.dma.spm_dma_rresp_rdy //= s.cgra.spm_dma_rresp_rdy | ||
| s.dma.spm_dma_rresp_data //= s.cgra.spm_dma_rresp_data |
There was a problem hiding this comment.
All these will change if we go with @HobbitQia's option 2, right?
Moreover, can we use send/recv interfaces, and define msg (https://github.com/tancheng/VectorCGRA/blob/master/lib/messages.py) to encapsulate data, addr, or whatever needed as struct? So we don't need to declare so many ports, and explicitly connect each of them. This CGRA RTL shouldn't see these details, the data struct can be decomposed inside the submodule.
There was a problem hiding this comment.
All these will change if we go with @HobbitQia's option 2, right?
Moreover, can we use send/recv interfaces, and define msg (https://github.com/tancheng/VectorCGRA/blob/master/lib/messages.py) to encapsulate data, addr, or whatever needed as struct? So we don't need to declare so many ports, and explicitly connect each of them. This CGRA RTL shouldn't see these details, the data struct can be decomposed inside the submodule.
Yes.
I agree with we can use this helper functions or classes to define these ports. Initially I design the interface only from the side about how to connect the DMA engine and the chipyard so I defined these ports. But I believe these input/output ports can be used in our struct and wrapped by another adapter so that we can connect them. @BenkangPeng
I am thinking if we want enable the concurrent running of DMA and traditional load/store, then we need to multiplex the port of Data SPM and I think this logic can be implemented in |
I thought they are the same latency if we can distinguish the Adding logic inside the |
If the DMA data should go through the controller packet path, there may be extra latency of packeting, and there may be competitions between NoC/CPU/tile request to SPM? Or we have two separate paths in Controller? |


Related issue: coredac/CGRA-SoC#2
This PR introduces
CgraDmaRTLwhich integrates the CGRA with a DMA engine, enabling direct memory transfers between external DRAM(don't implement now) and the CGRA's dataSPM.