-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathmidReport.html
More file actions
241 lines (218 loc) · 17 KB
/
Copy pathmidReport.html
File metadata and controls
241 lines (218 loc) · 17 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>CS/ECE 570 Project</title>
<style>
body {
font-family: Arial, sans-serif;
line-height: 1.6;
margin: 80px;
padding-left: 100px;
padding-right: 100px;
background-color: rgb(247, 252, 252); /* Cream color */
}
h1, h2, h3 {
color: #000000;
}
h1 {
border-bottom: 2px solid #333;
padding-bottom: 10px;
}
p {
text-align: justify;
}
section {
margin-bottom: 20px;
padding-left: 100px;
padding-right: 100px;
padding-bottom: 20px;
padding-top: 20px;
background-color: rgb(187, 217, 227);
border-radius: 10px;
}
table, th, td {
border: 1px solid black;
border-collapse: collapse;
padding: 10px;
}
button {
font-size:20px;
background-color: rgb(187, 217, 227);
padding: 10px;
border-radius: 10px;
}
</style>
</head>
<body>
<h1><center>CS/ECE570 Project Mid-Report</center></h1>
<h2><center>Heterogeneous Computing in an AI Context</center></h2>
<h3>
<p><center>Amber Kahklen & Ninad Anklesaria</center></p>
<p><center>Winter 2024</center></p>
</h3>
<section><p>If you would like to view the project proposal, final report, or presentation slides please use the links below:</p>
<p>Note: Presentation slides will not be available until March 14th, 2024, thank you!</p>
<a href="proposal.html" title="Project Proposal"><button type="button">Project Proposal</button></a>
<a href="index.html" title="Project Final Report"><button type="button">Project Final Report</button></a>
<a href="Heterogeneous_Computing_in_an_AI_Context_1.pdf" title="Presentation Slides"><button type="button">Presentation Slides</button></a>
<p></p>
</section>
<section>
<h3><p><b>1 Introduction</b></p></h3>
<p> The advancement of artificial intelligence (AI) has increased the use and development of heterogeneous computing
with the search to find new ways to increase efficiency. We plan to conduct a survey that collects
information regarding heterogeneous computing in an AI context. Our project seeks to
gather information on the different ways artificial intelligence has influenced the increase and
development of heterogeneous computing. Alongside that, we are looking to gather information on how
AI is being used to optimize the use of the given architecture by allowing the AI to
assist in the scheduling or design of the system. We hope to gain information regarding the different
applications of AI in the optimization of these processes and the successes or
failures of these implementations. We chose this topic due to the increasing relevance of heterogeneous
computing and an increase in applications of AI with heterogeneous computing and its affect on the field of computer architecture.
Through the data collection done with the survey, this project will explore what heterogeneous computing
is and how it relates to AI and explore the adoption, challenges, and benefits of
heterogeneous computing in AI applications. We plan to ensure the survey includes
information from both academic research and industry applications to gain a well-rounded understanding.
Overall, this project hopes to collect information regarding heterogeneous computing in an AI context to
analyze and present the information in a meaningful way.</p>
<h3><b>2 Background</b></p></h3>
<b>2.1 What is Heterogeneous Computing?</b>
<p> Heterogeneous computing is a computing architecture that incorporates multiple types of processors or cores within a single system,
each specialized for different computational tasks. This approach leverages the strengths of various processing units—such as CPUs, GPUs, DSPs, and FPGAs—to optimize performance,
energy efficiency, and computational speed. By allocating tasks to the most suitable processor type, heterogeneous computing systems can handle a wide range of applications more effectively than homogeneous systems.
</p>
<b>2.2 Why is it relevant to Artificial intelligence?</b>
<p> Heterogeneous computing is relevant for AI due to its ability to enhance performance, improve energy efficiency, and provide flexibility for diverse AI workloads.
By leveraging various types of processors—such as CPUs for general tasks, GPUs for parallel processing,
and specialized accelerators for specific AI functions—heterogeneous computing systems can efficiently handle the intensive computational demands of training and deploying AI models.
This approach enables faster processing, reduces power consumption, and supports the scalability and innovation needed for the complex and varied applications of AI technology.</p>
<p>
</p>
<h3><b>3 Related Literature</b></p></h3>
<p> Memeti <i>et al.</i> [1] bring forward an approach to heterogeneous computing that utilizes AI heuristic search
techniques in combination with the use of machine learning to optimize the use of the heterogeneous system. The authors reference the
Enumeration and Measurements (EnuM) technique in comparison to their own when evaluating their method which they call AI heuristics with machine
learning (AML). EnuM is also known as brute-force search and it evaluates every option available before making a decision based on those results. The
authors aim to determine a near optimal system configuration using their proposed method. Their proposed method uses the heuristic search as a guide
through the parameter space and uses simulated annealing (SA) to conduct the parameter space exploration. The outline of this approach can be seen below in
Figure 1:
</p>
<center><img src="ECE570_IMG1.png" alt="Memeti et al. approach" style="width:50%;height:50%;"></center>
<p><center>Figure 1: Memeti <i>et al.</i> [1] AML design.</center></p>
<p>
The authors then use decision tree regression,
a supervised machine learning model, to evaluate the system configuration. The results of their experiment shows that AML is more than 1300 times faster
than EnuM. Another noted difference between the two is AML is able to achieve a similar energy efficiency to EnuM after only evaluating around 7% of the
possible configurations while EnuM needs to evaluate all of the possible configurations to achieve that energy efficiency.
</p>
<p> Greathouse <i>et al.</i> [2] take a different approach and use machine learning (ML) to predict an applications performance and
power on a range of heterogeneous systems. The authors begin the process by creating the dataset to be used by the model by running various applications on
different combinations of hardware and storing that data. Using a fully connected neural network with a linear input layer and sigmoid functions for the hidden
and output layers the kernels are trained and clustered as can be seen in Figure 2.
</p>
<center><img src="ECE570_IMG3.png" alt="Greathouse et al. approach" style="width:50%;height:50%;"></center>
<p><center>Figure 2: Greathouse <i>et al.</i> [1] clustering setup. Each kernel produces a scaling<br> curve and similar kernels are grouped together into clusters.</center></p>
<p>
Within the clusters the authors set it to pick the kernel with the highest value as the representative
for that cluster. The model is then used with an application to predict and a cluster is picked based on which cluster the application is closest to and the scaling
curve designed by the authors using the collected data is used to predict performance or power of the kernel depending on desired configuration. This model is organized into
the overall model setup seen in Figure 3:
</p>
<center><img src="ECE570_IMG2.png" alt="Greathouse et al. approach overview" style="width:40%;height:40%;"></center>
<p><center>Figure 3: Greathouse <i>et al.</i> [1] overall model setup.</center></p>
<p>
It is determined that the use of the designed system is very effective and does not require heavy computational effort. The authors tested the accuracy of their model and found it to have
only 15% error rate in performance predictions and less than 10% error rate for power. </p>
<p> Malita <i>et al.</i> [3] explores the challenges and advancements in hardware acceleration for deep learning. It details the computational components of Deep Neural Networks (DNNs),
reviews state-of-the-art hardware solutions including Intel's MIC, Nvidia's GPUs, and Google's TPUs.
</p>
<p>Computational Components of DNNs: They outline the key computational components of Deep Neural Networks (DNNs), including fully connected layers,
convolutional layers, pooling layers, and softmax layers. There is significant computational intensity of these components and they underscore
the importance of efficient acceleration for optimal performance in deep learning tasks. By identifying and understanding these components, researchers and engineers can develop hardware architectures that specifically target the computational </p>
<p>State of the Art Hardware Solutions: They have reviewed various state-of-the-art hardware solutions commonly used for accelerating deep learning tasks. There is a discussion of various architectures such as Intel's Many Integrated Core (MIC) processors, Nvidia's Graphics Processing Units (GPUs),
and Google's Tensor Processing Units (TPUs). This discussion covers performance characteristics, energy efficiency, and limitations of each architecture in the context of deep learning applications. They conclude that by examining the strengths and weaknesses of these hardware solutions,
researchers can make informed decisions when selecting or designing hardware platforms for deep learning tasks.</p>
<p>Limitations of Specific ASICs like TPU: They delve into the limitations of specific Application-Specific Integrated Circuits (ASICs), focusing on Google's Tensor Processing Units (TPUs). While TPUs offer significant computational power, they lack the flexibility to efficiently
support the diverse computational demands of different DNNs. The chapter [3] highlights issues related to flexibility, resource utilization, memory hierarchy, and architectural suitability. By understanding these limitations, researchers can explore alternative approaches or develop
strategies to mitigate the challenges associated with specific ASICs like TPUs.</p>
<p>Conclusion and Future Directions: The authors summarize the key findings of the chapter and offer insights into potential avenues for future research and development. They emphasize that there is a need for innovative architectural designs that can adapt to the evolving landscape of deep learning while
optimizing for performance and energy efficiency. The chapter also discusses emerging trends in hardware acceleration for deep learning, such as the integration of specialized hardware units for specific tasks and the exploration of novel memory hierarchy designs.
</p>
<h3><b>4 Results</b></p></h3>
<p> The survey has yielded various different approaches to this topic that allow for improvement on implementations through the use
of AI applications. Each approach has had various purposes to attempt to improve upon the efficiency in one way or another. Memeti <i>et al.</i> [1] focused on the
optimization of the systems configuration while Greathouse <i>et al.</i> [2] focused on the prediction of performance and energy consumption to reduce time and energy
costs in industry by reducing the need for physical tests or simulations.
</p>
<h3><p><b>5 Conclusion and Future Work</b></p></h3>
<p> There are many different approaches and implementations regarding heterogeneous computing specifically in an AI context and the expansive number
of possibilities are evident through the sources included in this survey. Heterogeneous computing can be incredibly useful for AI just as AI can also be extremely helpful
in the configuration or prediction of heterogeneous systems and their behavior. Throughout the course of the term, our team seeks to continue conduct a thorough survey on
heterogeneous computing in an artificial intelligence context. We plan to dive deeper into the various implementations and possibilities pulling in approaches regarding
the use of heterogeneous computing for AI. We aim to take a look into the different frameworks for heterogeneous computing and how these frameworks are used for AI applications.
We are also interested in researching more about the most effective algorithms or implementations for the most common heterogeneous implementations.
Once those topics have been fully researched, we aim to bring together a presentation of the many different implementations of this topic and provide an comprehensive overview.
The result of this project should provide good insight into heterogeneous computing in addition to the increasing use of artificial intelligence within the field.
</p>
<p><u><b>Deliverables</b></u></p>
<p>Presentation:
<ul>
<li><u>Deliverable Due:</u> March 12, 2024</li>
<li><u>Description: </u> The final presentation will showcase the work done over the course of the term. The
presentation will outline the work done by the team and display results collected through the survey.
</ul>
</p>
<p>Final Report:
<ul>
<li><u>Deliverable Due:</u> March 19, 2024</li>
<li><u>Description: </u> The final report will go into detail regarding the project description, information collected,
and resulting analysis of the collected information. The final report will go into a more detailed view of the overall project and its results.
</ul>
</p>
<b><u>Timeline</b></u></p>
<table>
<tr>
<th>Tasks</th>
<th>Deadline</th>
</tr>
<tr>
<td>Project idea has been decided on and initial research done, project proposal drafted and website created.</td>
<td>Week 4</td>
</tr>
<tr>
<td>Website and report have been edited and revised. Survey 50% done and mid-report in progress.</td>
<td>Week 6</td>
</tr>
<tr>
<td>Mid-report completed and website updated with mid-report.</td>
<td>Week 7</td>
</tr>
<tr>
<td>Survey completed and all data collected. Survey data is ready to be used.</td>
<td>Week 9</td>
</tr>
<tr>
<td>Presentation finished and being revised/practiced.</td>
<td>Week 10</td>
</tr>
<tr>
<td>Final Report finished and website updated with final report.</td>
<td>Final's week - March 19th by 6pm</td>
</tr>
</table>
<p></p>
</section>
<section>
<p><b><u>References</u></b></p>
<p>
<ol>
<li>Memeti, S., Pllana, S. Optimization of heterogeneous systems with AI planning heuristics and machine learning: a performance and energy aware approach. Computing 103, 2943–2966 (2021). https://doi.org/10.1007/s00607-021-01017-6</li>
<li>J. L. Greathouse and G. H. Loh, "Machine Learning for Performance and Power Modeling of Heterogeneous Systems," 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), San Diego, CA, USA, 2018, pp. 1-6, doi: 10.1145/3240765.3243484. </li>
<li>Mihaela Maliţa, George Vlǎduţ Popescu, & Ştefan, G. M. (2019). Heterogeneous Computing System for Deep Learning. Studies in Computational Intelligence, 287–319. https://doi.org/10.1007/978-3-030-31756-0_10</li>
</ol>
</p>
</section>
</body>
</html>