deeplearning-math.github.io/index.html at master · deeplearning-math/deeplearning-math.github.io · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
   <META http-equiv=Content-Type content="text/html; charset=gb2312">
   <title>Math6380p: Deep Learning </title>
</head>
<body background="../images/crysback.jpg">

<!-- PAGE HEADER -->

<div class="Section1">
<table border="0" cellpadding="0" width="100%" style="width: 100%;">
      <tbody>
        <tr>

       <td style="padding: 0.75pt;" width="80" align="center">

      <p class="MsoNormal">&nbsp;<img width="64" height="64"
 id="_x0000_i1025"
 src="../images/hkust0_starry.jpg" alt="PKU">
          </p>
       </td>
       <td style="padding: 0.75pt;">
      <p>
<span style="font-size: 18pt;">
<b><big>MATH 6380P. Advanced Topics in Deep Learning <br>
   Fall 2020</big></b>
<br>
</p>
</td>
</tr>

</tbody>
</table>

<div class="MsoNormal" align="center" style="text-align: center;">
<hr size="2" width="100%" align="center">  </div>

<ul type="disc">

</ul>

<!-- COURSE INFORMATION BANNER -->

<table border="0" cellpadding="0" width="100%" bgcolor="#990000"
 style="background: rgb(153,0,0) none repeat scroll 0% 50%; width: 100%;">
      <tbody>

        <tr>
       <td style="padding: 2.25pt;">
      <p class="MsoNormal"><b><span
 style="font-size: 13.5pt; color: white;">Course Information</span></b></p>
       </td>
      </tr>

  </tbody>
</table>

<!-- COURSE INFORMATION -->

<h3>Synopsis</h3>
<p style="margin-left: 0.5in;">
<big> This course is a continuition of <a href="https://deeplearning-math.github.io/2018spring.html">Math 6380o, Spring 2018</a>, inspired by Stanford Stats 385, <a href="http://stats385.github.io">Theories of Deep Learning</a>,
	taught by Prof. Dave Donoho, Dr. Hatef Monajemi, and Dr. Vardan Papyan, as well as the Simons Institute program on
	<a href="https://simons.berkeley.edu/programs/dl2019">Foundations of Deep Learning</a> in the summer of 2019 and IAS@HKUST workshop on
	<a href="http://ias.ust.hk/events/201801mdl/">Mathematics of Deep Learning</a> during Jan 8-12, 2018.
The aim of this course is to provide graduate students who are interested in deep learning a variety of understandings
	on neural networks that are currently available to foster future research.
	</big>
<br>
	<big>
Prerequisite: there is no prerequisite, though mathematical maturity on approximation theory, harmonic analysis, optimization, and statistics will be helpful.
		Do-it-yourself (DIY) and critical thinking (CT) are the most important things in this course. Enrolled students should have some programming experience with modern neural networks, such as PyTorch, Tensorflow, MXNet, Theano, and Keras,
		etc. Otherwise, it is recommended to take some courses on Statistical Learning (<a href="https://yuany-pku.github.io/2018_math4432/">Math 4432</a> or 5470), and Deep learning such as
		<a href="https://cs231n.github.io/">Stanford CS231n</a> with assignments, or a similar course COMP4901J by Prof. CK TANG at HKUST.
	</big>
</p>

<h3>Reference</h3>
<p style="margin-left: 0.5in;">
<big>
<em><a href="http://stats385.github.io">Theories of Deep Learning</a>, Stanford STATS385 by Dave Donoho, Hatef Monajemi, and Vardan Papyan </em>
</big>
</p>
<p style="margin-left: 0.5in;">
<big>
<em><a href="https://simons.berkeley.edu/programs/dl2019">Foundations of Deep Learning</a>, by Simons Institute for the Theory of Computing, UC Berkeley </em>
</big>
</p>
<p style="margin-left: 0.5in;">
<big>
<em><a href="http://www.deepmath.org">On the Mathematical Theory of Deep Learning</a>, by <a href="http://www.math.tu-berlin.de/~kutyniok">Gitta Kutyniok</a> </em>
</big>
</p>


<h3>Tutorials: preparation for beginners</h3>
<p style="margin-left: 0.5in;">
<big>
	<em><a href="http://cs231n.github.io/python-numpy-tutorial/">Python-Numpy Tutorials</a> by Justin Johnson </em>
</big>
</p>

<p style="margin-left: 0.5in;">
<big>
	<em><a href="http://scikit-learn.org/stable/tutorial/">scikit-learn Tutorials</a>: An Introduction of Machine Learning in Python</em>
</big>
</p>

<p style="margin-left: 0.5in;">
<big>
	<em><a href="http://cs231n.github.io/ipython-tutorial/">Jupyter Notebook Tutorials</a> </em>
</big>
</p>

<p style="margin-left: 0.5in;">
<big>
<em><a href="http://pytorch.org/tutorials/">PyTorch Tutorials</a> </em>
</big>
</p>
<p style="margin-left: 0.5in;">
<big>
<em><a href="http://www.di.ens.fr/~lelarge/dldiy/">Deep Learning: Do-it-yourself with PyTorch</a>, </em> A course at ENS
</big>
</p>
<p style="margin-left: 0.5in;">
<big>
<em><a href="https://www.tensorflow.org/tutorials/">Tensorflow Tutorials</a></em>
</big>
</p>
<p style="margin-left: 0.5in;">
<big>
<em><a href="https://mxnet.incubator.apache.org/tutorials/index.html">MXNet Tutorials</a></em>
</big>
</p>
<p style="margin-left: 0.5in;">
<big>
<em><a href="http://deeplearning.net/software/theano/tutorial/">Theano Tutorials</a></em>
</big>
</p>
<p style="margin-left: 0.5in;">
<big>
<em><a href="https://www.manning.com/books/deep-learning-with-python?a_aid=keras&a_bid=76564dff">Manning: Deep Learning with Python</a>, by Francois Chollet</em> [<a href="https://github.com/fchollet/deep-learning-with-python-notebooks">GitHub source in Python 3.6 and Keras 2.0.8</a>]
</big>
</p>
<p style="margin-left: 0.5in;">
<big>
<em><a href="http://www.deeplearningbook.org/">MIT: Deep Learning</a>, by Ian Goodfellow, Yoshua Bengio, and Aaron Courville
</em>
</big>
</p>


<h3>Instructors: </h3>
<p style="margin-left: 0.5in;">
<big>
<em><a href="http://yao-lab.github.io/">Yuan Yao</a>  </em>
</big>
</p>

<h3>Time and Place:</h3>
<p style="margin-left: 0.5in;">
<big><em>Wed 3:00PM - 5:50PM, Zoom </em> <br>
<!---	<em> Venue changed: Rm 4582 (Lift 27-28) from Sep 10, 2018.</em> <img src="./images/new.jpg" height="40"> ---->
</big>
</p>

<h3>Homework and Projects:</h3>

<p style="margin-left: 0.5in;">
<big><em> No exams, but extensive discussions and projects will be expected. </em>
</big></p>
<!---
<h3>Teaching Assistant:</h3>

<p style="margin-left: 0.5in;">
<big> <br>
Email: Mr. Mingxuan CAI <em> deeplearning.math (add "AT gmail DOT com" afterwards) </em>
</big>
</p>
--->

<h3>Schedule</h3>

<table border="1" cellspacing="0">
<tbody>

<tr>
<td align="left"><strong>Date</strong></td>
<td align="left"><strong>Topic</strong></td>
<td align="left"><strong>Instructor</strong></td>
<td align="left"><strong>Scriber</strong></td>
</tr>

<tr>
<td>09/09/2020, Wednesday</td>
<td>Lecture 01: Overview I <a href="./2020/slides/Lecture01_overview.pdf">[ slides ]</a>
	<br>
	<!---<ul>[Reference]:
	<li> Robert Geirhos, Patricia Rubisch, Claudio Michaelis, Matthias Bethge, Felix Wichmann, Wieland Brendel,
		<a href="https://openreview.net/forum?id=Bygh9j09KX">ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness</a>, ICLR 2019
		[<a href="https://videoken.com/embed/W2HvLBMhCJQ?tocitem=46"> video </a>]
	</li>
	<li> Aleksander Madry (MIT),
		<a href="https://simons.berkeley.edu/talks/tbd-57">A New Perspective on Adversarial Perturbation</a>, Simons Institute for Theory of Computing, 2019.
		[<a href="https://arxiv.org/abs/1905.02175">Adversarial Examples Are Not Bugs, They Are Features</a>]
	</li>
	</ul>
	--->
</td>
<td>Y.Y.</td>
<td></td>
</tr>

	<tr>
<td>09/16/2020, Wednesday</td>
<td>Lecture 02: Symmetry and Network Architectures: Wavelet Scattering Net, DCFnet, Frame Scattering, and Permutation Invariant/Equivariant Nets <a href="./2020/slides/Lecture02_symmetry.pdf">[ slides ]</a> and <a href="./2020/project1/project1.pdf"> Project 1</a>.
	<br>
	<ul>[Reference]:
		<li> Vardan Papyan, X.Y. Han, David L. Donoho,
			<a href="https://arxiv.org/pdf/2008.08186.pdf">Prevalence of Neural Collapse during the terminal phase of deep learning training</a>, arXiv:2008.08186.
		</li>
	<li> Stephane Mallat's short course on Mathematical Mysteries of Deep Neural Networks: <a href="https://www.youtube.com/watch?v=0wRItoujFTA">[ Part I video ]</a>, <a href="https://www.youtube.com/watch?v=kZkjb52zh5k">[ Part II video ]</a>,
		<a href="http://learning.mpi-sws.org/mlss2016/slides/CadixCours2016.pdf"> [ slides ] </a>
		</li>
	<li> Stephane Mallat, <a href="https://www.di.ens.fr/~mallat/papiers/ScatCPAM.pdf">Group Invariant Scattering</a>, Communications on Pure and Applied Mathematics, Vol. LXV, 1331–1398 (2012) </li>
	<li> Joan Bruna and Stephane Mallat, <a href="http://www.cmapx.polytechnique.fr/~bruna/Publications_files/pami.pdf">Invariant Scattering Convolution Networks</a>, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012 </li>
	<li> Thomas Wiatowski and Helmut Bolcskei, <a href="https://www.nari.ee.ethz.ch/commth//pubs/files/deep-2016.pdf">A Mathematical Theory of Deep Convolutional Neural Networks for Feature Extraction</a>, 2016.
	</li>
	<li> Qiang Qiu, Xiuyuan Cheng, Robert Calderbank, Guillermo Sapiro, <a href="https://arxiv.org/abs/1802.04145">DCFNet: Deep Neural Network with Decomposed Convolutional Filters</a>, ICML 2018. arXiv:1802.04145.
	</li>
	<li>Taco S. Cohen, Max Welling, <a href="https://arxiv.org/abs/1602.07576"> Group Equivariant Convolutional Networks</a>, ICML 2016. arXiv:1602.07576.
		</li>
	<li> Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Ruslan Salakhutdinov, Alexander Smola. <a href="https://arxiv.org/abs/1703.06114"> Deep Sets </a>, NIPS, 2017. arXiv:1703.06114.
		</li>
	<li> Akiyoshi Sannai, Yuuki Takai, Matthieu Cordonnier. <a href="https://arxiv.org/abs/1903.01939"> Universal approximations of permutation invariant/equivariant functions by deep neural networks
 </a>, NIPS, 2017. arXiv:1903.01939.
		</li>
	<li> Haggai Maron, Heli Ben-Hamu, Nadav Shamir, Yaron Lipman. <a href="https://openreview.net/pdf?id=Syx72jC9tm"> Invariant and Equivariant Graph Networks </a>. ICLR 2019. <a href="https://arxiv.org/abs/1812.09902">arXiv:1812.09902</a>
		</li>
	</ul>
	<ul> [Public codes]:
		<li> <a href="http://www.di.ens.fr/data/software/"> Scattering Net Matlab codes </a> </li>
		<li> <a href="https://github.com/edouardoyallon/pyscatwave"> pyscatwave: Scattering Transform in Python </a> </li>
		<li> <a href="https://github.com/tdeboissiere/DeepLearningImplementations/tree/master/ScatteringTransform"> Deep Hybrid Transform in Python </a> </li>
		<li> <a href="https://github.com/xycheng/DCFNet"> DCFNet </a> </li>
		<li> <a href="https://exhibits.stanford.edu/data/catalog/ng812mz4543"> Data from "Prevalence of Neural Collapse during the terminal phase of deep learning training" </a> </li>
	</ul>
</td>
<td>Y.Y.</td>
<td></td>
</tr>

<tr>
<td>09/23/2020, Wednesday</td>
<td>Lecture 03: Generalization of Deep Learning. <a href="./2020/slides/Lecture03_generalization.pdf">[ slides ]</a>.
	<br>
	<ul>[Title]: From Classical Statistics to Modern Machine Learning
		[<a href="https://simons.berkeley.edu/sites/default/files/docs/14172/slides1.pdf"> slide </a>]

	<ul>[Speaker]: <a href="http://www.cse.ohio-state.edu/~mbelkin/"> Misha Belkin </a> (OSU)
	</ul>
	<ul>[Abstract]:
		A model with zero training error is overfit to the training data and will typically generalize poorly" goes statistical textbook wisdom. Yet, in modern practice, over-parametrized deep networks with near perfect fit on training data still show excellent test performance. As I will discuss in the talk, this apparent contradiction is key to understanding the practice of modern machine learning.

While classical methods rely on a trade-off balancing the complexity of predictors with training error, modern models are best described by interpolation, where a predictor is chosen among functions that fit the training data exactly, according to a certain (implicit or explicit) inductive bias. Furthermore, classical and modern models can be unified within a single "double descent" risk curve, which extends the classical U-shaped bias-variance curve beyond the point of interpolation. This understanding of model performance delineates the limits of the usual ''what you see is what you get" generalization bounds in machine learning and points to new analyses required to understand computational, statistical, and mathematical properties of modern models.

I will proceed to discuss some important implications of interpolation for optimization, both in terms of "easy" optimization (due to the scarcity of non-global minima), and to fast convergence of small mini-batch SGD with fixed step size.

	</ul>
	<ul>[Video]
		<li>[<a href="https://simons.berkeley.edu/talks/tbd-65"> Simons link </a>]</li>
		<li>[<a href="https://www.bilibili.com/video/av69027489?p=13"> Bilibili link </a>] </li>
		<li>[<a href="https://www.youtube.com/watch?v=JS-Bl36aVPs"> MIT CBMM: Beyond Empirical Risk Minimization: the lessons of deep learning </a>] and an [<a href="https://www.youtube.com/watch?v=JI8BPQAR9fE"> interview with Tommy Poggio </a>] </li>
	</ul>
	<ul>[Reference]
		<li>Mikhail Belkin, Daniel Hsu, Siyuan Ma, Soumik Mandal. Reconciling modern machine learning practice and the bias-variance trade-off.
			<a href="https://www.pnas.org/content/116/32/15849.short"> PNAS</a>, 2019, 116 (32). [<a href="https://arxiv.org/abs/1812.11118"> arXiv:1812.11118 </a>]
		</li>
		<li>Mikhail Belkin, Alexander Rakhlin, Alexandre B Tsybakov.
			Does data interpolation contradict statistical optimality?
			<a href="http://proceedings.mlr.press/v89/belkin19a/belkin19a.pdf"> AISTATS</a>, 2019.
			[<a href="https://arxiv.org/abs/1806.09471"> arXiv:1806.09471 </a>]
		</li>
		<li> Mikhail Belkin, Daniel Hsu, Partha Mitra.
			Overfitting or perfect fitting? Risk bounds for classification and regression rules that interpolate.
			Neural Inf. Proc. Systems (NeurIPS) 2018.
			[<a href="https://arxiv.org/abs/1806.05161"> arXiv:1806.05161 </a>]
	</ul>
	</ul>

	<ul>[Title]: Generalization of linearized neural networks: staircase decay and double descent
		[<a href="https://deeplearning-math.github.io/slides/MEI_Song_Gen_LNN_HKUST.pdf"> slide </a>]

	<ul>[Speaker]: <a href="https://www.stat.berkeley.edu/~songmei/"> MEI, Song </a> (UC Berkeley)
		</ul>
	<ul>[Abstract]:
		Deep learning methods operate in regimes that defy the traditional statistical mindset. Despite the non-convexity of empirical risks and the huge complexity of neural network
		architectures, stochastic gradient algorithms can often find the global minimizer of the training loss and achieve small generalization error on test data.
		As one possible explanation to the training efficiency of neural networks, tangent kernel theory shows that a multi-layers neural network — in a proper large
		width limit — can be well approximated by its linearization. As a consequence, the gradient flow of the empirical risk turns into a linear dynamics and
		converges to a global minimizer. Since last year, linearization has become a popular approach in analyzing training dynamics of neural networks. However,
		this naturally raises the question of whether the linearization perspective can also explain the observed generalization efficacy. In this talk, I will
		discuss the generalization error of linearized neural networks, which reveals two interesting phenomena: the staircase decay and the double-descent curve.
		Through the lens of these phenomena, I will also address the benefits and limitations of the linearization approach for neural networks.
	</ul>
	<ul>[Video]
		<li>[<a href="https://hkust.zoom.us/rec/share/_mySHasnwGTyR4eaTQ7q-pqmrzwha4djDoCPZQrdG0GP4_DP21SRrvzOUs8zZArB.h659yY4czbtVFt3d"> HKUST Zoom </a>]</li>
	</ul>
	<ul>[Reference]
		<li>Song Mei and Andrea Montanari. The generalization error of random features regression: Precise asymptotics and double descent curve.
			[<a href="https://arxiv.org/abs/1908.05355"> arXiv:1908.05355 </a>]
		</li>
		<li>Behrooz Ghorbani, Song Mei, Theodor Misiakiewicz, and Andrea Montanari. Linearized two-layers neural networks in high dimension.
			[<a href="https://arxiv.org/abs/1904.12191"> arXiv:1904.12191 </a>]
		</li>
	</ul>
	</ul>
</td>
<td>Y.Y.</td>
<td></td>
</tr>


	<tr>
<td>09/30/2020, Wednesday</td>
<td>Lecture 04: Generalization in Deep Learning: I and II. Introduction and Uniform Law of Large Numbers. <a href="./2019/slides/BartlettRakhlin_Generalization.pdf">[ slides ]</a>
	<br>
	<ul>[Speaker]: <a href="http://www.mit.edu/~rakhlin/"> Sasha Rakhlin </a> (Massachusetts Institute of Technology) and <a href="https://simons.berkeley.edu/people/peter-bartlett">Peter Bartlett</a> (UC Berkeley) </ul>
	<ul>[Abstract]: We review tools useful for the analysis of the generalization performance of deep neural networks on classification and regression problems.  We review uniform convergence properties, which show how this performance depends on notions of complexity, such as Rademacher averages, covering numbers, and combinatorial dimensions, and how these quantities can be bounded for neural networks.  We also review the analysis of the performance of nonparametric estimation methods such as nearest-neighbor rules and kernel smoothing.  Deep networks raise some novel challenges, since they have been observed to perform well even with a perfect fit to the training data.  We review some recent efforts to understand the performance of interpolating prediction rules, and highlight the questions raised for deep learning.
	</ul>
	<ul>[Video] <a href="https://simons.berkeley.edu/workshops/schedule/10624">Deep Learning Bootcamp</a>, Simons Institute for the Theory of Computing at UC Berkeley, 2019 [<a href="https://weibo.com/7337268754/Ign1TBCQR?from=page_1005057337268754_profile&type=comment"> Weibo collection </a>]
		<li>[<a href="https://simons.berkeley.edu/talks/generalization-i"> Part I </a>] [<a href="https://www.bilibili.com/video/av75713571/"> Bilibili link </a>] </li>
		<li>[<a href="https://simons.berkeley.edu/talks/generalization-ii"> Part II </a>] [<a href="https://www.bilibili.com/video/av75713571?p=2"> Bilibili link </a>] </li>
	</ul>
</td>
<td>Y.Y.</td>
<td></td>
</tr>

	<tr>
<td>10/07/2020, Wednesday</td>
<td>Lecture 05: Generalization in Deep Learning: III. Classification and Rademacher Complexity <a href="./2019/slides/BartlettRakhlin_Generalization.pdf">[ slides ]</a>
	<br>
	<ul>[Speaker]: <a href="http://www.mit.edu/~rakhlin/"> Sasha Rakhlin </a> (Massachusetts Institute of Technology) </ul>
	<ul>[Abstract]: We review tools useful for the analysis of the generalization performance of deep neural networks on classification and regression problems.  We review uniform convergence properties, which show how this performance depends on notions of complexity, such as Rademacher averages, covering numbers, and combinatorial dimensions, and how these quantities can be bounded for neural networks.  We also review the analysis of the performance of nonparametric estimation methods such as nearest-neighbor rules and kernel smoothing.  Deep networks raise some novel challenges, since they have been observed to perform well even with a perfect fit to the training data.  We review some recent efforts to understand the performance of interpolating prediction rules, and highlight the questions raised for deep learning.
	</ul>
	<ul>[Video] <a href="https://simons.berkeley.edu/workshops/schedule/10624">Deep Learning Bootcamp</a>, Simons Institute for the Theory of Computing at UC Berkeley, 2019
		<li>[<a href="https://simons.berkeley.edu/talks/generalization-iii"> Part III </a>] [<a href="https://www.bilibili.com/video/av75713571?p=3"> Bilibili link </a>] </li>
	</ul>

	<ul>[ Title ]: Rethinking Breiman's Dilemma in Neural Networks: Phase Transitions of Margin Dynamics.
		[<a href="https://deeplearning-math.github.io/slides/?"> slide </a>]

	<ul>[Speaker]: ZHU, Weizhi (HKUST)
	</ul>
	<ul>[Abstract]:
		Margin enlargement over training data has been an important strategy since perceptrons in machine learning for the purpose of boosting the confidence
		of training toward a good generalization ability. Yet Breiman shows a dilemma (Breiman, 1999) that a uniform improvement on margin distribution does not
		necessarily reduces generalization errors. In this paper, we revisit Breiman’s dilemma in deep neural networks with recently proposed spectrally normalized
		margins, from a novel perspective based on phase transitions of normalized margin distributions in training dynamics. Normalized margin distribution of a
		classifier over the data, can be divided into two parts: low/small margins such as some negative margins for misclassified samples vs. high/large margins
		for high confident correctly classified samples, that often behave differently during the training process. Low margins for training and test datasets are
		often effectively reduced in training, along with reductions of training and test errors; while high margins may exhibit different dynamics, reflecting the
		trade-off between expressive power of models and complexity of data. When data complexity is comparable to the model expressiveness, high margin distributions
		for both training and test data undergo similar decrease-increase phase transitions during training. In such cases, one can predict the trend of generalization
		or test error by margin-based generalization bounds with restricted Rademacher complexities, shown in two ways in this paper with early stopping time exploiting
		such phase transitions. On the other hand, over-expressive models may have both low and high training margins undergoing uniform improvements, with a distinct
		phase transition in test margin dynamics. This reconfirms the Breiman’s dilemma associated with overparameterized neural networks where margins fail to predict
		overfitting. Experiments are conducted with some basic convolutional networks, AlexNet, VGG-16, and ResNet-18, on several datasets including Cifar10/100 and
		mini-ImageNet.
	</ul>
	<ul>[Reference]
	<li> Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, Oriol Vinyals,
		<a href="https://arxiv.org/abs/1611.03530">Understanding deep learning requires rethinking generalization.
			</a> ICLR 2017.
		<a href="https://github.com/pluskid/fitting-random-labels">[Chiyuan Zhang's codes]</a>
	</li>
	<li> Peter L. Bartlett, Dylan J. Foster, Matus Telgarsky. Spectrally-normalized margin bounds for neural networks.
 <a href="https://arxiv.org/abs/1706.08498">[ arXiv:1706.08498 ]</a>. NIPS 2017. </li>
	<li> Neyshabur, B., Bhojanapalli, S., McAllester, D., and Srebro, N. A pac-bayesian approach to spectrally-normalized
		margin bounds for neural networks. [<a href="https://arxiv.org/abs/1707.09564"> arXiv:1707.09564 </a>].<I> International Conference on Learning Representations (ICLR)</I>, 2018.
		</li>
	<li> Noah Golowich, Alexander (Sasha) Rakhlin, Ohad Shamir. Size-Independent Sample Complexity of Neural Networks.
		[<a href="https://arxiv.org/abs/1712.06541"> arXiv:1712.06541 </a>]. COLT 2018. </li>
	<li> Weizhi Zhu, Yifei Huang, Yuan Yao. Rethinking Breiman's Dilemma in Neural Networks: Phase Transitions of Margin Dynamics.
		[<a href="https://arxiv.org/abs/1810.03389"> arXiv: 1810.03389 </a>].
	</li>
	<li> Vaishnavh Nagarajan, J. Zico Kolter. Uniform convergence may be unable to explain generalization in deep learning.
		<a href="https://arxiv.org/abs/1902.04742">[ arXiv:1902.04742 ]</a>. NIPS 2019.
		<a href="https://locuslab.github.io/2019-07-09-uniform-convergence/">[ Github ]</a>.
		(It argues that all the generalization bounds above might fail to explain generalization in deep learning)
	</li>

	</ul>
	</ul>

</td>
<td>Y.Y.</td>
<td></td>
</tr>

<tr>
<td>10/14/2019, Wednesday</td>
<td>Lecture 06: Generalization in Deep Learning: IV. Interpolation. <a href="./2019/slides/BartlettRakhlin_Generalization.pdf">[ slides ]</a>
	<br>
	<ul>[Speaker]: <a href="https://simons.berkeley.edu/people/peter-bartlett">Peter Bartlett</a> (UC Berkeley) </ul>
	<ul>[Abstract]: We review tools useful for the analysis of the generalization performance of deep neural networks on classification and regression problems.  We review uniform convergence properties, which show how this performance depends on notions of complexity, such as Rademacher averages, covering numbers, and combinatorial dimensions, and how these quantities can be bounded for neural networks.  We also review the analysis of the performance of nonparametric estimation methods such as nearest-neighbor rules and kernel smoothing.  Deep networks raise some novel challenges, since they have been observed to perform well even with a perfect fit to the training data.  We review some recent efforts to understand the performance of interpolating prediction rules, and highlight the questions raised for deep learning.
	</ul>
	<ul>[Video] <a href="https://simons.berkeley.edu/workshops/schedule/10624">Deep Learning Bootcamp</a>, Simons Institute for the Theory of Computing at UC Berkeley, 2019
		<li>[<a href="https://simons.berkeley.edu/talks/generalization-iv"> Part IV </a>] [<a href="https://www.bilibili.com/video/av75713571?p=4"> Bilibili link </a>]</li>
	</ul>

	<ul>[Title]:
		Benign Overfitting in Linear Prediction

	<ul>[Speaker]: <a href="https://simons.berkeley.edu/people/peter-bartlett">Peter Bartlett</a> (UC Berkeley) </ul>
	<ul>[Abstract]:
		Classical theory that guides the design of nonparametric prediction methods like deep neural networks involves a tradeoff between the fit to the training data and the complexity of the prediction rule. Deep learning seems to operate outside the regime where these results are informative, since deep networks can perform well even with a perfect fit to noisytraining data. We investigate this phenomenon of 'benign overfitting' in the simplest setting, that of linear prediction. We give a characterization of linear regression problems for which the minimum norm interpolating prediction rule has near-optimal prediction accuracy. The characterization is in terms of two notions of effective rank of the data covariance. It shows that overparameterization is essential: the number of directions in parameter space that are unimportant for prediction must significantly exceed the sample size.  We discuss implications for deep networks and for robustness to adversarial examples.

Joint work with Phil Long, Gábor Lugosi, and Alex Tsigler.

	</ul>
	<ul>[Video]
		<li>[<a href="https://simons.berkeley.edu/talks/tbd-51"> Simons link </a>]</li>
		<li>[<a href="https://www.bilibili.com/video/av69027489?p=14"> Bilibili link </a>] </li>
	</ul>
	<ul>[Reference]
		<li> Peter L. Bartlett, Philip M. Long, Gábor Lugosi, Alexander Tsigler. Benign Overfitting in Linear Regression. <a href="https://arxiv.org/abs/1906.11300"> arXiv:1906.11300 </a> </li>
	</ul>
	</ul>

	<ul>[ Gallery of Project 1 ]:
		<ul>
		<li> Description of <a href="./2020/project1/project1.pdf"> Project 1</a></li>
		<!---
		<li> Peer Review requirement: <a href="./2020/project1/project1_review.pdf"> Peer Review </a> and <a href="./2019/project1/project1review_assignment.pdf"> Report Assignment </a></li>
		<li> Rebuttal Guideline: <a href="./2019/project1/project1_rebuttal.pdf"> Rebuttal </a> </li>
		<li> Doodle Vote for Top 3 Reports: <a href="https://doodle.com/poll/56s69neeyme7ry23"> vote link </a> </li>
		---->
		<li> <B>FAN, Ganghua </B>.
			[<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/blob/master/2020/project1/FAN/Math%206380P%20Project%201.pdf"> report </a>]
			[<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/tree/master/2019/project1/group01/review"> source </a>]
			</li>
		<li> <B>FANG Linjiajie (20382284), Liu Yiyuan (20568864), Wang Qiyue (20672641), Wang Ya (20549569)</B>.
			[<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/blob/master/2020/project1/FANG/MATH6380_project1_report_FANGL.pdf"> report (pdf) </a>]
			[<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/tree/master/2020/project1/FANG/project1"> source </a>]
			</li>
		<li> <B>Hao HE, He CAO, Yue GUO, Haoyi CHENG</B>.
			[<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/blob/master/2020/project1/Hao/Project_1_Report_Hao_He_Yue_Haoyi/Project_1_Final_Report.html"> report (html) </a>]
			[<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/blob/master/2020/project1/Hao/Project_1_Report_Hao_He_Yue_Haoyi.ipynb"> source </a>]
			</li>
		<li> <B> WU Huimin, HE Changxiang</B>.
			[<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/blob/master/2020/project1/Huimin/Project%201.doc.pdf"> report (pdf) </a>]
			[<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/blob/master/2020/project1/Huimin/project1.ipynb"> source </a>]
			</li>
		<li> <B> VU Tuan-Anh </B>.
			[<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/blob/master/2020/project1/Tuan-Anh/Project1_report.html"> report (html) </a>]
			[<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/blob/master/2020/project1/Tuan-Anh"> source </a>]
		</li>
		<li> <B> CAO Yang, WU Jiamin </B>.
			[<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/blob/master/2020/project1/Yang/Project_1_CAO_WU.pptx"> report (pptx) </a>]
			[<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/tree/master/2020/project1/Yang/code_Project_1_CAO_WU"> source </a>]
		</li>
		<li> <B> DU, Yipai, Yongquan QU </B>.
			[<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/blob/master/2020/project1/Yipai/MATH6380P-P1.pdf"> report (pdf) </a>]
			[<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/tree/master/2020/project1/Yipai/MATH6380P-P1-main"> source </a>]
		</li>
		<li> <B> Zheyue Fang, Chutian Huang, Yue WU, and Lu YANG </B>.
			[<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/blob/master/2020/project1/Zheyue/Math%206380p%20project1.pdf"> report (pdf) </a>]
			[<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/blob/master/2020/project1/Zheyue/MATH6380P_Project1%20.ipynb"> source </a>]
		</li>
		<li> <B> Shizhe Diao, Jincheng Yu, Duo Li, Yimin Zheng </B>.
			[<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/blob/master/2020/project1/MATH6380p%20Project%201.pdf"> report (pdf) </a>]
		</li>
		<li> <B> Kai WANG, Weizhen DING </B>.
			[<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/blob/master/2020/project1/Math%206380P%20Project%201_Weizhen%20DING(12229218)_Kai%20Wang(20738081).pdf"> report (pdf) </a>]
		</li>
		<li> <B> Tony C. W. Mok, Jierong Wang </B>.
			[<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/blob/master/2020/project1/MATH6380p_project1_report.pdf"> report (pdf) </a>]
		</li>
		<li> <B> Rongrong GAO (20619663), Junming CHEN (20750649), Zifan SHI (20619455) </B>.
			[<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/blob/master/2020/project1/Mini-Project_1_Feature%20Extraction%20and%20Transfer%20Learning_GaoRongrong_ShiZifan_ChenJunming.pdf"> report (pdf) </a>]
		</li>
		<li> <B> Samuel Cahyawijaya, Etsuko Ishii, Ziwei Ji, Ye Jin Bang </B>.
			[<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/blob/master/2020/project1/Poster%20MATH6380P%20Deep%20learning.pdf"> report (pdf) </a>]
		</li>
		<li> <B> Hanli Huang </B>.
			[<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/blob/master/2020/project1/Hanli.ipynb"> report (ipynb) </a>]
		</li>
		<li> <B> ABDULLAH, Murad </B>.
			[<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/blob/master/2020/project1/poject1_report.pdf"> report (pdf) </a>]
		</li>
		<li> <B> PANG Hong Wing and WONG, Yik Ben </B>.
   			[<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/blob/master/2020/project1/PangWong/PangWong_Project1.pdf"> report (pdf) </a >]
   			[<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/blob/master/2020/project1/PangWong/codes"> source </a >]
  		</li>


	</ul>
</ul>
</td>
<td>Y.Y.</td>
<td></td>
</tr>

	<tr>
<td>10/21/2020, Wednesday</td>
<td>Lecture 07: Is Optimization a Sufficient Language for Understanding Deep Learning? <a href="http://www.offconvex.org/2019/06/03/trajectories/">[ link ]</a>
	<br>
	<ul>[Speaker]: <a href="https://www.cs.princeton.edu/~arora/">Sanjeev Arora</a> (Princeton University) </ul>
	<ul>[Abstract]: In this Deep Learning era, machine learning usually boils down to defining a suitable objective/cost function for the learning task at hand,
		and then optimizing this function using some variant of gradient descent (implemented via backpropagation). Little wonder that hundreds of ML papers
		each year are devoted to various aspects of optimization. Today I will suggest that if our goal is mathematical understanding of deep learning, then
		the optimization viewpoint is potentially insufficient — at least in the conventional view.
	</ul>
	<ul>[Video]
		<li>[<a href="https://www.youtube.com/watch?v=HMdJd2minAI&list=PLWQvhvMdDChzsThHFe4lYAff3pu2m0v2H&index=5&t=1s"> IAS Princeton </a>]</li>
	</ul>
	<ul>[Seminars]:
		<ul>
		<li><B>FANG Linjiajie (20382284), Liu Yiyuan (20568864), Wang Qiyue (20672641), Wang Ya (20549569)</B></li>
		<li><B> WU Huimin, HE Changxiang </B></li>
		<li><B> CAO Yang, WU Jiamin </B></li>
		</ul>
	</ul>
</td>
<td>Y.Y.</td>
<td></td>
</tr>

<tr>
<td>10/28/2020, Wednesday</td>
<td>Lecture 08: Final Project [<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/blob/master/2020/project2/project2.pdf"> pdf </a>]
	<br>
	<ul>[Title]: Compression and Acceleration of Pre-trained Language Models
		[<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/blob/master/2020/slides/Hou_Lu_slide_handout.pdf"> slide </a>]

	<ul>[Speaker]: Dr. <a href="https://scholar.google.com.hk/citations?user=rnjoL5cAAAAJ&hl=en"> Lu HOU, </a> Huawei Noah’s Ark Lab
	</ul>

	<ul>[Abstract]:
		Recently, pre-trained language models based on the Transformer structure like BERT and RoBERTa have achieved remarkable results on various natural language processing tasks and even some computer vision tasks. However, these models have many parameters, hindering their deployment on edge devices with limited storage. In this talk, I will first introduce some basics about pre-trained language modeling and our proposed pre-trained language model NEZHA. Then I will elaborate on how we alleviate the concerns in various deployment scenarios during the inference and training period. Specifically, compression and acceleration methods using knowledge distillation, dynamic networks, and network quantization will be discussed.
		Finally, I will also discuss some recent progress about training deep networks on edge through quantization.
	</ul>

	<ul>[Bio]:
		Dr. Lu HOU is a researcher at the Speech and Semantics Lab in Huawei Noah's Ark Lab. She obtained Ph.D. from Hong Kong University of Science and Technology in 2019, under the supervision of Prof. James T. Kwok. Her current research interests include compression and acceleration of deep neural networks, natural language processing, and deep learning optimization.
	</ul>
	</ul>
</td>
<td>Y.Y.</td>
<td></td>
</tr>

<tr>
<td>11/04/2020, Wednesday</td>
<td>Lecture 09: Overparameterization and Optimization <a href="./2019/slides/Lee_simons_tutorial_Overparam_opt_DL.pdf">[ slides ]</a>
	<br>
	<ul>[Speaker]: Prof. <a href="https://jasondlee88.github.io/">Jason Lee</a>, Princeton University </ul>
	<ul>[Abstract]: We survey recent developments in the optimization and learning of deep neural networks. The three focus topics are on:
		<li>1) geometric results for the optimization of neural networks, </li>
		<li>2) Overparametrized neural networks in the kernel regime (Neural Tangent Kernel) and its implications and limitations, </li>
		<li>3) potential strategies to prove SGD improves on kernel predictors.</li>
	</ul>
	<ul>[Video] <a href="https://simons.berkeley.edu/workshops/schedule/10624">Deep Learning Bootcamp</a>, Simons Institute for the Theory of Computing at UC Berkeley, 2019. [<a href="https://weibo.com/7337268754/Ign0S6ePZ?from=page_1005057337268754_profile&wvr=6&mod=weibotime&type=comment"> Weibo collection </a>]
		<li>[<a href="https://simons.berkeley.edu/talks/optimizations-i"> Part I </a>] [<a href="https://www.bilibili.com/video/av75713946/"> Bilibili link</a>]</li>
		<li>[<a href="https://simons.berkeley.edu/talks/optimization-ii"> Part II </a>] [<a href="https://www.bilibili.com/video/av75713946?p=2"> Bilibili link</a>]</li>
	</ul>
</td>
<td>Y.Y.</td>
<td></td>
</tr>

		<tr>
<td>11/11/2020, Wedneday</td>
<td>Lecture 10: Implicit Regularization
	<br>
	<ul>[Speaker]: <a href="https://ttic.uchicago.edu/~nati/"> Nati Srebro </a> (TTI at University of Chicago) </ul>
	<ul>[Abstract]: We review the implicit regularization of gradient descent type algorithms in machine learning.
	</ul>
	<ul>[Video] <a href="https://simons.berkeley.edu/workshops/schedule/10624">Deep Learning Bootcamp</a>, Simons Institute for the Theory of Computing at UC Berkeley, 2019.
		[<a href="https://www.weibo.com/7337268754/Igaho8Jp7?type=comment"> Weibo link </a>]
		<li>[<a href="https://simons.berkeley.edu/talks/implicit-regularization-i"> Part I </a>] [<a href="https://www.bilibili.com/video/av75621719/"> Bilibili link </a>] </li>
		<li>[<a href="https://simons.berkeley.edu/talks/implicit-regularization-ii"> Part II </a>] [<a href="https://www.bilibili.com/video/av75621719?p=2"> Bilibili link </a>]</li>
	</ul>
	<ul>[Reference]
	<li> Inductive Bias and Optimization in Deep Learning <a href="https://www.youtube.com/watch?v=F9172atHmRs&feature=youtu.be" Lecture video </a> at
		<a href="https://stats385.github.io/lecture_videos"> Stanford Stats385 </a>
		</li>
	<li> Matus Telgarsky. A Primal-dual Analysis of Margin Maximization by Steepest Descent Methods
		<a href="https://simons.berkeley.edu/talks/tbd-56"> Simons Institute </a> </li>
	<li> Behnam Neyshabur, Ryota Tomioka, Nathan Srebro. In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning.
		[<a href="https://arxiv.org/abs/1412.6614"> arXiv:1412.6614 </a>]
	</li>
	<li> Behnam Neyshabur, Ryota Tomioka, Ruslan Salakhutdinov, Nathan Srebro. Geometry of Optimization and Implicit Regularization in Deep Learning.
		[<a href="https://arxiv.org/abs/1705.03071"> arXiv: 1705.03071</a>] An older paper that takes a higher level view of what might be going on and what we want to try to achieve. </li>
	<li> Daniel Soudry, Elad Hoffer, Mor Shpigel Nacson, Suriya Gunasekar, Nathan Srebro. The Implicit Bias of Gradient Descent on Separable Data.
		[<a href="https://arxiv.org/abs/1710.10345"> arXiv:1710.10345 </a>]. ICLR 2018. Gradient descent on logistic regression leads to max margin. </li>
	<li> Matus Telgarsky. Margins, Shrinkage, and Boosting. <a href="https://arxiv.org/abs/1303.4172">[ arXiv:1303.4172 ]</a>. ICML 2013. An older paper on gradient descent on exponential/logistic loss
		leads to max margin. </li>
	<li> Suriya Gunasekar, Blake Woodworth, Srinadh Bhojanapalli, Behnam Neyshabur, Nathan Srebro. Implicit Regularization in Matrix Factorization.
		[<a href="https://arxiv.org/abs/1705.09280"> arXiv:1705.09280 </a>] </li>
	<li> Yuanzhi Li, Tengyu Ma, Hongyang Zhang. Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations.
		[<a href="https://arxiv.org/abs/1712.09203"> arXiv:1712.09203 </a>] </li>
	<li> Blake Woodworth, Suriya Gunasekar, Pedro Savarese, Edward Moroshko, Itay Golan, Jason Lee, Daniel Soudry, Nathan Srebro. Kernel and Rich Regimes in Overparametrized Models.
		[<a href="https://arxiv.org/abs/1906.05827"> arXiv:1906.05827 </a>] </li>
	<li> Suriya Gunasekar, Jason Lee, Daniel Soudry, Nathan Srebro. Characterizing Implicit Bias in Terms of Optimization Geometry
		[<a href="https://arxiv.org/abs/1802.08246"> arXiv:1802.08246 </a>]
	</li>
	<li> Mor Shpigel Nacson, Suriya Gunasekar, Jason D. Lee, Nathan Srebro, Daniel Soudry. Lexicographic and Depth-Sensitive Margins in Homogeneous and Non-Homogeneous Deep Models
		[<a href="https://arxiv.org/abs/1905.07325"> arXiv:1905.07325 </a>] a generalization of Implicit regularization in linear conv nets:<a href="https://arxiv.org/abs/1806.00468"> https://arxiv.org/abs/1806.00468 </a> </li>
	<li> Greg Ongie, Rebecca Willett, Daniel Soudry, Nathan Srebro. A Function Space View of Bounded Norm Infinite Width ReLU Nets: The Multivariate Case
		[<a href="https://arxiv.org/abs/1910.01635"> arXiv:1910.01635 </a> ] Inductive bias in infinite-width ReLU networks of high dimensionality </li>
	</ul>
</td>
<td>Y.Y.</td>
<td></td>
</tr>


<tr>
<td>11/18/2020, Wednesday</td>
<td>Lecture 11: seminars.
	<br>
	<ul>[Title]: Theory of Deep Convolutional Neural Networks.
		[<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/blob/master/slides/dxZhouHKUST2020tk.pdf"> slides </a>]
	</ul>
	<ul>[Speaker]: Ding-Xuan ZHOU, City University of Hong Kong.
	</ul>
	<ul>[Time]: 3:00-4:20pm </ul>
	<ul>[Abstract]:
		Deep learning has been widely applied and brought breakthroughs in speech recognition, computer vision, and many other domains. The involved deep neural network architectures and computational issues have been well studied in machine learning. But there lacks a theoretical foundation for understanding the modelling, approximation or generalization ability of deep learning models with network architectures. Here we are interested in deep convolutional neural networks (CNNs) with convolutional structures. The convolutional architecture gives essential differences between the deep CNNs and fully-connected deep neural networks, and the classical theory for fully-connected networks developed around 30 years ago does not apply. This talk describes a mathematical theory of deep CNNs associated with the rectified linear unit (ReLU) activation function.

		In particular, we give the first proof for the universality of deep CNNs, meaning that a deep CNN can be used to approximate any continuous function to an arbitrary accuracy when the depth of the neural network is large enough. We also give explicit rates of approximation, and show that the approximation ability of deep CNNs is at least as good as that of fully-connected multi-layer neural networks for general functions, and is better for radial functions. Our quantitative estimate, given tightly in terms of the number of free parameters to be computed, verifies the efficiency of deep CNNs in dealing with big data.
	</ul>
	<ul>[Bio]:
	Ding-Xuan Zhou is a Chair Professor in School of Data Science and Department of Mathematics at City University of Hong Kong, serving also as Associate Dean of School of Data Science, and Director of the Liu Bie Ju Centre for Mathematical Sciences. His recent research interest is deep learning theory.

	He is an Editor-in-Chief of the journals ''Analysis and Application'' and ''Mathematical Foundations of Computing'', and serves editorial boards of more than ten journals. He received a Fund for Distinguished Young Scholars from NSF of China in 2005, and was rated in 2014-2017 by Thomson Reuters/Clarivate Analytics as a Highly-cited Researcher.
	</ul>
<ul>[Reference]
	<li> [<a href="https://hkust.zoom.us/rec/share/1W7REu8HLTjYh4WqX0CIWbjMqQ7OETmnJsELexe7QmK9MpTPm4ZJx208HYWiOZyP.K_ypkVpcq9Agv18o"> video </a>]</li>
</ul>
	<br>

	<ul>[Title]: Analyzing Optimization and Generalization in Deep Learning via Dynamics of Gradient Descent
			[<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/blob/master/slides/Cohen_Nadav_19slides.pdf"> slides </a>]
	</ul>
	<ul>[Speaker]: <a href="https://www.cohennadav.com/">Nadav Cohen</a>, Tel Aviv University.
	</ul>
	<ul>[Time]: 4:30-5:50pm </ul>
	<ul>[Abstract]:
		Understanding deep learning calls for addressing the questions of: (i) optimization --- the effectiveness of simple gradient-based algorithms in solving neural network training programs that are non-convex and thus seemingly difficult; and (ii) generalization --- the phenomenon of deep learning models not overfitting despite having many more parameters than examples to learn from. Existing analyses of optimization and/or generalization typically adopt the language of classical learning theory, abstracting away many details on the setting at hand. In this talk I will argue that a more refined perspective is in order, one that accounts for the dynamics of the optimizer. I will then demonstrate a manifestation of this approach, analyzing the dynamics of gradient descent over linear neural networks. We will derive what is, to the best of my knowledge, the most general guarantee to date for efficient convergence to global minimum of a gradient-based algorithm training a deep network. Moreover, in stark contrast to conventional wisdom, we will see that sometimes, adding (redundant) linear layers to a classic linear model significantly accelerates gradient descent, despite the introduction of non-convexity. Finally, we will show that such addition of layers induces an implicit bias towards low rank (different from any type of norm regularization), and by this explain generalization of deep linear neural networks for the classic problem of low rank matrix completion.
<br>
Works covered in this talk were in collaboration with Sanjeev Arora, Noah Golowich, Elad Hazan, Wei Hu, Yuping Luo and Noam Razin.
	</ul>
	<ul>[Reference]
		<li> [<a href="https://www.youtube.com/watch?v=ytTLSHs2fL4&feature=youtu.be"> video </a>] </li>
		<li> Noam Razin and Nadav Cohen.
			<a href="https://arxiv.org/pdf/2005.06398.pdf">Implicit Regularization in Deep Learning May Not Be Explainable by Norms</a>.
			Conference on Neural Information Processing Systems (NeurIPS) 2020.
		</li>
		<li> Sanjeev Arora, Nadav Cohen, Wei Hu and Yuping Luo (alphabetical order).
			<a href="https://papers.nips.cc/paper/8960-implicit-regularization-in-deep-matrix-factorization.pdf">Implicit Regularization in Deep Matrix Factorization</a>.
			Conference on Neural Information Processing Systems (NeurIPS) 2019. </li>
		<li> Sanjeev Arora, Nadav Cohen, Noah Golowich and Wei Hu (alphabetical order).
			<a href="https://openreview.net/pdf?id=SkMQg3C5K7">A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks</a>.
			International Conference on Learning Representations (ICLR) 2019.</li>
		<li> Sanjeev Arora, Nadav Cohen and Elad Hazan (alphabetical order).
			<a href="http://proceedings.mlr.press/v80/arora18a.html">On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization</a>.
International Conference on Machine Learning (ICML) 2018.</li>
	</ul>
</td>
<td></td>
<td></td>
</tr>


<tr>
<td>11/25/2020, Wednesday</td>
<td>Lecture 12: Mean Field Theory for Neural Networks.
	<br>
	<ul>[Title]: Mean Field Theory and Tangent Kernel Theory in Neural Networks.
		[<a href="https://stats385.github.io/assets/lectures/MF_dynamics_Stanford.pdf"> slides </a>]
	</ul>
	<ul>[Speaker]: Song Mei, University of California at Berkeley.
	</ul>
	<ul>[Time]: 3:00-4:20pm </ul>
	<ul>[Abstract]:
	Deep neural networks trained with stochastic gradient algorithms often achieve near vanishing training error, and generalize well on test data. Such empirical success of optimization and generalization, however, is quite surprising from a theoretical point of view, mainly due to non-convexity and overparameterization of deep neural networks.
	<br>
	In this lecture, I will talk about the mean field theory and the tangent kernel theory on the training dynamics of neural networks, and discuss about their benefits and shortcomings in terms of both optimization and generalization.Then I will analyze the generalization error of linearized neural networks with two interesting phenomena: staircase and double-descent. Finally, I will propose challenges and open problems in analyzing deep neural networks.	</ul>
	<ul>[Bio]:
	</ul>
<ul>[Reference]
	<li> [<a href="https://www.youtube.com/watch?v=7vHr7f3byLc&feature=youtu.be"> video </a>]
	</li>
	<li> Mei, Montanari, and Nguyen. A mean field view of the landscape of two-layers neural networks. Proceedings of the National Academy of Sciences 115, E7665-E7671.
	</li>
	<li> Rotskoff and Vanden-Eijnden. Neural networks as interacting particle systems: Asymptotic convexity of the loss landscape and universal scaling of the approximation error. arXiv:1805.00915.
	</li>
	<li> Chizat and Bach. On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport. Advances in neural information processing systems, 2018, pp. 3036–3046.
	</li>
	<li> Jacot, Gabriel, and Hongler. Neural Tangent Kernel: Convergence and Generalization in Neural Networks. Advances in neural information processing systems, 2018, pp. 8571–8580.
	</li>
	<li> Belkin, Hsu, Ma, and Mandal. Reconciling modern machine learning practice and the bias-variance trade-off. Proceedings of the National Academy of Sciences 116.32 (2019): 15849-15854.
	</li>
	<li> Bach. Breaking the Curse of Dimensionality with Convex Neural Networks. The Journal of Machine Learning Research 18 (2017), no. 1, 629–681.
	</li>
	<li> Ghorbani, Mei, Misiakiewicz, and Montanari. Linearized two-layers neural networks in high dimension. arXiv:1904.12191.
	</li>
	<li> Hastie, Montanari, Rosset, and Tibshirani. Surprises in High-Dimensional Ridgeless Least Squares Interpolation. arXiv:1903.08560.
	</li>
	<li> Mei and Montanari. The generalization error of random features regression: Precise asymptotics and double descent curve. arXiv:1904.12191.
	</li>
</ul>
	<br>

	<ul>[Title]: A mean-field theory for certain deep neural networks
	</ul>
	<ul>[Speaker]: <a href="https://sites.google.com/view/roboliv">Roberto I. Oliveira</a>, IMPA.
	</ul>
	<ul>[Time]: 4:30-5:50pm </ul>
	<ul>[Abstract]:
		A natural approach to understand overparameterized deep neural networks is to ask if there is some kind of natural limiting behavior when the number of neurons diverges. We present a rigorous limit result of this kind for for networks with complete connections and "random-feature-style" first and last layers. Specifically, we show that network weights are approximated by certain "ideal particles" whose distribution and dependencies are described by McKean-Vlasov mean-field model. We will present the intuition behind our approach; sketch some of the key technical challenges along the way; and connect our results to some of the recent literature on the topic.
	</ul>
	<ul>[Reference]
		<li> [<a href="https://youtu.be/JiWhXpVtsm8"> video </a>] </li>
		<li> Dyego Araújo, Roberto I. Oliveira, Daniel Yukimura.
			A mean-field limit for certain deep neural networks.
			<a href="https://arxiv.org/abs/1906.00193">arXiv:1906.00193</a>.
		</li>
		<li>Justin Sirignano, Konstantinos Spiliopoulos.
			"Mean field analysis of deep neural networks", 2020, <em> Mathematics of Operations Research</em>,
			<a href="https://arxiv.org/abs/1903.04440">[ArXiv:1903.04440]</a>, to appear.
		</li>
		<li>Jean-François Jabir, David Šiška, Łukasz Szpruch.
			Mean-Field Neural ODEs via Relaxed Optimal Control
			<a href="https://arxiv.org/abs/1912.05475">arXiv:1912.05475</a>.
		</li>
		<li> Phan-Minh Nguyen, Huy Tuan Pham.
			A Rigorous Framework for the Mean Field Limit of Multilayer Neural Networks.
			<a href="https://arxiv.org/abs/2001.11443">arXiv:2001.11443</a>.
		</li>
		<li> Weinan E, Stephan Wojtowytsch.
			On the Banach spaces associated with multi-layer ReLU networks: Function representation, approximation theory and gradient descent dynamics.
			<a href="https://arxiv.org/abs/2007.15623">arXiv:2007.15263</a>.
		</li>
	</ul>
</td>
<td></td>
<td></td>
</tr>

<tr>
<td>12/02/2020, Wednesday</td>
<td>Lecture 13: seminars.
	<br>
	<ul>[Title]: Learning assisted modeling of molecules and materials.
		[<a href=""> slides </a>]
	</ul>
	<ul>[Speaker]: Linfeng ZHANG, Beijing Institute of Big Data Research and Princeton University.
	</ul>
	<ul>[Time]: 3:00-4:00pm </ul>
	<ul>[Abstract]:
		In recent years, machine learning (ML) has emerged as a promising tool for dealing with the difficulty of representing high dimensional functions. This gives us an unprecedented opportunity to revisit theoretical foundations of various scientific fields and solve problems that were too complicated for conventional approaches to address. Here we identify a list of such problems in the context of multi-scale molecular and materials modeling and review ML-based strategies that boost simulations with ab initio accuracy to much larger scales than conventional approaches. Using examples at scales of many-electron Schrödinger equation, density functional theory, and molecular dynamics, we present two equally important principles: 1) ML-based models should respect important physical constraints in a faithful and adaptive way; 2) to build truly reliable models, efficient algorithms are needed to explore relevant physical space and construct optimal training data sets. Finally, we present our efforts on developing related software packages and high-performance computing schemes, which have now been widely used worldwide by experts and practitioners in the molecular and materials simulation community.
	</ul>
	<ul>[Bio]:
Linfeng Zhang is temporarily working as a research scientist at the Beijing Institute of Big Data Research. In the May of 2020, he graduated from the Program in Applied and Computational Mathematics (PACM), Princeton University, working with Profs. Roberto Car and Weinan E. Linfeng has been focusing on developing machine learning based physical models for electronic structures, molecular dynamics, as well as enhanced sampling. He is one of the main developers of DeePMD-kit, a very popular deep learning based open-source software for molecular simulation in physics, chemistry, and materials science. He is a
recipient of the 2020 ACM Gordon Bell Prize for their project,
		“Pushing the limit of molecular dynamics with ab initio accuracy to 100 million atoms with machine learning.”
	</ul>
<ul>[Reference]
	<li> Weile Jia, Han Wang, Mohan Chen, Denghui Lu, Lin Lin, Roberto Car, Weinan E, Linfeng Zhang.
		Pushing the limit of molecular dynamics with ab initio accuracy to 100 million atoms with machine learning.
		<a href="https://arxiv.org/abs/2005.00223"> arXiv:2005.00223 </a>
	</li>
</ul>
	<br>

	<ul>[Title]: Robust Estimation via Generative Adversarial Networks.
		[<a href=""> slides </a>]
	</ul>
		<ul>[Speaker]: Weizhi ZHU, HKUST.
	</ul>
	<ul>[Time]: 4:00-5:00pm </ul>
	<ul>[Abstract]:
		Robust estimation under Huber's -contamination model has become an important topic in statistics and theoretical computer science. Rate-optimal procedures such as Tukey's median and other estimators based on statistical depth functions are impractical because of their computational intractability. In this talk, we establish an intriguing connection between f-GAN, various depth functions and proper scoring rules. Similar to the derivation of f-GAN, we show that these depth functions that lead to rateoptimal robust estimators can all be viewed as variational lower bounds of the total variation distance in the framework of f-Learning.
	</ul>
	<ul>[Reference]
		<li> GAO, Chao, Jiyu LIU, Yuan YAO, and Weizhi ZHU.
			Robust Estimation and Generative Adversarial Nets.
			[<a href="https://arxiv.org/abs/1810.02030"> arXiv:1810.02030 </a>] [<a href="https://github.com/zhuwzh/Robust-GAN-Center"> GitHub </a>] [<a href="https://simons.berkeley.edu/talks/robust-estimation-and-generative-adversarial-nets"> GAO, Chao's Simons Talk </a>]
		</li>
		<li> GAO, Chao, Yuan YAO, and Weizhi ZHU.
			Generative Adversarial Nets for Robust Scatter Estimation: A Proper Scoring Rule Perspective. Journal of Machine Learning Research, 21(160):1-48, 2020.
			[<a href="https://arxiv.org/abs/1903.01944"> arXiv:1903.01944 </a>] [<a href="https://github.com/zhuwzh/Robust-GAN-Scatter"> GitHub </a>]
		</li>
	</ul>

	<br>

	<ul>[Title]: Towards a mathematical understanding of supervised learning: What we know and what we don't know
		[<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/blob/master/slides/E_One-world-1.pdf"> slides </a>]
	</ul>
		<ul>[Speaker]: Weinan E, Princeton University.
	</ul>
	<ul>[Time]: 4:30-5:50pm </ul>
	<ul>[Abstract]:
		Two of the biggest puzzles in machine learning are: Why is it so successful and why is it quite fragile?
<br>
This talk will present a framework for unraveling these puzzles from the perspective of approximating functions in high dimensions. We will discuss what's known and what's not known about the approximation generalization properties of neural network type of hypothesis space as well as the dynamics and generalization properties of the training process. We will also discuss the relative merits of shallow vs. deep neural network models and suggest ways to formulate more robust machine learning models.
<br>
This is joint work with Chao Ma, Stephan Wojtowytsch and Lei Wu.
	</ul>
	<ul>[Reference]
		<li> [<a href="https://www.youtube.com/watch?v=Ixs45jX6Oq8"> video </a>]
		</li>
	</ul>

	<br>

	<ul>[ Gallery of Project 2 ]:
		<ul>
		<li> Description of <a href="./2020/project2/project2.pdf"> Project 2 (Final Project)</a></li>
		<li> Kaggle in-class contest on semi-conductor image classification 2 mini: [<a href="https://www.kaggle.com/c/semi-conductor-image-classification-second-stage"> link </a>] </li>
		<!---
		<li> Peer Review requirement: <a href="./2020/project1/project1_review.pdf"> Peer Review </a> and <a href="./2019/project1/project1review_assignment.pdf"> Report Assignment </a></li>
		<li> Rebuttal Guideline: <a href="./2019/project1/project1_rebuttal.pdf"> Rebuttal </a> </li>
		<li> Doodle Vote for Top 3 Reports: <a href="https://doodle.com/poll/56s69neeyme7ry23"> vote link </a> </li>
		---->
<li> <B> PANG, Hong Wing and Wong, Yik Ben </B>.
   <br> 1. Can Object Dectectors Generalize?<br>
   [<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/blob/master/2020/project2/1.Can%20Object%20Dector%20Generalize%20-%20Pang%20Hong%20Wing%20et%20al./WONG%2CYik%20Ben%20and%20PANG%2C%20Hong%20Wing.pdf"> poster </a >]
   [<a href="https://youtu.be/6ZQXoDSpYXc"> video </a >]
  </li>

  <li> <B> Ye Jin Bang, Etsuko Ishii, Samuel Cahyawijaya, and Ziwei Ji </B>.
   <br> 2. Model Generalization on COVID19 Fake News Detection <br>
   [<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/blob/master/2020/project2/2.Covid19%20-%20Yejin%20Bang%20et%20al./math_final_report.pdf"> report </a >]
   [<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/blob/master/2020/project2/2.Covid19%20-%20Yejin%20Bang%20et%20al./math_presentation.pdf"> slides </a >]
   [<a href="https://github.com/yjbang/math6380"> source </a >]
   [<a href="https://youtu.be/UKfaBNn_Kyk"> video </a >]
  </li>

  <li> <B> Zheyue FANG, Chutian HUANG, Yue WU, and Lu YANG </B>.
   <br> 3. Home Credit Default Risk Project <br>
   [<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/blob/master/2020/project2/3.Default%20Risk%20-%20Zheyue%20Fang%20et%20al./Final_Project.pdf"> report </a >]
   [<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/blob/master/2020/project2/3.Default%20Risk%20-%20Zheyue%20Fang%20et%20al./final%20project.ipynb"> source </a >]
   [<a href="https://www.bilibili.com/video/BV1Jz4y1C7kz/"> video </a >]
  </li>

  <li> <B> Yipai Du and Yongquan Qu </B>.
   <br> 4. Interpretability of Deep Learning on Home Credit Default Risk Dataset <br>
   [<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/blob/master/2020/project2/4.Home%20Default%20Risk%20-%20Yipai%20Du%20et%20al./poster.pdf"> poster </a >]
   [<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/blob/master/2020/project2/4.Home%20Default%20Risk%20-%20Yipai%20Du%20et%20al./slides.pdf"> slides </a >]
   [<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/tree/master/2020/project2/4.Home%20Default%20Risk%20-%20Yipai%20Du%20et%20al./Advanced-Tops-in-Deep-Learning-Final-Project-main"> source </a >]
   [<a href="https://hkust.zoom.us/rec/share/ATxgDFSrCJjXA5nNKZBvwFnzhHVKhfxBggHbuiOndoUjv-qQw-QO51--hkrVMnkQ.arMqGGmvPdtjeas_?startTime=1607872415000"> video </a >]
  </li>

  <li> <B> Shizhe Diao, Jincheng Yu, Duo Li, and Yimin Zheng </B>.
   <br> 5. Improving Batch Normalization via Scaling and Shifting Relay <br>
   [<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/blob/master/2020/project2/5.Improve%20Batch%20Normalization%20-%20Shizhe%20Diao%20et%20al./Poster.pdf"> poster </a >]
   [<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/blob/master/2020/project2/5.Improve%20Batch%20Normalization%20-%20Shizhe%20Diao%20et%20al./Slides.pptx"> slides </a >]
   [<a href="https://youtu.be/GG8__vs4UC0"> video </a >]
  </li>

  <li> <B> ABDULLAH, Murad </B>.
   <br> 6. Classification of Nexperia Image Dataset: An Averaging Ensemble Approach <br>
   [<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/tree/master/2020/project2/6.Semiconductor%20-%20ABDULLAH%2CMurad/murad_report2.pdf"> report </a >]
   [<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/tree/master/2020/project2/6.Semiconductor%20-%20ABDULLAH%2CMurad/Murad_codes"> source </a >]
   [<a href="https://www.youtube.com/watch?v=raMwajlc8ac&feature=youtu.be&ab_channel=MuradAbdullah"> video </a >]
  </li>

  <li> <B> HE, Changxiang and XU, Yan </B>.
   <br> 7. Nexperia Image Classification <br>
   [<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/blob/master/2020/project2/7.Semiconductor%20-%20Changxiang%20He%20et%20al./Project%202%20HeChangxiang%2020675461.doc"> report </a >]
   [<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/blob/master/2020/project2/7.Semiconductor%20-%20Changxiang%20He%20et%20al./Project%202.pptx"> slides </a >]
   [<a href="https://www.bilibili.com/video/BV1u54y1t7Jj?from=search&seid=4711626716740759584"> video </a >]
  </li>

  <li> <B> Ganghua Fan </B>.
   <br> 8. Kaggle in-class Contest: Nexperia Image Classication II <br>
   [<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/blob/master/2020/project2/8.Semiconductor%20-%20Ganghua%20Fan/Ganghua%20FAN_poster.pptx"> poster </a >]
   [<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/blob/master/2020/project2/8.Semiconductor%20-%20Ganghua%20Fan/FAN%20Ganghua_presentation%20slides.pdf"> slides </a >]
   [<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/blob/master/2020/project2/8.Semiconductor%20-%20Ganghua%20Fan/MATH6380P%20Project2_code"> codes </a >]
   [<a href="https://hkust.zoom.us/rec/share/Jukmg3mzMgQ9XmRD83VXYX8xDHXjGYE2Pbem-xSu2XpD4f-xzn9wEryQJ6Eo-zR5._wMX0hjohbvF2m2n?startTime=1607866318000"> video </a >]
  </li>

  <li> <B> Hanli Huang </B>.
   <br> 9.Semi-conductor defect images classification <br>
   [<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/blob/master/2020/project2/9.Semiconductor%20-%20Hanli%20HUANG/MATH_6380P_project2_Hanli_Huang.pptx"> poster (pptx) </a >]
   [<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/blob/master/2020/project2/9.Semiconductor%20-%20Hanli%20HUANG/MATH_6380P_project2_Hanli_Huang_PRESENTATION.pptx"> slides (pptx) </a >]
   [<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/tree/master/2020/project2/9.Semiconductor%20-%20Hanli%20HUANG"> codes </a >]
   [<a href="https://youtu.be/ReJ_i7TVzdM"> video </a >]
  </li>

  <li> <B> Huimin Wu </B>.
   <br> 10. Nexperia Image Classification II with Noise Handling <br>
   [<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/blob/master/2020/project2/10.Semiconductor%20-%20Huimin%20Wu/Project2_poster.pptx"> poster </a >]
   [<a href="https://hkust.zoom.us/rec/share/Adtwl-tWN0xCg32JU0al99bhpqNx1fvZc7g_uvjy1Mm_nq65gxkU2omXNeRghJL3.aMND9pW6-5wCf_D5"> video </a >]
  </li>

  <li> <B> FANG Linjiajie, Liu Yiyuan, Wang Qiyue, and Wang Ya </B>.
   <br> 11. Solving Semi-Conductor Classification Problem by Light-weighted Model with Stratified Convolutions <br>
   [<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/blob/master/2020/project2/11.Semiconductor%20-%20Linjiajie%20Fang%20et%20al./MATH6380_final_report.pdf"> report </a >]
   [<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/blob/master/2020/project2/11.Semiconductor%20-%20Linjiajie%20Fang%20et%20al./MATH6380_pre.pdf"> slides </a >]
   [<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/tree/master/2020/project2/11.Semiconductor%20-%20Linjiajie%20Fang%20et%20al./codes"> codes </a >]
   [<a href="https://hkust.zoom.us/rec/share/gJFQauBncQZgtj5iEn5PYQSql_SAJ44OJQZen5vyia2WuY05fi_iOHklHuA1cfO-.4oOED-3REmTTkjim"> video </a >]
  </li>

  <li> <B> Rongrong GAO, Junming CHEN, and Zifan SHI </B>.
   <br> 12. Nexperia Image Classification <br>
   [<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/tree/master/2020/project2/12.Semiconductor%20-%20Rongrong%20Gao%20et%20al./report.pdf"> report </a >]
   [<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/blob/master/2020/project2/12.Semiconductor%20-%20Rongrong%20Gao%20et%20al./final_project_slides.pdf"> slides </a>]
   [<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/tree/master/2020/project2/12.Semiconductor%20-%20Rongrong%20Gao%20et%20al."> codes </a >]
   [<a href="https://youtu.be/Mrq4V-nt1I8"> video </a >]
  </li>

  <li> <B> Tony C.W. Mok and Jierong Wang </B>.
   <br> 13. Toward Fast and Accurate Semi-conductor Image Classification using Deep Convolutional Neural Networks <br>
   [<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/blob/master/2020/project2/13.Semiconductor%20-%20Tony%20Mok/MATH6380p_project_2_report.pdf"> report </a >]
   [<a href="https://github.com/wingwing518/MATH6380p_finalproject"> codes </a >]
  </li>

  <li> <B> Tuan Anh VU </B>.
   <br> 14. Anomaly Detection using Transfer Learning in Semiconductors <br>
   [<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/blob/master/2020/project2/14.Semiconductor%20-%20VU%20Tuan%20Anh/MATH6380P_Project_2_Anomaly_Detection_using_Transfer_Learning_in_Semiconductors.pdf"> report </a >]
   [<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/blob/master/2020/project2/14.Semiconductor%20-%20VU%20Tuan%20Anh/MATH6380P_VU_Tuan_Anh_Presentation.pdf"> slides </a >]
   [<a href="https://github.com/tuananh1007/MATH6380P"> codes </a >]
   [<a href="https://youtu.be/0nP9UWc-lo4"> video </a >]
  </li>

  <li> <B> Yang Cao and Jiamin Wu </B>.
   <br> 15. Nexperia Image Classification <br>
   [<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/blob/master/2020/project2/15.Semiconductor%20-%20Yang%20Cao%20et%20al./Final%20Project%20poster.pdf"> report </a >]
   [<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/blob/master/2020/project2/15.Semiconductor%20-%20Yang%20Cao%20et%20al./MATH6380P%20Final%20Project_CAO_WU%20presentation.pdf"> slides </a >]
   [<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/blob/master/2020/project2/15.Semiconductor%20-%20Yang%20Cao%20et%20al./Final%20project%20code.py"> codes </a >]
   [<a href="https://youtu.be/P52qUpzkibo"> video </a >]
  </li>

  <li> <B> Yue Guo, Hao He, He Cao, and Haoyi Cheng </B>(DreamDragon).
   <br> 16. Image Classification of Semiconductors <br>
   [<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/blob/master/2020/project2/16.Semiconductor%20-%20Yue%20Guo%20et%20al./report.pdf"> report </a >]
   [<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/blob/master/2020/project2/16.Semiconductor%20-%20Yue%20Guo%20et%20al./ppt.pptx"> slides (pptx) </a >]
   [<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/tree/master/2020/project2/16.Semiconductor%20-%20Yue%20Guo%20et%20al./code_YueGUO_HaoHE_HeCAO_HaoyiCHENG_MATH6380_Proj2"> codes </a>]
   [<a href="https://hkustconnect-my.sharepoint.com/:u:/g/personal/yguoar_connect_ust_hk/EaUAnpNxZWJJhSAs6eWBqyMB1fq-Z-HNDIQS3Crjc28lKA"> video </a >]
  </li>


  <li> <B> Kai Wang and Weizhen Ding </B>.
   <br> 17. Defect Detection in Semi conductor Images <br>
   [<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/blob/master/2020/project2/17.Semiconductor%20-%20Kai%20Wang/Math6380_project2_report.pdf"> report </a >]
   [<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/blob/master/2020/project2/17.Semiconductor%20-%20Kai%20Wang/Final_Project_weizhen_kai.pptx"> slides (pptx) </a >]
   [<a href="https://github.com/deeplearning-math/deeplearning-math.github.io/blob/master/2020/project2/17.Semiconductor%20-%20Kai%20Wang/Code"> codes </a >]
   [<a href="https://youtu.be/MecxuIo2gNc"> video </a >]
  </li>

	</ul>
</td>
<td></td>
<td></td>
</tr>

	<!---
<tr>
<td>09/18/2018, Wednesday</td>
<td>Seminar: Asymptotic Behavior of Robust Wasserstein Profile Inference (RWPI) Function Analysis --- selecting \delta for DRO (Distributionally Robust Optimization) Problems.
	<a href="">[ slides ]</a>
	<br>
	<ul>[Speaker]: XIE, Jin, Stanford University.
	</ul>
	<ul>[Time]: 3:00-4:20pm </ul>
	<ul>[Venue]: LTJ (Lift 33) </ul>
	<ul>[Abstract]:
		Recently, [1] showed that several machine learning algorithms, such as Lasso, Support Vector Machines, and
		regularized logistic regression, and many others can be represented exactly as distributionally robust
		optimization (DRO) problems. The uncertainty is then defined as a neighborhood centered at the empirical
		distribution. A key element of the study of uncertainty is the Robust Wasserstein Profile function. In [1],
		the authors study the asymptotic behavior of the RWP function in the case of L^p costs under the true
		parameter. We consider costs in more generalized forms, namely Bregman distance or in the more general
		symmetric format of d(x-y) and analyze the asymptotic behavior of the RWPI function in these cases. For
		the purpose of statistical applications, we then study the RWP function with plug-in estimators.

This is a joint work with Yue Hui, Jose Blanchet and Peter Glynn.

<li> [1] Blanchet, J., Kang, Y., & Murthy, K. Robust Wasserstein Profile Inference and Applications to Machine Learning, <a href="https://arxiv.org/pdf/1610.05627.pdf">arXiv:1610.05627</a>, 2016.
	[<a href="./2019/slides/Blanchet_Tutorial_APS_2017.pdf"> tutorial slides </a> ]</li>
	</ul>

<ul>[Reference]
<li> John Duchi (Stanford), Distributional Robustness, Learning, and Empirical Likelihood [<a href="https://simons.berkeley.edu/talks/john-duchi-11-30-17">Simons Institute, 2017</a>]
</li>
<li> Stefanie Jegelka (MIT), Robust Learning via Robust Optimization. [<a href="https://www.youtube.com/watch?v=IgAPc0i0-9E"> video </a>]
</li>
<li> Matthew Staib and Stefanie Jegelka. Distributionally Robust Optimization and Generalization in Kernel Methods. [<a href="https://arxiv.org/pdf/1905.10943.pdf"> arXiv:1905.10943 </a>]
</li>
<li> Daniel Kuhn (EPFL), Wasserstein Distributionally Robust Optimization: Theory and Applications in Machine Learning.