This is a seminar organized by 2prime© at PKU. Our target is gain basic knowledge about deep learning and be able to conqure simple computer vision tasks.

However this is not only a technical course, our focus is the mathmatics behind deep learning. At the same time, we will also learn about the optimization method used in deep learning like Adam and other stochastic optimization methods.

Course introduction can be seen here:link,Course information:link

Director |
2prime |

E-mail |
luyiping9712@pku.edu.cn |

TextBook |
Deep Learning Book link |

The Elements of Statistical Learning:Data Mining, Inference, and Prediction. link | |

Topics |
Machine Learning Elements |

Deep Learning & Neural Networks | |

Semi-supervised Learning and Unsupervised problems | |

Computer Vision | |

Stochastic Optimization& Randomrized Numerical Linear Algebra. | |

Sparse optimization & Compressed Sensing |

**Textbook Pdf**:pdf

Micheal I. Jordan's Advice: link

- Convex Optimization:lecture note You can get more information from here.
- Accelerate trick in optimization:link,link,saddle point
- ADMM Review by Boyd:link
- Candes' Course about convex optimization.link
- Some new reslus in ADMM:link
- Deep Learning Review by Yann LeCun, Yoshua Bengio ,Geoffrey Hinton:link
- Learning Deep Architectures for AI:link
- Lecture2 of CS231N:link
- Chapter 5 and 6 of Deep Learning Book.
- TensorFlow:link
- Supplementary material:
- Choose a loss function in low level computer vision task:linklink,link
- StÂ´ephane Mallat's view on DNN:link
- ICLR 2017 Best Parper:Understanding deep learning requires rethinking generalization link
- ICLR 2017:ON LARGE-BATCH TRAINING FOR DEEP LEARNING: GENERALIZATION GAP AND SHARP MINIMA.link
- ICLR2013:Intriguing properties of neural networkslink

- An interview with LeCun link

You can consider this homework as an enterance test of our seminar.You can get a Pdf version at link

- Read the three paper from ICLR and write a review. You can see the comments from the chairs:link and at the same time you can see the disscusion on zhihu:link
- Prove that: There exists a two-layer neural network with ReLU activations and 2n+d weights that can represent any function on a sample of size n in d dimensions.
**Programming Project**- You can have a demo of Neural Network here Tensorflow palyground
- Understand the back propagation algorithm, you can learn it for chap6.5 for deep learning book or hte lecture4 of CS231n(link)
- Try to train a Network to conquer MNIST tasks.(Any Network structure is ok except LeNet.Can you get 99% accuracy?)

- Read the parper "Learning Fast Approximations of Sparse Coding" by LeCun (ICML 2010)(link) and write a Review.(
**Optional**: you can read the paper:"Maximal Sparsity with Deep Networks?"(NIPS2016)(link))(**Optional**:There is also a similar paper in ICLR2017 called"Learning to Optmize"link)Some similar papers:- Learning Proximal Operators: Using Denoising Networks for Regularizing Inverse Imaging Problems:link
- DESIGNING NEURAL NETWORK ARCHITECTURES USING REINFORCEMENT LEARNING:link
- NEURAL ARCHITECTURE SEARCH WITH REINFORCEMENT LEARNING:link
- Deep Convolutional Neural Network for Inverse Problems in Imaging:link
- Deep ADMM-Net for Compressive Sensing MRIlink
- A bolglink
- Adaptive Acceleration of Sparse Coding via Matrix Factorization(2017ICLR)link

- (
**Optional**)Find the proof of No Free Lunch Theorem.

**Meterial**

**Meterial**

- Bayesian Dropout
- Information Theory In Deep Learning(Information Bottleneck)

- Neural Transfer:link
- Like What You Like: Knowledge Distill via Neuron Selectivity Transfer
- Improving training of deep neural networks via Singular Value Bounding
- Deep Residual Networks and Weight Initialization
- express power of neural netowrklink
- Data-Driven Sparse Structure Selection for Deep Neural Networks
- Riemannian approach to batch normalization
- Comparison of Batch Normalization and Weight Normalization Algorithms for the Large-scale Image Classification
- Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks

**Meterial**

**Meterial**

- Johnson R, Zhang T. Accelerating stochastic gradient descent using predictive variance reduction NIPS2013
- Chaudhari P, Choromanska A, Soatto S, et al. Entropy-SGD: Biasing Gradient Descent Into Wide Valleys ICLR2017
- Hardt M, Recht B, Singer Y. Train faster, generalize better: Stability of stochastic gradient descent ICML2015
- Train longer, generalize better: closing the generalization gap in large batch training of neural networks NIPS2017
- Keskar N S, Mudigere D, Nocedal J, et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima ICLR2017
- Dinh L, Pascanu R, Bengio S, et al. Sharp Minima Can Generalize For Deep Nets ICML2017

Deep residual networks construced by Kaiming He wins the champion of ImageNet challange at 2015. It conqures the difficulty of training will the network becomes deep. It is a hot and interesting topic, this discussion will be hosted by Yiping Lu(SMS).

Link |
Paper Title |

link | Deep Residual Learning for Image Recognition |

link | Identity Mappings in Deep Residual Networks |

link | Learing identity mappings with residual gates |

link | Demystifying ResNet |

link | Aggregated Residual Transformations for Deep Neural Networks:code |

published later.

- A talk made by Yiping Lu at Prof.Dong's Group.slide