Video Transcript
Hey everyone, welcome to the session. We'll be covering convolution neural networks today. A lot of nice aspects to cover. We'll be covering some of the fundamental concepts with regard to what is a convolution neural network, what are the key layers involved in a convolution neural network. We'll be also looking at popular convolution neural network operations and also look at some hands-on examples in the form of specific layer operations as well as how do you use a CNN model for building an image classification system. The slide deck and code, I think the link has already been shared in the chat. So people who are in the chat, feel free to hit this link. You will get all the relevant material in terms of the slide deck as well as the notebooks which I'll be covering. So I would request you to focus on trying to understand the session because everything will be available to you for reference later on. Wonderful. Okay, so the session agenda for today, I'll just give a brief introduction about myself. Since Chanukya has already introduced me, I'll be covering how do you understand what is a convolution neural network, the structure, the different layers in it. We'll also look at the various layer operations in the form of what is a convolution layer, what's a kernel or a filter, what's spooling, what's dropout. And then we'll also briefly touch upon transfer learning in terms of what are pre-trained models and how do you leverage pre-trained models. And obviously, we'll be looking at some hands-on examples in the form of building image classification models as well as trying out specific convolution and pooling operations from scratch by implementing it using NumPy. And at the end, we'll briefly touch upon the popular applications of convolution neural networks as in what are the different types of models and how they are being used in the industry. So let's dive into the session. So we'll start with understanding what are convolution neural networks. So basically convolutional neural networks are also known as CNNs. And as you can see, this is a typical structure of a CNN where you can see it's a hierarchical structure with multiple layers, which are interconnected sequentially. And the overall perspective of a CNN is to understand how to learn hierarchical spatial features considering you're working on image data. And let's say if you're working on video data, there is the aspect of time. So it also tries to learn spatial temporal features in that perspective. There are two main aspects in a convolution neural network. Typically, it is built of convolution layers followed by pooling layers. And several of these pooling blocks are stacked one after the other, after which we have some fully connected dense layers followed by an output layer based on whether you're building a classification model, a regression model, or an object detection model, and so on. So convolution layers, and we'll be diving deeper into this shortly, this uses what is called as convolution filters or kernels to build feature maps. So the main task of a convolution layer is to perform feature extraction on source images and try to leverage these features to understand various patterns with regard to how is an image structured, what are the key components of an image, texture, edges, corners, and so on. And the pooling layers basically help in reducing the dimensionality after the convolution operation is performed, which means once you apply a convolution filter and extract what are known as feature maps from your source image, the pooling layer helps in downsampling or reducing the dimensionality, which means it helps in compression and also enhancing specific aspects of these feature maps. And one thing to remember is just like any neural network, as you may have had in this bootcamp so far, or learned in this bootcamp so far, or may have read by your own, there are nonlinear activation functions which are typically applied in a neural network because, so that it can learn more complex nonlinear patterns. And that is why there will be nonlinear activation functions like rectified linear units, which will be typically applied after every conv layer, pooling layer, and so on. Another interesting aspect is, again, this is optional, is often your model may end up overfitting on your training data so we have all heard about overfitting where your training performance is really good but your validation performance is really bad. So the whole perspective here is to prevent overfitting, you can use layers like dropout or batch normalization. And finally you end up with the fully connected layers or dense layers and typically you have an output layer which helps you in making the final prediction.