Video Transcript
And now the question might arise that, why do we need several layers? Why do we need several filters? Which I've been talking about so far. So one of the reasons why we need several layers, as you remember, I talked about it in the beginning, is CNN is a hierarchical layered neural network. The reason is so that it can learn a hierarchy of features where each type of feature or each set of features represents very specific aspects about the image. So what that means is, assume you have an input image here. So the convolution layers, which are typically closer to the input image, they will learn very low level features, which could include things like just some edges or just some very specific textures, as you can see here. Then what happens is, once you take these feature maps and input it to further convolution layers, it tries to combine multiple edges together to form corners, let's say. It tries to combine, let's say, circular objects and tries to identify very specific aspects, like let's say the eye of the lion and so on. And in this way, once you pass these mid-level features into more convolution layers, which are known as the deeper layers, which are closer to the output layer, they will start learning larger structures, as you can see, like a head or a leg, so on and so forth. So basically in this way, it learns a hierarchy of features and specific layers learn specific type of features and get activated based on specific types of inputs. And the reason why we need multiple filters is because we just don't want to learn one edge or one corner. We want to learn multiple aspects about the same image. And that is why we use multiple filters to extract different features from the same image at any specific convolution layer. So that's the reason why we need several layers and filters.