Learning Objectives
- Statistics Refresher
- Cost Function
- Gradient Descent
Statistics Refresher
-
Mean: The mean is the average of a data set. For example, take a list of numbers: 10, 20, 40, 10, 70
Mean = (10 + 20 + 40 + 10 + 70) / 5 = 30 -
Median: The median is the middle of the set of numbers. To find the median, first we sort the list of numbers: 10, 10, 20, 40, 70
The exact middle number, i.e., 20, is the median. -
Mode: The mode is the most common number in a data set.
In the above list of numbers, 10 has occurred two times while the other three numbers occurred one time each. So, the mode is 10 here. -
Range: The difference between the highest and lowest values in the data set.
For a given list of numbers: 10, 20, 40, 10, 70 the range is 70 - 10 = 60. -
Variance: The average of the squared differences from the mean. Steps to calculate variance:
- Calculate mean (mean is nothing but average)
- Find the difference of each data from mean
- Square all the differences
- Take the average of the squares.
-
Standard Deviation: It shows you how much your data is spread around the mean. Its symbol is
(the greek letter sigma). It is the square root of the variance. Therefore, the variance is often represented as .
Normal Distribution
Unimodal and Multimodal Distribution
Coming back to Linear Regression
This will be in continuation of our yesterday's topic!
Cost Function
What?
Now that we built a model, we need to measure its performance, right? and understand if it works well or not. The cost function measures the performance of a Machine Learning model for given data. It quantifies the error between predicted and expected values and presents it as a single real number.
Depending on the problem, the Cost Function can be formed in many different ways. The purpose of Cost Function is to be either:
- Minimized - the returned value is usually called cost, loss, or error. The goal is to find the values of model parameters for which Cost Function returns as small a number as possible.
- Maximized - the value it yields is named a reward. The goal is to find values of model parameters for which returned number is as large as possible.
- Predicted value: As the name says is the predicted value of your machine learning model.
- Expected value: Is the actual value(or the label present in your data)
Often machine learning models are not 100% accurate or perfect; they tend to deviate from the actual or expected value.
Explaining with an example: If we are predicting a person's age based on a few input variables or features.
- Our machine learning model predicted the age as 28 years
- However, the actual age of the person is 29 years.
- Here 28 years is the predicted value and 29 years is the expected or true value. As data scientists, we try to minimize errors while building models.
The difference between the actual value and the model's predicted value is called residual.
Cost Function Types/Evaluation Metrics
There are three primary metrics used to evaluate linear models (to find how well a model is performing):
- Mean Squared Error:
- Root Mean Squared Error
- Mean Absolute Error
- MSE is simply the average of the squared difference between the true target value and the value predicted by the regression model.
- As it squares the differences, it penalizes (gives some penalty or weight for deviating from the objective) even a tiny error which leads to over-estimation of how bad the model is.
Root Mean Squared Error (RMSE)
- It is just the square root of the mean square error.
- It is preferred in some cases because the errors are first squared before averaging, which poses a high penalty on large errors. This implies that RMSE is useful when large errors are undesired.
Mean Absolute Error(MAE)
- MAE is the absolute difference between the target value and the value predicted by the model.
- MAE does not penalize the errors as effectively as MSE, making it unsuitable for use-cases where you want to pay more attention to the outliers.
Gradient Descent
- As Data Scientists, we always want to optimize our algorithms and go for the best ones. Gradient Descent is one of those optimizers that help us do this!
- Gradient Descent is an optimization technique that minimizes the cost function in the machine learning process. Every machine learning algorithm has a cost function.
- For now, we are not getting too much into how it works. We will learn about it as we proceed. Below is the link to a video for further understanding: https://www.youtube.com/watch?v=vsWrXfO3wWw
Recap
- Linear regression is used to predict a value (like the sale price of a house).
- Given a set of data, first try to fit a line to it.
- The cost function tells you how good your line is.
- You can use gradient descent to find the best line.
Optional Reading Material
If you are interested to learn more about gradient descent, refer to the below video: