Earn 20 XP

Machine Learning Classification Vs. Regression


Data Science Modeling Process


Problem Solving

  • Define Objective or understand the problem statement
  • Data Requirements
  • Data Collection
  • Exploratory Data Analysis
  • Data Pre-processing
  • Build a model
    • Understand whether it is a regression or classification problem
  • Evaluate
  • Optimise
  • Production
  • Monitor
  • You keep Optimising it every now and then

Objective/Problem Statement

  • The goal of the model is to predict whether a passenger survived or not in the Titanic disaster, given their age, class, and a few other features.



  • We have the data!

Understanding the Data

  • PassengerId - this is just a generated Id
  • Pclass - which class did the passenger ride - first, second, or third
  • Name - self-explanatory
  • Sex - male or female
  • Age
  • SibSp - were the passenger's spouse or siblings with them on the ship
  • Parch - were the passenger's parents or children with them on the ship
  • Ticket - ticket number
  • Fare - ticket price
  • Cabin
  • Embarked - port of embarkation
  • Survived - did the passenger survive the sinking of the Titanic?

Explore the data

Omitting Irrelevant Variables/Columns

  • You shouldn't drop columns or variables just like that! Unless there is a strong premise.
  • Id, port, cabin, and name

Split the data into train and test


Model Building - Decision Tree

  • Now, what is this decision tree?
  • Well, we all might have seen it by now!
  • Decision Tree Examples



Now, what next?


Let's do it!

Model Evaluation

  • Evaluate on test dataset to check the performance!
  • Well, we build a model on historical data and expose them to new data that we would see in the future. Technically they will be exposed to unseen data

Overfitting - Underfitting



Model Evaluation

We are not done yet. We can improve it significantly.

How? It will follow in due course!

What else can be done in general?

  • Feature Selection
  • Cross-validation
  • Applying different ML Models
  • Hyperparameter tuning, etc.

And as data scientists, we must keep optimizing and building better models that derive meaningful results.