Logistic regression
The Logistic Regression model was employed to predict the values in the "Healthy" column of a dataset consisting of four categorical columns ('Food preference', 'Smoker?', 'Living in?', 'Any hereditary condition?') and eleven numerical columns ('Specific ailments', 'Age', 'BMI', 'Follow Diet', 'Physical activity', 'Regular sleeping hours', 'Alcohol consumption', 'Social interaction', 'Taking supplements', 'Mental health management', 'Illness count last year'). To handle missing values, the columns ('Follow Diet', 'Physical activity', 'Regular sleeping hours', 'Alcohol consumption', 'Social interaction', 'Taking supplements', 'Mental health management', 'Illness count last year') were imputed with the median using a simple imputer, while the remaining columns were marked as 'missing'. Logistic Regression is a commonly used linear model for binary classification tasks. It estimates the probability of an instance belonging to a particular class based on the weighted sum of the input features. The model applies a logistic function (sigmoid) to the linear combination of features, mapping the continuous output to a binary class label. After training the Logistic Regression model on the dataset, the model's performance was evaluated using appropriate metrics such as accuracy, precision, recall, or F1 score. These metrics provide insights into the classifier's ability to accurately predict the "Healthy" values. The predictions made by the Logistic Regression model were saved in a CSV file named "submission.csv." This file contains the predicted values for the "Healthy" column based on the input data and the trained Logistic Regression model.
Tags:
#machine-learning
#classification