Network Activity Anomaly Detection

Objective: Classify network activities as either normal (0) or Neptune attacks (1) using a machine learning model. Dataset: Training Set: 86,845 rows, includes both independent variables and the target variable (attack). Test Set: 21,712 rows, includes only independent variables, target variable (attack) is missing. Target Variable: normal (0) - Normal network activity. neptune (1) - Neptune (SYN flood) attack. Steps Taken: Data Preparation: Loaded and merged train.csv and test.csv. One-Hot Encoded categorical features (protocoltype, service, flag). Scaled numerical features using StandardScaler. Feature and Target Separation: Separated features (X_train) and target (y_train) in the training set. Handling Imbalanced Data: Used SMOTE to oversample the minority class in the training set. Train-Test Split for Validation: Split the training data into training and validation sets for model evaluation. Model Training: Trained a RandomForestClassifier on the training data. Model Evaluation: Evaluated the model using F1 score, precision, and recall on the validation set. Achieved satisfactory performance metrics. Prediction on Test Set: Ensured the attack column was not present in the test data before making predictions. Generated predictions for the test set. Submission Preparation: Created a sample-submission.csv file with the predicted values of the target variable (attack) for the test set. Evaluation Metrics: F1 Score Precision Recall The submission file final-submission.csv contains predictions for the test set in the required format.

7/5/2024
12 views

Tags:  

#python 

#classification 

#machine-learning 

#intermediate