Naad_hackathon.ipynb

Step 1: Data Loading We began by loading the provided training and test datasets. The training dataset included network activity records labeled as either "normal" or "neptune" (indicating a SYN flood attack). Step 2: Data Preprocessing We separated the features from the target variable and mapped the target values to binary labels: 0 for normal activity and 1 for Neptune attacks. Step 3: Feature Engineering We identified categorical and numerical columns in the dataset. For preprocessing, we scaled numerical features using a standard scaler and one-hot encoded the categorical features to prepare them for the machine learning model. Step 4: Model Definition We created a machine learning pipeline combining the preprocessing steps with a logistic regression classifier. This approach ensured that all preprocessing and model training steps were streamlined and reproducible. Step 5: Model Training and Validation We split the training data into training and validation sets. The model was trained on the training set and evaluated on the validation set using classification metrics such as the F1 score. This step helped us gauge the model's performance before making predictions on the test set. Step 6: Prediction on Test Data We ensured that the test data had the same format as the training data. The trained model was then used to make predictions on the test data. Step 7: Creating Submission File To prepare the submission file, we loaded a sample submission file provided by the hackathon organizers, replaced the 'attack' column with our predictions, and saved the file in the required format for submission. Conclusion Following these steps, we developed a machine learning model that effectively detects network activity anomalies, achieving a high F1 score of 0.9999243284146804. The process involved structured data preprocessing, feature engineering, model training, evaluation, and preparation of the final submission file, ensuring our solution met the hackathon requirements.

7/3/2024
33 views

Tags:  

#python 

#machine-learning 

#regression 

#beginner