Network Activity Anomaly Detection
A look at what my 'final result' strategy would be: 1. Preprocessing : It consisted of importing the dataset and an initial EDA (exploratory data analysis) to know a little more about its structure, contents that were in. Which included checking missing values, displaying data statistics and encoding the categorical variables (protocoltype, flag) using LabelEncoder. To convert the service column to numerical values, I used a hash function. 2. Feature Engineering : I performed train_test_split to split the data between training and validation set, after that Numerical features were scaled using StandardScaler. 3. Created a neural network with TensorFlow/Keras : The model architecture was - 1 dense layer with 64 units and ReLU activation, another dense layer of size (32,ReLu) then finally the last output as a binary sigmoid unit. 4. Model Compilation and Training : The Adam optimizer was used to compile the model with a binary cross-entropy loss function. I ran this model for 5 epochs in training along with accuracy and loss measures on both test set over a batch size of 32. 5. I used Matplotlib to visualize the training and validation accuracy, as well as loss when evaluating my model (overfitting/underfitting). 6. Model performance Metrics : After training I used this model run to get the F1 score in validation for precision and Recall. 7. Test Set Predictions : applied preprocess steps using the same language from training rounds to this new incoming dataset of tests. And so I made predictions on the test data and also converted it into a submission file (submission.csv). This consisted of complete data cleaning, creating and training a neural network model, testing the performance on validation metrics and finally using it to predict unseen test data.
Tags:
#python
#machine-learning
#deep-learning