IIT Guwahati Network Activity Anomaly Detection
Overview This analysis evaluates the performance of several machine learning models on the given dataset. The models include K-Nearest Neighbors (KNN), Logistic Regression, Decision Tree, Random Forest, Gradient Boosting Machine (GBM), Extreme Gradient Boosting (XGBM), AdaBoost, LightGBM, CatBoost, Naive Bayes, Voting Classifier, and Support Vector Machine (SVM). Both the training and test scores are considered to assess the accuracy and potential overfitting of each model. Model Performance K-Nearest Neighbors (KNN) Train Score: 0.999803 Test Score: 0.999808 Summary: KNN shows high accuracy on both training and test sets, indicating strong performance with minimal overfitting. Logistic Regression Train Score: 0.999638 Test Score: 0.999539 Summary: Logistic Regression performs excellently with slightly lower test accuracy, suggesting a well-generalized model. Decision Tree Train Score: 1.0 Test Score: 0.999463 Summary: The Decision Tree achieves perfect training accuracy but slightly lower test accuracy, indicating some overfitting. Random Forest Train Score: 1.0 Test Score: 0.999846 Summary: Random Forest provides excellent results with high accuracy on both datasets, but perfect training accuracy indicates potential overfitting. Gradient Boosting Machine (GBM) Train Score: 1.0 Test Score: 0.999655 Summary: GBM shows perfect training accuracy and slightly lower test accuracy, suggesting good performance with some overfitting. Extreme Gradient Boosting (XGBM) Train Score: 1.0 Test Score: 0.999731 Summary: XGBM achieves perfect training accuracy and high test accuracy, indicating strong performance but potential overfitting. AdaBoost Train Score: 1.0 Test Score: 0.999731 Summary: AdaBoost performs very well, with perfect training accuracy and high test accuracy, suggesting possible overfitting. LightGBM Train Score: 1.0 Test Score: 0.999846 Summary: LightGBM shows excellent performance with perfect training accuracy and high test accuracy, indicating potential overfitting. CatBoost Train Score: 1.0 Test Score: 0.999731 Summary: CatBoost achieves high accuracy on both datasets, with perfect training accuracy suggesting overfitting. Naive Bayes Train Score: 0.997352 Test Score: 0.997083 Summary: Naive Bayes performs well but slightly lower than other models, indicating good generalization. Voting Classifier Train Score: 1.0 Test Score: 0.999731 Summary: The Voting Classifier shows high accuracy but perfect training accuracy suggests overfitting. Support Vector Machine (SVM) Train Score: 0.999836 Test Score: 0.999808 Summary: SVM performs excellently with high accuracy on both datasets, indicating minimal overfitting. Conclusion Overall, all models performed exceptionally well, with test accuracies very close to perfect. However, several models show perfect training accuracy, indicating potential overfitting. Models like KNN, Logistic Regression, and SVM demonstrate excellent generalization with high test accuracy and slightly lower training accuracy, making them strong candidates for reliable predictions on unseen data. Models such as Decision Tree, Random Forest, and Boosting models, while showing high performance, should be further tuned to reduce overfitting.
Tags:
#datathon
#machine-learning