Predictive Model for Crime Rate for Metropolitan Data
In this analysis, a Decision Tree Regressor model was trained and evaluated to predict the "crime_rate" based on the provided dataset. Here are the key steps and results: Data Loading: The dataset was loaded from a Google Sheets document using the "read_data" function. Data Preprocessing: The dataset was split into features (X) and the target variable (y). The "crime_rate" column was set as the target variable. Data Splitting: The data was split into training and testing sets using an 80:20 ratio. Feature Scaling: The features were scaled using StandardScaler to ensure all features have the same scale. Exploratory Data Analysis: Data summary statistics and visualizations were performed to gain insights into the dataset. Histograms were plotted to understand the distribution of each feature. Scatter plots were created to visualize the relationship between each feature and the target variable. A correlation heatmap was generated to examine the correlation between features. Modeling: A Decision Tree Regressor model was created and trained using the training data. Model Evaluation: The trained model was evaluated using the test data. Root Mean Squared Error (RMSE) and R-squared (R^2) were calculated to assess the model's performance. The RMSE represents the average difference between the predicted and actual values, while R^2 measures the proportion of variance explained by the model. Model Interpretation: Feature importances were computed from the trained model and visualized in a bar chart to identify the most important features in predicting the "crime_rate". Hyperparameter Tuning: GridSearchCV was employed to search for the best hyperparameters for the Decision Tree Regressor model. The provided parameter grid was used, and the best hyperparameters were determined. Cross-Validation: Cross-validation scores were calculated using the training data to estimate the model's performance. The mean cross-validation score and standard deviation were reported. Model Deployment: Predictions were made on the test dataset using the trained model. The RMSE and R^2 were calculated to evaluate the model's performance on the test data. Model Saving: The trained model was saved as "predictive_model_for_crime_rate.pkl" using the joblib library for future use. Based on the results, the model achieved a mean cross-validation score of -0.4416 with a standard deviation of 0.6122. On the test dataset, the model achieved an RMSE of 11.5438 and an R^2 of 0.2808. The feature importance analysis provided insights into the important features for predicting the "crime_rate". The saved model can be loaded and used for future predictions or further analysis.
Tags:
#python
#machine-learning
#regression
#intermediate