Ended on 9th Nov'20 03:30 PM (Coordinated Universal Time)
Data Sprint #13: Cyberbullying
Identify cyberbullying comments
338
Hard
Challenge Starts
06 Nov 03:30 pm
Registration Ends
09 Nov 03:30 pm
Challenge Ends
09 Nov 03:30 pm
Context
What is cyberbullying?
Cyberbullying, also known as cyberharassment, is a form of bullying or harassment which happens over electronic media (or over the internet). It is also known as online bullying.
It has become increasingly common as the digital sphere has expanded and technology has advanced.

Cyberbullying is when someone bullies or harasses others on the internet and in other digital spaces, particularly on social media sites. Harmful bullying behavior can include posting rumors, threats, sexual remarks, a victims' personal information, or pejorative labels (i.e. hate speech) with the intention of causing embarrassment or humiliation. Bullying or harassment can be identified by repeated behavior and an intent to harm. Victims of cyberbullying may experience lower self-esteem, increased suicidal ideation, and a variety of negative emotional responses including being scared, frustrated, angry, or depressed.
Problem Statement
The world of the internet receives thousands of new posts and comments on a daily basis from all over the globe. It is practically impossible for platforms (websites, forums, social media sites, etc.) to manually moderate these comments in order to identify cyberbullying and take appropriate actions.
Objective
Your objective here is to build a machine learning model that would identify comments that are cyberbullying.
Evaluation Criteria
Submissions are evaluated using F1 score.

How do we do it?
Once we release the data, anyone can download it, build a model, and make a submission. We give competitors a set of data (
training data
) with both the independent and dependent variables.
We also release another set of data (
test dataset
) with just the independent variables, and we hide the dependent variable that corresponds with this set. You submit the predicted values of the dependent variable for this set and we compare it against the actual values.
The predictions are evaluated based on the evaluation metric defined in the datathon.
