Plan of Action
Our plan was to use the model created by the University of Wisconsin with a different labeled dataset to determine the accuracy of the model. The original model was trained on a dataset of ~7000 tweets that were labeled for cyberbullying. For our test data, we used a labeled dataset from DataTurks of ~20,000 tweets that were labeled for cyber-aggression.
Extracting Data
In order to correctly use the program from the University of Wisconsin, we needed all the data on a line in a text file. However, the data we acquired from dataturks was a JSON file. In order to convert the JSON file into a form appropriate for the program, we wrote "parse.cpp" to extract the tweets that were moved to "tweets.txt" and the corresponding labels to "labels.txt".
Running the Experiment
We used the code provided by the University of Wisconsin, which consisted of two programs: BullyingV3.0 and bullyingtraceV2 which collected the data they used for the original analysis and the program that took tweets as inputs and classified them as cyberbullying. On further analysis, we determined that we were unable to acquire the original dataset since most of the tweets had either been deleted or were too old to acquire from twitters API. The code we used was a Java program that was written as a SVM with a Linear Kernel that was pre-trained on the data. After compiling their program, we were able to put tweets into a text file and run the program. The program was equipped with an optional enrichment program that filtered out data that did not meet a certain threshold of having aggressive words, however we decided to skip this portion as it would affect our data to label matching scheme. Finally, we were able to run the main classification program from the command line that took the input file of tweets and classified them as cyberbullying or not. After getting the results, we wrote "comparing.cpp" to see if the prediction matched the label. Using the dataset from DataTurks, we identified that the model provided by the University was able to correctly identify 11,229 out of 19,991 tweets, or an accuracy of 56.17%.
Analysis
After much consideration, we have attributed our low accuracy to the differences in the dataset. The originial dataset used by the University of Wisconsin was labeled for cyberbullying and had much more detail than the data from DataTurks that was just labeled for cyber-aggression. After examining the DataTurks dataset, we saw it was much more labeled for using harsh words rather than undermining another figure. We acknowledge that the labeled data can't be fully trusted, however finding labeled data for tweets was hard to come by.
For example, the program marked the following as cyberbullying "are you only acting nice to people now because you have to stay at brock?" when it was labeled as non-cyberaggressive.
For example, the program marked the following as cyberbullying "are you only acting nice to people now because you have to stay at brock?" when it was labeled as non-cyberaggressive.