Abstract
Sentiment Analysis has become the go-to machine learning-based technique for understanding, decoding, and analysing millions of textual data items written by consumers. It is an efficient method to process subjective reviews into a computer-understandable language such as 1s and 0s. In today’s research world, sentiment analysis has been tested over several different machine learning models but with our study, we have put a bunch of those models alongside each other and further used a Voting Classifier to determine the best. To increase the diversity and accuracy of our models, we have created an umbrella dataset constituting three popular datasets. We further created a more focused set of 75,000 entries from this larger dataset by developing a randomization algorithm. The supervised learning algorithms we have designed and tested are Random Forest, Extra Tree Classifier, Decision Tree, Logistic Regression, and XG Boosting. In our experiment, we were able to draw some good observations and conclusions. We used a word cloud library, often used for visual representation to further enhance the understanding of our data. Finally, we were able to conclude that the Extra Tree Classifier has the best accuracy compared to the other models while the Voting Classifier reported the best precision given the use case, we set up for this research study.
Keyword
Extra Tree Classifier, Machine Learning, Sentiment Analysis, Supervised Algorithms, Voting Classifier
PDF Download (click here)
|