Study of different Statistical Machine Learning Techniques for Text Sentiment Classification

Document Type : Original Article

Authors

1 Teaching Assistant, Electrical Engineering Department, Faculty of Engineering, Fayoum University, Fayoum 63514, Egypt

2 Professor of Electrical Engineering - Faculty of Engineering - Fayoum University - Fayoum 63514, Egypt

Abstract

Text classification is an important task in NLP for various applications from movie review classification to market analysis. NLP as a tool provides the capability to process huge amount of text and come up with conclusions. In this paper we inves-tigate statistical machine learning for NLP for document classification. The target problem of choice is sentiment analysis, we explore various techniques for text pre-processing, feature selection and model selection to find a good fit model. This paper acts as both a system proposal and also a primer for those who to start practicing NLP, we try to provide insight and intuition about modelling choices for text classi-fication that extend even beyond the task scope to general NLP. In this paper we propose a feature based text sentiment analysis relying heavily of the BoN (Bag of N-grams) model and utilizing these features with a statistical ML classifier. We use the IMDB movie review dataset (Maas et al. 2011) for benchmarking.

Keywords

Main Subjects