Abstract
Sentiment analysis has attracted signi?cant research attention both in academia and industry due to the exponential growth of sentimental big data on the Internet, especially for short text such as product reviews and micro-blogging sites. Unlike formal text, informal text is very short in length, which contains many abbreviations, misspellings and special symbols (e.g. hashtag, emoticon, lol, etc.). This leads to two research questions in order to construct an ef?cient system for sentiment prediction on short texts. • Q1. How can we handle with the noisy nature of this informal data? • Q2. Which learning methods are the most suitable to solve the task? Traditionally, the problem has been handled by using either discrete machine-learning mod-els, which are learned from manually-de?ned sparse features, or rule-based approaches, which employ many external lexical resources (e.g. lexicon, negation, part-of-speech pattern, etc.). However, manually extracting features and de?ning tailored rules are time-consuming and inef-?cient for short texts. Over the past few years, neural network models have received increasing attention in most sub areas of sentiment analysis, giving highly promising results. A main reason is the capability of neural models to automatically learn dense features capturing subtle seman-tic information from raw inputs, which is dif?cult to model using traditional discrete features based on words and n-gram patterns. In this work, we study different neural network approaches to address two above-mentioned questions in different sub areas of sentiment analysis on short texts. The main contributions of this dissertation are three-folds. First, we release polyglot sentiment lexicons, which are often employed as an important factor to build a sentiment classi?er. Secondly, we design the state-of-the-art target-dependent classi?er on Twitter by automatically extracting rich neural features. Thirdly, we construct a neural network structure, which is competitive to cutting-edge methods on both short and long texts. We conclude that low-dense neural features and neural network methods are pro?cient to cope with the challenges of short texts.