Automaticly Generating Annotator Rationales For Document-Based Text Categorization
Submitted By apple9204120
Words: 4015
Pages: 17
Automatically generating annotator rationales to improve sentiment classification
Ainur Yessenalina Yejin Choi Claire Cardie Department of Computer Science, Cornell University, Ithaca NY, 14853 USA {ainur, ychoi, cardie}@cs.cornell.edu
Abstract
One of the central challenges in sentimentbased text categorization is that not every portion of a document is equally informative for inferring the overall sentiment of the document. Previous research has shown that enriching the sentiment labels with human annotators’ “rationales” can produce substantial improvements in categorization performance (Zaidan et al., 2007). We explore methods to automatically generate annotator rationales for document-level sentiment classification. Rather unexpectedly, we find the automatically generated rationales just as helpful as human rationales.
1
Introduction
One of the central challenges in sentiment-based text categorization is that not every portion of a given document is equally informative for inferring its overall sentiment (e.g., Pang and Lee (2004)). Zaidan et al. (2007) address this problem by asking human annotators to mark (at least some of) the relevant text spans that support each document-level sentiment decision. The text spans of these “rationales” are then used to construct additional training examples that can guide the learning algorithm toward better categorization models. But could we perhaps enjoy the performance gains of rationale-enhanced learning models without any additional human effort whatsoever (beyond the document-level sentiment label)? We hypothesize that in the area of sentiment analysis, where there has been a great deal of recent research attention given to various aspects of the task (Pang and Lee, 2008), this might be possible: using existing resources for sentiment analysis, we might be able to construct annotator rationales automatically.
336
In this paper, we explore a number of methods to automatically generate rationales for documentlevel sentiment classification. In particular, we investigate the use of off-the-shelf sentiment analysis components and lexicons for this purpose. Our approaches for generating annotator rationales can be viewed as mostly unsupervised in that we do not require manually annotated rationales for training. Rather unexpectedly, our empirical results show that automatically generated rationales (91.78%) are just as good as human rationales (91.61%) for document-level sentiment classification of movie reviews. In addition, complementing the human annotator rationales with automatic rationales boosts the performance even further for this domain, achieving 92.5% accuracy. We further evaluate our rationale-generation approaches on product review data for which human rationales are not available: here we find that even randomly generated rationales can improve the classification accuracy although rationales generated from sentiment resources are not as effective as for movie reviews. The rest of the paper is organized as follows. We first briefly summarize the SVM-based learning approach of Zaidan et al. (2007) that allows the incorporation of rationales (Section 2). We next introduce three methods for the automatic generation of rationales (Section 3). The experimental results are presented in Section 4, followed by related work (Section 5) and conclusions (Section 6).
2
Contrastive Learning with SVMs
Zaidan et al. (2007) first introduced the notion of annotator rationales — text spans highlighted by human annotators as support or evidence for each document-level sentiment decision. These rationales, of course, are only useful if the sentiment categorization algorithm can be extended to exploit the rationales effectively. With this in mind, Zaidan et al. (2007) propose the following con-
Proceedings of the ACL 2010 Conference Short Papers, pages 336–341, Uppsala, Sweden, 11-16 July 2010. c 2010 Association for Computational Linguistics