Projects
Turkish Sentiment Analysis
Title: A Novel Approach to Rule Based Turkish Sentiment Analysis Using Sentiment Lexicon
Duration: 15.09.2015 - 15.03.2017 (18 Months)
Supporter: TUBITAK (The Scientific and Technological Research Council of Turkey)
Grant No: 115E440
Abstract:
Although the supervised approach performs well on the sentiment classification task, the availability of sentiment annotated data is known as a limitation for this approach. On the other hand, the term-based features like bag-of-words or n-grams cannot make more progress on the performance of this approach in cases that the sentiments of several texts are presented by more ambiguous words or phrases. This is important because natural language is ambiguous. In this condition, the sentiment lexicons play important role in sentiment analysis systems. This is considerable for supervised approach since these lexicons can be used in extracting more effective features along with term-based ones. However, despite the successful performance of using these lexicons in English sentiment analysis systems, they cannot be employed in a new language due to the lack of such lexical resources. This project proposes an automatic translation approach to create a sentiment lexicon for a new language from available English resources. In this approach, an automatic mapping is generated from a sense-level resource to a word-level by applying a triple unification process. This process produces a single polarity score for each term by incorporating all sense polarities. The major idea is to deal with the sense ambiguity during the lexicon transfer and provide a general sentiment lexicon for languages like Turkish which do not have a freely available machine-readable dictionary. On the other hand, the translation quality is critical in the lexicon transfer due to the ambiguity problem. Thus, this project also proposes a multiple bilingual translation approach to find the most appropriate equivalents for the source language terms. In this approach, three parallel, series and hybrid algorithms are used to integrate the translation results. Finally, three lexicons are achieved for the target language with different sizes. The generated lexicons are used in a rule-based sentiment classification process and compared with the supervised approach.
Coordinator: |
Prof. Dr. Ebru Akçapinar Sezer Computer Engineering Department Hacettepe University, Ankata – Turkey Phone: +90 (312) 297 7500 (138) E-Mail: ebru[at]hacettepe.edu.tr |
Research assistants |
Alaettin Uçan E-Mail: aucan[at]hacettepe.edu.tr Phone: +90 (312) 297 7500 |
Behzad Naderalvojoud E-Mail: n.behzad[at]hacettepe.edu.tr Phone: - |