A micro-word based approach for Arabic sentiment analysis

Fawaz S. Al-Anzi, Dia Abuzeina

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

10 Scopus citations

Abstract

Sentiment analysis of social networks data has recently received a great deal of attention. Social networks are characterized by uncommon language that is different when compared with the standard format of the language. Hence, there is a demand for effective methods to analyze the huge volume of the new word variants that quickly and daily show up in the digital and online world. In text classification, vector space model (VSM) is based on the vocabulary list (i.e. the entire training set words) while ignoring the odd words, which leads to partial loss of textual information. To address this challenge, we propose to use each two-neighboring letters of the word as a basic feature unit instead of using the word itself. That is, instead of using words in VSM, we propose a new method that is based on decomposing each word into a sequence of micro-words, each of which has only two consecutive letters. Two data collections were employed to investigate the performance. The data collections include common (i.e. standard form) and uncommon Arabic text (obtained from Instagram). For the common text, we used a corpus that contains 1,500 documents for training and 500 documents for testing. The proposed method was evaluated using latent semantic indexing (LSI) for textual features and cosine similarity measure for classification. The experimental results show promising results as the proposed method correctly classifies the testing set documents with an accuracy up to 83.6%.

Original languageEnglish
Title of host publicationProceedings - 2017 IEEE/ACS 14th International Conference on Computer Systems and Applications, AICCSA 2017
PublisherIEEE Computer Society
Pages910-914
Number of pages5
ISBN (Electronic)9781538635810
DOIs
StatePublished - 2 Jul 2017
Event14th IEEE/ACS International Conference on Computer Systems and Applications, AICCSA 2017 - Hammamet, Tunisia
Duration: 30 Oct 20173 Nov 2017

Publication series

NameProceedings of IEEE/ACS International Conference on Computer Systems and Applications, AICCSA
Volume2017-October
ISSN (Print)2161-5322
ISSN (Electronic)2161-5330

Conference

Conference14th IEEE/ACS International Conference on Computer Systems and Applications, AICCSA 2017
Country/TerritoryTunisia
CityHammamet
Period30/10/173/11/17

Keywords

  • Arabic text
  • Classification
  • Cosine similarity measure. Latent semantic indexing
  • Sentiment analysis
  • Social networks

Funding Agency

  • Kuwait Foundation for the Advancement of Sciences

Fingerprint

Dive into the research topics of 'A micro-word based approach for Arabic sentiment analysis'. Together they form a unique fingerprint.

Cite this