Arabic text classification using linear discriminant analysis

Fawaz S. Al-Anzi, DIa Abuzeina

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Scopus citations

Abstract

The linear discriminant analysis (LDA) is a dimensionality reduction technique that is widely used in pattern recognition applications. The LDA aims at generating effective feature vectors by reducing the dimensions of the original data (e.g. bag-of-words textual representation) into a lower dimensional space. Hence, the LDA is a convenient method for text classification that is known by huge dimensional feature vectors. In this paper, we empirically investigated two LDA based methods for Arabic text classification. The first method is based on computing the generalized eigenvectors of the ratio (between-class to within-class) scatters, the second method includes linear classification functions that assume equal population covariance matrices (i.e. pooled sample covariance matrix). We used a textual data collection that contains 1,750 documents belong to five categories. The testing set contains 250 documents belong to five categories (50 documents for each category). The experimental results show that the linear classification functions method outperforms the eigenvalue decomposition method. We emphasize that the goal of this work is to demonstrate how to employ the LDA algorithm in text classification rather than comparing the performance with other well-known text classification algorithms.

Original languageEnglish
Title of host publicationProceedings - 2017 International Conference on Engineering and MIS, ICEMIS 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1-6
Number of pages6
ISBN (Electronic)9781509067787
DOIs
StatePublished - 2 Jul 2017
Event2017 International Conference on Engineering and MIS, ICEMIS 2017 - Monastir, Tunisia
Duration: 8 May 201710 May 2017

Publication series

NameProceedings - 2017 International Conference on Engineering and MIS, ICEMIS 2017
Volume2018-January

Conference

Conference2017 International Conference on Engineering and MIS, ICEMIS 2017
Country/TerritoryTunisia
CityMonastir
Period8/05/1710/05/17

Keywords

  • Arabic
  • classification
  • eigenvectors
  • Fisher
  • linear discriminant analysis
  • text

Funding Agency

  • Kuwait Foundation for the Advancement of Sciences

Fingerprint

Dive into the research topics of 'Arabic text classification using linear discriminant analysis'. Together they form a unique fingerprint.

Cite this