Word Embedding for Arabic Text Classification

dc.contributor.authorHabib, Titraoui
dc.contributor.authorSupervisor: Belkacem, Brahimi
dc.date.accessioned2025-07-08T08:55:47Z
dc.date.available2025-07-08T08:55:47Z
dc.date.issued2025-06-15
dc.description.abstractArabic text classification is a critical task in natural language processing (NLP) with broad applications across media, business, and education. The unique linguistic features of Arabic such as rich morphology, significant dialectal variation, and limited annotated resources pose substantial challenges for automated classification systems. This thesis investigates and compares the effectiveness of traditional machine learning models (such as SVM and Random Forest) and state-of-the-art BERT based deep learning models (such as AraBERT and QARiB) for Arabic text classification. A pre-existing, multi-dialectal dataset comprising 3,600 samples across 18 functional categories was developed and annotated by expert linguists to address the limitations of existing resources. The study evaluates each approach in terms of classification accuracy, robustness to dialectal and functional variations, and computational efficiency. Results demonstrate that while BERT-based models outperform traditional approaches in handling morphological complexity and contextual ambiguity, they require significantly greater computational resources. The findings highlight the importance of dataset diversity, dialectal representation, and resource considerations in developing robust Arabic text classification systems. This research contributes to the advancement of Arabic NLP by providing practical insights and recommendations for future model development and deployment.
dc.identifier.urihttps://depot.univ-msila.dz/handle/123456789/46758
dc.language.isoen
dc.publisherMohamed Boudiaf University of M'sila
dc.subjectArabic text classification
dc.subjectnatural language processing (NLP)
dc.subjectArabic morphology
dc.subjectdialectal variation
dc.subjectmachine learning
dc.subjecttraditional models (SVM
dc.subjectRandom Forest)
dc.subjectpre-trained BERT-based models (AraBERT
dc.subjectQARiB)
dc.subjectannotated multidialectal corpus
dc.subjectclassification accuracy
dc.subjectcontextual ambiguity
dc.subjectcomputational efficiency
dc.subjectArabic language resources
dc.titleWord Embedding for Arabic Text Classification
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Habib Titraoui.pdf
Size:
1.07 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description:

Collections