Titre : | Multilingual Sentiment Analysis | Type de document : | projet fin études | Auteurs : | Basma ESSATOUTI, Auteur | Langues : | Anglais (eng) | Catégories : | Internet des Objets et Services Mobiles ( IOSM )
| Mots-clés : | Sentiment Analysis, Moroccan Dialect, Arabic Language, Deep Learning,
Convolutional Neural Network, Language Identication. | Index. décimale : | mast 71/18 | Résumé : | Sentiment analysis is one of the most important applications of NLP, and with
Social Network being an integral part of people's everyday life all around the world,
meeting multilingual content has become more frequent than it had ever been.
This multilingual aspect of social networks manifests itself more in the Moroccan
context where for diverse political and especially cultural reasons, the use of multiple
languages is part of the speaking and thus writing habits of its people. Indeed, the
Moroccan web content is a mixture of multiple languages most importantly MSA,
French, English and Moroccan Dialect. So being able to cover all these languages is
essential in the capturing of the polarity of the opinions expressed in this part of
the world.
In this perspective, a sentiment analysis study focused on the polarity detection
is conducted on Moroccan Facebook comments illustrating this linguistic diversity.
For the well-functioning of the elaborated polarity detection process, a language
identication system based on n-grams statistical model on a character level is
developed in order to distinguish between the languages involved in a given comment.
The system achieved good results in the identication of Arabic and Moroccan
Dialect written in Arabic letters in one hand, and on the other hand, between
French, English and Moroccan Dialect written in Latin letters using Deep Neural
Network classier in both cases.
Using those resulting language identication models, and in order to ingather all the
polarity information from all the present languages, a matrix representation based
on the n-grams model on a word level is presented to represent comments subjects of
this study. A CNN model is then trained using those matrices for polarity detection
purposes and was able to correctly predict the polarity in 71% of the cases.
|
Multilingual Sentiment Analysis [projet fin études] / Basma ESSATOUTI, Auteur . - [s.d.]. Langues : Anglais ( eng) Catégories : | Internet des Objets et Services Mobiles ( IOSM )
| Mots-clés : | Sentiment Analysis, Moroccan Dialect, Arabic Language, Deep Learning,
Convolutional Neural Network, Language Identication. | Index. décimale : | mast 71/18 | Résumé : | Sentiment analysis is one of the most important applications of NLP, and with
Social Network being an integral part of people's everyday life all around the world,
meeting multilingual content has become more frequent than it had ever been.
This multilingual aspect of social networks manifests itself more in the Moroccan
context where for diverse political and especially cultural reasons, the use of multiple
languages is part of the speaking and thus writing habits of its people. Indeed, the
Moroccan web content is a mixture of multiple languages most importantly MSA,
French, English and Moroccan Dialect. So being able to cover all these languages is
essential in the capturing of the polarity of the opinions expressed in this part of
the world.
In this perspective, a sentiment analysis study focused on the polarity detection
is conducted on Moroccan Facebook comments illustrating this linguistic diversity.
For the well-functioning of the elaborated polarity detection process, a language
identication system based on n-grams statistical model on a character level is
developed in order to distinguish between the languages involved in a given comment.
The system achieved good results in the identication of Arabic and Moroccan
Dialect written in Arabic letters in one hand, and on the other hand, between
French, English and Moroccan Dialect written in Latin letters using Deep Neural
Network classier in both cases.
Using those resulting language identication models, and in order to ingather all the
polarity information from all the present languages, a matrix representation based
on the n-grams model on a word level is presented to represent comments subjects of
this study. A CNN model is then trained using those matrices for polarity detection
purposes and was able to correctly predict the polarity in 71% of the cases.
|
|