Titre : | Content Analysis from Audio and Video Data | Type de document : | projet fin études | Auteurs : | Fatima Zohra Daha / Miloud Aqqa, Auteur | Année de publication : | 2016 | Langues : | Anglais (eng) | Catégories : | Business Intelligence
| Mots-clés : | Machine learning, Crowds’ understanding, Image processing, Audio analysis,
Classification, Deep learning. | Index. décimale : | 1568/16 | Résumé : | This document is a summary of our work done as part of our graduation project that we accomplished
within University of Houston. Our work consists on exploring two applications
of machine learning that involve video and audio data.
In the first part of this work, we aim to understand humans’ activities in crowded scenes
through the extraction of pertinent parameters from crowds’ videos.
Different tasks have been involved to achieve the project’s goal. Initially, a literature
study was required to find out the state of the art in crowds’ understanding. After that,
we designed our system’s architecture based on the pertinent parameters that will help
understanding crowds’ behaviors. Then, we had to create our own dataset. Finally, we
computed these parameters through suitable methods for our project.
In the second part of this work, our aim is to build an iPhone application that calm down
the crying baby by automatically playing lullabies.
This project was developed through a complete learning process. We propose using
Mel-frequency cepstral coefficients (MFCCs) to represent the spectral information. For carrying
out this work, ’Baby Cry’ and ’No Cry’ sound signals collected from YouTube were
used. Support Vector Machines (SVMs) were used to capture the discriminative information
with respect to above mentioned binary classification problem. Evaluations on the
created dataset show that our Baby Cry Recognition system can achieve a recognition performance
around 99%. In the final step, an iPhone application, named iCry, was designed
and developed.
|
Content Analysis from Audio and Video Data [projet fin études] / Fatima Zohra Daha / Miloud Aqqa, Auteur . - 2016. Langues : Anglais ( eng) Catégories : | Business Intelligence
| Mots-clés : | Machine learning, Crowds’ understanding, Image processing, Audio analysis,
Classification, Deep learning. | Index. décimale : | 1568/16 | Résumé : | This document is a summary of our work done as part of our graduation project that we accomplished
within University of Houston. Our work consists on exploring two applications
of machine learning that involve video and audio data.
In the first part of this work, we aim to understand humans’ activities in crowded scenes
through the extraction of pertinent parameters from crowds’ videos.
Different tasks have been involved to achieve the project’s goal. Initially, a literature
study was required to find out the state of the art in crowds’ understanding. After that,
we designed our system’s architecture based on the pertinent parameters that will help
understanding crowds’ behaviors. Then, we had to create our own dataset. Finally, we
computed these parameters through suitable methods for our project.
In the second part of this work, our aim is to build an iPhone application that calm down
the crying baby by automatically playing lullabies.
This project was developed through a complete learning process. We propose using
Mel-frequency cepstral coefficients (MFCCs) to represent the spectral information. For carrying
out this work, ’Baby Cry’ and ’No Cry’ sound signals collected from YouTube were
used. Support Vector Machines (SVMs) were used to capture the discriminative information
with respect to above mentioned binary classification problem. Evaluations on the
created dataset show that our Baby Cry Recognition system can achieve a recognition performance
around 99%. In the final step, an iPhone application, named iCry, was designed
and developed.
|
|