| Titre : | TUNING HYPERPARAMETERS OF CLASSIFICATION TECHNIQUES IN BREAST CANCER |  | Type de document :  | projet fin études |  | Auteurs :  | Souhaila Serbout, Auteur |  | Langues : | Anglais (eng) |  | Catégories :  | Ingénierie de web et Informatique mobile
  |  | Mots-clés :  | Machine Learning, Classification, Breast Cancer, Grid search, PSO, 
SMOTE. |  | Index. dĂ©cimale :  | 1912/18  |  | RĂ©sumĂ© :  | Abstract — In the context of classificaion, hyperparameter optimization is the problem 
of choosing a set of hyperparameters for a learning algorithm, usually with the goal of 
optimizing a measure of the algorithm’s performance on an independent data set. The 
work presented in this document assess this problem in the case of four classification 
models: K-nearest Neighbors Algorithm, Support Vector Machine, Multilayer Perceptron 
and Decision trees, for binary classification. Two approaches of searching has been applied: 
Grid Search (GS) and Particle Swarm Optimization (PSO), along with using the default 
parameters of the Weka Software in a third approach. 
The solution of the problem is based on implementing a desktop Java application 
that, firstly, allows to detect automatically the problem of imbalanced data in databases 
by using an approach that addresses this problem: the synthetic minority oversampling 
technique (SMOTE). The tool developed has as main functionality enabling users to use 
several classification models simultaneously and generate a table for evaluating classifiers. 
It also allows to use the GS and PSO methods, and they are both applied by considering 
the number of incorrectly classified instances of the classification model as a metric to 
select its optimal parameters. 
Four well-known databases of breast cancer have been used in order to perform a 
comparison between the tree approaches and the evaluation models performances: Breast 
Cancer Data set, Breast Cancer Wisconsin (original) Data set, Breast Cancer Wisconsin 
(Diagnosis) Data set, Breast Cancer Wisconsin (Prognostic) Data set. Next metrics are 
evaluated in order to calculate the performance: Accuracy, Precision and Recall. 
 |   
 
  			TUNING HYPERPARAMETERS OF CLASSIFICATION TECHNIQUES IN BREAST CANCER [projet fin Ă©tudes] /  Souhaila Serbout, Auteur . - [s.d.]. Langues : Anglais ( eng) | CatĂ©gories :  | IngĂ©nierie de web et Informatique mobile
  |  | Mots-clés :  | Machine Learning, Classification, Breast Cancer, Grid search, PSO, 
SMOTE. |  | Index. dĂ©cimale :  | 1912/18  |  | RĂ©sumĂ© :  | Abstract — In the context of classificaion, hyperparameter optimization is the problem 
of choosing a set of hyperparameters for a learning algorithm, usually with the goal of 
optimizing a measure of the algorithm’s performance on an independent data set. The 
work presented in this document assess this problem in the case of four classification 
models: K-nearest Neighbors Algorithm, Support Vector Machine, Multilayer Perceptron 
and Decision trees, for binary classification. Two approaches of searching has been applied: 
Grid Search (GS) and Particle Swarm Optimization (PSO), along with using the default 
parameters of the Weka Software in a third approach. 
The solution of the problem is based on implementing a desktop Java application 
that, firstly, allows to detect automatically the problem of imbalanced data in databases 
by using an approach that addresses this problem: the synthetic minority oversampling 
technique (SMOTE). The tool developed has as main functionality enabling users to use 
several classification models simultaneously and generate a table for evaluating classifiers. 
It also allows to use the GS and PSO methods, and they are both applied by considering 
the number of incorrectly classified instances of the classification model as a metric to 
select its optimal parameters. 
Four well-known databases of breast cancer have been used in order to perform a 
comparison between the tree approaches and the evaluation models performances: Breast 
Cancer Data set, Breast Cancer Wisconsin (original) Data set, Breast Cancer Wisconsin 
(Diagnosis) Data set, Breast Cancer Wisconsin (Prognostic) Data set. Next metrics are 
evaluated in order to calculate the performance: Accuracy, Precision and Recall. 
 |  
  |