Titre : | TUNING HYPERPARAMETERS OF CLASSIFICATION TECHNIQUES IN BREAST CANCER | Type de document : | projet fin études | Auteurs : | Souhaila Serbout, Auteur | Langues : | Anglais (eng) | Catégories : | Ingénierie de web et Informatique mobile
| Mots-clés : | Machine Learning, Classification, Breast Cancer, Grid search, PSO,
SMOTE. | Index. dĂ©cimale : | 1912/18 | RĂ©sumĂ© : | Abstract — In the context of classificaion, hyperparameter optimization is the problem
of choosing a set of hyperparameters for a learning algorithm, usually with the goal of
optimizing a measure of the algorithm’s performance on an independent data set. The
work presented in this document assess this problem in the case of four classification
models: K-nearest Neighbors Algorithm, Support Vector Machine, Multilayer Perceptron
and Decision trees, for binary classification. Two approaches of searching has been applied:
Grid Search (GS) and Particle Swarm Optimization (PSO), along with using the default
parameters of the Weka Software in a third approach.
The solution of the problem is based on implementing a desktop Java application
that, firstly, allows to detect automatically the problem of imbalanced data in databases
by using an approach that addresses this problem: the synthetic minority oversampling
technique (SMOTE). The tool developed has as main functionality enabling users to use
several classification models simultaneously and generate a table for evaluating classifiers.
It also allows to use the GS and PSO methods, and they are both applied by considering
the number of incorrectly classified instances of the classification model as a metric to
select its optimal parameters.
Four well-known databases of breast cancer have been used in order to perform a
comparison between the tree approaches and the evaluation models performances: Breast
Cancer Data set, Breast Cancer Wisconsin (original) Data set, Breast Cancer Wisconsin
(Diagnosis) Data set, Breast Cancer Wisconsin (Prognostic) Data set. Next metrics are
evaluated in order to calculate the performance: Accuracy, Precision and Recall.
|
TUNING HYPERPARAMETERS OF CLASSIFICATION TECHNIQUES IN BREAST CANCER [projet fin études] / Souhaila Serbout, Auteur . - [s.d.]. Langues : Anglais ( eng) Catégories : | Ingénierie de web et Informatique mobile
| Mots-clés : | Machine Learning, Classification, Breast Cancer, Grid search, PSO,
SMOTE. | Index. dĂ©cimale : | 1912/18 | RĂ©sumĂ© : | Abstract — In the context of classificaion, hyperparameter optimization is the problem
of choosing a set of hyperparameters for a learning algorithm, usually with the goal of
optimizing a measure of the algorithm’s performance on an independent data set. The
work presented in this document assess this problem in the case of four classification
models: K-nearest Neighbors Algorithm, Support Vector Machine, Multilayer Perceptron
and Decision trees, for binary classification. Two approaches of searching has been applied:
Grid Search (GS) and Particle Swarm Optimization (PSO), along with using the default
parameters of the Weka Software in a third approach.
The solution of the problem is based on implementing a desktop Java application
that, firstly, allows to detect automatically the problem of imbalanced data in databases
by using an approach that addresses this problem: the synthetic minority oversampling
technique (SMOTE). The tool developed has as main functionality enabling users to use
several classification models simultaneously and generate a table for evaluating classifiers.
It also allows to use the GS and PSO methods, and they are both applied by considering
the number of incorrectly classified instances of the classification model as a metric to
select its optimal parameters.
Four well-known databases of breast cancer have been used in order to perform a
comparison between the tree approaches and the evaluation models performances: Breast
Cancer Data set, Breast Cancer Wisconsin (original) Data set, Breast Cancer Wisconsin
(Diagnosis) Data set, Breast Cancer Wisconsin (Prognostic) Data set. Next metrics are
evaluated in order to calculate the performance: Accuracy, Precision and Recall.
|
|