Unbalanced data processing using oversampling: machine Learning

Nowadays, the DL algorithms show good results when used in the solution of different problems which present similar characteristics as the great amount of data and high dimensionality. However, one of the main challenges that currently arises is the classification of high dimensionality databases, w...

Full description

Autores:
amelec, viloria
Pineda Lezama, Omar Bonerge
Mercado Caruso, Nohora Nubia
Tipo de recurso:
Article of journal
Fecha de publicación:
2020
Institución:
Corporación Universidad de la Costa
Repositorio:
REDICUC - Repositorio CUC
Idioma:
eng
OAI Identifier:
oai:repositorio.cuc.edu.co:11323/7655
Acceso en línea:
https://hdl.handle.net/11323/7655
https://doi.org/10.1016/j.procs.2020.07.018
https://repositorio.cuc.edu.co/
Palabra clave:
Imbalance of classes
Microarray databases
Genetic expression
Deep learning techniques
Rights
openAccess
License
CC0 1.0 Universal
Description
Summary:Nowadays, the DL algorithms show good results when used in the solution of different problems which present similar characteristics as the great amount of data and high dimensionality. However, one of the main challenges that currently arises is the classification of high dimensionality databases, with very few samples and high-class imbalance. Biomedical databases of gene expression microarrays present the characteristics mentioned above, presenting problems of class imbalance, with few samples and high dimensionality. The problem of class imbalance arises when the set of samples belonging to one class is much larger than the set of samples of the other class or classes. This problem has been identified as one of the main challenges of the algorithms applied in the context of Big Data. The objective of this research is the study of genetic expression databases, using conventional methods of sub and oversampling for the balance of classes such as RUS, ROS and SMOTE. The databases were modified by applying an increase in their imbalance and in another case generating artificial noise.