Smart Data Augmentation using Generative Adversarial Networks for Rare Oncological Disease Classification

Authors

  • Rahul Vadisetty Electrical engineering, Wayne State University, Detroit, MI, USA
  • Himanshu Suyal School of Computer Science Engineering and Technology, Bennett University, Greater Noida, India

DOI:

https://doi.org/10.63503/j.ijssic.2025.122

Keywords:

Generative Adversarial Networks, Synthetic Medical Data, Rare Oncological Diseases, Data Augmentation, Deep Learning, cGAN, WGAN, Classification

Abstract

Generative Adversarial Networks (GANs) have become a powerful tool for generating synthetic data, and they fill in the important gap concerning the lack or imbalance of real data with respect to healthcare applications. Annotated data for rare oncological diseases that would allow training machine learning models is not available. A modern system based on GANs, which uses conditional GANs (cGANs) and Wasserstein, is described here. The goal is to extend existing datasets and improve the outcomes of classifications for rare diseases. This is achieved by extensive preprocessing, the introduction of noise to avoid overfitting, and carefully executed validation procedures after synthesis to retain biological consistency and statistical coherence. Based on the experimental results presented, classifiers trained on augmented data produce much better sensitivity, specificity, and F1 scores than the baseline models, provided that the classes are significantly imbalanced. This study uses heatmap correlation analysis and distributional assessments between synthetic and real samples to measure data realism within a modular framework that fuses adversarial training and strict validation of synthetic data for augmentation in rare cases. Outcomes of the study support the idea that GAN-generated datasets offer a promising way to improve robust diagnostic models, thus addressing the data shortage that is rampant in oncology research. This research broadens the use of GANs in synthesising medical data, which enriches the growing toolkit of computational approaches to strengthen the early detection and categorisation of rare cancers that benefit from data-based techniques.

References

[1] Aggarwal, A., Mittal, M., & Battineni, G. (2021). Generative adversarial network: An overview of theory and applications. International Journal of Information Management Data Insights, 1(1), 100004. https://doi.org/10.1016/j.jjimei.2020.100004

[2] Alajaji, S. A., Khoury, Z. H., Elgharib, M., Saeed, M., Ahmed, A. R., Khan, M. B., ... & Sultan, A. S. (2024). Generative adversarial networks in digital histopathology: current applications, limitations, ethical considerations, and future directions. Modern Pathology, 37(1), 100369. https://doi.org/10.1016/j.modpat.2023.100369

[3] Sun, C., van Soest, J., & Dumontier, M. (2023). Generating synthetic personal health data using conditional generative adversarial networks combining with differential privacy. Journal of Biomedical Informatics, 143, 104404. https://doi.org/10.1016/j.jbi.2023.104404

[4] Hussain, J., Båth, M., & Ivarsson, J. (2025). Generative adversarial networks in medical image reconstruction: A systematic literature review. Computers in Biology and Medicine, 191, 110094. https://doi.org/10.1016/j.compbiomed.2025.110094

[5] Makhlouf, A., Maayah, M., Abughanam, N., & Catal, C. (2023). The use of generative adversarial networks in medical image augmentation. Neural Computing and Applications, 35(34), 24055-24068. https://doi.org/10.1007/s00521-023-09100-z

[6] Wang, R., Bashyam, V., Yang, Z., Yu, F., Tassopoulou, V., Chintapalli, S. S., ... & Davatzikos, C. (2023). Applications of generative adversarial networks in neuroimaging and clinical neuroscience. Neuroimage, 269, 119898. https://doi.org/10.1016/j.neuroimage.2023.119898

[7] Kazeminia, S., Baur, C., Kuijper, A., Van Ginneken, B., Navab, N., Albarqouni, S., & Mukhopadhyay, A. (2020). GANs for medical image analysis. Artificial intelligence in medicine, 109, 101938. https://doi.org/10.1016/j.artmed.2020.101938

[8] Osuala, R., Kushibar, K., Garrucho, L., Linardos, A., Szafranowska, Z., Klein, S., ... & Lekadir, K. (2023). Data synthesis and adversarial networks: A review and meta-analysis in cancer imaging. Medical Image Analysis, 84, 102704. https://doi.org/10.1016/j.media.2022.102704

[9] Pezoulas, V. C., Zaridis, D. I., Mylona, E., Androutsos, C., Apostolidis, K., Tachos, N. S., & Fotiadis, D. I. (2024). Synthetic data generation methods in healthcare: A review on open-source tools and methods. Computational and structural biotechnology journal. https://doi.org/10.1016/j.csbj.2024.07.005

[10] Lee, J., Jung, D., Moon, J., & Rho, S. (2025). Advanced R-GAN: Generating anomaly data for improved detection in imbalanced datasets using regularized generative adversarial networks. Alexandria Engineering Journal, 111, 491-510. https://doi.org/10.1016/j.aej.2024.10.084

[11] Lim, W., Yong, K. S. C., Lau, B. T., & Tan, C. C. L. (2024). Future of generative adversarial networks (GAN) for anomaly detection in network security: A review. Computers & Security, 139, 103733. https://doi.org/10.1016/j.cose.2024.103733

[12] Sirisha, U., Kumar, C. K., Narahari, S. C., & Srinivasu, P. N. (2025). An Iterative PRISMA Review of GAN Models for Image Processing, Medical Diagnosis, and Network Security. Computers, Materials & Continua, 82(2). https://doi.org/10.32604/cmc.2024.059715

[13] Cai, Z., Poulos, R. C., Liu, J., & Zhong, Q. (2022). Machine learning for multi-omics data integration in cancer. Iscience, 25(2). https://doi.org/10.1016/j.procs.2025.04.515

[14] Liu, Z., Zhu, L., Roberts, R., & Tong, W. (2019). Toward clinical implementation of next-generation sequencing-based genetic testing in rare diseases: where are we?. Trends in Genetics, 35(11), 852-867. https://doi.org/10.1016/j.tig.2019.08.006

[15] Luschi, A., Tognetti, L., Cartocci, A., Cevenini, G., Rubegni, P., & Iadanza, E. (2025). Advancing synthetic data for dermatology: GAN comparison with multi-metric and expert validation approach. Health and Technology, 1-10. https://doi.org/10.1007/s12553-025-00971-x

[16] Onakpojeruo, E. P., Mustapha, M. T., Ozsahin, D. U., & Ozsahin, I. (2024). A comparative analysis of the novel conditional deep convolutional neural network model, using conditional deep convolutional generative adversarial network-generated synthetic and augmented brain tumor datasets for image classification. Brain Sciences, 14(6), 559. doi: 10.3390/brainsci14060559

[17] Erfanian, N., Heydari, A. A., Feriz, A. M., Iañez, P., Derakhshani, A., Ghasemigol, M., ... & Sahebkar, A. (2023). Deep learning applications in single-cell genomics and transcriptomics data analysis. Biomedicine & Pharmacotherapy, 165, 115077. https://doi.org/10.1016/j.biopha.2023.115077

[18] Tohka, J., & Van Gils, M. (2021). Evaluation of machine learning algorithms for health and wellness applications: A tutorial. Computers in Biology and Medicine, 132, 104324. https://doi.org/10.1016/j.compbiomed.2021.104324

[19] Mårtensson, P., Fors, U., Wallin, S. B., Zander, U., & Nilsson, G. H. (2016). Evaluating research: A multidisciplinary approach to assessing research practice and quality. Research Policy, 45(3), 593-603. https://doi.org/10.1016/j.respol.2015.11.009

[20] Mumuni, A., & Mumuni, F. (2022). Data augmentation: A comprehensive survey of modern approaches. Array, 16, 100258. https://doi.org/10.1016/j.array.2022.100258

Downloads

Published

2025-05-23

How to Cite

Rahul Vadisetty, & Himanshu Suyal. (2025). Smart Data Augmentation using Generative Adversarial Networks for Rare Oncological Disease Classification. International Journal on Smart & Sustainable Intelligent Computing, 2(2), 65–79. https://doi.org/10.63503/j.ijssic.2025.122

Issue

Section

Research Articles