Breathomics Meets Deep Learning: A Novel Approach for Early Detection of Lung Cancer

6 ottobre 2025

A. Catino, A. Di Gilio, J. Palmisani, M. Nisi, V. Pizzillo, T. Guarino, S. Pisconti, L. Bellantuono, R. Bellotti, D. Diacono, R. Tommasi, A. Lo Sasso, N. Varesano, P. Petrillo, E. Serra, M. Gesualdo, E.S. Montagna, M. Montrone, V. Longo, F. Pesola, P. Pizzutilo, I. Marech, G. De Gennaro, D. Galetta

IRCCS Istituto Tumori “Giovanni Paolo II”, Bari/IT
Apulian Regional Centre for the Breath Analysis, Department of Biosciences, Biotechnology and Environment, University of Bari, Bari/IT
Oncology Unit, Ospedale San Giuseppe Moscati, Taranto/IT
Department of Translational Biomedicine and Neuroscience (DiBraiN), University of Bari, Istituto Nazionale di Fisica Nucleare, Sezione di Bari, Bari/IT
Istituto Nazionale di Fisica Nucleare, Sezione di Bari, Dipartimento Interateneo di Fisica, University of Bari, Bari/IT
Istituto Nazionale di Fisica Nucleare, Sezione di Bari, Bari/IT
Department of Translational Biomedicine and Neuroscience (DiBraiN), University of Bari, Bari/IT

Introduction
Lung cancer (LC) is one of the most common malignant cancers worldwide representing the leading cause of death. Tobacco smoking is the main aetiological factor in lung carcinogenesis, but long-term exposure to air pollution and genetic susceptibility may play a role. In the last few decades , the diagnosis and therapies of LC have shown great improvement, but its clinical management is still challenging due to diagnosis mainly at advanced stages. Thus, there is an ever-increasing need to develop innovative, easy-to-use, and non-invasive diagnostic methodologies to implement within large-scale screening protocols to enable early detection and improve the management and clinical outcome . Recently, the volatile organic compounds (VOCs) analysis in exhaled breath, known as breathomics, has showed to be promising to identify disease-specific metabolic signatures. In this study, we analysed breath samples to detect and characterize by deep learning the breath fingerprint of patients with lung cancer.
Methods
195 participants were recruited : 114 patients affected by LC and 81 healthy controls (HC). The method is based on sampling of end-tidal breath directly onto two-bed adsorbent cartridges (Biomonitoring steel tubes, Markes International) by means of automated sampler Mistral (Predict srl). Ambient air samples (AA) were collected at each sampling . Breath samples were thermally desorbed (Unity Ultra-xr Markes) and analysed by Gas Chromatography/Mass Spectrometry (GC Agilent 7890/MS Agilent 5975). Ion channels were acquired across a mass-to-charge (m/z) range from 35 to 250 with unit mass resolution. The instrument operated at a scanning frequency of approximately 3.31 Hz, resulting in around 8805 retention time (RT) points over a period of 44 minutes. Each sample was represented by a matrix dimension of R × 216, where R ≈ 8805. . All experimental temporal scans have been preprocessed using Gaussian smoothing and Savitzky-Golay filtering to increase the signal-to-noise ratio. To reduce the huge amount of experimental data, for each subject, the 216×8805 preprocessed data points have been converted into 2D images in which each pixel is associated with a (m/z, RT) pair and whose color depends on the intensity of the chromatogram signal. Subsequently, these 2D images have been embedded in an Artificial Intelligence workflow to implement a classifier that can distinguish between patients with LC and healthy controls.
Results
A preliminary classification of the images using Decision Tree, Random Forest, and Convolutional Neural Network (CNN) algorithms, performing a 10-fold repeated 5-fold cross-validation, was obtained. The best results have been achieved with CNNs, with both a mean AUC and a balanced accuracy over 0.90 within errors over the 100 iterations of the workflow.
Conclusions
The proposed models demonstrated superior performance compared to conventional expert-driven methodologies, enabling more accurate and efficient classification between healthy individuals and patients with lung cancer, while maintaining high specificity. These findings highlight the potential of this data-driven approach to advance breath-based diagnostic strategies by improving group differentiation, optimizing analytical processes, and facilitating scalable and cost-effective implementation in clinical practice.

Articoli correlati

My Agile Privacy
Questo sito utilizza cookie tecnici e di profilazione. Cliccando su accetta si autorizzano tutti i cookie di profilazione. Cliccando su rifiuta o la X si rifiutano tutti i cookie di profilazione. Cliccando su personalizza è possibile selezionare quali cookie di profilazione attivare.
Attenzione: alcune funzionalità di questa pagina potrebbero essere bloccate a seguito delle tue scelte privacy