Exploring volatile organic compounds in breath for high-accuracy prediction of lung cancer

Ping Hsien Tsou, Zong Lin Lin, Yu Chiang Pan, Hui Chen Yang, Chien Jen Chang, Sheng Kai Liang, Yueh Feng Wen, Chia Hao Chang, Lih Yu Chang, Kai Lun Yu, Chia Jung Liu, Li Ta Keng, Meng Rui Lee, Jen Chung Ko*, Guan Hua Huang, Yaw-Kuen Li

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

30 Scopus citations


(1) Background: Lung cancer is silent in its early stages and fatal in its advanced stages. The current examinations for lung cancer are usually based on imaging. Conventional chest X-rays lack accuracy, and chest computed tomography (CT) is associated with radiation exposure and cost, limiting screening effectiveness. Breathomics, a noninvasive strategy, has recently been studied extensively. Volatile organic compounds (VOCs) derived from human breath can reflect metabolic changes caused by diseases and possibly serve as biomarkers of lung cancer. (2) Methods: The selected ion flow tube mass spectrometry (SIFT-MS) technique was used to quantitatively analyze 116 VOCs in breath samples from 148 patients with histologically confirmed lung cancers and 168 healthy volunteers. We used eXtreme Gradient Boosting (XGBoost), a machine learning method, to build a model for predicting lung cancer occurrence based on quantitative VOC measurements. (3) Results: The proposed prediction model achieved better performance than other previous approaches, with an accuracy, sensitivity, specificity, and area under the curve (AUC) of 0.89, 0.82, 0.94, and 0.95, respectively. When we further adjusted the confounding effect of environmental VOCs on the relationship between participants’ exhaled VOCs and lung cancer occurrence, our model was improved to reach 0.92 accuracy, 0.96 sensitivity, 0.88 specificity, and 0.98 AUC. (4) Conclusion: A quantitative VOCs databank integrated with the application of an XGBoost classifier provides a persuasive platform for lung cancer prediction.

Original languageEnglish
Article number1431
Pages (from-to)1-14
Number of pages14
Issue number6
StatePublished - 2 Mar 2021


  • Breath analysis
  • Lung cancer
  • Machine learning
  • Volatile organic compounds
  • XGBoost


Dive into the research topics of 'Exploring volatile organic compounds in breath for high-accuracy prediction of lung cancer'. Together they form a unique fingerprint.

Cite this