Morphological Profiling of Lung Cancer Through Explainable Machine Learning

Main Article Content

Karuna Tanthanawarakun
Prompong Sugunnasil

Abstract

Lung cancer continues to be the predominant cause of cancer-related mortality globally, representing 11.6% of all cancer cases as reported by WHO, with approximately 2.2 million new diagnoses and 1.8 million fatalities recorded in 2020. For lung cancer diagnosis, Computed Tomography (CT) imaging serves as a critical tool in identifying both solid and subsolid ("ground glass") nodules. Although CT image segmentation has demonstrated significant clinical value, healthcare practitioners require a comprehensive understanding of the underlying algorithmic mechanisms to ensure diagnostic precision. This research investigates the application of interpretable machine learning methods for feature extraction from lung CT imaging data. We conduct a comparative analysis between transparent and opaque classification algorithms utilizing a comprehensive dataset comprising 1,229 normal and 1,010 abnormal pulmonary CT scans. The processed data undergoes evaluation using both interpretable models (including Logistic Regression, Decision Trees, and K-Nearest Neighbor) and black-box models (such as Multi-Layer Perceptrons, Convolutional Neural Networks, and Support Vector Machines). Our findings indicate that interpretable algorithms consistently outperform their black-box counterparts across multiple metrics. The evaluation framework incorporates accuracy, F1 score, precision, recall, computational efficiency, and resource utilization measurements. Results demonstrate exceptional classification accuracy for pulmonary malignancy detection while preserving explanatory capability, thereby providing clinicians with both practical and transparent diagnostic assistance. This investigation contributes to the development of accountable artificial intelligence systems for deployment in mission-critical healthcare environments.

Article Details

Section
Academic Articles

References

Takenaka, M.; Hanagiri, T.; Shinohara, S.; Kuwata, T.; Chikaishi, Y.; Oka, S.; Tanaka, F. The prognostic significance of HER2 overexpression in non-small cell lung cancer. Abbreviated Journal Name 2011.

Oser, M. G.; Niederst, M. J.; Sequist, L. V.; Engelman, J. A. Molecular drivers and cells of origin in the transformation from non-small-cell lung cancer to small-cell lung cancer. Lancet Oncol. 2015, 16(4), E165-E172. https://doi.org/10.1016/S1470-2045(14)71180-5

de Castro, J.; Rodríguez, M. C.; Martínez-Zorzano, V. S.; Sánchez-Rodríguez, P.; Sánchez-Yagüe, J. Erythrocyte fatty acids as potential biomarkers in the diagnosis of advanced lung adenocarcinoma, lung squamous cell carcinoma, and small cell lung cancer. Abbreviated Journal Name 2014, 142(1), 111-120. https://doi.org/10.1309/AJCP1QUQQLLT8BLI

Yu, L.; Tao, G.; Zhu, L.; Wang, G.; Li, Z.; Ye, J.; Chen, Q. Prediction of pathologic stage in non-small cell lung cancer using machine learning algorithm based on CT image feature analysis. Abbreviated Journal Name 2019. https://doi.org/10.1186/s12885-019-5646-9

Singh, G. A. P.; Gupta, P. K. Performance analysis of various machine learning-based approaches for detection and classification of lung cancer in humans. Abbreviated Journal Name 2019.

Kadir, T.; Gleeson, F. Lung cancer prediction using machine learning and advanced imaging techniques. Abbreviated Journal Name 2018, 7(3). https://doi.org/10.21037/tlcr.2018.05.15

Elish, M. C. The stakes of uncertainty: developing and integrating machine learning in clinical care. Abbreviated Journal Name 2018. https://doi.org/10.1111/1559-8918.2018.01213

Tjoa, E.; Guan, C. A survey on explainable artificial intelligence (XAI): Toward medical XAI. IEEE Transactions on Neural Networks and Learning Systems 2020, 32(11). https://doi.org/10.1109/TNNLS.2020.3027314

Doshi-Velez, F.; Kim, B. Towards a rigorous science of interpretable machine learning. Abbreviated Journal Name 2017.

Balagurunathan, Y.; Kumar, V.; Gu, Y.; Kim, J.; Wang, H.; Liu, Y.; Goldgof, D. B.; Hall, L. O.; Korn, R.; Zhao, B.; Schwartz, L. H.; Basu, S.; Eschrich, S.; Gatenby, R. A.; Gillies, R. J. Test–Retest Reproducibility Analysis of Lung CT Image Features. Med. Phys. 2014, 41(5), 2405–2427.

Paing, M. P.; Choomchuay, S. Ground glass opacity (GGO) nodules detection from lung CT scans. Abbreviated Journal Name 2017. https://doi.org/10.1109/ISESD.2017.8253338

Ramalho, G. L. B.; Rebouças Filho, P. P.; Medeiros, F. N. S. D.; Cortez, P. C. Lung disease detection using feature extraction and extreme learning machines. Abbreviated Journal Name 2014. https://doi.org/10.1590/rbeb.2014.019

Abdillah, B.; Bustamam, A.; Sarwinda, D. Image processing based detection of lung cancer on CT scan images. Abbreviated Journal Name 2017. https://doi.org/10.1088/1742-6596/893/1/012063

Keshani, M.; Azimifar, Z.; Tajeripour, F.; Boostani, R. Lung nodule segmentation and recognition using SVM classifier and active contour modeling: A complete intelligent system. Abbreviated Journal Name 2013, 43(4), 287-300. https://doi.org/10.1016/j.compbiomed.2012.12.004

Faisal, M. I.; Bashir, S.; Khan, Z. S.; Khan, F. H. An evaluation of machine learning classifiers and ensembles for early stage prediction of lung cancer. Abbreviated Journal Name 2018. https://doi.org/10.1109/ICEEST.2018.8643311

Günaydin, Ö.; Günay, M.; Şengel, Ö. Comparison of Lung Cancer Detection Algorithms. In Proceedings of the 2019 Scientific Meeting on Electrical-Electronics & Biomedical Engineering and Computer Science (EBBT), April 2019; IEEE, 2019; pp 1–4. https://doi.org/10.1109/EBBT.2019.8741826

Tonekaboni, S.; Joshi, S.; McCradden, M. D.; Goldenberg, A. What clinicians want: contextualizing explainable machine learning for clinical end use. Abbreviated Journal Name 2019.

Kumar, D.; Wong, A.; Clausi, D. A. Lung nodule classification using deep features in CT images. Abbreviated Journal Name 2015. https://doi.org/10.1109/CRV.2015.25

Thallam, C.; Peruboyina, A.; Raju, S. S. T.; Sampath, N. Early stage lung cancer prediction using various machine learning techniques. In 2020 4th International Conference on Electronics, Communication and Aerospace Technology (ICECA); IEEE: City, Country, 2020; pp 1285–1292. https://doi.org/10.1109/ICECA49313.2020.9297576

Li, X.; Shen, L.; Xie, X.; Huang, S.; Xie, Z.; Hong, X.; Yu, J. Multi-resolution convolutional networks for chest X-ray radiograph-based lung nodule detection. Artificial Intelligence in Medicine 2020, 103, 101744. https://doi.org/10.1016/j.artmed.2019.101744

Daisy, T. White Box vs. Black Box Algorithms in Machine Learning. ActiveState, 2023. https://www.activestate.com/ (accessed 2023-07-09).

Kareem, H. F.; AL-Husieny, M. S.; Mohsen, F. Y.; Khalil, E. A.; Hassan, Z. S. Evaluation of SVM Performance in the Detection of Lung Cancer in Marked CT Scan Dataset. Indonesian J. Electr. Eng. Comput. Sci. 2021, 21(3), 1731. https://doi.org/10.11591/ijeecs.v21.i3.pp1731-1738

Guidotti, R.; Monreale, A.; Ruggieri, S.; Turini, F.; Giannotti, F.; Pedreschi, D. A Survey of Methods for Explaining Black Box Models. ACM Comput. Surv. 2018, 51(5), 1–42. https://doi.org/10.1145/3236009

Armato III, S. G.; McLennan, G.; Bidaut, L.; McNitt‐Gray, M. F.; Meyer, C. R.; Reeves, A. P.; Clarke, L. P. The lung image database consortium (LIDC) and image database resource initiative (IDRI): a completed reference database of lung nodules on CT scans. Abbreviated Journal Name 2011.

Aria, M.; Ghaderzadeh, M.; Asadi, F.; Jafari, R. Lung CT Scan Dataset from Tehran Teaching Hospitals. Mendeley Data, 2021. https://doi.org/10.17632/hn6vr7r5cm.1 (accessed 2023-07-09).

Maleki, N.; Zeinali, Y.; Niaki, S. T. A. A k-NN method for lung cancer prognosis with the use of a genetic algorithm for feature selection. Abbreviated Journal Name 2021, 164, 113981. https://doi.org/10.1016/j.eswa.2020.113981

Radhika, P. R.; Nair, R. A.; Veena, G. A comparative study of lung cancer detection using machine learning algorithms. Abbreviated Journal Name 2019.

Li, D.; Fu, Z.; Xu, J. Stacked-autoencoder-based model for COVID-19 diagnosis on CT images. Abbreviated Journal Name 2021, 51, 2805-2817. https://doi.org/10.1007/s10489-020-02002-w