Main Article Content
Cancer is a leading cause of death in the world. In 2020 World Health Organization (WHO) reported that approximately 10 million deaths caused by cancer and will increase for the coming years. This research paper aims to study the prediction of cancer epitope using machine learning for classifying between cancer cell surface and epitope on healthy cell surface. The comparison between the different machine learning algorithms is presented. This work can help to training T-cell for recognizing cancer cell and release enzyme to kill cancer cell (Targeted Therapy). The experiment results shown that imbalance data the model from Support Vector Machine (SVM) calculated based on Dipeptide Composition (DPC) feature achieved the best accuracy of 79% Sensitivity 16% and Specificity 100% on test dataset. While balance data with SMOTE Random Forest (RF) calculated based on Dipeptide Composition (DPC) feature achieved the best accuracy of 80% Sensitivity 28% and Specificity 96% on the same test dataset. In conclusion, Support Vector Machine (SVM) and Random Forest (RF) calculated based on Dipeptide Composition (DPC) feature can employ these models for predicting the cancer epitope in imbalance dataset and balanced dataset.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
I/we certify that I/we have participated sufficiently in the intellectual content, conception and design of this work or the analysis and interpretation of the data (when applicable), as well as the writing of the manuscript, to take public responsibility for it and have agreed to have my/our name listed as a contributor. I/we believe the manuscript represents valid work. Neither this manuscript nor one with substantially similar content under my/our authorship has been published or is being considered for publication elsewhere, except as described in the covering letter. I/we certify that all the data collected during the study is presented in this manuscript and no data from the study has been or will be published separately. I/we attest that, if requested by the editors, I/we will provide the data/information or will cooperate fully in obtaining and providing the data/information on which the manuscript is based, for examination by the editors or their assignees. Financial interests, direct or indirect, that exist or may be perceived to exist for individual contributors in connection with the content of this paper have been disclosed in the cover letter. Sources of outside support of the project are named in the cover letter.
I/We hereby transfer(s), assign(s), or otherwise convey(s) all copyright ownership, including any and all rights incidental thereto, exclusively to the Journal, in the event that such work is published by the Journal. The Journal shall own the work, including 1) copyright; 2) the right to grant permission to republish the article in whole or in part, with or without fee; 3) the right to produce preprints or reprints and translate into languages other than English for sale or free distribution; and 4) the right to republish the work in a collection of articles in any other mechanical or electronic format.
We give the rights to the corresponding author to make necessary changes as per the request of the journal, do the rest of the correspondence on our behalf and he/she will act as the guarantor for the manuscript on our behalf.
All persons who have made substantial contributions to the work reported in the manuscript, but who are not contributors, are named in the Acknowledgment and have given me/us their written permission to be named. If I/we do not include an Acknowledgment that means I/we have not received substantial contributions from non-contributors and no contributor has been omitted.
World Health Organization, “Cancer,” 21 September 2021. [Online]. Available: https://www.who.int/ news-room/fact-sheets/detail/cancer. [Accessed Sep. 20, 2021].
Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A., “Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries,’ CA Cancer J Clin, 68(6), pp. 394–424, 2018.
Shahid Akbar, Ateeq Ur Rahman, Maqsood Hayat, Mohammad Sohail, “cACP: Classifying anticancer peptides using discriminative intelligent model via Chou’s 5-step rules and general pseudo components,” Chemometrics and Intelligent Laboratory Systems, Volume 196, 103912, ISSN 0169-7439, 2020
L. Breiman, J. H. Friedman, R. Olshen and C. J. Stone, “Classification and Regression Trees,” Wadsworth International Group, Belmont, California, 1984.
Friedman, J. H., “Greedy Function Approximation: A Gradient Boosting Machine,” Annals of Statistics, 29, pp. 1189-1232, 2000.
Quinlan, J. R. (1986). Induction of decision trees. Machinelearning, 1(1), 81-106, 1986.
Quinlan, J. R. (1993). C4. 5: “programs for machine learning,” (Vol. 1). Morgan kaufmann, 1993.
Vita R, Mahajan S, Overton JA, Dhanda SK, Martini S, Cantrell JR, Wheeler DK, Sette A, Peters B., “The Immune Epitope Database (IEDB): 2018 update,” Nucleic Acids Res. 2018 Oct 24. doi: 10.1093/nar/gky1006. [Epub ahead of print] PubMed PMID: 30357391, 2018.
UniProt, “The universal protein knowledgebase,” Nucleic Acids Res. 45, D158–D169, 2016.
Peri S, Navarro JD, Kristiansen TZ, Amanchy R, Surendranath V, Muthusamy B, Gandhi TK, Chandrika KN, Deshpande N, Suresh S, et al: “Human protein reference database as a discovery resource for proteomics.” Nucleic Acids Res, 32 Database: D497-501, 2004.
Yu Wan1, Zhuo Wang and Tzong‑Yi Lee1, “Incorporating support vector machine with sequential minimal optimization to identify anticancer peptides: Wan et al. BMC Bioinformatics,” 22:286, 2021.
Wang, L.; Niu, D.; Wang, X.; Khan, J.; Shen, Q.; Xue, Y., “A Novel Machine Learning Strategy for the Prediction of Antihypertensive Peptides Derived from Food with High Efficiency.” Foods, 10, 550, 2021.
Akshara Pande, Sumeet Patiyal, Anjali Lathwal, Chakit Arora, Dilraj Kaur, Anjali Dhall, Gaurav Mishra, Harpreet Kaur, Neelam Sharma, Shipra Jain, Salman Sadullah Usmani, Piyush Agrawal, Rajesh Kumar, Vinod Kumar, Gajendra P.S.Raghava: “Computing wide range of protein/peptide features from their sequence and structure : biorxiv,” April 04 2019.
Onkar Singh, Wen‑Lian Hsu and Emily Chia‑Yu Su., “Co‑AMPpred for in silico‑aided predictions of antimicrobial peptides by integrating composition‑based features,” Singh et al. BMC Bioinformatics, 22:389, 2021.
Lei J, Sun L, Huang S, Zhu C, Li P, He J, Mackey V, Coy DH, He Q., “The antimicrobial peptides and their potential clinical applications,” Am J Transl Res. 11(7):3919–31, 2019.
Muthuirulan Pushpanathan, Paramasamy Gunasekaran and Jeyaprakash Rajendhran, Antimicrobial Peptides: Versatile Biological Properties,” Hindawi Publishing Corporation International Journal of Peptides, Volume 2013, Article ID 675391, 15 pages, http://dx.doi.org/10.1155/2013/675391, 2013.
Usmani SS, Bhalla S, Raghava GPS., “Prediction of antitubercular peptides from sequence information using ensemble classifier and hybrid features,” Front Pharmacol, 9:954, 2018.
Kao HJ, Nguyen VN, Huang KY, Chang WC, Lee TY., “SuccSite: incorporating amino acid composition and informative k‑spaced amino acid pairs to identify protein succinylation sites,” Genomics Proteomics Bioinform, 18(2):208–19, 2020.
Huang CH, Su MG, Kao HJ, Jhong JH, Weng SL, Lee TY., “UbiSite: incorporating two‑layered machine learning method with substrate motifs to predict ubiquitin‑conjugation site on lysines,” BMC Syst Biol, 10(Suppl 1):6, 2016.
Chen SA, Lee TY, Ou YY., “Incorporating significant amino acid pairs to identify O‑linked glycosylation sites on trans‑membrane proteins and non‑transmembrane proteins,” BMC Bioinform, 11:536, 2010.
Chou KC., “Prediction of protein cellular attributes using pseudo‑amino acid composition,” Proteins Struct Funct Bioinform. 43(3):246–55, 2001.
Chou K‑C., “Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes,” Bioinformatics, 21(1):10–9, 2005.
Hanley JA, McNeil BJ., “The meaning and use of the area under a receiver operating characteristic (ROC) curve,” Radiol‑ogy, 143(1), pp. 29–36, 1982.
Epitope human publications, 2021. [Online]. Available: https://github.com/manon089/Epitope_ human_papers.git. [Accessed Sep. 20, 2021].
Breiman, L., “Random Forests,” Machine Learning 45, 5–32, 2001.
Cortes, Corinna; Vapnik, Vladimir N., “Support-vector networks,” Machine Learning. 20 (3), pp. 273–297, 1995.
Sayamon Hongjaisee, Chanin Nantasenamat, Tanawan Samleerat Carraway, Watshara Shoombuatong, “HIVCoR: A sequence-based tool for predicting) HIV-1 CRF01_AE coreceptor usage,” Computational Biology and Chemistry, Volume 80, Pages 419-432, ISSN 1476-9271, 2019.