The Optimal Parameters of Spline Regression for SNP-Set Analysis in Genome-Wide Association Study

Sirikanlaya  Sookkhee; Pianpool  Kirdwichai; Fazil  Baksh

PDF

Published: Mar 16, 2021

Keywords:

Sequence kernel association test Generalized higher criticism Permutation test Spline regression B-spline GWAS

Sirikanlaya Sookkhee

Department of Applied Statistics, Faculty of Applied Science, King Mongkut’s University of Technology of North Bangkok, Bangkok 10800, Thailand

Pianpool Kirdwichai

Department of Applied Statistics, Faculty of Applied Science, King Mongkut’s University of Technology of North Bangkok, Bangkok 10800, Thailand

Fazil Baksh

Department of Mathematics and Statistics, School of Mathematics, University of Reading RG6 6AH, UK

Abstract

This research aims to develop a method that is capable and reliable for identifying significant regions in Genome-Wide Association Study based on Spline regression. We evaluate the optimal parameters in the Splines by smoothing and tuning p-values obtained from two methods, Sequence Kernel Association Test using normal weight (SKAT normal weight) and Generalized Higher Criticism (GHC) for testing SNP-set. False positive (FP) and True positive (TP) rates were evaluated under different genetic models for disease with significant thresholds adjusted for multiple hypothesis testing based on the permutation method. The simulated data used in this research are constructed from a control data set in a study of Crohn’s disease which is repeated 1,500 replicates for studies of size 3,000 cases and 3,000 controls. The simulation result shows that the optimal parameter in the Splines on the p-value of SKAT normal weight and GHC under the one disease SNP model simulation are at the degree of freedom 1,000. GHC is shown to be preferable in terms of comparing FP and TP rates but it is disadvantageous compared to SKAT in terms of computational burden time. Finally, the optimal parameter of both methods was applied to real data on Crohn’s disease. Both methods found the important regions of genes NOD2 which are strongly associated with the development and the importance of gene NOD2 which causes Crohn’s disease.

How to Cite

Sookkhee, S. ., Kirdwichai, P. ., & Baksh, F. . (2021). The Optimal Parameters of Spline Regression for SNP-Set Analysis in Genome-Wide Association Study. Science & Technology Asia, 26(1), 39–52. retrieved from https://ph02.tci-thaijo.org/index.php/SciTechAsia/article/view/240351

Issue

Vol.26 No.1 (January-March 2021)

Section

Physical sciences

Article Sidebar

Main Article Content

Abstract

Article Details