The Optimal Parameters of Spline Regression for SNP-Set Analysis in Genome-Wide Association Study
Keywords:Sequence kernel association test, Generalized higher criticism, Permutation test, Spline regression, B-spline, GWAS
This research aims to develop a method that is capable and reliable for identifying significant regions in Genome-Wide Association Study based on Spline regression. We evaluate the optimal parameters in the Splines by smoothing and tuning p-values obtained from two methods, Sequence Kernel Association Test using normal weight (SKAT normal weight) and Generalized Higher Criticism (GHC) for testing SNP-set. False positive (FP) and True positive (TP) rates were evaluated under different genetic models for disease with significant thresholds adjusted for multiple hypothesis testing based on the permutation method. The simulated data used in this research are constructed from a control data set in a study of Crohn’s disease which is repeated 1,500 replicates for studies of size 3,000 cases and 3,000 controls. The simulation result shows that the optimal parameter in the Splines on the p-value of SKAT normal weight and GHC under the one disease SNP model simulation are at the degree of freedom 1,000. GHC is shown to be preferable in terms of comparing FP and TP rates but it is disadvantageous compared to SKAT in terms of computational burden time. Finally, the optimal parameter of both methods was applied to real data on Crohn’s disease. Both methods found the important regions of genes NOD2 which are strongly associated with the development and the importance of gene NOD2 which causes Crohn’s disease.