Using Simple Statistics to Compare Genetic Sequences

  • Hossam Farag Abou-Shaara
Keywords: Parametric, non-parametric, significant, bioinformatics, phylogenetic


The aim of this study was to compare different sequences using simple statistics. Firstly, sequences of four viruses were downloaded from the National Center for Biotechnology Information (NCBI). Then, these sequences were arranged in Excel sheets as numbers, and subjected to the statistical analysis using parametric and non-parametric tests. The obtained results were compared with those obtained by the phylogenetic analysis and gene cluster analysis for these viruses. The results of the statistical analysis, from ANOVA and Kruskal-Wallis test, were similar to those of phylogenetic relationships and shared gene clusters. It was possible to get additional information from the sequences using simple statistics either using parametric or non-parametric tests. The results of this study could help software developer and bioinformatics specialists to develop simple analytical methods to acquire information from the sequences.


