A Comparison of Hadoop Distributions – Cluster Installation and Management Aspects

Main Article Content

Araya Florence
Thanisa Numnonda


Big data is one of most promising technology which works with Cloud computing and Internet of Everything. Every second, data generated from billions of devices are sending to the cloud to be analysed and probably used for prediction or prevention in various applications. Big data platform is a foundation of its implementation to provide an ecosystem that data can be imported, processed and exported. This article reports a comparison of three different platforms; Apache Hadoop, Cloudera (Express), and Hortonworks in the aspects of stability, installation and cluster management. Apache Spark was chosen to test processing of all three distributions since it is ten times faster than Hive and 100 times faster than MapReduce. In addition, HiBench was chosen to be used as a testing benchmark, results were previously reported in [1]. In the aspects of cluster management, commercial based distributions are more likely to offer a better tool for installation and cluster management while Apache Hadoop is robust but lacking manageability.


Article Details

บทความวิจัย (Research Article)


