Distance Measures for Mixed Data with Application in Cluster Analysis

Authors

  • พิชญา บุตรขุนทอง
  • อัครินทร์ ไพบูลย์พานิช

Keywords:

quantitative variable, nominal variable, ordinal variable, mixed data, distance

Abstract

This study presents comparison of performance of cluster analysis through Partitioning Around Medoids algorithm, for mixed data which contains numerical, nominal, and ordinal variables, using different types of distance measures: Kaufman and Rousseeuw distance (KR) and Podani distance (P) (both are applied from Gower’s similarity), and two newly proposed distance measures: one is a combination between KR and Noorbehbahani et al. distance (KR&N) and the other is a combination between P and Noorbehbahani et al. distance (P&N). Mixed data were simulated with equal and unequal frequency of nominal and ordinal variables. In case of unequal frequency data, the clustering using KR&N distance gives better result. However, in case of equal frequency data, the clustering using different four distances shows similar efficiency.

Author Biographies

พิชญา บุตรขุนทอง

ภาควิชาสถิติ คณะพาณิชยศาสตร์และการบัญชี จุฬาลงกรณ์มหาวิทยาลัย

อัครินทร์ ไพบูลย์พานิช

ภาควิชาสถิติ คณะพาณิชยศาสตร์และการบัญชี จุฬาลงกรณ์มหาวิทยาลัย

References

Everitt, B. S., Landau, S., Leese, M., & Stahl, D. (2011). Cluster Analysis. 5th ed. London: A John Wiley and Sons, Ltd., Publication.

Gower, J. C. (1971). A general coefficient of similarity and some of its properties. Biometrics, 27 (4), 857-871.

Kaufman, L. & Rousseeuw, P. J. (1990). Finding groups in data: An introduction to cluster analysis. USA: A Wiley-Interscience Publication.

Madhulatha, T. S. (2011). Comparison between K-Means and K-Medoids Clustering Algorithms. Communications in Computer and Information Science, 198, 472-481.

Noorbehbahani, F. Mousavi, S. R., & Mirzaei, A. (2014). An incremental mixed data clustering method using a new distance measure. Springer-Verlag Berlin Heidelberg,

Podani, J. (1999). Extending Gower’s general coefficient of similarity to ordinal characters, International Association for Plant Taxonomy, 48 (2), 331-340.

Rousseeuw, P. J. (1987). Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, (20), 53-65.

Downloads

Published

2019-01-03

How to Cite

บุตรขุนทอง พ., & ไพบูลย์พานิช อ. (2019). Distance Measures for Mixed Data with Application in Cluster Analysis. Journal of Applied Statistics and Information Technology, 1(1), 31–45. retrieved from https://ph02.tci-thaijo.org/index.php/asit-journal/article/view/164670