Title | K-Means Clustering on Multiple Correspondence Analysis Coordinates |
Authors | Phan, Le, Liu, Hongzhe and Tortora, Cristina |
Year | 2019 |
Volume | Archives of Data Science, Series B 1(1) / 2019 |
Abstract | On April 18, 2017, the International Federation of Classification Societies (IFCS) issued a challenge to its members and the classification community to analyze a data set of 928 low back pain patients. In this paper, we present our contribution in terms of a cluster analysis of this data set. We will discuss our data cleaning process, which we view as a two-pronged approach: inferring values that are missing not at random and imputing values that are missing at random. We will also discuss the challenges in clustering mixed data types and the required data transformation prior to applying a clustering algorithm. We call our proposed data transformation process split-then-join. Finally, we offer our interpretation of the clustering results with respect to validation variables and we present some thoughts on selecting important variables to classify new observations. |