Friday, August 2, 2013

KMeans on Categorical and Mixed Data Types

Below is a link to an article on performing KMeans on Australian Credit Dataset. A mixture of cosine distance and euclidian distance was used, and KMeans utilizes both with customizable weights to find clusters. Australian set has 2 clusters that are marked with + and -, and our success rate on this set was 82%. The language of the article is not English, but the code can be helpful for someone. Data is here.


