Description
A brief introduction

K-tree is a tree structured clustering algorithm. It is also refered to as a Tree Structured Vector Quantizer (TSVQ). The goal of cluster analysis is to group objects based on similarity. Each object in a K-tree is represented by an n-dimensional vector. All vectors in the tree must have the same number of dimensions.

The algorithm is a hybrid of the B+-tree and k-means algorithms. It uses a similar tree structure to the B+-tree and uses k-means to perform splits. The tree forms a nearest neighbour search tree. Unlike k-means the number of clusters does not need to be specified upfront. However, a tree order must be specified that restricts how many vectors can be stored in any node. Each level of the tree produces a different number of clusters.

The K-tree algorithm is useful for clustering large data sets with many features. It scales best in comparison to traditional approaches when there are many objects to cluster into a large number of clusters. In this scenario each cluster contains relatively few objects. For example, a document collection of three million documents can be clustered into one hundred thousand clusters.

Further Development
Future directions

Currently K-tree has implementations in C, Java and Python. The Python version has recently been written by Ulf and focusses on rapid prototyping for research. Future work will look at intergrating all previous and future research into a C++ project with bindings to Matlab and Python. This includes disk based, parallel and distributed versions of K-tree. It will also be a toolkit for machine learning and data mining with text and XML documents. For more details see the Wiki.

Download K-tree
Licensed under LGPL and GPL

To download K-tree see the software page. Please cite papers from the publications page when citing K-tree.

Development Team
K-tree developers

The following people have contributed to the development of K-tree (sorted lexicographically by last name)
    Lance De Vine, QUT
    Chris De Vries, QUT
    Shlomo Geva, QUT
    Ulf Großekathöfer, Bielefeld University

The development of K-tree has been proudly supported by the QUT Faculty of Science and Technology.

K-tree Codebook Clusters in 2D
2 normal distributions in 2D