K-tree is a tree structured clustering algorithm. It is also refered to as a Tree Structured Vector Quantizer (TSVQ). The goal of cluster analysis is to group objects based on similarity. Each object in a K-tree is represented by an n-dimensional vector. All vectors in the tree must have the same number of dimensions.
The algorithm is a hybrid of the B+-tree and k-means algorithms. It uses a similar tree structure to the B+-tree and uses k-means to perform splits. The tree forms a nearest neighbour search tree. Unlike k-means the number of clusters does not need to be specified upfront. However, a tree order must be specified that restricts how many vectors can be stored in any node. Each level of the tree produces a different number of clusters.
The K-tree algorithm is useful for clustering large data sets with many features. It scales best in comparison to traditional approaches when there are many objects to cluster into a large number of clusters. In this scenario each cluster contains relatively few objects. For example, a document collection of three million documents can be clustered into one hundred thousand clusters.