Development has started on LMW-tree. It is a generic C++ template library implementing EM-tree, K-tree and related research using Boost and Intel Thread Building Blocks. It includes K-tree, EM-tree and other algorithms. It reduces memory usage and increases execution speed. Parallelized versions of the algorithms have been implemented. Streaming and distributed implementations of the EM-tree are being developed for clustering of extremely large collections containing billions of examples. Development is taking place on GitHub.
K-tree is a tree structured clustering algorithm. It is also refered to as a Tree Structured Vector Quantizer (TSVQ). The goal of cluster analysis is to group objects based on similarity. Each object in a K-tree is represented by an n-dimensional vector. All vectors in the tree must have the same number of dimensions.
The algorithm is a hybrid of the B+-tree and k-means algorithms. It uses a similar tree structure to the B+-tree and uses k-means to perform splits. The tree forms a nearest neighbour search tree. Unlike k-means the number of clusters does not need to be specified upfront. However, a tree order must be specified that restricts how many vectors can be stored in any node. Each level of the tree produces a different number of clusters.
The K-tree algorithm is useful for clustering large data sets with many features. It scales best in comparison to traditional approaches when there are many objects to cluster into a large number of clusters. In this scenario each cluster contains relatively few objects. For example, a document collection of three million documents can be clustered into one hundred thousand clusters.
Currently K-tree has implementations in C++, C, Java and Python. The Python version has recently been written by Ulf and focusses on rapid prototyping for research. Development has recently started on the C++ version.
The following people have contributed to the development of K-tree (sorted lexicographically by last name)
Lance De Vine, QUT
Chris De Vries, QUT
Shlomo Geva, QUT
Ulf Großekathöfer, Bielefeld University
The development of K-tree has been proudly supported by the QUT Faculty of Science and Technology.