Posts

Introduction Continuing the series on biclustering algorithms, this post is about BiMax, a method developed to serve as a baseline in a comparison of biclustering algorithms [1]. Because BiMax was developed as a reference method, its objective is intentionally simple: it finds all biclusters consisting entirely of 1s in a binary matrix. Specifically, BiMax enumerates all inclusion-maximal biclusters, which are biclusters of all 1s to which no row or column can be added without introducing 0s.

CONTINUE READING

Introduction The subject of today’s post is a biclustering algorithm commonly referred to by the names of its authors, Yizong Cheng and George Church [1]. It is one of the best-known biclustering algorithms, with over 1,400 citations, because it was the first to apply biclustering to gene microarray data. This algorithm remains popular as a benchmark: almost every new published biclustering algorithm includes a comparison with it. Cheng and Church were interested in finding biclusters with a small mean squared residue, which is a measure of bicluster homogeneity.

CONTINUE READING

Introduction Continuing the discussion of spectral biclustering methods, this post covers the Spectral Biclustering algorithm (Kluger, et. al., 2003) [1]. This method was originally formulated for clustering tumor profiles collected via RNA microarrays. This section introduces the checkerboard bicluster structure that the algorithm fits. The next section describes the algorithm in detail. Finally, in the last section we will see how it can be used for clustering real microarray data.

CONTINUE READING

Introduction This is the second entry in the series on biclustering algorithms, this time covering spectral biclustering. This is the first part, focusing on the Spectral Co-Clustering algorithm (Dhillon, 2001) [1]. The next part will focus on a related algorithm, Spectral Biclustering (Kluger et. al., 2003) [2]. To motivate the spectral biclustering problem, let us return to our friend Bob, who hosted the party in the previous article. Bob is thinking about music again, but this time he is writing song lyrics.

CONTINUE READING

Introduction This is the first in my series of biclustering posts. It serves as an introduction to the concept. The later posts will cover the individual algorithms that I am implementing for scikit-learn. Before talking about biclustering, it is necessary to cover the basics of clustering. Clustering is a fundamental problem in data mining: given a set of objects, group them according to some measure of similarity. This deceptively simple concept has given rise to a wide variety of problems and the algorithms to solve them.

CONTINUE READING

This summer I will be participating in the Google Summer of Code by implementing biclustering algorithms for scikit-learn. As required by the Python Foundation, I will be reporting my progress on this blog.

CONTINUE READING