## An introduction to biclustering

## Introduction

This is the first in my series of biclustering posts. It serves as an introduction to the concept. The later posts will cover the individual algorithms that I am implementing for scikit-learn.

Before talking about biclustering, it is necessary to cover the basics of clustering. Clustering is a fundamental problem in data mining: given a set of objects, group them according to some measure of similarity. This deceptively simple concept has given rise to a wide variety of problems and the algorithms to solve them. scikit-learn provides a number of clustering methods, but these represent only a small fraction ...

## Biclustering paper accepted

Our paper, *A Comparative Analysis of Biclustering Algorithms for Gene
Expression Data* has been accepted by Briefings in Bioinformatics.

## Cheng and Church

## Introduction

The subject of today’s post is a biclustering algorithm commonly referred to by the names of its authors, Yizong Cheng and George Church [1]. It is one of the best-known biclustering algorithms, with over 1,400 citations, because it was the first to apply biclustering to gene microarray data. This algorithm remains popular as a benchmark: almost every new published biclustering algorithm includes a comparison with it.

Cheng and Church were interested in finding biclusters with a small mean squared residue, which is a measure of bicluster homogeneity. To define the mean squared residue, the following notation will ...

## Google Summer of Code 2013

This summer I will be participating in the Google Summer of Code by implementing biclustering algorithms for scikit-learn. As required by the Python Foundation, I will be reporting my progress on this blog.

read more## New position at Heidelberg University

On 7 March 2012 I successfully defended my Master’s Thesis,
*Application of biclustering algorithms to biological data*, which
will soon be available at the OhioLINK ETD Center.

Next month I will officially begin a new position in the Multidimensional Image Processing lab at Heidelberg University.

read more## New software: BiBench

My associates and I at the HPC Lab have just released some new software: BiBench, a Python package for biclustering. In addition to lots of other functionality, It provides a common interface to twelve biclustering algorithms, including our own CPB algorithm.

BiBench makes biclustering gene expression datasets easy. In only a few lines of code you can download a GDS dataset, bicluster it, and perform GO enrichment analysis on the genes of the resulting biclusters. Though we developed BiBench to specifically target gene expression data, it is useful for any data that may be meaningfully biclustered.

A download link, installation ...

read more## Spectral biclustering, part 1

## Introduction

This is the second entry in the series on biclustering algorithms, this time covering spectral biclustering. This is the first part, focusing on the Spectral Co-Clustering algorithm (Dhillon, 2001) [1]. The next part will focus on a related algorithm, Spectral Biclustering (Kluger et. al., 2003) [2].

To motivate the spectral biclustering problem, let us return to our friend Bob, who hosted the party in the previous article. Bob is thinking about music again, but this time he is writing song lyrics. Bob plans to imitate popular songs by including similar words in his lyrics. To learn which words to ...

## Spectral biclustering, part 2

## Introduction

Continuing the discussion of spectral biclustering methods, this post covers the Spectral Biclustering algorithm (Kluger, et. al., 2003) [1]. This method was originally formulated for clustering tumor profiles collected via RNA microarrays. This section introduces the checkerboard bicluster structure that the algorithm fits. The next section describes the algorithm in detail. Finally, in the last section we will see how it can be used for clustering real microarray data.

The data collected from a gene expression microarray experiment may be arranged in a matrix \(A\) , where the rows represent genes and the columns represent individual microarrays. Entry \(A_{ij ...

## The BiMax algorithm

## Introduction

The focus of this post is BiMax, a method developed to serve as a baseline in a comparison of biclustering algorithms [1]. Because BiMax was developed as a reference method, its objective is intentionally simple: it finds all biclusters consisting entirely of 1s in a binary matrix.

Specifically, BiMax enumerates all inclusion-maximal biclusters, which are biclusters of all 1s to which no row or column can be added without introducing 0s. More formally, an inclusion-maximal bicluster of \(m \times n\) binary matrix \(M\) is a set of rows and a set of columns \((R, C)\) , with \(R \in 2 ...

## We got runner up for best poster at ACM BCB 2011

I just got back from presenting a poster at the ACM Conference on Bioinformatics, Computational Biology and Biomedicine in Chicago, IL. We were voted runner up for best poster! The poster will soon be available at the HPC lab website.

read more