Selected Publications

FLEA (Full-Length Envelope Analyzer) is a bioinformatics pipeline which performs end-to-end analysis and visualization of long-read sequencing data. The pipeline transforms FASTQ reads into high-quality consensus sequences (HQCSs) and uses them to build a codon-aware multiple sequence alignment. The resulting alignment is then used to infer phylogenies, selection pressure, and evolutionary dynamics. The web application provides publication-quality plots and interactive visualizations, including an annotated viral alignment browser, time series plots of evolutionary dynamics, visualizations of gene-wide selective pressures (such as dN/dS) across time and across protein structure, and a phylogenetic tree browser.
bioRxiv,2017

RIFRAF is a sequence consensus algorithm that takes a set of error-prone reads and a reference sequence and infers an accurate in-frame consensus. RIFRAF consistently finds a consensus sequence that is more accurate and in-frame, especially with small numbers of reads. RIFRAF is able to achieve these results and keep the consensus in-frame even with a distantly related reference sequence. Moreover, unlike other frame correction methods, RIFRAF can detect and keep true indels while removing erroneous ones.
bioRxiv,2017

Recent Publications

. Full-Length Envelope Analyzer (FLEA): A tool for longitudinal analysis of viral amplicons. bioRxiv, 2017.

Preprint PDF Code Project

. HIV Envelope Glycoform Heterogeneity and Localized Diversity Govern the Initiation and Maturation of a V2 Apex Broadly Neutralizing Antibody Lineage. Immunity, 2017.

Link

. Antibody 10-1074 suppresses viremia in HIV-1-infected individuals. Nature Medicine, 2017.

Link

. Rapid Sequencing of Complete env Genes from Primary HIV-1 Samples . Virus Evolution, 2016.

Link

. Early Antibody Lineage Diversification and Independent Limb Maturation Lead to Broad HIV-1 Neutralization Targeting the Env High-Mannose Patch. Immunity, 2016.

Link

. Gene-Wide Identification of Episodic Selection. Molecular Biology and Evolution, 2016.

PDF Code Project Link

. Querying Co-regulated Genes on Diverse Gene Expression Datasets Via Biclustering. Microarray Data Analysis, 2015.

. A comparative analysis of biclustering algorithms for gene expression data. Briefings in Bioinformatics, 2012.

PDF Code Link

Recent Posts

More Posts

Introduction Continuing the series on biclustering algorithms, this post is about BiMax, a method developed to serve as a baseline in a comparison of biclustering algorithms [1]. Because BiMax was developed as a reference method, its objective is intentionally simple: it finds all biclusters consisting entirely of 1s in a binary matrix. Specifically, BiMax enumerates all inclusion-maximal biclusters, which are biclusters of all 1s to which no row or column can be added without introducing 0s.

CONTINUE READING

Introduction The subject of today’s post is a biclustering algorithm commonly referred to by the names of its authors, Yizong Cheng and George Church [1]. It is one of the best-known biclustering algorithms, with over 1,400 citations, because it was the first to apply biclustering to gene microarray data. This algorithm remains popular as a benchmark: almost every new published biclustering algorithm includes a comparison with it. Cheng and Church were interested in finding biclusters with a small mean squared residue, which is a measure of bicluster homogeneity.

CONTINUE READING

Introduction Continuing the discussion of spectral biclustering methods, this post covers the Spectral Biclustering algorithm (Kluger, et. al., 2003) [1]. This method was originally formulated for clustering tumor profiles collected via RNA microarrays. This section introduces the checkerboard bicluster structure that the algorithm fits. The next section describes the algorithm in detail. Finally, in the last section we will see how it can be used for clustering real microarray data.

CONTINUE READING

Introduction This is the second entry in the series on biclustering algorithms, this time covering spectral biclustering. This is the first part, focusing on the Spectral Co-Clustering algorithm (Dhillon, 2001) [1]. The next part will focus on a related algorithm, Spectral Biclustering (Kluger et. al., 2003) [2]. To motivate the spectral biclustering problem, let us return to our friend Bob, who hosted the party in the previous article. Bob is thinking about music again, but this time he is writing song lyrics.

CONTINUE READING

Introduction This is the first in my series of biclustering posts. It serves as an introduction to the concept. The later posts will cover the individual algorithms that I am implementing for scikit-learn. Before talking about biclustering, it is necessary to cover the basics of clustering. Clustering is a fundamental problem in data mining: given a set of objects, group them according to some measure of similarity. This deceptively simple concept has given rise to a wide variety of problems and the algorithms to solve them.

CONTINUE READING

Contact