FLEA (Full-Length Envelope Analyzer) is a bioinformatics pipeline which performs end-to-end analysis and visualization of long-read sequencing data. The pipeline transforms FASTQ reads into high-quality consensus sequences (HQCSs) and uses them to build a codon-aware multiple sequence alignment. The resulting alignment is then used to infer phylogenies, selection pressure, and evolutionary dynamics. The web application provides publication-quality plots and interactive visualizations, including an annotated viral alignment browser, time series plots of evolutionary dynamics, visualizations of gene-wide selective pressures (such as dN/dS) across time and across protein structure, and a phylogenetic tree browser.
Kemal Eren, Steven Weaver, Robert Ketteringham, Morné Valentyn, Melissa Laird Smith, Venkatesh Kumar, Sanjay Mohan, Sergei L Kosakovsky Pond, Ben Murrell
RIFRAF is a sequence consensus algorithm that takes a set of error-prone reads and a reference sequence and infers an accurate in-frame consensus. RIFRAF consistently finds a consensus sequence that is more accurate and in-frame, especially with small numbers of reads. RIFRAF is able to achieve these results and keep the consensus in-frame even with a distantly related reference sequence. Moreover, unlike other frame correction methods, RIFRAF can detect and keep true indels while removing erroneous ones.
Marina Caskey, Till Schoofs, Henning Gruell, Allison Settler, Theodora Karagounis, Edward F Kreider, Ben Murrell, Nico Pfeifer, Lilian Nogueira, Thiago Y Oliveira, Gerald H Learn, Yehuda Z Cohen, Clara Lehmann, Daniel Gillor, Irina Shimeliovich, Cecilia Unson-O’Brien, Daniela Weiland, Alexander Robles, Tim Kümmerle, Christoph Wyen, Rebeka Levin, Maggi Witmer-Pack, Kemal Eren, Caroline Ignacio, Szilard Kiss, Anthony P West Jr, Hugo Mouquet, Barry S Zingman, Roy M Gulick, Tibor Keler, Pamela J Bjorkman, Michael S Seaman, Beatrice H Hahn, Gerd Fätkenheuer, Sarah J Schlesinger, Michel C Nussenzweig, Florian Klein.
Antibody 10-1074 suppresses viremia in HIV-1-infected individuals.
Nature Medicine,
2017.
Melissa Laird Smith, Ben Murrell, Kemal Eren, Caroline Ignacio, Elise Landais, Steven Weaver, Pham Phung, Colleen Ludka, Lance Hepler, Gemma Caballero, Tristan Pollner, Yan Guo, Douglas Richman, The IAVI Protocol C Investigators & The IAVI African HIV Research Network, Pascal Poignard, Ellen E. Paxinos, Sergei L. Kosakovsky Pond, Davey M. Smith.
Rapid Sequencing of Complete env Genes from Primary HIV-1 Samples .
Virus Evolution,
2016.
Ben Murrell, Steven Weaver, Martin D. Smith, Joel O. Wertheim, Sasha Murrell, Anthony Aylward, Kemal Eren, Tristan Pollner, Darren P. Martin, Davey M. Smith, Konrad Scheffler, Sergei L. Kosakovsky Pond.
Gene-Wide Identification of Episodic Selection.
Molecular Biology and Evolution,
2016.
Introduction Continuing the series on biclustering algorithms, this post is about BiMax, a method developed to serve as a baseline in a comparison of biclustering algorithms [1]. Because BiMax was developed as a reference method, its objective is intentionally simple: it finds all biclusters consisting entirely of 1s in a binary matrix.
Specifically, BiMax enumerates all inclusion-maximal biclusters, which are biclusters of all 1s to which no row or column can be added without introducing 0s.
Introduction The subject of today’s post is a biclustering algorithm commonly referred to by the names of its authors, Yizong Cheng and George Church [1]. It is one of the best-known biclustering algorithms, with over 1,400 citations, because it was the first to apply biclustering to gene microarray data. This algorithm remains popular as a benchmark: almost every new published biclustering algorithm includes a comparison with it.
Cheng and Church were interested in finding biclusters with a small mean squared residue, which is a measure of bicluster homogeneity.
Introduction Continuing the discussion of spectral biclustering methods, this post covers the Spectral Biclustering algorithm (Kluger, et. al., 2003) [1]. This method was originally formulated for clustering tumor profiles collected via RNA microarrays. This section introduces the checkerboard bicluster structure that the algorithm fits. The next section describes the algorithm in detail. Finally, in the last section we will see how it can be used for clustering real microarray data.
Introduction This is the second entry in the series on biclustering algorithms, this time covering spectral biclustering. This is the first part, focusing on the Spectral Co-Clustering algorithm (Dhillon, 2001) [1]. The next part will focus on a related algorithm, Spectral Biclustering (Kluger et. al., 2003) [2].
To motivate the spectral biclustering problem, let us return to our friend Bob, who hosted the party in the previous article. Bob is thinking about music again, but this time he is writing song lyrics.
Introduction This is the first in my series of biclustering posts. It serves as an introduction to the concept. The later posts will cover the individual algorithms that I am implementing for scikit-learn.
Before talking about biclustering, it is necessary to cover the basics of clustering. Clustering is a fundamental problem in data mining: given a set of objects, group them according to some measure of similarity. This deceptively simple concept has given rise to a wide variety of problems and the algorithms to solve them.