ISMB 2013 Announces the List of Accepted Papers

ISMB2013: List of Accepted papers

ISMB2013: List of Accepted papers

The annual international conference on Intelligent Systems for Molecular Biology (ISMB) announced the list of accepted papers that will be presented at this year’s ISMB to be held at Berlin, Germany during July 19-23. ISMB is one of the biggest bioinformatics/computational biology conferences and it is organized by International Society for Computational Biology (ISCB). The 21st ISMB is joining hands with the annual ECCB conference to make it the biggest computational biology event.

ISMB 2013 website listed over 40 papers that will be presented at the conference as 20 min talks in the proceedings track. These accepted papers will also be published as ISMB 2013 proceedings on Bioinformatics journal online.  The accepted papers covers a wide range of topics in computational biology/bioinformatics including next sequencing analysis.  The new papers on Next Gen sequencing analysis include a new short read aligner for population, de Bruijn Graph Assembler for Transcriptomes, method to correct RNA-seq reads misalignment due to pseudogenes. Here is a list of some of the accepted papers with its abstracts with a clear genomics bias.

[Update] Here is the link to Bioinformatics issue that published the ISMB/ECCB 2013 papers.

Short Read Alignment with Populations of Genomes

Author: Lin Huang , Stanford University, United States
Additional authors:
Victoria Popic, Stanford University,
Serafim Batzoglou, Stanford University,

The increasing availability of high throughput sequencing technologies has led to thousands of human genomes having been sequenced in the past years. Efforts such as the 1000 Genomes Project further add to the availability of human genome variation data. However, to-date there is no method that can map reads of a newly sequenced human genome to a large collection of genomes. Instead, methods rely on aligning reads to a single reference genome. This leads to inherent biases and lower accuracy. To tackle this problem, a new alignment tool BWBBLE is introduced in this paper.

We (1) introduce a new compressed representation of a collection of genomes, which explicitly tackles the genomic variation observed at every position, and (2) design a new alignment algorithm based on the Burrows-Wheeler transform that maps short reads from a newly sequenced genome to an arbitrary collection of 2 or more (up to millions of) genomes with high accuracy and no inherent bias to one specific genome.

GeneScissors: a comprehensive approach to detecting and correcting spurious transcriptome inference due to RNAseq reads misalignment

Author: Zhaojun Zhang , UNC Chapel Hill, United States

Shunping Huang, UNC Chapel Hill,
Jack Wang, UNC Chapel Hill,
Xiang Zhang, Case Western Reserve University,
Fernando Pardo Manuel De Villena, UNC Chapel Hill,
Leonard McMillan, UNC Chapel Hill,
Wei Wang, UNC Chapel Hill,

Motivation: RNA-seq techniques provide an unparalleled means for exploring a transcriptome with deep coverage and base pair level resolution. Various analysis tools have been developed to align and assemble RNA-seq data, such as the widely used TopHat/Cufflinks pipeline. A common observation is that a sizable fraction of the fragments/reads align to multiple locations of the genome. These multiple alignments pose substantial challenges to existing RNA-seq analysis tools. Inappropriate treatment may result in reporting spurious expressed genes (false positives), and missing the real expressed genes (false negatives). Such errors impact the subsequent analysis, such as differential expression analysis.

In our study, we observe that about 3.5% of transcripts reported by TopHat/Cufflinks pipeline correspond to annotated nonfunctional pseudogenes. Moreover, about 10.0% of reported transcripts are not annotated in the Ensembl database. These genes could be either novel expressed genes or false discoveries.

Results: We examine the underlying genomic features that lead to multiple alignments and investigate how they generate systematic errors in RNA-seq analysis. We develop a general tool, GeneScissors, which exploits machine learning techniques guided by biological knowledge to detect and correct spurious transcriptome inference by existing RNA-seq analysis methods. In our simulated study, GeneScissors can predict spurious transcriptome calls due to misalignment with an accuracy close to 90%. It provides substantial improvement over the widely used TopHat/Cufflinks or MapSplice/Cufflinks pipelines in both precision and F-measurement. On real data, GeneScissors reports 53.6% less pseudogenes and 0.97% more expressed and annotated transcripts, when compared with the TopHat/Cufflinks pipeline. In addition, among the 10.0% unannotated transcripts reported by TopHat/Cufflinks, GeneScissors finds that more than 16.3% of them are false positives. Availablility: The software can be downloaded at http://csbio.unc.edu/genescissors/

IDBA-Tran: A More Robust de novo de Bruijn Graph Assembler for Transcriptomes with Uneven Expression Levels

Author: Yu Peng , The University of Hong Kong, Hong Kong
Henry C.M. Leung, The University of Hong Kong,
S.M. Yiu, The University of Hong Kong,
Xin-Guang Zhu, Shanghai Institutes for Biological Sciences,
Ming-Zhu Lv, Shanghai Institutes for Biological Sciences,
Francis Chin, The University of Hong Kong,

Motivation: RNA sequencing based on next-generation sequencing tech-nology is an effective approach for analyzing transcriptomes. Similar to de novo genome assembly, de novo transcriptome assembly does not rely on a reference genome or additional annotated information. It is well-known that the transcriptome assembly problem is more difficult. In particular, isoforms can have very uneven expression levels (e.g. 1:100) which make it very difficult to identify low-expressed isoforms. Technically, a core issue is to remove erroneous vertices/edges with high multiplicity (produced by high-expressed isoforms) in the de Bruijn graph without removing those correct ones with not so high multiplicity corresponding to low-expressed isoforms. Failing to do so will result in the loss of low-expressed isoforms or having complicated subgraphs with transcripts of different genes mixed together due to the erroneous vertices and edges.

Contributions: Unlike existing tools which usually remove erroneous vertices/edges if their multiplicities are lower than a global threshold, we developed a probabilistic progressive approach with local thresholds to iteratively remove those erroneous vertices/edges. This enables us to de-compose the graph into disconnected components, each of which contains a few, if not single, genes, while keeping a lot of correct vertices/edges of low-expressed isoforms. Combined with existing techniques, IDBA-Tran is able to assemble both high-expressed and low-expressed transcripts and outperforms existing assemblers in terms of sensitivity and specificity for both simulated and real data. Availability: http://www.cs.hku.hk/~alse/idba_tran

Haplotype assembly in polyploid genomes and identical by descent shared tracts

Author: Derek Aguiar , Brown University, United States
Sorin Istrail, Brown University,

Motivation: Genome-wide haplotype reconstruction from sequence data, or haplotype assembly, is at the center of major challenges in molecular biology and life sciences. For complex eukaryotic organisms like humans, the genome is vast and the population samples are growing so rapidly that algorithms processing these high-throughput sequencing data must scale favorably in terms of both accuracy and computational efficiency. Furthermore, current models and methodologies for haplotype assembly (1) do not consider individuals sharing haplotypes jointly which reduces the size and accuracy of assembled haplotypes and (2) are unable to model genomes having more than two sets of homologous chromosomes (polyploidy). Particularly, polyploid organisms are becoming the target of many research groups interested in studying the genomics of disease, phylogenetics, botany, and evolution but there is an absence of theory and methods for polyploid haplotype reconstruction.

Results: In this work, we present a number of results, extensions, and generalizations of Compass graphs and our HapCompass framework (Aguiar et al. 2012). We prove the theoretical complexity of two haplotype assembly optimizations, thereby motivating the use of heuristics. We present graph theory-based algorithms for the problem of haplotype assembly from sequencing data using our previously developed HapCompass framework for (1) novel implementations of haplotype assembly optimizations (minimum error correction), (2) assembly of a pair of individuals sharing a tract identical by descent, and (3) assembly of polyploid genomes. We demonstrate the accuracy of each method on the 1000 Genomes Project, Pacific Biosciences, and simulated sequence data. HapCompass is available for download at http://www.brown.edu/Research/Istrail_Lab/}{http://www.brown.edu/Research/Istrail_Lab/

Using State Machines to Model the IonTorrent Sequencing Process and Improve Read Error-Rates

Author: David Golan , Tel Aviv University, Israel
Paul Medvedev, The Pennsylvania State University,

Motivation: The importance of fast and affordable DNA sequencing methods for current day life sciences, medicine and biotechnology is hard to overstate. A major player is IonTorrent, a pyrosequencing-like technology which produces flowgrams – sequences of incorporation values – which are converted into nucleotide sequences by a base-calling algorithm. Because of its exploitation of ubiquitous semiconductor technology and innovation in chemistry, IonTorrent has been gaining popularity since its debut in 2011. Despite the advantages, however, IonTorrent read accuracy remains a significant concern.

Results: We present FlowgramFixer, a new algorithm for converting flowgrams into reads. Our key observation is that the incorporation signals of neighboring flows, even after normalization and phase correction, carry considerable mutual information and are important in making the correct base-call. We therefore propose that base-calling of flowgrams should be done on a read-wide level, rather than one flow at a time. We show that this can be done in linear time by combining a state machine with a Viterbi algorithm to find the nucleotide sequence that maximizes the likelihood of the observed flowgram. FlowgramFixer is applicable to any flowgram based sequencing platform. We demonstrate FlowgramFixer’s superior performance on Ion Torrent E.Coli data, with a 4.8% improvement in the number of high-quality mapped reads and a 7.1% improvement in the number of uniquely mappable reads. Availability: Binaries and source code of FlowgramFixer are freely available at: http://www.cs.tau.ac.il/˜davidgo5/flowgramfixer.html

Integrating sequence, expression and interaction data to determine condition-specific miRNA regulation

Author: Hai-Son Le , Carnegie Mellon, United States
Additional authors:
Ziv Bar-Joseph, Carnegie Mellon,

MicroRNAs (miRNAs) are small non-coding RNAs that regulate gene expression post-transcriptionally. MiRNAs were shown to play an important role in development and disease, and accurately determining the networks regulated by these miRNAs in a specific condition is of great interest. Early work on miRNA target prediction has focused on utilizing static sequence information. More recently, researchers have combined sequence and expression data to identify such targets in various conditions. Results: Here we propose a regression-based probabilistic method that integrates sequence, expression and interaction data to identify modules of mRNAs controlled by small sets of miRNAs. We formulate an optimization problem and develop a learning framework to determine the module regulation and membership. Applying our method to cancer data we show that by adding protein interaction data and modeling combinatorial regulation our method can accurately identify both miRNA and their targets improving upon prior methods. We next used our method to jointly analyze a number of different types of cancers and identified both common and cancer type specific miRNA regulators.

Inference of historical migration rates via haplotype sharing

Author: Pier Francesco Palamara , Columbia University, United States
Itsik Pe’Er, Columbia University,

Pairs of individuals from a study cohort will often share long-range haplotypes identical-by-descent (IBD). Such haplotypes are transmitted from common ancestors that lived tens to hundreds of generations in the past, and can now be efficiently detected in high-resolution genomic datasets, providing a novel source of information in several domains of genetic analysis. Recently, haplotype sharing distributions were studied in the context of demographic inference, and were used to reconstruct recent demographic events in several populations. We here extend such framework to handle demographic models that contain multiple demes interacting through migration. We extensively test our formalism in several demographic scenarios, and provide a freely available software tool for demographic inference.

IBD-Groupon : An Efficient Method for Detecting Group-wise Identity-by-Descent regions simultaneously in Multiple Individuals based on Pairwise IBD relationships

Author: Dan He , IBM T.J. Watson, United States

Detecting Identity-by-Descent (IBD) is a very important problem in genetics. Most of the existing methods focus on detecting pairwise IBDs, which have relatively low power to detect short IBDs. Methods to detect IBDs among multiple individuals simultaneously, or group-wise IBDs, have better performance for short IBD detection. In the meanwhile group-wise IBDs can be applied to a wide range of applications such as disease mapping, pedigree reconstruction, etc. The existing group-wise IBD detection method is computationally inefficient and is only able to handle small data sets such as 20, 30 individuals with hundreds of SNPs. It also requires a prior specification of the number of IBD groups, which may not be realistic in many cases. The method can only handle small number of IBD groups such as two or three due to scalability issue. What’s more, it does not take LD into consideration. In this work, we developed a very efficient method IBD-Groupon, which detects group-wise IBDs based on pairwise IBD relationships and it is able to address all the drawbacks mentioned above. To our knowledge, our method is the first group-wise IBD detection method that is scalable to very large data sets, for example, hundreds of individuals with thousands of SNPs, and in the meanwhile is powerful to detect short IBDs. Our method does not need to specify the number of IBD groups, which will be detected automatically. And our method takes LD into consideration as it is based on pairwise IBDs where LD can be easily incorporated.

 

Speak Your Mind

*