Sailfish: Alignment-free isoform quantification from RNA-seq reads

Sailfish – the fastest and alignmnet-free isoform-level quantitation of RNA-seq data is now published on Nature Biotechnology.

Sailfish manuscript was first available in the open preprint server arXiv in August 2013 and now it is in Nature Biotechnology’s online publication.

In case you missed reading about Sailfish earlier, Sailfish is a new computational method for isoform-level expression quantitation from RNA-seq data developed by a team from Carnegie Mellon University and UMCP, Maryland.  Here is the abstract of the Sailfish Nature Biotechnology paper.

We introduce Sailfish, a computational method for quantifying the abundance of previously annotated RNA isoforms from RNA-seq data. Because Sailfish entirely avoids mapping reads, a time-consuming step in all current methods, it provides quantification estimates much faster than do existing approaches (typically 20 times faster) without loss of accuracy. By facilitating frequent reanalysis of data and reducing the need to optimize parameters, Sailfish exemplifies the potential of lightweight algorithms for efficiently processing sequencing reads.

The thing about Sailfish that makes it novel is that Sailfish is an alignment-free approach that works with k-mers and has some cool EM tricks to run super-fast. As Steve Mount, one of the authors tweeted a while ago, now you can do RNA-seq quantitation need not take longer than a cup of coffee :)

And we had quick a blogpost on Sailfish almost immediately @nextgenseek (the only useful post from @nextgenseek in a longtime :-)).  The arXiv manuscript also allowed a great open discussion/ peer-review online, as the RNA-seq methods pioneer, Lior Pachter had a blogpost on Sailfish immediately.

Sailfish: Worry-Free, not just Alignment-Free?

Looking forward to read the Nature Biotechnology paper and have a “Buzzfeed” style post “10 reasons you should be using Sailfish, not xxxx” :) Before that here is just a  sample :)

The biggest benefit that was kind of golden is that with Sailfish lets you not to worry about the zillion default parameters that come with any quantitation approach using sequencing data.  As the Sailfish paper say in the supplement

Sailfish avoids parameters specifying, for example, the number of mismatches to tolerate, total allowable quality of mismatched bases, gap open and extension penalties, whether and how much to trim reads, number and quality of alignments to report from the aligner and pass into the estimation procedure.



