[Update:] Since the post was published, a preprint on Kallisto manuscript is available. Lior Pachter has writtena blog post on pseudo-aligments and Rob Patro has a nice post on “Light weight algorithms for RNA-seq”. Here there.
- kallisto preprint
- Near-optimal RNA-Seq quantification with kallisto
- Rob Patro’s blog post: Not-quite alignments: Salmon, kallisto and efficient quantification of RNA-seq data
2015 Biology of the Genomes conference at CSHL NY has started yesterday. A lot of interesting genomics stories to watch out for during the conference. On the methodological side, there is kallisto, a new RNA-seq quantitation method that is superfast, from Lior Pachter’s group at Berkeley. kallisto is presented as poster at BOG15
- Ultrafast accurate RNA-Seq analysis, Nicolas Bray, Harold Pimentel, Páll Melsted, Lior Pachter
The Pachter team has opened up the software before the poster session today. The software is freely downloadable fromkallisto: ultra fast RNA-seq quantitation . A preprint describing the method is expected soon. Although not much is known about the method per say, but it is bit easy to see the basic idea behind the new method. Here is a quick post on it.
kallisto is a software program written mainly in C++ for quantifying expression abundances of transcripts using RNA-Seq data. kallisto is fast, the software page shows that it is faster than Salifish, one of the fastest RNA-seq quantitation method using k-mer lookups instead regular “alignment”.
The original implementation of Sailfish breaks down reads into k-mers for quantitation. However, using the k-mers alone kind of throws away the information present in a read, therefore it can lead to inaccurate abundance estimates. In contrast, alignment approaches preserves the information in the reads, but the alignment process is slower.
kallisto seems to gain the speed and accuracy using the idea of
pseudoalignment for rapidly determining the compatibility of reads with targets, without the need for alignment
It looks like instead of regular alignment or pure k-mer based analysis, kallisto seems to use the “pseudoalignment” process to find the set of potential transcripts a read could have come from. The term “pseudoalignment” is vague here, the github page for kallisto points to clever algorithmic use of k-mers and hashing that preserve the read identity for finding the potential origins of reads very quickly.
The Kallisto github page also shows the use of an EM algorithm on the “pseudoalignments” to resolve “read origin” ambiguities in reads aligning to multiple trranscripts. It will be interesting to whether it is regular EM algorithm with clever optimization like Sailfish or some kind of online algorithm like eXpress is used in kallisto. At the end, kallisto yields expected read counts for every transcript. It is also good to see that Kallisto has moved away from outputting FPKM/RPKM in favor of TPM in the output file.
kallisto page has some description about the performance of kallisto and comparison of speed with other methods like Sailfish, and eXpress.
On benchmarks with standard RNA-Seq data, kallisto can quantify 30 million human reads in less than 3 minutes on a Mac desktop computer using only the read sequences and a transcriptome index that itself takes less than 10 minutes to build. Pseudoalignment of reads preserves the key information needed for quantification, and kallisto is therefore not only fast, but also as accurate than existing quantification tools. In fact, because the pseudoalignment procedure is robust to errors in the reads, in many benchmarks kallisto significantly outperforms existing tools.
Just love that the kallisto FAQ page has this and leaves it for imagination :)
Is there a reason you picked the name kallisto for your program?
The kallisto bear does seem to like the sailfish. It will be interesting to see how kallisto will like the upcoming “salmon”. Salmon, a new approach from the Sailfish team, seems a bit similar in some aspects to kallisto. It will be interesting to see the comparison. Another interesting comparison is to know how it compares with RNA-skim, another k-mer based approach that is faster than Sailfish.
A bit more updates on kallisto from #bog15 tweets.
— Ewan Birney (@ewanbirney) May 6, 2015
— valentine svensson (@vallens) May 7, 2015
Many people have been asking for kallisto vs RNASkim comparisons. This is why we haven’t done it: https://t.co/mdkusLefY2
— Harold Pimentel (@hjpimentel) May 6, 2015