Tuesday, June 21, 2011

RealTimeGenomics Goes Free, Provides Alternative for CG Users

I've mentioned RealTimeGenomics (RTG) in the past, and Joke Reumers mentioned their software recently in her talk at the Complete Genomics user group meeting. Today, RTG announced that they were going free to individual researchers with their RTG Investigator 2.2 package. I kind of knew this was going to happen after some chats with them about the pricing models they were considering and the hard sell it would be to academia.

The Hard Sell
I think that it's a hard sell to get academic researchers to pay for something they can get for free. I've been in a lab that preferred to make its own Taq polymerase rather than pay for a commercial enzyme even when that meant using it at a 1:1 ratio with the rest of the PCR reaction (and no, I didn't stay in that lab for too long, but the point stands).

In a world where BWA, TopHat, Samtools, SoapSNP, and GATK are free and fairly well documented, selling an aligner and variant caller is going to be difficult unless it does something particularly special. Plus, a major focus of RTG's strategy is providing an alternative to Complete Genomics own analysis that comes "free" when you buy a whole genome from them. Very hard sell.

So, again, unless your software does something particularly special, like being more sensitive and/or specific, like being faster, like being significantly easier to use, like including a bunch of bells and whistles in the form of visualization tools or fancy reports, you're going to have trouble selling your product.

But how, as a company, do you prove that your software has something like this to offer? Traditionally trial licences have been the way, but that's with software that doesn't have a strong free alternative. A company lets you try the software and see if you like it, then you buy it if you do. But most sequencing labs have their pipelines done already. And comparing and contrasting two softwares isn't really worth the time unless the claims have been substantiated by other groups.

That's where this foot-in-the-door approach comes in. Basically, you give the software away to academia and let them do your leg work for you. If your software offers something special and academia can prove it, you'll start to be able to sell to the bigger corporate entities and sequencing cores. So the solution to the hard sell is to not sell at all! Brilliant!

So, how is the software?
I tested RTG's software on Illumina data because at the time that's all I was using. My findings were that it was easy to use (in a native, parallel environment), ran fast, mapped about 80% of reads (similar to Novoalign/BWA), and found a similar number of variants to GATK. Basically, it worked and seemed to work pretty well. I admit I have yet to go too in depth on comparing the findings. However, when I have some time, I intend to do more comprehensive assessment of its performance.

I also ran it in a mode that combined the Complete Genomics and Illumina data I had from the same patient. I found this to be a pretty cool option that I enjoyed using.

Really, if you're dealing with Complete Genomics data, this is your only option (as far as I know, let me know if this isn't true) for an alternative alignment and variant caller to theirs. You could also align using the RTG mapper and then try variant calling with YFA (your favorite algorithm).

What's unique?

They have a cool program called "mapx" that does a translated nucleotide alignment against protein databases. You can then take that and use their "similarity" tool to basically create phylogenetic clusters based on your reads alone. Very cool for metagenomics. I'm planning to try this out with a whole genome sample I have derived from saliva.

Why does this matter?

Well, frankly, there's the chance they just may be on to something. They make a lot of claims about their sensitivity and specificity. They have some killer ROC curves. They have that cool metagenomics tool that I honestly haven't heard about from anywhere else.

And now there's no fear of losing access to it when the trial licence expires.

I fully admit it: I wanted this to be free. Because I am one of those people who likes trying new programs and seeing if I can squeeze a bit more information out of my data set. I was just thinking today about how I should go back to our old U87MG dataset and call variants using GATK and the new SV pipeline we have.

Finally, I think it really has implications for users of Complete Genomics. Joke Reumers showed that CG variants detected by RTG as well were highly accurate. That's key as an in silico validation step. Plus, it empowers us to analyze the data ourselves. I love CG, but I also want the ability to adjust my alignment and variant calling settings myself. I also want to be able to update my analyses to be compatible with each other without having to pay a couple thousand dollars more on top of my original investment.

I do wonder how it's going to pan out. I hope, of course, that it ends up helping them out. As I tell all my corporate friends: I want them to succeed, because their success is my success.

At the very least, the software is now out in the wild. It's now on the users to figure out if it's worth using. I'll be doing my part over the coming months and I promise to share!