Tuesday, July 26, 2011

The Ion Torrent Paper (Nature)

An integrated semiconductor device enabling non-optical genome sequencing

I'm just going to discuss my thoughts and comments on the paper, their findings, and how they relate to claimed specs for the IonTorrent PGM.

And just for some comparisons later on, the current Ion Torrent PGM product sheet is also attached [click].

I'm going to try to step around some of the issues with the paper that have been well covered at Daniel MacArthur's blog Genetic Future. I think he is pretty fair to the paper in criticizing its "validation rate" and so forth. Sufficed to say, a year and a half to two years ago, perhaps a 15x human genome would have been considered adequate, but in a paper coming out of LifeTech, manufacturer of the SOLiD sequencer, if you're going to use the whole genome sequence off the SOLiD as validation, let's go for at least 30x coverage.


"..there is a desire to continue to drop the cost of sequencing at an exponential rate consistent with the semiconductor industry's Moore's Law..."
They bring up Moore's law repeatedly, and they sequenced Moore himself in the paper. But wait a second... sequencing costs are dropping significantly faster than Moore's law! I suppose it's a minor complaint, but let's give sequencing the credit it's due--per-base cost of sequencing is dropping much faster than Moore's law!

Also, let me complain very briefly about the use of a few buzz terms:
"To overcome these limitations and further democratize the practice of sequencing, a paradigm shift based on non-optical sequencing on newly developed integrated circuits was pursued."
If there is any term that is going to supplant "paradigm shift" as the default excessively pretentious term in scientific papers, it has to be "democratize". Look, unless this device is offering $100 genomes, it's not democratizing sequencing. Can we leave sensationalist buzz words for the advertisements and stick to reality for the Nature papers? (Wait, what am I saying?) I appreciate that we're talking about a non-light system here, but the observation of protons rather than photons released upon base incorporation isn't really a paradigm shift. Once we're taking pictures of DNA with electron microscopes and reading the entire genome instantaneously in one shot, then we can start talking about paradigm shifts.


Okay, let's look at the scalability based on their data, and what's being touted in their product sheet.

A typical 2-h run using an ion chip with 1.2M sensors generates approximately 25 million bases.

That pretty much throws out the Ion 314 (1.3M wells) for human genome sequencing. A 30x diploid human genome would require 640 days on the 1.2M sensor chip in the paper. Even just 1x coverage would take three weeks. Yikes.

Later there is a rather astounding statement:
"At present, 20-40% of the sensors in a given run yield mappable reads."
Room for improvement there, methinks.

In Table 1, they test the 11M chip with E. coli and Human. The E. coli yields 273.9Mb of sequence off that chip. At about 20-40% of sensors yielding mappable reads, that gives an average read length of 62b-125b. This is consistent with their finding that 2.6M reads are >=21b and that 1.8M are >= 100b. Also with Figure S15, where it appears the majority of read lengths are around 110b-120b. So at least the read lengths are not disappointing.

Their Ion 318 is the 12 million wells chip. I think this is similar to their 11M chip in the paper. Going back to Table 1, they got 273.9Mb off the 11M chip. At issue is the promised "[starting] 1 Gb of high-quality sequence" off the Ion 318 chip in the Ion Torrent product sheet. Now, I completely believe that advancements have been made since the paper's acceptance on May 26th, 2011, but four times the yield? Not so sure about that claim. I'm not doubting it can get there, but I'll put it this way: This is a paper from the company that makes the product--if anyone can make it work optimally, it should be them. And their optimal report here has it at significantly lower than what they're advertising. Oh, and the specs sheet has small print next to the Ion 318 chip entry that says "the content provided herein [...] is subject to change without notice". Let's just say I'm skeptical.

Anyway, the rest of the paper is pretty vague. One issue on everyone's mind is the homopolymer issue, which is addressed in a single sentence stating the accuracy of 5-base homopolymers (97.328%--not terrible, but not overly good either) and that it's "better than pyrosequencing-based methods" (read: 454).  How about longer ones? No idea. Figure S16b only goes out to 5b also with a curve that isn't looking too encouraging, though.

Apparently the Ion 316, their 6.3M well chip is currently available, and they claim at least 100Mb of sequence per run. This is consistent with their mapped bases in the paper (169.6Mb off a 6.1M ion chip).  With this chip, you're talking about 3 days on one machine for 1x diploid human coverage, and about 90 days for 30x coverage. Better, but still not there when it comes to human sequencing.  Still, it's at the level of completing entire bacterial genomes in 2 hours. If you're into that sort of thing (and don't have access to a HiSeq or something else...).

Not Quite "Post-Light"

You know, there's a lot of "post-light" jibber jabber in the paper. It's Ion Torrent's favorite buzz phrase, and I'm a fan, actually (much more so than I am of "democratize" and "paradigm shift"). But at this point, with this performance, I'm not sure we're "post-light" yet. The technology is there, but it isn't scaled up enough yet. There's a claim made a the end of the paper that's interesting:
"The G. Moore genome sequence required on the order of a thousand individual ion chips comprising about one billion sensors. ...our work suggests that readily available CMOS nodes should enable the production of one-billion-sensor ion chips and low-cost routine human genome sequencing."
Doubtless this is long in the works already, and I hope it is a reality. Because making the leap from the things in this paper to a functional 1B sensor chip would make a huge difference.

I'd say I was a bit disappointed with this paper. It felt half done. I'm confused about the way the comparison to SOLiD was done--why wasn't the SOLiD WGS of G. Moore done to an adequate depth? I'm a bit annoyed at the lack of comprehensive information, as well. The homopolymer issue is known--why hide behind homopolymers of 5b or smaller? Just give the whole story in your paper--it's an article in Nature, not an advertisement.

Anyway, to quote a very smart man I know, "it is what it is." Ion Torrent is here to stay and it's only going to improve. I certainly hope it does--I'd love to see it pumping out 1Gb/2hrs with long read lengths.