<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-3162625598067227745</id><updated>2012-01-27T16:00:14.632-08:00</updated><category term='complete genomics user conference 2011'/><category term='exome-seq'/><category term='coverage'/><category term='webtools'/><category term='bedtools'/><category term='Technology'/><category term='ion torrent'/><category term='news'/><category term='23andMe'/><category term='politics'/><category term='personalized genomics'/><category term='genomics'/><category term='GATK'/><category term='journal club'/><category term='blogging'/><category term='read depth'/><category term='bioinformatics'/><category term='complete genomics'/><category term='vcftools'/><category term='humor'/><title type='text'>Mendelian Disorder</title><subtitle type='html'>genomics . bioinformatics . biotechnology . sequencing . opinions</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://mendeliandisorder.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3162625598067227745/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://mendeliandisorder.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>MJ</name><uri>http://www.blogger.com/profile/15925017686062610434</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/-HTMyuoZZmcI/Tp312OoFZwI/AAAAAAAAAts/HG8JIQnyggA/s220/63294_481448885235_672250235_7212947_4269557_n.jpg'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>29</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-3162625598067227745.post-6617041519134560340</id><published>2011-12-09T16:33:00.001-08:00</published><updated>2011-12-09T16:35:41.454-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='GATK'/><category scheme='http://www.blogger.com/atom/ns#' term='bioinformatics'/><category scheme='http://www.blogger.com/atom/ns#' term='genomics'/><title type='text'>GATK's available annotations</title><content type='html'>Perhaps because it changes too often, GATK's available annotations for VCF files does not seem to be online anywhere that I've seen. The GATK site says to run GATK with the "--list" parameter to list them. Doing that requires putting in valid input files and such. Basically, it's a pain.&lt;br /&gt;&lt;br /&gt;So here's the list from GATK v1.3-21-gcb284ee&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;blockquote class="tr_bq"&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;Available annotations for the VCF INFO field:&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; ChromosomeCounts&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; IndelType&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; HardyWeinberg&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; SpanningDeletions&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; NBaseCount&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; AlleleBalance&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; MappingQualityZero&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; LowMQ&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; BaseCounts&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; MVLikelihoodRatio&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; InbreedingCoeff&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; RMSMappingQuality&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; TechnologyComposition&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; HaplotypeScore&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; SampleList&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; QualByDepth&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; FisherStrand&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; SnpEff&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; HomopolymerRun&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; DepthOfCoverage&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; MappingQualityZeroFraction&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; GCContent&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; MappingQualityRankSumTest&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; ReadPosRankSumTest&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; BaseQualityRankSumTest&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;Available annotations for the VCF FORMAT field:&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; ReadDepthAndAllelicFractionBySample&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; AlleleBalanceBySample&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; DepthPerAlleleBySample&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; MappingQualityZeroBySample&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;Available classes/groups of annotations:&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; RodRequiringAnnotation&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; StandardAnnotation&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; WorkInProgressAnnotation&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; ExperimentalAnnotation&lt;/span&gt;&lt;br style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;" /&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; RankSumTest&lt;/span&gt;&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;No promises about how accurate this is for any other version.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3162625598067227745-6617041519134560340?l=mendeliandisorder.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mendeliandisorder.blogspot.com/feeds/6617041519134560340/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mendeliandisorder.blogspot.com/2011/12/gatks-available-annotations.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3162625598067227745/posts/default/6617041519134560340'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3162625598067227745/posts/default/6617041519134560340'/><link rel='alternate' type='text/html' href='http://mendeliandisorder.blogspot.com/2011/12/gatks-available-annotations.html' title='GATK&apos;s available annotations'/><author><name>MJ</name><uri>http://www.blogger.com/profile/15925017686062610434</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/-HTMyuoZZmcI/Tp312OoFZwI/AAAAAAAAAts/HG8JIQnyggA/s220/63294_481448885235_672250235_7212947_4269557_n.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3162625598067227745.post-5966952414591063861</id><published>2011-11-16T15:56:00.001-08:00</published><updated>2011-11-16T16:09:11.331-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='bioinformatics'/><category scheme='http://www.blogger.com/atom/ns#' term='blogging'/><category scheme='http://www.blogger.com/atom/ns#' term='politics'/><title type='text'>Stop SOPA</title><content type='html'>I'm not a huge politics guy, so I don't want to go on a tirade about the Stop Online Piracy Act. Sufficed to say, it's a huge censorship bill parading as a bill to protect intellectual property. While I think the majority of us support protecting IP, I can't imagine the best way to do so is to monitor everything we do and censor websites based on some government-backed list of sensitive content.&lt;br /&gt;&lt;br /&gt;As an example, if someone were to post copyrighted material in the comments section of my blog without my notice, my blog could potentially be shut down (censored) because of it. If I were to link to a site that had, somewhere on it, shared copyrighted material (even if I had no idea it was there and didn't intend for anyone to go there and see it or download it), my blog could be shut down (censored).&lt;br /&gt;&lt;br /&gt;Moreover, the bill basically forces both providers and hosting services to strongly monitor content and shut down sites that potentially "infringe" on protected IP. Ever posted a picture of something that was copyrighted? Ever shared a link to a YouTube video with a copyrighted song in the background? That could be enough to get you shut down (because your host or provider doesn't want to be sued).&lt;br /&gt;&lt;br /&gt;My wife and I often talk about the situation in Japan (she's Japanese) regarding the Fukushima nuclear reactor situation and how censorship and control of the media in Japan is so strong that the general public has no idea how dire the situation is. We even see that censorship and media control bleed over to here in America, where the general public is under the impression the nuclear situation isn't as bad as it is, in large part because the government and big media have an incentive to see nuclear power as an industry succeed.&lt;br /&gt;&lt;br /&gt;Now here's the scary thing: If I were to start posting excerpts from copyrighted articles about that topic to respond to, if SOPA were to pass, my blog could potentially be shut down. I could personally be denied access. It's unclear to me exactly how much power this bill would give big media and the government. And that's the major problem.&lt;br /&gt;&lt;br /&gt;In our field of genomics, a lot of us utilize freedom of sharing information and media to rapidly advance the science. I understand that this bill is meant to limit piracy of software and other digital media, but it represents a foot-in-the-door to all sorts of censorship. Could SEQanswers, for example, be sued for having a post up that contains the Illumina adaptor sequences? It certainly has been threatened in the past based on such things, but with SOPA passed, SEQanswers very well could have been shut down for that. What a detriment that would have been to the genomics and bioinformatics community.&lt;br /&gt;&lt;br /&gt;Anyway, I just wanted to share this on my outlet to the world, as it is a very important issue generally and to our field in particular.&lt;br /&gt;&lt;br /&gt;If you are an American and you do not support SOPA, please &lt;a href="https://wfc2.wiredforchange.com/o/9042/p/dia/action/public/?action_KEY=8173" target="_blank"&gt;send a notice&lt;/a&gt; to your congresspeople telling them not to support it, either.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3162625598067227745-5966952414591063861?l=mendeliandisorder.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mendeliandisorder.blogspot.com/feeds/5966952414591063861/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mendeliandisorder.blogspot.com/2011/11/stop-sopa.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3162625598067227745/posts/default/5966952414591063861'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3162625598067227745/posts/default/5966952414591063861'/><link rel='alternate' type='text/html' href='http://mendeliandisorder.blogspot.com/2011/11/stop-sopa.html' title='Stop SOPA'/><author><name>MJ</name><uri>http://www.blogger.com/profile/15925017686062610434</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/-HTMyuoZZmcI/Tp312OoFZwI/AAAAAAAAAts/HG8JIQnyggA/s220/63294_481448885235_672250235_7212947_4269557_n.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3162625598067227745.post-5778085188113243514</id><published>2011-11-16T15:34:00.001-08:00</published><updated>2011-11-16T15:36:12.638-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Technology'/><title type='text'>Drobo</title><content type='html'>&lt;div&gt;&lt;p&gt; A secure, offline storage solution with automagical backing up. Forget the cloud for sensitive data. &lt;/p&gt;&lt;p&gt;#lovetech&lt;/p&gt;&lt;br/&gt;&lt;img src='http://lh6.ggpht.com/-rotbjmQsznk/TsRIkOhl0SI/AAAAAAAAAxA/xv5Py9_cghg/2011-11-16_14-53-27_70.png' /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3162625598067227745-5778085188113243514?l=mendeliandisorder.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mendeliandisorder.blogspot.com/feeds/5778085188113243514/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mendeliandisorder.blogspot.com/2011/11/drobo.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3162625598067227745/posts/default/5778085188113243514'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3162625598067227745/posts/default/5778085188113243514'/><link rel='alternate' type='text/html' href='http://mendeliandisorder.blogspot.com/2011/11/drobo.html' title='Drobo'/><author><name>MJ</name><uri>http://www.blogger.com/profile/15925017686062610434</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/-HTMyuoZZmcI/Tp312OoFZwI/AAAAAAAAAts/HG8JIQnyggA/s220/63294_481448885235_672250235_7212947_4269557_n.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh6.ggpht.com/-rotbjmQsznk/TsRIkOhl0SI/AAAAAAAAAxA/xv5Py9_cghg/s72-c/2011-11-16_14-53-27_70.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3162625598067227745.post-689677205109850157</id><published>2011-11-03T13:15:00.001-07:00</published><updated>2011-11-03T13:15:55.175-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='humor'/><category scheme='http://www.blogger.com/atom/ns#' term='genomics'/><title type='text'>There's something funny about this quality score</title><content type='html'>&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;@COLUMBO:3:1:1653:950#0/1 NGCCGCGATATCGGATCCAACAGATCGGAA&lt;wbr&gt;&lt;/wbr&gt;GAGCTC +COLUMBO:3:1:1653:950#0/1 BOOOOTTTTTYYYYY__________b____&lt;wbr&gt;&lt;/wbr&gt;[T[[[_&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3162625598067227745-689677205109850157?l=mendeliandisorder.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mendeliandisorder.blogspot.com/feeds/689677205109850157/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mendeliandisorder.blogspot.com/2011/11/theres-something-funny-about-this.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3162625598067227745/posts/default/689677205109850157'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3162625598067227745/posts/default/689677205109850157'/><link rel='alternate' type='text/html' href='http://mendeliandisorder.blogspot.com/2011/11/theres-something-funny-about-this.html' title='There&apos;s something funny about this quality score'/><author><name>MJ</name><uri>http://www.blogger.com/profile/15925017686062610434</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/-HTMyuoZZmcI/Tp312OoFZwI/AAAAAAAAAts/HG8JIQnyggA/s220/63294_481448885235_672250235_7212947_4269557_n.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3162625598067227745.post-1868719438929574518</id><published>2011-10-13T11:16:00.000-07:00</published><updated>2011-10-13T11:16:07.708-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='journal club'/><category scheme='http://www.blogger.com/atom/ns#' term='personalized genomics'/><category scheme='http://www.blogger.com/atom/ns#' term='genomics'/><category scheme='http://www.blogger.com/atom/ns#' term='exome-seq'/><title type='text'>Exome Sequencing Q&amp;A</title><content type='html'>In case you missed it, the most recent issue of Genome Biology is focused on exome-sequencing. Included among a number of exome-seq platform comparison papers (like &lt;a href="http://genomebiology.com/2011/12/9/R94/abstract"&gt;this one&lt;/a&gt; and &lt;a href="http://genomebiology.com/2011/12/9/R95/abstract"&gt;this one&lt;/a&gt;, which I intend to discuss in comparison with &lt;a href="http://www.nature.com/nbt/journal/vaop/ncurrent/full/nbt.1975.html"&gt;my own recent publication&lt;/a&gt; in Nature Biotech in a future post) is a fun Q&amp;amp;A article entitled "&lt;a href="http://genomebiology.com/2011/12/9/128"&gt;Exome sequencing: the expert view&lt;/a&gt;". The article asked three experts (Dr. Leslie G. Biesecker of NHGRI, Dr. Kevin V. Shianna of Duke University, and Dr. Jim C. Mullikin of NHGRI) a series of questions related to exome-seq ranging from what we've learned to whether exome-seq will be around in a couple years.&lt;br /&gt;&lt;br /&gt;Since I like to think of myself as a bit of an expert on the topic (mostly because I dedicated the past year of my life to it), I thought it might be fun to answer the questions myself and discuss a bit what I think of the answers in the article.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;&lt;b&gt;How can exome sequencing contribute to our understanding of the dynamic nature of the genome?&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;Exome sequencing is quite literally the best way we currently have to assess the most interpretable part of the genome: the protein-coding sequence. While it accounts for only about 1% of the whole genome, that 1% is the part we understand the best, because it's the part that we can assess through central dogma. Basically, when we see a mutation in an exon, we can then determine how the RNA will be affected, and subsequently how the protein will be mutated. We can determine precisely which amino acids will be mutated, and which amino acids they will turn into. And thanks to decades of work on proteins and amino acids, we can predict fairly reasonably how damaging a particular amino acid change will be. We can also predict nonsense mutations, which almost always result in loss of the protein as well as frameshifts (which typically lead to a nonsense mutation), splice site variations, and regulatory site mutations (assuming the UTRs/upstream and downstream regions are enriched).&lt;br /&gt;&lt;br /&gt;I put it like this to someone not long ago: What would you look at if you sequenced someone's genome? Exons. That is literally the first thing, because it's the low hanging fruit. If you find nothing there, you might zoom out and look at splice sites (which exome-seq generally captures), micro RNAs (which exome-seq generally captures), and regulatory elements (which are becoming recognized as increasingly important--be ready for regulatory element/transcription factor binding site enrichment kits in the near future). As for the intergenic regions, they're such a mystery with regards to function that I feel secure saying we basically don't know what to do with variants in those regions.&lt;br /&gt;&lt;br /&gt;At the very least, I have no doubt exome-sequencing will drive a revolution in Mendelian disorder research (it already is). We've already seen a number of them published and as an increasing number of samples are sequenced, that will only improve. Because Mendelians are most often caused by changes in the protein coding region of genes, exome-seq is a prime way of solving them.&amp;nbsp;The one major barrier I see for solving Mendelians is the (perhaps surprising) prevalance of dominant low-penetrance disorders. Mendelian disorders with this mode of inheritance are inherently difficult to solve, and will require a large number of samples sequenced. That said, this again primes exome-seq as a major method, and I feel secures it as a major technology for at least a few more years.&lt;br /&gt;&lt;br /&gt;In the paper, Kevin Shianna mentions some shortcomings. In particular, he mentions that structural variations (SVs) are difficult to detect by exome-seq, which is certainly true. I would say there is some light at the end of that tunnel, at least with regards to copy-number variations (CNVs), as &lt;a href="http://bioinformatics.oxfordjournals.org/content/early/2011/08/09/bioinformatics.btr462.abstract"&gt;some successful work&lt;/a&gt; is already showing positive results with detecting CNVs in exome-seq.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;&lt;b&gt;How much has exome sequencing been driven by cost alone?&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;Certainly I think the increasing&amp;nbsp;prevalence&amp;nbsp;of exome-seq has been almost completely driven by cost. Targeted sequencing (and subsequently, exome-seq) were developed precisely because labs wanted to assess particular regions by sequencing without paying for an entire genome sequence (most of which they wouldn't even look at the data from). That said, were the exome-seq and WGS assays the same cost, there would still be significantly higher cost both in terms of time and money associated with the bioinformatics, analysis and storage of WGS data compared with exome-seq data. Even at equivalent assay cost, exome-seq is cheaper. Will that ever change? Unlikely, I think.&lt;br /&gt;&lt;br /&gt;Consider the value of an exome versus the value of a whole genome currently. One could certainly argue that because the information in the exome is so much more meaningful (please don't write me letters for saying that), every exome base is worth significantly more than intergenic bases. In the paper, LGB and JCM make the point that an exome costs about 1/6th what a whole genome costs. That seems pretty accurate to me. And that does demonstrate how much more valuable we consider the exome compared to the other regions. Because of that I would not say that exome-seq has been driven by cost alone, but also because it is considered more valuable per base than whole genome.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;What are the major limitations of exome sequencing?&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;Obviously, any meaningful variation outside of the exome is missed by exome-seq, and that's a major drawback. Just because they're less interpretable does not mean that non-exonic variations are meaningless. Missing SVs is another large drawback, although I feel it's important to point out that SV detection does not require a deep whole genome sequencing experiment--a low depth WGS that is not adequate for small variant calling is more than adequate for SV calling. Therefore, exome-seq could be supplemented by low depth WGS for SV calling without going as far as doing a full WGS. I'd estimate the cost of an exome-seq plus low-depth SV-WGS to still be less than 1/4th&amp;nbsp;the&amp;nbsp;cost of a full WGS.&lt;br /&gt;&lt;br /&gt;The other major issue, which is also brought up in the paper, is the fact that exome-seq misses almost everything it doesn't target. If the mutation causing a particular disorder is in an exon that doesn't happen to be on your target list, you're going to miss it. WGS, on the other hand, would likely catch it. That's a major limitation. However, if the exon is poorly annotated and that's why it's not on the exome-seq design, there's a good chance you'll miss it with WGS as well. So while this is a major limitation, it's as much a major limitation of sequencing in general. Specifically, that mutational analysis is so heavily reliant on annotation, which is incomplete at best.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;&lt;b&gt;What lessons from exome sequencing studies can be applied to whole genome sequencing studies?&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;Most of what we've learned from exome sequencing studies is directly applicable to whole genome sequencing studies. Currently, the field is in a state where exomic variation is highly detectable and interpretable, and that's due in no small part to exome-seq. Coding region annotation in particular has become significantly more powerful as a result of all the exome-seq work that's been done. We have a much stronger appreciation for the limitations of sequencing as well. Although we are very sensitively detecting variants, even at miniscule false positive rates we have trouble detecting disease-causing variants with single samples. As far as I'm aware, not a single Mendelian disorder has been solved by sequencing of a single sample (exome or otherwise). Because we need some way to eliminate false positives, and we use family members and additional samples for that.&lt;br /&gt;&lt;br /&gt;And that brings up the other major lesson we've learned that we'll need to apply to WGS in the future: Small sample groups are simply not adequate for most purposes. Even for monogenic conditions, numerous samples and multiple families are required. Much like GWAS and linkage studies, we're going to need to sequence a lot of people to find meaning. This has a lot to do with sensitivity and specificity. A false positive rate of 1% sounds okay in a lot of fields, but in genomics, that's 400 false positives in the exome and 30,000 in the whole genome. Good luck finding your one disease causing mutation in there! But we're doing it through clever decision-making with regards to which family members to sequence, which individuals to sequence, et cetera. And that is perhaps the most useful lesson from exome-seq so far.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;How do exome sequencing studies contribute to our mechanistic understanding of disease?&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;The most obvious answer is that exome-seq is very likely to be how we find the genetic etiology of the vast majority of Mendelian disorders. How this will translate to complex disease has yet to be seen, though it's certainly being pursued, particularly in the field of autism.&lt;br /&gt;&lt;br /&gt;I would like to respond to a statement made by Kevin Shianna in the paper:&lt;br /&gt;&lt;blockquote&gt;&lt;i&gt;However, exome studies will have very limited power to identify causative variants in regulatory regions spread across the genome (transcription binding sites, enhancers, and so on). Implementing a WGS approach would allow detection of variants in these regions, thus increasing our knowledge of disease beyond the coding region of the genome&lt;/i&gt;.&lt;/blockquote&gt;While it is certainly true that exome-seq may be unable to identify mutations in the regulome (yes, &lt;i&gt;regulome&lt;/i&gt;), WGS is not the only alternative to it. Why not instead supplement the exome-seq with a regulome-seq? I guarantee you companies are going to create such a thing, and even if you don't want to wait for them to be commercially available, one could create a custom target enrichment that covers the regulome as well.&lt;br /&gt;&lt;br /&gt;Back to the cost issue, at about 1/6th the cost of WGS each, exome-seq + regulome-seq would still only come to about 1/3rd the cost of WGS. Outside the exome and regulome, is there much more in the genome that we can even comprehend by sequencing at this moment? Structural variation, perhaps, but as I've already mentioned, SVs can be detected with a low depth WGS. So for half the cost, we may be able to obtain the same biological meaning. That's something to seriously consider when deciding how to budget your sequencing. If you can squeeze out double the samples and obtain practically the same info by doing exome-seq + regulome-seq + low depth WGS, wouldn't you?&lt;br /&gt;&lt;br /&gt;I would also add that while it may not be mechanistic, the rise of exome-seq has led to a realization that that we are not adequately prepared from a policy standpoint yet. More than anything, I think it has led to a general realization that medical genetics, genetic counselors and DTC genomics are arriving before society is really ready for them. And that's an issue that I think is going to need increased attention as we researchers charge forward with sequencing &lt;i&gt;everybody&lt;/i&gt;.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Does exome sequencing have a limited 'shelf life'?&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;Not as limited as many people think. Even in the $1000 genome world, exome-seq will still have a place. I'll discuss this a bit more shortly, but there are genomic elements that are more sensitively determined by exome-seq. I feel strongly that once sequencing is sufficiently cheap, exome-seq will become the &lt;i&gt;de facto&lt;/i&gt; standard for genomic information much as microarrays were for a good decade. This is due to a combination of factors--the majority of the genome being uninterpretable, the storage cost of genomes versus exomes, the bioinformatic challenges of whole genome versus whole exome, et cetera. The article goes into good depth on many of those issues. The one I'd focus on, however, is that until WGS is actually cheaper than exome-seq, it will always have a place in our field.&lt;br /&gt;&lt;br /&gt;I have to disagree strongly with the following statement:&lt;br /&gt;&lt;blockquote&gt;&lt;i&gt;Yes, as soon as the difference in cost between exome and whole genome diminishes (which will be soon) and issues with data management and storage are resolved, whole genome sequencing will be the method of choice.&lt;/i&gt;&lt;/blockquote&gt;First of all, he knows as well as the rest of us that "issues with data management and storage" are not trivial to resolve. But beyond that, this is the stance that says, basically, "what's a thousand dollars for a whole genome compared to $300 for just the exome"? I state these numbers because that's where we're heading. At some point exome-seq will stop getting cheaper because the enrichment assay will always cost something. Same for WGS. And my answer to that question is simple: You'll still be able to get three exomes for the price of one whole genome.&lt;br /&gt;&lt;br /&gt;The real thing that will limit exome-seq's lifespan is the interpretability of the rest of the genome. If we can utilize the intergenic data to a greater degree, then those bases become more valuable. Then WGS becomes more appealing. Until then, it's all exome (and regulome--I swear, it is coming soon!).&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;&lt;b&gt;How much do you think that future research will be restricted by the IT-related costs of the analysis?&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;A great deal. Clouds are not cheap. I'm not convinced they're even the answer at this point. But are clusters? I'm not so sure there, either. I'm not convinced we have the IT solution to this yet because, frankly, I don't think the hardware manufacturers have realized what a lucrative market it's going to be.&lt;br /&gt;&lt;br /&gt;I had a conversation about this with somebody not long ago. I said, we've gone from producing gigabytes to terabytes to petabytes over the span of four years in the sequencing world. By now I think there's a good chance we're at or nearing a worldwide exabyte of genomic information. As we start sequencing more and more people, we're going to start hitting the limit for storage capacity worldwide. Sounds crazy, right? But is it?&lt;br /&gt;&lt;br /&gt;I also had another rather humorous thought that evolved from that one. How much energy will the number of hard drives needed to store the entire human population's genomic data require? How much heat will those drives produce? In the future, could sequencing be a major cause of global warming? Think about it!&lt;br /&gt;&lt;br /&gt;Frankly, I've started to take the issue seriously. One of the tasks on my plate is a thorough assessment of alternative storage formats for genomic data. But that's a post for another day.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;&lt;b&gt;Are there any advantages of whole exome over whole genome sequencing?&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;Yes. Let me say that in less uncertain terms:&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;&lt;b&gt;YES!&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;And by that I don't mean what Leslie Biesecker and Jim Mullikin said in their response in the paper. There are literally exonic regions that are better resolved by exome-seq than WGS. We demonstrate that in our recent Nature Biotech paper. A typical WGS will miss a small but meaningful number of exonic variations that are detected by exome-seq. To be fair, the opposite is also true: WGS will detect some exonic variations missed by exome-seq. To me, that says one thing: To be truly comprehensive at this point in time, we need to do both. Naturally, budgets prevent such a thing, but it's important to recognize that targeted enrichment can allow sequencing of regions missed by WGS, and that this is an advantage of exome-seq.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;br /&gt;&lt;b&gt;Reference:&lt;/b&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="background-color: white; font-family: arial, helvetica, sans-serif; line-height: 18px;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;div class="cit" style="line-height: 1.45em;"&gt;Biesecker LG, Shianna KV, Mullikin JC. &lt;a href="http://genomebiology.com/content/12/9/128"&gt;Exome sequencing: the expert view&lt;/a&gt;. 2011. Genome Biol. Sep 14; 12(9):128 [Epub]&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3162625598067227745-1868719438929574518?l=mendeliandisorder.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mendeliandisorder.blogspot.com/feeds/1868719438929574518/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mendeliandisorder.blogspot.com/2011/10/exome-sequencing-q.html#comment-form' title='8 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3162625598067227745/posts/default/1868719438929574518'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3162625598067227745/posts/default/1868719438929574518'/><link rel='alternate' type='text/html' href='http://mendeliandisorder.blogspot.com/2011/10/exome-sequencing-q.html' title='Exome Sequencing Q&amp;A'/><author><name>MJ</name><uri>http://www.blogger.com/profile/15925017686062610434</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/-HTMyuoZZmcI/Tp312OoFZwI/AAAAAAAAAts/HG8JIQnyggA/s220/63294_481448885235_672250235_7212947_4269557_n.jpg'/></author><thr:total>8</thr:total><georss:featurename>Palo Alto, CA</georss:featurename><georss:point>37.4384069 -122.1802205</georss:point><georss:box>12.061150899999998 -162.60990800000002 62.81566289999999 -81.750533</georss:box></entry><entry><id>tag:blogger.com,1999:blog-3162625598067227745.post-1037118517773479755</id><published>2011-10-04T17:06:00.000-07:00</published><updated>2011-10-04T17:12:09.677-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='23andMe'/><category scheme='http://www.blogger.com/atom/ns#' term='blogging'/><category scheme='http://www.blogger.com/atom/ns#' term='personalized genomics'/><category scheme='http://www.blogger.com/atom/ns#' term='genomics'/><title type='text'>And he's back with a site redesign!</title><content type='html'>You may have been wondering: Where have I been?&lt;br /&gt;&lt;br /&gt;&lt;i&gt;Choose from the following:&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;&lt;div&gt;&lt;form&gt;&lt;input name="group1" type="checkbox" value="wedding" /&gt; Getting married.&lt;br /&gt;&lt;input name="group1" type="checkbox" value="driving" /&gt; Driving my wife's family from Japan around the Bay Area.&lt;br /&gt;&lt;input name="group1" type="checkbox" value="honeymoon" /&gt;  On my honeymoon.&lt;br /&gt;&lt;input name="group1" type="checkbox" value="paper" /&gt; Finishing up a huge paper.&lt;br /&gt;&lt;input name="group1" type="checkbox" value="paper" /&gt; Sick with meningitis.&lt;br /&gt;&lt;input name="group1" type="checkbox" value="paper" /&gt; Adopting a cat.&lt;/form&gt;&lt;/div&gt;&lt;br /&gt;&lt;i&gt;Answer:&lt;/i&gt;&lt;br /&gt;If you checked any of the above boxes, you're &lt;i&gt;correct&lt;/i&gt;! Congratulations.&lt;br /&gt;&lt;br /&gt;Yes, it's been a very busy time in my life, but I am finally back and promise to update more often than ever before.&lt;br /&gt;&lt;br /&gt;Also, I hope readers like the site redesign. This is a new offering from Blogger. A dynamic theme that allows you to choose how you'll view it from the dropdown menu on the left. I'm a big fan of the "Magazine" look, so that's what I've set to default.&lt;br /&gt;&lt;br /&gt;Also, here's a cool site: [&lt;a href="http://opensnp.org/"&gt;http://opensnp.org/&lt;/a&gt;] Basically, they let you sign up and will host your DTC SNP chip data with them free (with all the consequences* that comes with).&lt;br /&gt;&lt;br /&gt;Finally, &lt;a href="https://www.23andme.com/exome/"&gt;this offer&lt;/a&gt; from 23andMe is interesting. $999 for an 80x human exome. That's raw data only, folks. No analysis. I've got a long post about this whole thing in mind that I'll put up in the coming days. Still, it's a great opening salvo in the DTC/PGM era. Exome-seq DTC is truly here. Finally.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: x-small;"&gt;*Consequences unspecified. You should be protected by GINA regardless and, honestly, if someone wanted to know your genotypes so bad they could just pick up that coffee cup you threw away last week and do it themselves. But whatever, you still better ask dad before you post half of his DNA on the web.&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3162625598067227745-1037118517773479755?l=mendeliandisorder.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mendeliandisorder.blogspot.com/feeds/1037118517773479755/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mendeliandisorder.blogspot.com/2011/10/and-hes-back-with-site-redesign.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3162625598067227745/posts/default/1037118517773479755'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3162625598067227745/posts/default/1037118517773479755'/><link rel='alternate' type='text/html' href='http://mendeliandisorder.blogspot.com/2011/10/and-hes-back-with-site-redesign.html' title='And he&apos;s back with a site redesign!'/><author><name>MJ</name><uri>http://www.blogger.com/profile/15925017686062610434</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/-HTMyuoZZmcI/Tp312OoFZwI/AAAAAAAAAts/HG8JIQnyggA/s220/63294_481448885235_672250235_7212947_4269557_n.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3162625598067227745.post-4652834828202230493</id><published>2011-08-02T12:04:00.000-07:00</published><updated>2011-08-02T12:04:37.592-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='23andMe'/><category scheme='http://www.blogger.com/atom/ns#' term='personalized genomics'/><category scheme='http://www.blogger.com/atom/ns#' term='genomics'/><title type='text'>$50 off 23andMe Coupon Code</title><content type='html'>Hello friends!&lt;br /&gt;&lt;br /&gt;23andMe just sent me an email with a coupon code in it for $50 off! Apparently it's share-able, so if you've been waiting for a genomic deal, here you go!&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: arial, sans-serif; font-size: 13px;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;table border="0" cellpadding="5" cellspacing="0" style="display: table; margin-bottom: 10px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td align="left" colspan="1" rowspan="1" style="color: black; font-family: Arial, Helvetica, sans-serif; font-size: 11pt; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-align: left;"&gt;&lt;div style="font-size: 11pt; margin-bottom: 0px; margin-top: 0px;"&gt;To use this coupon, visit our&amp;nbsp;&lt;a href="http://r20.rs6.net/tn.jsp?llr=s4lvzjcab&amp;amp;et=1106890110824&amp;amp;s=141941&amp;amp;e=001uJFOj8mhmDdIjZJC6z0I-Jxqiya0XI0QEYGGScUmLu_j_TZQo2R-QcEG5K6VyguSzdgLQFcxzbLPM777cqfsyvzzDZUXS0yx19Z3TyVndzIYfaSU1fDacYV2rddF2cT4" shape="rect" style="color: blue; text-decoration: underline;" target="_blank"&gt;online store&lt;/a&gt;&amp;nbsp;and add an order to your cart. Click "I have a discount code" and enter the code below.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;a href="" name="13189fa044ceab11_LETTER.BLOCK3" style="color: #5c4520;"&gt;&lt;/a&gt;&lt;table border="0" cellpadding="5" cellspacing="0" style="background-color: #da6497; margin-bottom: 10px; padding-bottom: 5px; padding-left: 5px; padding-right: 5px; padding-top: 5px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td colspan="1" rowspan="1" style="border-bottom-color: rgb(255, 255, 255); border-bottom-style: dashed; border-bottom-width: 2px; border-left-color: rgb(255, 255, 255); border-left-style: dashed; border-left-width: 2px; border-right-color: rgb(255, 255, 255); border-right-style: none; border-right-width: 2px; border-top-color: rgb(255, 255, 255); border-top-style: dashed; border-top-width: 2px; font-family: Arial, Helvetica, sans-serif; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 3px; padding-left: 3px; padding-right: 3px; padding-top: 3px;"&gt;&lt;div align="center" style="color: white; font-size: 48pt; margin-bottom: 0px; margin-top: 0px; text-align: center;"&gt;&lt;strong&gt;$50 Off&lt;/strong&gt;&lt;/div&gt;&lt;/td&gt;&lt;td colspan="1" rowspan="1" style="border-bottom-color: rgb(255, 255, 255); border-bottom-style: dashed; border-bottom-width: 2px; border-left-color: rgb(255, 255, 255); border-left-style: none; border-left-width: 2px; border-right-color: rgb(255, 255, 255); border-right-style: dashed; border-right-width: 2px; border-top-color: rgb(255, 255, 255); border-top-style: dashed; border-top-width: 2px; color: white; font-family: Arial, Helvetica, sans-serif; font-size: 18pt; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 3px; padding-left: 3px; padding-right: 3px; padding-top: 3px;"&gt;&lt;div style="margin-bottom: 0px; margin-top: 0px;"&gt;&lt;strong&gt;&lt;span style="font-size: 12pt;"&gt;Coupon code:&lt;/span&gt;&amp;nbsp;4DPGQP&lt;/strong&gt;&lt;/div&gt;&lt;div style="font-size: 12pt; margin-bottom: 0px; margin-top: 0px;"&gt;&lt;strong&gt;Share with your friends!&lt;/strong&gt;&lt;/div&gt;&lt;div style="font-size: 12pt; margin-bottom: 0px; margin-top: 0px;"&gt;(Valid for new customers only)&amp;nbsp;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: arial, sans-serif; font-size: 13px;"&gt;If you do use it, please share your data with me!&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3162625598067227745-4652834828202230493?l=mendeliandisorder.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mendeliandisorder.blogspot.com/feeds/4652834828202230493/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mendeliandisorder.blogspot.com/2011/08/50-off-23andme-coupon-code.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3162625598067227745/posts/default/4652834828202230493'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3162625598067227745/posts/default/4652834828202230493'/><link rel='alternate' type='text/html' href='http://mendeliandisorder.blogspot.com/2011/08/50-off-23andme-coupon-code.html' title='$50 off 23andMe Coupon Code'/><author><name>MJ</name><uri>http://www.blogger.com/profile/15925017686062610434</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/-HTMyuoZZmcI/Tp312OoFZwI/AAAAAAAAAts/HG8JIQnyggA/s220/63294_481448885235_672250235_7212947_4269557_n.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3162625598067227745.post-7874137633231402254</id><published>2011-07-28T20:29:00.000-07:00</published><updated>2011-07-28T20:29:21.309-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='vcftools'/><category scheme='http://www.blogger.com/atom/ns#' term='bioinformatics'/><category scheme='http://www.blogger.com/atom/ns#' term='genomics'/><title type='text'>Intersecting Indels with VCFtools</title><content type='html'>Indel detection in is not what I'd call accurate at this point in our history. I, along with probably every other bioinformatician and genomicist looking at next-gen data, have noticed that immediately adjacent indels called as separate events but which are really the same variant called differently due to sequence context and the nature of our variant callers get called all the time.&lt;br /&gt;&lt;br /&gt;A band-aid approach is to simply look for overlap in indel calls within a window. Even a tiny window can make a big difference to small indels.&lt;br /&gt;&lt;br /&gt;To do this, I currently use &lt;a href="http://vcftools.sourceforge.net/"&gt;VCFtools&lt;/a&gt;, which makes it very simple. Specifically, use the vcf-isec command with the -w parameter. &lt;br /&gt;&lt;br /&gt;If I compare two libraries sequenced from the same individual that had indels called independently (using the same method), I end up with a few thousand overlapping indels that would have been assessed as independent from one another if I looked for exact overlap.&lt;br /&gt;&lt;br /&gt;Exact overlap:&lt;br /&gt;&lt;blockquote&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;vcf-isec -f -n =2 -o indels1.vcf.gz indels2.vcf.gz | wc -l&lt;/div&gt;&lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;468136&lt;/span&gt;&lt;/blockquote&gt;Overlap +/- 5b:&lt;br /&gt;&lt;blockquote&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;vcf-isec -f -n =2 -o -w 5 indels1.vcf.gz indels2.vcf.gz | wc -l&lt;/div&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;471047&lt;/div&gt;&lt;/blockquote&gt;I realize it's not that astounding a difference, but keep in mind this is looking at two different libraries from the same individual. If you're comparing calls from two completely different sequencing platforms or variant callers, these numbers jump quite a bit.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3162625598067227745-7874137633231402254?l=mendeliandisorder.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mendeliandisorder.blogspot.com/feeds/7874137633231402254/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mendeliandisorder.blogspot.com/2011/07/intersecting-indels-with-vcftools.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3162625598067227745/posts/default/7874137633231402254'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3162625598067227745/posts/default/7874137633231402254'/><link rel='alternate' type='text/html' href='http://mendeliandisorder.blogspot.com/2011/07/intersecting-indels-with-vcftools.html' title='Intersecting Indels with VCFtools'/><author><name>MJ</name><uri>http://www.blogger.com/profile/15925017686062610434</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/-HTMyuoZZmcI/Tp312OoFZwI/AAAAAAAAAts/HG8JIQnyggA/s220/63294_481448885235_672250235_7212947_4269557_n.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3162625598067227745.post-7804903713304519860</id><published>2011-07-26T19:48:00.000-07:00</published><updated>2011-07-26T19:48:14.366-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='ion torrent'/><category scheme='http://www.blogger.com/atom/ns#' term='genomics'/><title type='text'>The Ion Torrent Paper (Nature)</title><content type='html'>&lt;span class="Apple-style-span" style="color: #333333; font-family: arial, helvetica, clean, sans-serif; font-size: 13px; line-height: 19px;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;h1 class="article-heading" style="color: #222222; font-size: 26px; font-weight: normal; letter-spacing: -0.5px; line-height: 1.173; margin-bottom: 20px; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"&gt;&lt;a href="http://www.nature.com/nature/journal/v475/n7356/full/nature10242.html"&gt;An integrated semiconductor device enabling non-optical genome sequencing&lt;/a&gt;&lt;/h1&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;I'm just going to discuss my thoughts and comments on the paper, their findings, and how they relate to claimed specs for the IonTorrent PGM.&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;And just for some comparisons later on, the current Ion Torrent PGM product sheet is also attached [&lt;a href="http://bit.ly/ouykT8"&gt;click&lt;/a&gt;].&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;&lt;span class="Apple-style-span" style="color: #333333; font-family: inherit;"&gt;&lt;span class="Apple-style-span" style="line-height: 19px;"&gt;I'm going to try to step around some of the issues with the paper that have been well covered at&amp;nbsp;&lt;a href="http://www.wired.com/wiredscience/2011/07/how-accurate-is-the-new-ion-torrent-genome-really/"&gt;Daniel MacArthur's blog Genetic Future&lt;/a&gt;. I think he is pretty fair to the paper in criticizing its "validation rate" and so forth.&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="color: #333333; line-height: 19px;"&gt;Sufficed to say, a year and a half to two years ago, perhaps a 15x human genome would have been considered adequate, but in a paper coming out of LifeTech, manufacturer of the SOLiD sequencer, if you're going to use the whole genome sequence off the SOLiD as validation, let's go for at least 30x coverage.&lt;/span&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;&lt;span class="Apple-style-span" style="color: #333333; font-family: inherit;"&gt;&lt;span class="Apple-style-span" style="line-height: 19px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;&lt;span class="Apple-style-span" style="color: #333333; font-family: inherit; font-size: large;"&gt;&lt;span class="Apple-style-span" style="line-height: 19px;"&gt;&lt;b&gt;Comments&lt;/b&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;blockquote&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;&lt;span class="Apple-style-span" style="color: #333333; line-height: 19px;"&gt;"..there is a desire to continue to drop the cost of sequencing at an exponential rate consistent with the semiconductor industry's Moore's Law..."&lt;/span&gt;&lt;/span&gt;&lt;/blockquote&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="color: #333333; font-family: inherit; line-height: 19px;"&gt;They bring up Moore's law repeatedly, and they sequenced Moore himself in the paper. But wait a second... &lt;a href="http://www.genome.gov/sequencingcosts/"&gt;sequencing costs are dropping significantly faster than Moore's law&lt;/a&gt;! I suppose it's a minor complaint, but let's give sequencing the credit it's due--per-base cost of sequencing is dropping much faster than Moore's law!&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="color: #333333; font-family: inherit; line-height: 19px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="color: #333333;"&gt;&lt;span class="Apple-style-span" style="line-height: 19px;"&gt;Also, let me complain very briefly about the use of a few buzz terms:&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;blockquote&gt;&lt;span class="Apple-style-span" style="color: #333333;"&gt;&lt;span class="Apple-style-span" style="line-height: 19px;"&gt;"To overcome these limitations and further &lt;i&gt;democratize&lt;/i&gt; the practice of sequencing, a &lt;i&gt;paradigm shift&lt;/i&gt; based on non-optical sequencing on newly developed integrated circuits was pursued."&lt;/span&gt;&lt;/span&gt;&lt;/blockquote&gt;&lt;span class="Apple-style-span" style="color: #333333;"&gt;&lt;span class="Apple-style-span" style="line-height: 19px;"&gt;If there is any term that is going to supplant "paradigm shift" as the default excessively pretentious term in scientific papers, it has to be "democratize". Look, unless this device is offering $100 genomes, it's not democratizing sequencing. Can we leave sensationalist buzz words for the advertisements and stick to reality for the Nature papers? (Wait, what am I saying?)&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="color: #333333; line-height: 19px;"&gt;I appreciate that we're talking about a non-light system here, but the observation of protons rather than photons released upon base incorporation isn't really a &lt;/span&gt;&lt;span class="Apple-style-span" style="color: #333333; line-height: 19px;"&gt;&lt;i&gt;paradigm shift&lt;/i&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="color: #333333; line-height: 19px;"&gt;. Once we're taking pictures of DNA with electron microscopes and reading the entire genome instantaneously in one shot, then we can start talking about &lt;/span&gt;&lt;span class="Apple-style-span" style="color: #333333; line-height: 19px;"&gt;&lt;i&gt;paradigm shifts&lt;/i&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="color: #333333; line-height: 19px;"&gt;.&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="color: #333333;"&gt;&lt;span class="Apple-style-span" style="line-height: 19px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="color: #333333; font-size: large;"&gt;&lt;span class="Apple-style-span" style="line-height: 19px;"&gt;&lt;b&gt;Scalability&lt;/b&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="color: #333333;"&gt;&lt;span class="Apple-style-span" style="line-height: 19px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="color: #333333;"&gt;&lt;span class="Apple-style-span" style="line-height: 19px;"&gt;Okay, let's look at the scalability based on their data, and what's being touted in their product sheet.&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="color: #333333;"&gt;&lt;span class="Apple-style-span" style="line-height: 19px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="color: #333333;"&gt;&lt;span class="Apple-style-span" style="line-height: 19px;"&gt;A typical 2-h run using an ion chip with 1.2M sensors generates approximately 25 million bases.&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="color: #333333;"&gt;&lt;span class="Apple-style-span" style="line-height: 19px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="color: #333333;"&gt;&lt;span class="Apple-style-span" style="line-height: 19px;"&gt;That pretty much throws out the Ion 314 (1.3M wells) for human genome sequencing. A 30x diploid human genome would require 640 days on the 1.2M sensor chip in the paper. Even just 1x coverage would take three weeks. Yikes.&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="color: #333333;"&gt;&lt;span class="Apple-style-span" style="line-height: 19px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="color: #333333;"&gt;&lt;span class="Apple-style-span" style="line-height: 19px;"&gt;Later there is a rather astounding statement:&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;blockquote&gt;&lt;span class="Apple-style-span" style="color: #333333;"&gt;&lt;span class="Apple-style-span" style="line-height: 19px;"&gt;"At present, 20-40% of the sensors in a given run yield mappable reads."&lt;/span&gt;&lt;/span&gt;&lt;/blockquote&gt;&lt;span class="Apple-style-span" style="color: #333333;"&gt;&lt;span class="Apple-style-span" style="line-height: 19px;"&gt;Room for improvement there, methinks.&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="color: #333333;"&gt;&lt;span class="Apple-style-span" style="line-height: 19px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="color: #333333;"&gt;&lt;span class="Apple-style-span" style="line-height: 19px;"&gt;In Table 1, they test the 11M chip with E. coli and Human. The E. coli yields 273.9Mb of sequence off that chip. At about 20-40% of sensors yielding mappable reads, that gives an average read length of 62b-125b. This is consistent with their finding that 2.6M reads are &amp;gt;=21b and that 1.8M are &amp;gt;= 100b. Also with Figure S15, where it appears the majority of read lengths are around 110b-120b. So at least the read lengths are not disappointing.&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="color: #333333;"&gt;&lt;span class="Apple-style-span" style="line-height: 19px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="color: #333333;"&gt;&lt;span class="Apple-style-span" style="line-height: 19px;"&gt;Their Ion 318 is the 12 million wells chip. I think this is similar to their 11M chip in the paper. Going back to Table 1, they got 273.9Mb off the 11M chip. At issue is the promised "[starting] 1 Gb of high-quality sequence" off the Ion 318 chip in the Ion Torrent product sheet. Now, I completely believe that advancements have been made since the paper's acceptance on May 26th, 2011, but four times the yield? Not so sure about that claim. I'm not doubting it can get there, but I'll put it this way: This is a paper from the company that makes the product--if anyone can make it work optimally, it should be them. And their optimal report here has it at significantly lower than what they're advertising. Oh, and the specs sheet has small print next to the Ion 318 chip entry that says "the content provided herein [...] is subject to change without notice". Let's just say I'm skeptical.&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="color: #333333;"&gt;&lt;span class="Apple-style-span" style="line-height: 19px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="color: #333333;"&gt;&lt;span class="Apple-style-span" style="line-height: 19px;"&gt;Anyway, the rest of the paper is pretty vague. One issue on everyone's mind is the homopolymer issue, which is addressed in a single sentence stating the accuracy of 5-base homopolymers (97.328%--not terrible, but not overly good either) and that it's "better than pyrosequencing-based methods" (read: 454). &amp;nbsp;How about longer ones? No idea. Figure S16b only goes out to 5b also with a curve that isn't looking too encouraging, though.&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="color: #333333;"&gt;&lt;span class="Apple-style-span" style="line-height: 19px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="color: #333333;"&gt;&lt;span class="Apple-style-span" style="line-height: 19px;"&gt;Apparently the Ion 316, their 6.3M well chip is currently available, and they claim at least 100Mb of sequence per run. This is consistent with their mapped bases in the paper (169.6Mb off a 6.1M ion chip). &amp;nbsp;With this chip, you're talking about 3 days on one machine for 1x diploid human coverage, and about 90 days for 30x coverage. Better, but still not there when it comes to human sequencing. &amp;nbsp;Still, it's at the level of completing entire &lt;i&gt;bacterial&lt;/i&gt;&amp;nbsp;genomes in 2 hours. If you're into that sort of thing (and don't have access to a HiSeq or something else...).&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="color: #333333;"&gt;&lt;span class="Apple-style-span" style="line-height: 19px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="color: #333333; font-size: large;"&gt;&lt;span class="Apple-style-span" style="line-height: 19px;"&gt;&lt;b&gt;Not Quite "Post-Light"&lt;/b&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="color: #333333;"&gt;&lt;span class="Apple-style-span" style="line-height: 19px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="color: #333333;"&gt;&lt;span class="Apple-style-span" style="line-height: 19px;"&gt;You know, there's a lot of "post-light" jibber jabber in the paper. It's Ion Torrent's favorite buzz phrase, and I'm a fan, actually (much more so than I am of "democratize" and "paradigm shift"). But at this point, with this performance, I'm not sure we're "post-light" yet. The technology is there, but it isn't scaled up enough yet. There's a claim made a the end of the paper that's interesting:&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;blockquote&gt;&lt;span class="Apple-style-span" style="color: #333333;"&gt;&lt;span class="Apple-style-span" style="line-height: 19px;"&gt;"The G. Moore genome sequence required on the order of a thousand individual ion chips comprising about one billion sensors. ...our work suggests that readily available CMOS nodes should enable the production of one-billion-sensor ion chips and low-cost routine human genome sequencing."&lt;/span&gt;&lt;/span&gt;&lt;/blockquote&gt;&lt;span class="Apple-style-span" style="color: #333333;"&gt;&lt;span class="Apple-style-span" style="line-height: 19px;"&gt;Doubtless this is long in the works already, and I hope it is a reality. Because making the leap from the things in this paper to a functional 1B sensor chip would make a huge difference.&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="color: #333333;"&gt;&lt;span class="Apple-style-span" style="line-height: 19px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="color: #333333;"&gt;&lt;span class="Apple-style-span" style="line-height: 19px;"&gt;I'd say I was a bit disappointed with this paper. It felt half done. I'm confused about the way the comparison to SOLiD was done--why wasn't the SOLiD WGS of G. Moore done to an adequate depth? I'm a bit annoyed at the lack of comprehensive information, as well. The homopolymer issue is known--why hide behind homopolymers of 5b or smaller? Just give the whole story in your paper--it's an article in Nature, not an advertisement.&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="color: #333333;"&gt;&lt;span class="Apple-style-span" style="line-height: 19px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="color: #333333;"&gt;&lt;span class="Apple-style-span" style="line-height: 19px;"&gt;Anyway, to quote a very smart man I know, "it is what it is." Ion Torrent is here to stay and it's only going to improve. I certainly hope it does--I'd love to see it pumping out 1Gb/2hrs with long read lengths.&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3162625598067227745-7804903713304519860?l=mendeliandisorder.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mendeliandisorder.blogspot.com/feeds/7804903713304519860/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mendeliandisorder.blogspot.com/2011/07/ion-torrent-paper-nature.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3162625598067227745/posts/default/7804903713304519860'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3162625598067227745/posts/default/7804903713304519860'/><link rel='alternate' type='text/html' href='http://mendeliandisorder.blogspot.com/2011/07/ion-torrent-paper-nature.html' title='The Ion Torrent Paper (Nature)'/><author><name>MJ</name><uri>http://www.blogger.com/profile/15925017686062610434</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/-HTMyuoZZmcI/Tp312OoFZwI/AAAAAAAAAts/HG8JIQnyggA/s220/63294_481448885235_672250235_7212947_4269557_n.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3162625598067227745.post-8488853274269451560</id><published>2011-07-20T18:40:00.000-07:00</published><updated>2011-07-20T18:40:49.898-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='news'/><category scheme='http://www.blogger.com/atom/ns#' term='blogging'/><title type='text'>Mendel Google Doodle</title><content type='html'>&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-el6PbMFaSqg/TieDGMBSWsI/AAAAAAAAAq4/1fKhNZU44zs/s1600/6a00d8341c630a53ef014e89ffe433970d-600wi.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="123" src="http://4.bp.blogspot.com/-el6PbMFaSqg/TieDGMBSWsI/AAAAAAAAAq4/1fKhNZU44zs/s320/6a00d8341c630a53ef014e89ffe433970d-600wi.jpg" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;a href="http://www.google.com/"&gt;Google&lt;/a&gt;'s current doodle is a very neat nod to the father of genetics, Gregor Mendel! They're celebrating his 189th birthday with a representation of his original experiment--crossing green and yellow pea plants and tracing the color trait.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3162625598067227745-8488853274269451560?l=mendeliandisorder.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mendeliandisorder.blogspot.com/feeds/8488853274269451560/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mendeliandisorder.blogspot.com/2011/07/mendel-google-doodle.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3162625598067227745/posts/default/8488853274269451560'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3162625598067227745/posts/default/8488853274269451560'/><link rel='alternate' type='text/html' href='http://mendeliandisorder.blogspot.com/2011/07/mendel-google-doodle.html' title='Mendel Google Doodle'/><author><name>MJ</name><uri>http://www.blogger.com/profile/15925017686062610434</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/-HTMyuoZZmcI/Tp312OoFZwI/AAAAAAAAAts/HG8JIQnyggA/s220/63294_481448885235_672250235_7212947_4269557_n.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/-el6PbMFaSqg/TieDGMBSWsI/AAAAAAAAAq4/1fKhNZU44zs/s72-c/6a00d8341c630a53ef014e89ffe433970d-600wi.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3162625598067227745.post-3400041450602021156</id><published>2011-07-04T00:11:00.000-07:00</published><updated>2011-08-29T13:58:18.568-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='bedtools'/><category scheme='http://www.blogger.com/atom/ns#' term='bioinformatics'/><category scheme='http://www.blogger.com/atom/ns#' term='coverage'/><category scheme='http://www.blogger.com/atom/ns#' term='read depth'/><category scheme='http://www.blogger.com/atom/ns#' term='genomics'/><title type='text'>Accurate genome-wide read depth calculation (how-to)</title><content type='html'>I'm currently working ferociously on revisions to a paper and need to calculate mean genome-wide read depth as a fine point in the paper. My first inclination was genomeCoverageBed from BEDtools. Trying it out on chromosome 22 first, I noted a huge number (&amp;gt;30%) of the bases had 0 coverage. Of course, this must be because genomeCoverageBed is including the massive centromere (chr22 is acrocentric--the entire p-arm is unmappable heterochromatin). I kind of already knew genomeCoverageBed wasn't meant for this purpose anyway, but I was hoping to stumble upon something.&lt;br /&gt;&lt;br /&gt;I decided to Google "genomeCoverageBed centromere" and "genomeCoverageBed not include centromere" and came up with bunk (well, not quite bunk, I came across &lt;a href="http://kevin-gattaca.blogspot.com/2011/07/genomecoveragebed-to-look-at-coverage.html"&gt;a blog post about genomeCoverageBed&lt;/a&gt; that happens to have been posted tomorrow... yes, tomorrow!). &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;As I explained in a comment there, I find genomeCoverageBed's approach lacking in meaning. Is it fair to include unmappable regions in a calculation of coverage? That's what led me to asking myself if "genome-wide coverage" really has a meaningful use as a statistic. We all know you're going to have a ton of bases with 0 coverage in the centromere because they're unmappable, but that says nothing about how your sequencing performed or aligner worked. All it says is that those bases are missing from the reference assembly. And that confounds any other meaning you might get from the number, really.&lt;br /&gt;&lt;br /&gt;Not to say that genomicCoverageBed is useless. To the contrary, it does exactly what it's supposed to: generates a histogram of genome-wide coverage. But I do not think it's overly useful beyond that histogram. A lot of people like to see the "mean coverage" or "mean read depth" statistic when you talk about a project, and you'd certainly be selling yourself short if you're generating that number with genomicCoverageBed.&lt;br /&gt;&lt;br /&gt;I think the author of BEDtools (Hi Aaron if you ever read this!) would be the first to say, "do not use  genomeCoverageBed for calculating mean read depth". But BEDtools can,  fortunately, give us a very strong way of doing just that through  intersectBed and coverageBed.&lt;br /&gt;&lt;br /&gt;My solution is quite simple. I take the "gaps" track from UCSC (Tables-&amp;gt;All Tracks-&amp;gt;Gap) and create a BED file of all regions that are NOT in gaps. You can easily generate this file by obtaining the gap track from UCSC as a BED file and then using BEDtools subtractBed to subtract those regions from a whole genome BED file.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: 'Courier New', Courier, monospace;"&gt;$ subtractBed -a hg19.bed -b ~/resources/hg19_gaps.bed &amp;gt; hg19.gapless.bed &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Then, take your gap-less whole genome, run intersectBed with your input BAM file and pipe it to coverageBed again using the gap-less whole genome bed file as your -b. Make sure to use the -hist option in coverageBed.&lt;br /&gt;&lt;br /&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;$ intersectBed -abam in.bam -b hg19.gapless.bed | coverageBed -abam stdin -b hg19.gapless.bed -hist &amp;gt; coverageHist.bed&lt;/div&gt;&lt;br /&gt;If you want to save yourself some trouble, you can pipe that to grep and grep out "all"--that's your&amp;nbsp; histogram of genomic coverage across the gap-less genome. Actually, you can calculate mean read depth across this gapless genome in one line without generating any output with some awk magic:&lt;br /&gt;&lt;br /&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;$ intersectBed -abam in.bam -b hg19.gapless.bed | coverageBed -abam stdin -b hg19.gapless.bed -hist | grep all | awk '{NUM+=$2*$3; DEN+=$3} END {print NUM/DEN}'&lt;/div&gt;&lt;br /&gt;Not everything outside the gaps is mappable, but at least those bases are present in the reference genome. &lt;br /&gt;&lt;br /&gt;Here's an example from a real WGS experiment restricted to chr22:&lt;br /&gt;&lt;br /&gt;genomeCoverageBed mean read depth: 0.68&lt;br /&gt;gap-less genome coverageBed mean read depth: 30.637&lt;br /&gt;&lt;br /&gt;And yes, calculated by other means, average read depth was right around 30x. For the sake of accuracy, I call it "mean read depth of the reference genome assemby". Doesn't roll off the tongue, but that's what it is, and it means a lot more than "genomic read depth" or "genomic coverage", in my opinion.&lt;br /&gt;&lt;br /&gt;(All of that said, genomeCoverageBed will run a lot faster than the way above, but the info it reports is different and really serves a different purpose.)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3162625598067227745-3400041450602021156?l=mendeliandisorder.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mendeliandisorder.blogspot.com/feeds/3400041450602021156/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mendeliandisorder.blogspot.com/2011/07/im-currently-working-ferociously-on.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3162625598067227745/posts/default/3400041450602021156'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3162625598067227745/posts/default/3400041450602021156'/><link rel='alternate' type='text/html' href='http://mendeliandisorder.blogspot.com/2011/07/im-currently-working-ferociously-on.html' title='Accurate genome-wide read depth calculation (how-to)'/><author><name>MJ</name><uri>http://www.blogger.com/profile/15925017686062610434</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/-HTMyuoZZmcI/Tp312OoFZwI/AAAAAAAAAts/HG8JIQnyggA/s220/63294_481448885235_672250235_7212947_4269557_n.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3162625598067227745.post-4625594548679823094</id><published>2011-07-01T13:16:00.000-07:00</published><updated>2011-07-01T13:16:11.446-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='bioinformatics'/><category scheme='http://www.blogger.com/atom/ns#' term='blogging'/><category scheme='http://www.blogger.com/atom/ns#' term='genomics'/><title type='text'>Gists</title><content type='html'>I've been super busy writing a paper lately, so I apologize for the lack of updates. I do intend to comment on the recent 23andMe paper soon™. &lt;br /&gt;&lt;br /&gt;A new feature on the blog is &lt;a href="http://github.com/mjclark.atom"&gt;my Gist feed&lt;/a&gt; and a link to &lt;a href="http://gist.github.com/mjclark"&gt;my Gist page&lt;/a&gt;. Gist is basically a quick-and-dirty code-sharing site from Github. Since I'm a quick-and-dirty bioinformatics programmer, I'll do my best to keep random code snippets that might have general application for others up there and publicly available. Also, please feel free to fork my code, fix it, et cetera.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://gist.github.com/1057839"&gt;My first Gist&lt;/a&gt; is an interesting one: A shell script for calculating mean heterozygous allele balance from a VCF4 file (basically GATK output with the "AB" INFO field). Very simplistic, but fast and easy to use for people wondering what the overall reference bias is in their variant calls.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3162625598067227745-4625594548679823094?l=mendeliandisorder.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mendeliandisorder.blogspot.com/feeds/4625594548679823094/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mendeliandisorder.blogspot.com/2011/07/gists.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3162625598067227745/posts/default/4625594548679823094'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3162625598067227745/posts/default/4625594548679823094'/><link rel='alternate' type='text/html' href='http://mendeliandisorder.blogspot.com/2011/07/gists.html' title='Gists'/><author><name>MJ</name><uri>http://www.blogger.com/profile/15925017686062610434</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/-HTMyuoZZmcI/Tp312OoFZwI/AAAAAAAAAts/HG8JIQnyggA/s220/63294_481448885235_672250235_7212947_4269557_n.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3162625598067227745.post-6048088325244730279</id><published>2011-06-21T20:19:00.000-07:00</published><updated>2011-06-21T20:19:43.951-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='news'/><category scheme='http://www.blogger.com/atom/ns#' term='complete genomics'/><category scheme='http://www.blogger.com/atom/ns#' term='genomics'/><title type='text'>RealTimeGenomics Goes Free, Provides Alternative for CG Users</title><content type='html'>I've mentioned &lt;a href="http://www.realtimegenomics.com/"&gt;RealTimeGenomics&lt;/a&gt; (RTG) in the past, and Joke Reumers mentioned their software recently in &lt;a href="http://mendeliandisorder.blogspot.com/2011/06/joke-reumers-talk.html"&gt;her talk&lt;/a&gt; at the Complete Genomics user group meeting. Today, RTG announced that they were going free to individual researchers with their RTG Investigator 2.2 package. I kind of knew this was going to happen after some chats with them about the pricing models they were considering and the hard sell it would be to academia.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;The Hard Sell&lt;/span&gt;&lt;br /&gt;I think that it's a hard sell to get academic researchers to pay for something they can get for free. I've been in a lab that preferred to make its own Taq polymerase rather than pay for a commercial enzyme even when that meant using it at a 1:1 ratio with the rest of the PCR reaction (and no, I didn't stay in that lab for too long, but the point stands). &lt;br /&gt;&lt;br /&gt;In a world where BWA, TopHat, Samtools, SoapSNP, and GATK are free and fairly well documented, &lt;i&gt;selling&lt;/i&gt; an aligner and variant caller is going to be difficult unless it does something particularly special. Plus, a major focus of RTG's strategy is providing an alternative to Complete Genomics own analysis that comes "free" when you buy a whole genome from them. Very hard sell.&lt;br /&gt;&lt;br /&gt;So, again, unless your software does something particularly special, like being more sensitive and/or specific, like being faster, like being significantly easier to use, like including a bunch of bells and whistles in the form of visualization tools or fancy reports, you're going to have trouble selling your product. &lt;br /&gt;&lt;br /&gt;But how, as a company, do you prove that your software has something like this to offer? Traditionally trial licences have been the way, but that's with software that doesn't have a strong free alternative. A company lets you try the software and see if you like it, then you buy it if you do. But most sequencing labs have their pipelines done already. And comparing and contrasting two softwares isn't really worth the time unless the claims have been substantiated by other groups.&lt;br /&gt;&lt;br /&gt;That's where this foot-in-the-door approach comes in. Basically, you give the software away to academia and let them do your leg work for you. If your software offers something special and academia can prove it, you'll start to be able to sell to the bigger corporate entities and sequencing cores. So the solution to the hard sell is to not sell at all! Brilliant!&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;So, how is the software?&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;I tested RTG's software on Illumina data because at the time that's all I was using. My findings were that it was easy to use (in a native, parallel environment), ran fast, mapped about 80% of reads (similar to Novoalign/BWA), and found a similar number of variants to GATK. Basically, it worked and seemed to work pretty well. I admit I have yet to go too in depth on comparing the findings. However, when I have some time, I intend to do more comprehensive assessment of its performance.&lt;br /&gt;&lt;br /&gt;I also ran it in a mode that combined the Complete Genomics and Illumina data I had from the same patient. I found this to be a pretty cool option that I enjoyed using.&lt;br /&gt;&lt;br /&gt;Really, if you're dealing with Complete Genomics data, this is your only option (as far as I know, let me know if this isn't true) for an alternative alignment and variant caller to theirs. You could also align using the RTG mapper and then try variant calling with YFA (your favorite algorithm). &lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;What's unique?&lt;br /&gt;&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;They have a cool program called "mapx" that does a translated nucleotide alignment against protein databases. You can then take that and use their "similarity" tool to basically create phylogenetic clusters based on your reads alone. Very cool for metagenomics. I'm planning to try this out with a whole genome sample I have derived from saliva. &lt;br /&gt;&lt;b&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt; Why does this matter? &lt;/span&gt;&lt;/b&gt;&lt;br /&gt;Well, frankly, there's the chance they just may be on to something. They make a lot of claims about their sensitivity and specificity. They have some killer ROC curves. They have that cool metagenomics tool that I honestly haven't heard about from anywhere else.&lt;br /&gt;&lt;br /&gt;And now there's no fear of losing access to it when the trial licence expires.&lt;br /&gt;&lt;br /&gt;I fully admit it: I wanted this to be free. Because I am one of those people who likes trying new programs and seeing if I can squeeze a bit more information out of my data set. I was just thinking today about how I should go back to our old U87MG dataset and call variants using GATK and the new SV pipeline we have.&lt;br /&gt;&lt;br /&gt;Finally, I think it really has implications for users of Complete Genomics. Joke Reumers showed that CG variants detected by RTG as well were highly accurate. That's key as an &lt;i&gt;in silico&lt;/i&gt;&amp;nbsp;validation step. Plus, it empowers us to analyze the data ourselves. I love CG, but I also want the ability to adjust my alignment and variant calling settings myself. I also want to be able to update my analyses to be compatible with each other without having to pay a couple thousand dollars more on top of my original investment.&lt;br /&gt;&lt;br /&gt;I do wonder how it's going to pan out. I hope, of course, that it ends up helping them out. As I tell all my corporate friends: I want them to succeed, because their success is my success.&lt;br /&gt;&lt;br /&gt;At the very least, the software is now out in the wild. It's now on the users to figure out if it's worth using. I'll be doing my part over the coming months and I promise to share!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3162625598067227745-6048088325244730279?l=mendeliandisorder.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mendeliandisorder.blogspot.com/feeds/6048088325244730279/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mendeliandisorder.blogspot.com/2011/06/realtimegenomics-goes-free-provides.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3162625598067227745/posts/default/6048088325244730279'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3162625598067227745/posts/default/6048088325244730279'/><link rel='alternate' type='text/html' href='http://mendeliandisorder.blogspot.com/2011/06/realtimegenomics-goes-free-provides.html' title='RealTimeGenomics Goes Free, Provides Alternative for CG Users'/><author><name>MJ</name><uri>http://www.blogger.com/profile/15925017686062610434</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/-HTMyuoZZmcI/Tp312OoFZwI/AAAAAAAAAts/HG8JIQnyggA/s220/63294_481448885235_672250235_7212947_4269557_n.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3162625598067227745.post-218783965143699172</id><published>2011-06-18T18:11:00.000-07:00</published><updated>2011-06-20T10:23:06.659-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='complete genomics'/><category scheme='http://www.blogger.com/atom/ns#' term='genomics'/><category scheme='http://www.blogger.com/atom/ns#' term='complete genomics user conference 2011'/><title type='text'>Complete Genomics User Conference 2011 Day 2 Recap</title><content type='html'>Yesterday was a "half-day" at the CG User Conference and my netbook was inconveniently out of juice, so I didn't get to live blog. However, I took plenty of juicy notes at the interesting and useful talks of the day. The rundown included:&lt;br /&gt;&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Dr. Sek Won Kong, Children's Hospital Boston, about a downstream analysis and annotation pipeline for WGS/exome-seq called gKnome.&lt;/li&gt;&lt;li&gt;Dr. Jay Kasberger of the Gallo Research Center at UCSF about their tools and scripts for analyzing CG data that they will soon&lt;span class="Apple-style-span" style="font-family: sans-serif; font-size: 13px; line-height: 19px;"&gt;&lt;b&gt;™&lt;/b&gt;&lt;/span&gt;&amp;nbsp;make available.&lt;/li&gt;&lt;li&gt;Dr. Stephan Sanders from Matthew State's lab at Yale talking about identification of &lt;i&gt;de novo&lt;/i&gt; variants in CG/WGS data.&lt;/li&gt;&lt;li&gt;Dr. Andrew Stubbs, Erasmus Medical Centre, about HuVariome, a database of variations common to the human genome for use as a filtering/informative resource.&amp;nbsp;&lt;/li&gt;&lt;li&gt;Panel discussion that went into especially issues related to making sequence data and variants public and how to filter data that was quite interesting.&lt;/li&gt;&lt;/ol&gt;I'll summarize each of the talks and some of my thoughts below.&lt;br /&gt;&lt;span class="Apple-style-span" style="font-weight: 800;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;Dr. Sek Won Kong.&lt;/span&gt;&lt;b style="font-size: x-large;"&gt; gKnome: An analysis and annotation pipeline for whole-genome/exome sequencing.&amp;nbsp;&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Dr. Kong presented a sequence analysis and annotation pipeline Michael Hsing and others in his group have developed for variant analysis. Although it certainly takes CG data, it looks like it'll take Illumina data as well.&lt;br /&gt;&lt;br /&gt;Looked very nice. I think behind the scenes was a MySQL and Python based annotation pipeline. It utilizes the CG diversity panel (69 public genomes released by CG) to filter out systematic CG variants. Actually, this was a major theme at the conference that I'll go into at the panel discussion part.&lt;br /&gt;&lt;br /&gt;In its first version, the gKnome pipeline will be able to annotate from RefSeq, CCDS, Ensembl and UCSC known genes.&lt;br /&gt;&lt;br /&gt;The other cool part is the web front-end for the whole thing. Their system auto-generates a number of reports including box plots of #rare/NS variants/genome, which they demonstrated is closely tied to ethnicity. It also has built-in pathway analysis and disease risk summaries (including utilizing HGMD if one has a subscription). Finally they showed a nice R-based plot of CNV results that are auto-generated.&lt;br /&gt;&lt;br /&gt;There was also a quick slide of hypervariable genes shown that was a point of much conversation generally. Basically, everyone agreed there's a set of specific genes and gene families that always end up with variants in them. Dr. Kong showed the list to include HYDIN, PDE4DIP, MUC6, AHNAK2, HRNR, PRIM2, and ZNF806. I've seen most of these pop up in my many exome-seq experiments as well. I've even had PDE4DIP, AHNAK2, PRIM2 and many of the ZNF and MUC genes look like disease-causing variants before.&lt;br /&gt;&lt;br /&gt;So where can you get it? Well, it's not available yet, but should be completed by September. In the meantime, you can check out what they have going for them at: &lt;a href="http://gknome.genomicevidence.org/"&gt;gknome.genomicevidence.org&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Let me just throw out there that this looks superior to the alternative currently used by most people, which is &lt;a href="http://www.openbioinformatics.org/annovar/"&gt;Annovar&lt;/a&gt;. Annovar is a great tool and getting better all the time, but with its rather clunky input and output formats, lack of any downstream stats or visuals, and some notable bugs in its conversion scripts, gKnome is looking pretty nice.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Dr. Jay Kasberger. &lt;b&gt;Integrating tools and methods into analytical workflows for Complete Genomics data.&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;This one was pretty nice because it was kind of an overview of tools used with CG data and how a lab with a lot of CG data might implement them. Dr. Kasberger especially presented the way they assess the variants and CNVs provided by CG.&lt;br /&gt;&lt;br /&gt;There was a tool for auto-comparison with Illumina Omni SNP chip data, which is worth a look (although I should note that you have to deal with the Illumina SNP chip problems like the ambiguous calls at some percent of spots that don't tell you the correct ref/alt alleles, etc. yourself... and frankly, I'm not sure which ones those are myself--I usually just compare heterozygous calls between the platforms for validation).&lt;br /&gt;&lt;br /&gt;It also demonstrated a tool that goes from the CG masterVAR format to PLINK format for downstream IBD estimates.&lt;br /&gt;&lt;br /&gt;Finally, they have a number of Circos generating SNPs for variant density, coverage (using heatmap tracks), etc. And we all know I love Circos, so cool on them for that.&lt;br /&gt;&lt;br /&gt;These tools are available from them at: &lt;a href="http://sequencing.galloresearch.org/"&gt;sequencing.galloresearch.org&lt;/a&gt;&amp;nbsp;(but you'll need a login by being a collaborator/member of their group, so you'll have to contact them for access).&lt;br /&gt;&lt;br /&gt;This talk was interesting because it showed off tools that probably every genomics lab with CG or WGS data has developed for themselves. This stuff needs to be packaged up, published and shared openly in my opinion. For example, I don't like that there's this gateway through their page for it all. Put it up GitHub or SourceForge or something, guys!&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Dr. Stephan Sanders.&lt;b&gt; Identifying &lt;/b&gt;&lt;i style="font-weight: bold;"&gt;de novo&lt;/i&gt;&lt;b&gt; variants in Complete Genomics data.&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Dr. Sanders focused on assessing de novo variants. By this he does not mean what we typically call novel variants. Rather, he's talking about variants not inherited from parents (which coincidently are not likely to be known variants either). He claimed that (based on the Roach et al. paper), de novo variants are extremely rare, somewhere around 1x10^-8 chance per base, or about 0.5 disruptive events per exome.&lt;br /&gt;&lt;br /&gt;That equates to fewer than 100 de novo variants per genome. But when you actually assess the number from real data with standard filters, you end up with around 20,000 candidates. That's a problem.&lt;br /&gt;&lt;br /&gt;He demonstrated that the rare de novo candidate variants (the 20,000) have a much lower distribution of quality scores than true variants.&lt;br /&gt;&lt;br /&gt;He then went into a fairly extensive discussion of how to estimate the specificity needed to narrow down to those candidates to the correct number, which was great. BUT, the bottom line is really that you just have to move up the quality cut-off. At a high enough level, the de novo candidate total drops off and the very few leftovers are highly enriched for true positives. He showed that he narrowed down to ~70 candidates and that they validated a little more than thirty of them. This matched well with the expected number he calculated earlier.&lt;br /&gt;&lt;br /&gt;Cool talk, but the take-home is that all you have to do is sacrifice sensitivity for specificity and you'll throw away the vast majority of false-positive de novo variants. So for the ~20k or so candidates, apply a more stringent filter and voila!&lt;br /&gt;&lt;br /&gt;So, pretty cool idea for what to do with alleles that don't follow Mendelian inheritence errors. Just apply a stringent base quality and read depth filter to them and highly enrich for true variants. Kind of an obvious conclusion, but not something many have been doing, I'd bet.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Andrew Stubbs. &lt;b&gt;HuVariome: A high quality human variation study and resource for rare variant detection and validation.&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;This one is going to be pretty brief, though it was a good talk especially for the conversation points it brought up.&lt;br /&gt;&lt;br /&gt;HuVariome is a database they're putting together of well-annotated known variants. The goal is to use it as an alternative to dbSNP for filtering (which honestly shouldn't be used for filtering in the first place). It will be available at: &lt;a href="http://huvariome.erasmusmc.nl/"&gt;huvariome.erasmusmc.nl&lt;/a&gt;. Definitely a possible alternative to using dbSNP, especially if it's as well annotated as they suggest it will be.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Panel Discussion&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;Worth mentioning because the same themes kept coming up. Specifically, there was quite a bit about everyone developing their own list of false positive variants/genes and how to filter SNPs adequately. Generally, it's agreed that (1) there is a subset of variants and genes that show up in nearly every sequencing experiment and therefore are more likely false positives than anything else (they don't tend to validate, either) and (2) that dbSNP is too inclusive of disease samples and lacking adequate phenomic info to be used as a blind filter. I've personally always told people to use dbSNP as a &lt;i&gt;guide&lt;/i&gt;. Never dismiss a variant just because it's in dbSNP. You can look at the non-dbSNP variants first if you want, since those might be the "jackpot" spots, but if you find nothing there, try looking in those in dbSNP.&lt;br /&gt;That then leads to the need for things like HuVariome and lists of "bad" genes/variants (like hypervariable genes mentioned by Dr. Kong). But the problem then becomes how to share variants publicly that are present in protected samples, even if they're artifacts, because consent wasn't given to make any variants publicly available.&lt;br /&gt;Personally, I see that as an issue that will solve itself, but of course, we want it sooner rather than later. Solutions such as projects specifically intended to produce these lists are possible, though (and some in the audience said they were doing just that).&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;That's a Wrap&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;Anyway, that's a wrap on this conference. I hope it was informative for everyone. Also, props to Complete Genomics for putting on a pretty decent corporate conference. I didn't think it was overly biased, I found it useful and interesting, and the venue and food were quite good.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3162625598067227745-218783965143699172?l=mendeliandisorder.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mendeliandisorder.blogspot.com/feeds/218783965143699172/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mendeliandisorder.blogspot.com/2011/06/complete-genomics-user-conference-2011_18.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3162625598067227745/posts/default/218783965143699172'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3162625598067227745/posts/default/218783965143699172'/><link rel='alternate' type='text/html' href='http://mendeliandisorder.blogspot.com/2011/06/complete-genomics-user-conference-2011_18.html' title='Complete Genomics User Conference 2011 Day 2 Recap'/><author><name>MJ</name><uri>http://www.blogger.com/profile/15925017686062610434</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/-HTMyuoZZmcI/Tp312OoFZwI/AAAAAAAAAts/HG8JIQnyggA/s220/63294_481448885235_672250235_7212947_4269557_n.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3162625598067227745.post-8140083088427976095</id><published>2011-06-16T14:40:00.000-07:00</published><updated>2011-06-16T15:20:51.635-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='complete genomics'/><category scheme='http://www.blogger.com/atom/ns#' term='complete genomics user conference 2011'/><title type='text'>Complete Genomics Community</title><content type='html'>Apparently they just released this publicly:&lt;br /&gt;&lt;a href="http://community.completegenomics.com/"&gt;http://community.completegenomics.com/&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Pretty cool, actually. It has a forum, a wiki-ish knowledgebase (or at least, hopefully it'll be like a wiki in the future) and a tools section.&lt;br /&gt;&lt;br /&gt;Currently the tools section has CGAtools and some other scripts written by Steve Lincoln's team at CG. Seems like in the future it will include community-derived data as well.&lt;br /&gt;&lt;br /&gt;Also, I'm just gonna throw this out there: It seems like the community is really not satisfied with annotation tools. And by that I do mean Annovar and the Ensembl annotation tools. So I think that's really an open area for development for an ambitious bioinformaticist or two out there.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Cancer Grant Program&lt;/span&gt;&lt;br /&gt;Will be officially announced week of June 20th. Basically, it'll be an abstract plus some simple questions. There will be two winners in US and Europe. Applications will be due July 29th, winners decided by August 12th and samples will need to be submitted by September 16th.&lt;br /&gt;Bonus: All applicants will get a future discount regardless.&lt;br /&gt;&lt;br /&gt;Pretty cool... time to find some cancer samples to sequence.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Live Polling&lt;/span&gt;&lt;br /&gt;Kinda cool (and totally unrelated to genomics/CG... oh well). Never seen this before. Hosted by &lt;a href="http://polleverywhere.com/"&gt;PollEverywhere.com&lt;/a&gt;. Basically, you can host a live poll during a Powerpoint presentation. People can text message in a response to the poll and it'll update on the fly within the presentation.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3162625598067227745-8140083088427976095?l=mendeliandisorder.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mendeliandisorder.blogspot.com/feeds/8140083088427976095/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mendeliandisorder.blogspot.com/2011/06/complete-genomics-community.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3162625598067227745/posts/default/8140083088427976095'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3162625598067227745/posts/default/8140083088427976095'/><link rel='alternate' type='text/html' href='http://mendeliandisorder.blogspot.com/2011/06/complete-genomics-community.html' title='Complete Genomics Community'/><author><name>MJ</name><uri>http://www.blogger.com/profile/15925017686062610434</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/-HTMyuoZZmcI/Tp312OoFZwI/AAAAAAAAAts/HG8JIQnyggA/s220/63294_481448885235_672250235_7212947_4269557_n.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3162625598067227745.post-6732765916678970427</id><published>2011-06-16T11:30:00.000-07:00</published><updated>2011-06-16T11:30:41.103-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='complete genomics user conference 2011'/><title type='text'>Simon Lin Talk</title><content type='html'>Sequencing Outsourcing: Northwestern Experience&lt;br /&gt;&lt;br /&gt;An overview of the needs and costs of running your own sequencing rather than simply outsourcing.&lt;br /&gt;&lt;br /&gt;In-house: Numerous costs, need of additional staff, need of storage and analysis computers, et cetera.&lt;br /&gt;&lt;br /&gt;Outsource: lower cost, access to bioinformatics, etc.&lt;br /&gt;&lt;br /&gt;Need to consider cost, quality, and the impact to "internal culture/image/students"&lt;br /&gt;&lt;br /&gt;Lose at least some of the control by outsourcing. (However, re-analysis of the data itself is obviously possible.) But tweaks and changes can't really be made with outsourced analyses.&lt;br /&gt;&lt;br /&gt;They have a 454 (four-fifty-four... or is it four-five-four? I've always called it four-five-four.)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3162625598067227745-6732765916678970427?l=mendeliandisorder.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mendeliandisorder.blogspot.com/feeds/6732765916678970427/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mendeliandisorder.blogspot.com/2011/06/simon-lin-talk.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3162625598067227745/posts/default/6732765916678970427'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3162625598067227745/posts/default/6732765916678970427'/><link rel='alternate' type='text/html' href='http://mendeliandisorder.blogspot.com/2011/06/simon-lin-talk.html' title='Simon Lin Talk'/><author><name>MJ</name><uri>http://www.blogger.com/profile/15925017686062610434</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/-HTMyuoZZmcI/Tp312OoFZwI/AAAAAAAAAts/HG8JIQnyggA/s220/63294_481448885235_672250235_7212947_4269557_n.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3162625598067227745.post-4238413401420550958</id><published>2011-06-16T11:15:00.000-07:00</published><updated>2011-06-16T11:15:10.128-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='complete genomics'/><category scheme='http://www.blogger.com/atom/ns#' term='genomics'/><category scheme='http://www.blogger.com/atom/ns#' term='complete genomics user conference 2011'/><title type='text'>Zachary Hunter Talk</title><content type='html'>Paired Whole Genome Sequencing Studies in Waldenstrom's Macroglobulinemia&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;This type of lymphoma is pretty rare and also not much is known about it.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Approach: 10 paired genomes plus 20 unpaired&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;For normal they used CD19 depleted PBMCs and also took buccal cells as a backup.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;They actually did do exome sequencing as well.&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;They found a large number of strong candidates (&amp;gt;7?)... one was present in 100% of the 10 paired and 87% (26/30) in all 30 patiants. Strangely, it was the same exact SNP in all individuals, but it's a very good functional candidate.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Interestingly, with Sanger sequencing they saw a very small peak of the same variant in the trace from an individual with a very weak case (one of the four that didn't have it). Perhaps all patients with it have this variant?&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Detailed circos plots. Mapped out zygosity, CN, allele balance, CGI coverage levels and then testing results. At this detailed level, there were many notable CN/zygosity regions and UPD regions. (Everyone LOVES Circos.)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;They made their own wrapper for Annovar because the input/output for Annovar is something the "don't like". (Hey, guess what? You're not alone! I made just such a wrapper myself!)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Nice talk. I loved the Circos. Again, I have to ask whether a lot of these findings couldn't have been done solely through exome-seq, though...&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3162625598067227745-4238413401420550958?l=mendeliandisorder.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mendeliandisorder.blogspot.com/feeds/4238413401420550958/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mendeliandisorder.blogspot.com/2011/06/zachary-hunter-talk.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3162625598067227745/posts/default/4238413401420550958'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3162625598067227745/posts/default/4238413401420550958'/><link rel='alternate' type='text/html' href='http://mendeliandisorder.blogspot.com/2011/06/zachary-hunter-talk.html' title='Zachary Hunter Talk'/><author><name>MJ</name><uri>http://www.blogger.com/profile/15925017686062610434</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/-HTMyuoZZmcI/Tp312OoFZwI/AAAAAAAAAts/HG8JIQnyggA/s220/63294_481448885235_672250235_7212947_4269557_n.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3162625598067227745.post-379762597274101579</id><published>2011-06-16T11:01:00.001-07:00</published><updated>2011-06-16T11:31:38.077-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='complete genomics user conference 2011'/><title type='text'>View from the CG user conference</title><content type='html'>&lt;div&gt;Fantastic view and great San Francisco weather today!&lt;br /&gt;&lt;br /&gt;&lt;img src="http://lh6.ggpht.com/-wUOGzw4ryH8/TfpE45wYjhI/AAAAAAAAAo4/6Q6WosTB3sk/2011-06-16_10-24-13_997.png" /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3162625598067227745-379762597274101579?l=mendeliandisorder.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mendeliandisorder.blogspot.com/feeds/379762597274101579/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mendeliandisorder.blogspot.com/2011/06/view-from-cg-user-group-meeting.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3162625598067227745/posts/default/379762597274101579'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3162625598067227745/posts/default/379762597274101579'/><link rel='alternate' type='text/html' href='http://mendeliandisorder.blogspot.com/2011/06/view-from-cg-user-group-meeting.html' title='View from the CG user conference'/><author><name>MJ</name><uri>http://www.blogger.com/profile/15925017686062610434</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/-HTMyuoZZmcI/Tp312OoFZwI/AAAAAAAAAts/HG8JIQnyggA/s220/63294_481448885235_672250235_7212947_4269557_n.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh6.ggpht.com/-wUOGzw4ryH8/TfpE45wYjhI/AAAAAAAAAo4/6Q6WosTB3sk/s72-c/2011-06-16_10-24-13_997.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3162625598067227745.post-5700150610268313780</id><published>2011-06-16T10:49:00.000-07:00</published><updated>2011-06-16T10:52:16.715-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='complete genomics'/><category scheme='http://www.blogger.com/atom/ns#' term='complete genomics user conference 2011'/><title type='text'>Joke Reumers Talk</title><content type='html'>Two major experiments covered:&lt;br /&gt;tumor/normal ovarian cancer&lt;br /&gt;monozygotic twins (schizofrenia)&lt;br /&gt;&lt;br /&gt;Even at low error rate of CG data (1/10000), that's ~30k errors per genome, which is too much for twins.&lt;br /&gt;Detected 2.7M shared variants, but 46k discordant variants between twins&lt;br /&gt;&lt;br /&gt;Individual filters for quality/genomic complexity/bioinformatic errors used...&lt;br /&gt;Quality: low read depth, low variation score, snp clusters, indel proximity to snp (5bp from snp)&lt;br /&gt;Complexity: simple repeats, segdups, homopolymer stretches&lt;br /&gt;Bioinformatic errors: Collaboration with RealTimeGenomics, re-analysis&lt;br /&gt;&lt;br /&gt;1.7M shared variants, 846 discordant variants&lt;br /&gt;&lt;br /&gt;So basically swung the error rate from type 1 to type 2.&lt;br /&gt;&lt;br /&gt;All 846 discordancies here were validated by Sanger sequencing.&lt;br /&gt;&lt;br /&gt;Also 2 of the shared variants were found to actually be discordant.&lt;br /&gt;&lt;br /&gt;Reduced error rate down to 4.3x10^-7&amp;nbsp;(from 1.79x10^-4).&lt;br /&gt;&lt;br /&gt;Of the 846, 541 were &amp;nbsp;false positives.&lt;br /&gt;&lt;br /&gt;NA19240, 1000 genomes Illumina sequencing versus CG&lt;br /&gt;&lt;br /&gt;Before filtering, CG had more false negatives, Illumina had more false positives.&lt;br /&gt;&lt;br /&gt;After filtering, they were both down to about 1% error rates.&lt;br /&gt;&lt;br /&gt;As for tumor/normal, adding the filtering made a little difference... from 437 down to 21. But of course, this kills some true positives.&lt;br /&gt;&lt;br /&gt;Summary: Very good talk, I liked this one. I was pleasantly surprised to see &lt;a href="http://www.realtimegenomics.com/"&gt;RealTimeGenomics&lt;/a&gt;&amp;nbsp;get a shout out as one of their filter approaches. I used their software myself, it's very good and I hope to collaborate again with them, especially after seeing it helping other groups with their filters. Also, I think there's a lot to be said about the different error rates with CG versus BWA/GATK et cetera. I'm leaning toward combined approaches... for example, why not do exome-seq on Illumina as a validation of CG and to adjust error rates?&lt;br /&gt;&lt;br /&gt;Bonus: Hilarious pic of a desk covered in hard drives. In our lab's case, I think they're stuffed under people's desks. Someone needs to do something about the drive overload.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3162625598067227745-5700150610268313780?l=mendeliandisorder.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mendeliandisorder.blogspot.com/feeds/5700150610268313780/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mendeliandisorder.blogspot.com/2011/06/joke-reumers-talk.html#comment-form' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3162625598067227745/posts/default/5700150610268313780'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3162625598067227745/posts/default/5700150610268313780'/><link rel='alternate' type='text/html' href='http://mendeliandisorder.blogspot.com/2011/06/joke-reumers-talk.html' title='Joke Reumers Talk'/><author><name>MJ</name><uri>http://www.blogger.com/profile/15925017686062610434</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/-HTMyuoZZmcI/Tp312OoFZwI/AAAAAAAAAts/HG8JIQnyggA/s220/63294_481448885235_672250235_7212947_4269557_n.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3162625598067227745.post-4224264789365904585</id><published>2011-06-16T10:14:00.000-07:00</published><updated>2011-06-16T10:50:32.712-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='complete genomics'/><category scheme='http://www.blogger.com/atom/ns#' term='genomics'/><category scheme='http://www.blogger.com/atom/ns#' term='complete genomics user conference 2011'/><title type='text'>Complete Genomics User Conference 2011</title><content type='html'>Today I'm attending the Complete Genomics User Conference in San Francisco at the Fairmont. Nice venue, particularly for the guests that had to come from far away (actually, being from Palo Alto, I would have preferred if they had it down in Mountain View).&lt;br /&gt;&lt;br /&gt;First talk I saw was from Dr. Kevin Jacobs of the National Cancer Institute. Thought it was very nice, and I'll provide a run-down in a few minutes.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3162625598067227745-4224264789365904585?l=mendeliandisorder.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mendeliandisorder.blogspot.com/feeds/4224264789365904585/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mendeliandisorder.blogspot.com/2011/06/complete-genomics-user-conference-2011.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3162625598067227745/posts/default/4224264789365904585'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3162625598067227745/posts/default/4224264789365904585'/><link rel='alternate' type='text/html' href='http://mendeliandisorder.blogspot.com/2011/06/complete-genomics-user-conference-2011.html' title='Complete Genomics User Conference 2011'/><author><name>MJ</name><uri>http://www.blogger.com/profile/15925017686062610434</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/-HTMyuoZZmcI/Tp312OoFZwI/AAAAAAAAAts/HG8JIQnyggA/s220/63294_481448885235_672250235_7212947_4269557_n.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3162625598067227745.post-3389875566036805579</id><published>2011-05-04T02:57:00.000-07:00</published><updated>2011-05-04T13:25:03.565-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='23andMe'/><category scheme='http://www.blogger.com/atom/ns#' term='personalized genomics'/><title type='text'>The Earwax Trait Story</title><content type='html'>My first post related to 23andMe is going to be about something near and dear to my heart: the earwax type SNP rs17822931! Before I start, let me say that I do have wet earwax. Yes, I'm carrying the CC genotype at rs17822931. Not a big shocker considering I'm of 100% European descent. &lt;br /&gt;&lt;br /&gt;You may wonder why this particular variant and trait is important at all to me. The real reason is because when I taught human genetics at UCLA, we used the original Nature article as an example of a "modern" genetic study. The students read it and (hopefully) found it interesting to learn that a single SNP (rs17822931) was dictating whether they had wet or dry earwax.&lt;br /&gt;&lt;br /&gt;In fact, most of the students were basically unaware that there were two types of earwax! At UCLA, we had students of all different ethnic backgrounds, and it turned out about half of my class had dry and half had wet earwax. That should give you a hint at the ethnic mix there.&lt;br /&gt;&lt;br /&gt;Anyway, the point of this post is to explain why, before jumping the gun and shouting that you have dry earwax when you have a CC at rs17822931, you should consider whether or not you're accurately self-diagnosing it, and then why you might have dry earwax when your genotype says you should have wet.&lt;br /&gt;&lt;br /&gt;I remember clearly that one student of European descent claimed to have "dry" earwax. She may have (it's not impossible), but another student wisely pointed out something in the paper that may explain why she thought she had "dry" earwax.&lt;br /&gt;&lt;br /&gt;In the original study that identified this variant as the one causing the Mendelian earwax type trait (Yoshiura et al., 2006), they explain that they actually had to use two different groups of patients to identify the variant.&lt;br /&gt;&lt;br /&gt;The first group consisted of 64 "dry" and 54 "wet" control individuals. This group was "self-declared", meaning they basically checked off a box stating whether they had wet or dry earwax. This first pass group resulted in inconclusive results. Basically, they narrowed down the region with this group, but found some "phenotype-genotype inconsistency" in some of the samples and could not therefore narrow down exactly which was the causative SNP.&lt;br /&gt;&lt;br /&gt;So they followed up with an association study on a second group of 126 individuals (88 dry and 38 wet) whose earwax types were identified by a medical practitioner.&amp;nbsp; In this set, 87/88 individuals with dry earwax were AA. All 38 with wet earwax were GA or GG. &lt;br /&gt;&lt;br /&gt;That one GA individual with dry earwax turned out to have a deletion in exon 29 of the ABCC11 gene (downstream of rs17822931 and his G allele).&lt;br /&gt;&lt;br /&gt;So this really teaches us two things: &lt;br /&gt;1) Self-diagnosis is not accurate. You may have wet earwax and not know it! Just because it's flaky and seems dry to you does not mean it is dry earwax. And vice-versa.&lt;br /&gt;2) Other mutations do exist! In this case, particularly if you are CT but truly have dry earwax, it's possible you're carrying around a secondary mutation not unlike the unique case in the original paper. It's exceedingly unlikely a CC individual would have such a secondary mutation damaging both alleles without some sort of inbreeding, though, so... keep that in mind if you're going to claim #2 with a CC genotype.&lt;br /&gt;&lt;br /&gt;Much of this could be said for nearly any trait determined by SNPs, but the earwax trait makes a convenient real-world example of a simple trait where ascertainment problems (basically, the inability for the layman to know what his true trait is) and the rare secondary mutation can cause perceived discrepancies. &lt;br /&gt;&lt;br /&gt;More in-depth stuff on my own 23andMe experiences as I go through them. So far I'm having a blast.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3162625598067227745-3389875566036805579?l=mendeliandisorder.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mendeliandisorder.blogspot.com/feeds/3389875566036805579/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mendeliandisorder.blogspot.com/2011/05/earwax-trait-story.html#comment-form' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3162625598067227745/posts/default/3389875566036805579'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3162625598067227745/posts/default/3389875566036805579'/><link rel='alternate' type='text/html' href='http://mendeliandisorder.blogspot.com/2011/05/earwax-trait-story.html' title='The Earwax Trait Story'/><author><name>MJ</name><uri>http://www.blogger.com/profile/15925017686062610434</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/-HTMyuoZZmcI/Tp312OoFZwI/AAAAAAAAAts/HG8JIQnyggA/s220/63294_481448885235_672250235_7212947_4269557_n.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3162625598067227745.post-5957818438838850370</id><published>2011-04-15T11:55:00.000-07:00</published><updated>2011-04-15T11:55:24.177-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='23andMe'/><category scheme='http://www.blogger.com/atom/ns#' term='personalized genomics'/><category scheme='http://www.blogger.com/atom/ns#' term='genomics'/><title type='text'>Happy DNA Day! Celebrate with 23andMe unboxing!</title><content type='html'>I'd like to start by wishing all of you a happy &lt;a href="http://www.genome.gov/DNADay/"&gt;DNA Day&lt;/a&gt;! That's right, it's already April 15th again, which can mean only one thing: DNA Day is back! (...because &lt;a href="http://www.efile.com/tax-day-deadlines/"&gt;it's not tax day in 2011&lt;/a&gt;, so you can celebrate DNA Day and get back to filling out your taxes tomorrow.)&lt;br /&gt;&lt;br /&gt;Anyway, I got myself a gift for DNA Day, which arrived yesterday (and which you might be aware of if you read my last post). That's right, my &lt;a href="http://www.23andme.com/"&gt;23andMe&lt;/a&gt; package arrived! So I thought I'd share the unboxing in case you were curious what you get in the mail when you order 23andMe.&lt;br /&gt;&lt;br /&gt;It comes in a box a little bigger than a CD jewel case that looks like something from frog design. Very nice. The box is actually plastic wrapped with a sticker having your name and serial number on it (removed prior to taking these pics).&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-cUfW8-Gvifk/TaiR5-6GDjI/AAAAAAAAAoY/8DALkQ2aiJo/s1600/2011-04-15_11-23-33_427.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="320" src="http://4.bp.blogspot.com/-cUfW8-Gvifk/TaiR5-6GDjI/AAAAAAAAAoY/8DALkQ2aiJo/s320/2011-04-15_11-23-33_427.jpg" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;Opening it up, there's a set of instructions that look like something from Ikea explaining how to use the kit and send it back. (That's not a bad thing--they clearly put effort into making it dummy-proof.)&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-3lq9dCmY9tU/TaiSOBBEX_I/AAAAAAAAAoc/V1UYtmZ1Ewk/s1600/2011-04-15_11-23-55_62.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="320" src="http://2.bp.blogspot.com/-3lq9dCmY9tU/TaiSOBBEX_I/AAAAAAAAAoc/V1UYtmZ1Ewk/s320/2011-04-15_11-23-55_62.jpg" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;Savvy observers may have noted that the kit is a simple Oragene saliva kit. Spit in the tube. Close the lid. Remove the top. Screw the cap on and shake. (Again, dummy-proof instructions inside the plastic box containing the vial.)&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-65v5IeBw7QY/TaiSqY2hZgI/AAAAAAAAAog/4za2b3azVSc/s1600/2011-04-15_11-25-54_229.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="320" src="http://1.bp.blogspot.com/-65v5IeBw7QY/TaiSqY2hZgI/AAAAAAAAAog/4za2b3azVSc/s320/2011-04-15_11-25-54_229.jpg" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;Very straightforward. And I think the design is very well done. The box is the same one you send back to them. Just put the vial in the biohazard bag, put it in the box, re-seal it and throw it in the mail (it's postmarked already). The Oragene box does contain an instruction booklet with details in it for those interested.&lt;br /&gt;&lt;br /&gt;You probably noticed in the second picture that they want you to go to their site to register. I did indeed do that. Pretty standard set of consent documents to go through.&lt;br /&gt;&lt;br /&gt;I did find it interesting at the bottom of their consent form, they have three options about consent. One is to consent for yourself, one is to consent for another adult, and the third one is to convey your child's consent and authorize it as the parent/guardian. There goes my plan to genotype my children before they're old enough to deny me! (Kidding, kidding.)&lt;br /&gt;&lt;br /&gt;You also get to choose whether to allow them to Biobank the sample. I said yes, of course. Then you provide a few other general details (DOB, sex).&lt;br /&gt;&lt;br /&gt;Took no more than five minutes! The caveat is that you can't eat or drink for a half hour before spitting in the collection vial. So I will be doing that in, oh, about a half hour.&lt;br /&gt;&lt;br /&gt;I'll update more on my 23andMe experience as it happens. Coming up next: Surveys and more surveys. They seem to have a ton of phenotyping through surveys, which I think is fantastic.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3162625598067227745-5957818438838850370?l=mendeliandisorder.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mendeliandisorder.blogspot.com/feeds/5957818438838850370/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mendeliandisorder.blogspot.com/2011/04/happy-dna-day-celebrate-with-23andme.html#comment-form' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3162625598067227745/posts/default/5957818438838850370'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3162625598067227745/posts/default/5957818438838850370'/><link rel='alternate' type='text/html' href='http://mendeliandisorder.blogspot.com/2011/04/happy-dna-day-celebrate-with-23andme.html' title='Happy DNA Day! Celebrate with 23andMe unboxing!'/><author><name>MJ</name><uri>http://www.blogger.com/profile/15925017686062610434</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/-HTMyuoZZmcI/Tp312OoFZwI/AAAAAAAAAts/HG8JIQnyggA/s220/63294_481448885235_672250235_7212947_4269557_n.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/-cUfW8-Gvifk/TaiR5-6GDjI/AAAAAAAAAoY/8DALkQ2aiJo/s72-c/2011-04-15_11-23-33_427.jpg' height='72' width='72'/><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3162625598067227745.post-6285428545368616766</id><published>2011-04-11T16:50:00.000-07:00</published><updated>2011-04-11T18:41:47.203-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='personalized genomics'/><category scheme='http://www.blogger.com/atom/ns#' term='genomics'/><title type='text'>23andMe: FREE* genotyping, today only!</title><content type='html'>23andMe is offering &lt;a href="http://spittoon.23andme.com/2011/04/11/23andme-is-celebrating-dna-day-a-little-early-with-a-sale/"&gt;FREE* genotyping&lt;/a&gt; today until 11:59PM PST to celebrate DNA Day early (apparently).&lt;br /&gt;&lt;br /&gt;The asterisk is because it's not actually free. It's $108 plus shipping/handling costs because you have to pay for a 12 month subscription to their services.&lt;br /&gt;&lt;br /&gt;Still, for about 1,000,000 SNP genotyping, this is a darn good deal. &lt;br /&gt;&lt;br /&gt;I shot Dr. Wu (the one who posted the blog article on the 23andMe  blog announcing the deal) a note asking about whether we got raw data  back:&lt;br /&gt;&lt;blockquote&gt;I’m a little curious about the subscription. &lt;br /&gt;We get the raw data to keep and the subscription is to be able to  view the data using your tools? So even after the year subscription is  up, we’ll have the data for ourselves, right?&lt;br /&gt;Thanks.&lt;/blockquote&gt;Her response:&lt;br /&gt;&lt;blockquote&gt;Hi M.J. Clark,&lt;br /&gt;Yes, you will always be able to download the raw data and retain  access to content you had while you were subscribed. Some features, like  the ability to Browse your raw data using our website, receipt of  updates to your health reports and Relative Finder matches, and storage  of your saliva sample (if you choose to biobank) may, however, be  discontinued.&lt;/blockquote&gt;Sounds to me like we get whatever they qualify as "raw data"  permanently regardless of the subscription, which is fantastic. Plus,  gives those of us in the field something fun to play with.&lt;br /&gt;&lt;br /&gt;I'm not actually sure yet what we receive back in terms of raw data (or if they actually give you back raw data). It's currently their version 3 platform, which is a modified Illumina OmniExpress Plus Genotyping BeadChip. Seems their v2 "raw data" took the form of simply genotype calls, positions and rsids. Not sure if that's all you get with v3 (but I'll ask and update later).&lt;br /&gt;&lt;br /&gt;I'm interested for myself, of course, but if your lab has five or fewer human samples (per lab member) you've been meaning to get 1,000,000 SNP genotyped and you don't care about which specific platform it is, this is probably the best deal you'd get for a while. Food for thought! (Then again, why are you still genotyping? Go exome-seq those things!)&lt;br /&gt;&lt;br /&gt;Update (18:40): Answer regarding the nature of raw data.&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;Dr Clark,&lt;br /&gt;The raw data for v3 is formatted exactly the same as for v2, and is  also against build 36 of the human reference assembly.  That information  is in the header of the raw data file; not sure if it is documented  elsewhere except where people have made their raw data files public.&lt;/blockquote&gt;So it's NCBIv36 at least. I'm hoping they'll be willing to provide raw data in whatever basic format for those of us with interest later. Guess we'll find out!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3162625598067227745-6285428545368616766?l=mendeliandisorder.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mendeliandisorder.blogspot.com/feeds/6285428545368616766/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mendeliandisorder.blogspot.com/2011/04/23andme-free-genotyping-today-only.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3162625598067227745/posts/default/6285428545368616766'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3162625598067227745/posts/default/6285428545368616766'/><link rel='alternate' type='text/html' href='http://mendeliandisorder.blogspot.com/2011/04/23andme-free-genotyping-today-only.html' title='23andMe: FREE* genotyping, today only!'/><author><name>MJ</name><uri>http://www.blogger.com/profile/15925017686062610434</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/-HTMyuoZZmcI/Tp312OoFZwI/AAAAAAAAAts/HG8JIQnyggA/s220/63294_481448885235_672250235_7212947_4269557_n.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3162625598067227745.post-8371356627546866400</id><published>2011-03-09T16:23:00.000-08:00</published><updated>2011-04-11T17:14:49.977-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='news'/><category scheme='http://www.blogger.com/atom/ns#' term='politics'/><title type='text'>Rising gas prices and their effect</title><content type='html'>This post is in response to an article I read on CNNMoney.com. Now, let's be frank, CNN is in the business of "gotcha news", posting articles that are often inaccurate or misrepresented with the intent to draw views and reap chatter. Typically, I wouldn't let it bother me, but this particular article is heavily mis-representing things to make a point. That point is: "Stop whining, California."&lt;br /&gt;&lt;br /&gt;&lt;a href="http://money.cnn.com/2011/03/09/news/economy/gas_prices_mississippi/index.htm"&gt;Here's the article&lt;/a&gt;. It presents some data in an interesting fashion. It shows states shaded by the average cost for a gallon of gas and then it shows states shaded by the amount spent on gas as a percent of income. The conclusion? Despite having dramatically higher gas prices, Californians should "quit whining" because they on average spend a smaller percent of their income on gas. &lt;br /&gt;&lt;br /&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;a href="https://lh5.googleusercontent.com/-pL74wC-WQj4/TXgB9Wv3CBI/AAAAAAAAAn8/P4f2lueSDU0/s1600/Screen+shot+2011-03-09+at+2.38.45+PM.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"&gt;&lt;img border="0" height="223" src="https://lh5.googleusercontent.com/-pL74wC-WQj4/TXgB9Wv3CBI/AAAAAAAAAn8/P4f2lueSDU0/s320/Screen+shot+2011-03-09+at+2.38.45+PM.png" width="320" /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;Gas expenditure as a percent of income&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;a href="https://lh3.googleusercontent.com/-MzHEnRKiIaY/TXgB-nC8irI/AAAAAAAAAoA/m52AQBPuSso/s1600/Screen+shot+2011-03-09+at+2.38.57+PM.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"&gt;&lt;img border="0" height="224" src="https://lh3.googleusercontent.com/-MzHEnRKiIaY/TXgB-nC8irI/AAAAAAAAAoA/m52AQBPuSso/s320/Screen+shot+2011-03-09+at+2.38.57+PM.png" width="320" /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;Average gas price per gallon&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;The article accounts for this phenomenon in a series of theories about why it occurs (rather than, you know, going ahead and proving any of them). Example:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;"When you live in California or New York, where things are close by and  there's public transit, that's great -- but we don't have those options  here," said Hilary Hamblin, a Tupelo, Miss. resident who said her family  typically spends about 10% of their monthly income on gas alone.&lt;br /&gt;&lt;br /&gt;That's not uncommon in the state. Mississippi families earn a median  household income of about $37,000 -- the lowest in the country -- but  spend a whopping $402 per month, or 13.2% on gas.&lt;br /&gt;&lt;br /&gt;In contrast,  Californians earn a median of $59,000 per household, and spend about  $380, or 7.8% of their income on gas each month&lt;/blockquote&gt;What we see here is a way of manipulating the facts to put together a dramatic article that will piss off Californians and New Yorkers and rally red-staters. What we don't see is any semblance of meaning.&lt;br /&gt;&lt;br /&gt;Here's a little tid-bit the article doesn't tell you: California is a major agricultural center, including significantly more farms than Mississippi (or, it appears, any other state). Check out &lt;a href="http://www.ers.usda.gov/data/ruralatlas/atlas.htm#map"&gt;this fun little tool&lt;/a&gt; from the US Government's Economic Research Service. &lt;br /&gt;&lt;br /&gt;How does having the highest gas prices in America impact those rural workers? What percent of their income is spent on gas? Lumping all of California's 37M people together and calculating mean gas expenditure as a percent of income, then comparing to Mississippi's 3M people is like comparing apples to oranges.&lt;br /&gt;&lt;br /&gt;Personally, my gas expenditure is very low. I usually ride a bike to work. I don't drive far or often. But I guarantee you farmers down in Kern County or in other rural counties of California are driving quite a bit and paying similarly inflated gas prices to what I pay. So myself and people like me in California are bringing down that "gas expenditure as a percent of income" mean value down quite a bit. That doesn't mean the cost to the farmers here in Cali is any lower, though!&lt;br /&gt;&lt;br /&gt;I'm not saying Mississippi doesn't have it bad now. To the contrary. What I'm saying is that this article is misrepresenting facts to make a false point. Farmers and farm communities in California likely have just as much if not more reason to complain than those elsewhere because they're penalized by being in a state with huge urban areas that cause gas prices to be higher (due to increased taxes, more expensive gas blends to deal with urban air pollution, et cetera).&lt;br /&gt;&lt;br /&gt;&lt;span class="storybyline"&gt;I grabbed the data from the ERS and played around a little bit with it. Here are some interesting stats:&lt;/span&gt;&lt;br /&gt;&lt;span class="storybyline"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;blockquote&gt;&lt;span class="storybyline"&gt;&lt;b&gt;Total number of farms&lt;/b&gt;:&lt;/span&gt;&lt;br /&gt;&lt;span class="storybyline"&gt;California: 81,033&lt;/span&gt;&lt;br /&gt;&lt;span class="storybyline"&gt;Mississippi: 41,959&lt;/span&gt;&lt;br /&gt;&lt;span class="storybyline"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="storybyline"&gt;&lt;b&gt;Mean percent of land per county dedicated to agriculture&lt;/b&gt;:&lt;/span&gt;&lt;br /&gt;&lt;span class="storybyline"&gt;California: 33.28%&lt;/span&gt;&lt;br /&gt;&lt;span class="storybyline"&gt;Mississippi: 38.36%&lt;/span&gt;&lt;br /&gt;&lt;span class="storybyline"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;b&gt;&lt;span class="storybyline"&gt;Mean percent of farmland per county (for counties with &amp;gt;50% farmland)&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;&lt;span class="storybyline"&gt;California: 70.23%&lt;/span&gt;&lt;br /&gt;&lt;span class="storybyline"&gt;Mississippi: 67.67%&lt;/span&gt;&lt;br /&gt;&lt;span class="storybyline"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;b&gt;&lt;span class="storybyline"&gt;Total Population Size&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;&lt;span class="storybyline"&gt;California: 36,961,664&lt;/span&gt;&lt;br /&gt;&lt;span class="storybyline"&gt;Mississippi: 2,951,996&lt;/span&gt;&lt;br /&gt;&lt;span class="storybyline"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;b&gt;&lt;span class="storybyline"&gt;Population of rural counties (counties with &amp;gt;50% farmland)&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;&lt;span class="storybyline"&gt;California: 4,933,655 (13.35%)&lt;/span&gt;&lt;br /&gt;&lt;span class="storybyline"&gt;Mississippi: 482,751 (16.35%)&lt;/span&gt;&lt;/blockquote&gt;&lt;span class="storybyline"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="storybyline"&gt;Let these numbers settle in. Similar ratios of the population live in rural areas in both states. Rural counties appear very similar broken down by percentages here. But California has &lt;b&gt;more than twelve times&lt;/b&gt; as many people as Mississippi. In rural counties, California has about &lt;b&gt;ten times&lt;/b&gt; as many people. However, there are only about &lt;b&gt;twice&lt;/b&gt; as many farms in California as there are in Mississippi.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;But what does it all mean? It means there are ten times as many people in California paying significantly higher gas prices while doing the same job as those in Mississippi. Californians may on average make more than people in Mississippi, but does that mean the farmers in California make more than the farmers in Mississippi? This is a hard number to determine. We can use what the ESR calls the "average value of agricultural products sold". In this case, I'm taking those "rural counties" I talk about earlier with &amp;gt;50% farmland and then just calculating the straight mean of these valued.&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;&lt;b&gt;Mean for rural counties of average value of agricultural products sold&lt;/b&gt;&lt;br /&gt;California: 535,808&lt;br /&gt;Mississippi: 291,167&lt;/blockquote&gt;&lt;br /&gt;In this light, Mississippi farmers appear to make much less on average than California farmers. But if we look on a county-by-county basis, we note that there are a few counties skewing California's rosy outlook. Let's look at that distribution to see if there aren't a lot of California farmers in more dire straits than simply taking a mean would show.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="https://lh5.googleusercontent.com/-nPJ7GuuQTg8/TXgYFyKRHSI/AAAAAAAAAoE/-zP2ohKjBiE/s1600/avg_market_value_products_sold.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="246" src="https://lh5.googleusercontent.com/-nPJ7GuuQTg8/TXgYFyKRHSI/AAAAAAAAAoE/-zP2ohKjBiE/s320/avg_market_value_products_sold.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;Mississippi is in blue, California is in red (didn't expect that, did you?). We can see pretty clearly here that there are two counties skewing the California number. These are Kings County and Monterey County. If we remove those, the value of farming in California counties is down to 373,475. Not exactly making up for the fact that ten times as many people live in those counties.&lt;br /&gt;&lt;br /&gt;Now, I don't want to make generalizations. The only point I truly want to make is that an article telling Californians to "stop whining" because farmers in Mississippi have it bad is ignoring the fact that there are many farmers in California in the same situation as those ones in Mississippi. I don't want to say they have it worse, either, but my guess (emphasis on guess, as I'm not proving it) is that these folks spend a similar percent of their income on gas as well.&lt;br /&gt;&lt;br /&gt;Basically, I'm going to go out on a limb and say that given the makeup of farming counties in California and Mississippi are very similar (despite a difference in scale, given California's much larger size), the California farmer has just as much a right to be distraught about rising gas prices as the Mississippi. In fact, the California farmer has to pay significantly more per gallon, so even if they do make ten thousand dollars a year more (an estimate based on mean income in Kings County, California, which is down around 44k per year at this point), they probably spend a similar percent of income.&lt;br /&gt;&lt;br /&gt;All of that said, you won't find me complaining about gas prices. But I may end up complaining about produce prices, which is obviously directly tied to gas prices on the farmer's end.&lt;br /&gt;&lt;span class="storybyline"&gt;&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3162625598067227745-8371356627546866400?l=mendeliandisorder.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mendeliandisorder.blogspot.com/feeds/8371356627546866400/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mendeliandisorder.blogspot.com/2011/03/rising-gas-prices-and-their-effect.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3162625598067227745/posts/default/8371356627546866400'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3162625598067227745/posts/default/8371356627546866400'/><link rel='alternate' type='text/html' href='http://mendeliandisorder.blogspot.com/2011/03/rising-gas-prices-and-their-effect.html' title='Rising gas prices and their effect'/><author><name>MJ</name><uri>http://www.blogger.com/profile/15925017686062610434</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/-HTMyuoZZmcI/Tp312OoFZwI/AAAAAAAAAts/HG8JIQnyggA/s220/63294_481448885235_672250235_7212947_4269557_n.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='https://lh5.googleusercontent.com/-pL74wC-WQj4/TXgB9Wv3CBI/AAAAAAAAAn8/P4f2lueSDU0/s72-c/Screen+shot+2011-03-09+at+2.38.45+PM.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3162625598067227745.post-3196648412354102494</id><published>2011-02-11T14:50:00.000-08:00</published><updated>2011-02-11T16:38:09.470-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='bioinformatics'/><category scheme='http://www.blogger.com/atom/ns#' term='genomics'/><title type='text'>The "Data Deluge" and DNA</title><content type='html'>The current issue of Science has &lt;a href="http://www.sciencemag.org/site/special/data/"&gt;a special series of articles&lt;/a&gt; related to the "data deluge", an issue that is currently impacting numerous fields of science including genomics. Basically, it's an issue where the amount of data is outstripping our analytical capacity due to both a lack of computational power and man power.&lt;br /&gt;&lt;br /&gt;Naturally there are articles about the data deluge and genomics in the issue.&lt;br /&gt;&lt;br /&gt;One by Scott D. Kahn of Illumina entitled "On the Future of Genomic Data"&lt;span class="Apple-style-span" style="border-collapse: separate; color: black; font-family: inherit; font-size: small; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;"&gt;&lt;span class="Apple-style-span" style="color: #333333; font-size: 13px; line-height: 16px; text-align: left;"&gt;&lt;/span&gt;&lt;/span&gt; [&lt;a href="http://www.sciencemag.org/content/331/6018/728.full"&gt;link&lt;/a&gt;] is in great part about the meaning of "raw data" in genomics. He basically explains that "raw data" in next-gen sequencing is being defined downstream of the actual raw data, either as the sequence reads translated from the images (which is the ultimate true raw data) or as the variations from the reference. He explains well that these definitions are in great part due to the fact that the actual raw data represents an enormous amount of computational data that is by and large unnecessary.&lt;br /&gt;&lt;br /&gt;Another entitled "Will Computers Crash Genomics" by Elizabeth Pennisi [&lt;a href="http://www.sciencemag.org/content/331/6018/666.full"&gt;link&lt;/a&gt;] discusses two major issues. First, it emphasizes Lincoln Stein's view that funding agencies have inadequately funded analysis in favor of data production, and that if this doesn't change we'll be in for some tough times because there will be far too much data for our bioinformatics infrastructure to support. Second, it discusses the potential solution to the genomics data deluge found in cloud computing (while warning about the privacy issues that solution brings with it).&lt;span class="Apple-style-span" style="border-collapse: separate; color: black; font-family: Times; font-size: small; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;"&gt;&lt;span class="Apple-style-span" style="border-collapse: collapse; color: #222222; font-family: 'Lucida Grande',arial,helvetica,sans-serif; font-size: 13px; line-height: 16px;"&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Both articles are well written and astute. I think together they emphasize a lot of the issues related to our data deluge problem in genomics. I think the Pennisi article in particular puts the focus in an important place: That bioinformatics as a field is behind our data production capacity and, thus far, does not appear to be catching up at an adequate rate. That may be good news for bioinformaticists in genomics like myself, but it's not good news for genomics as a field. (A humorous note: the word "bioinformaticists" comes up on my spell checker as not existing. So does "bioinformaticians". That speaks volumes.)&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;The Analytical Deluge&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I think one thing that the articles hint at but don't really touch on at any depth is the advancement and dissemination of analytical approaches. Specifically, analytical approaches have advanced significantly, but not at a fast enough pace to keep up with data production. I estimate that currently, the amount of sequence data in the world is growing exponentially, but analytical approaches have, in contrast, advanced at a plodding pace. Most of these advances stem from a handful of institutes with huge funding that produce the most data and thereby require the most robust analytical approaches. &lt;br /&gt;&lt;br /&gt;We can look just over the past two years at how significantly alignment and variant calling have improved, for example. While these advances have been a major boon, they've also made current analyses nearly incomparable with old analyses. If we want to compare a genome we've just recently processed and analyzed with one from two years ago, it's not really a fair comparison unless we go re-analyze that two year old dataset using current tools. This is an issue we encountered with the U87MG genome. We compared our variant calls to the Watson and YanHuang genomes and ended up with a huge number of differences. But most of them can probably be attributed to each project using different sequencing platforms with different alignment algorithms and different variant calling algorithms with different settings. We can't be expected to go obtain all these huge data sets ourselves and re-analyze them to match our current projects. We have neither the infrastructure nor man-power (read: funding) for that.&lt;br /&gt;&lt;br /&gt;I will say the community (or, particularly, the 1000 genomes project) has done a nice job pushing standards that will help make us able to use larger portions of the world's genomic data. However, a bit of that is self-fulfilling. The 1000 Genomes is the largest source of genomic data in the world right now (though the Beijing Genomics Institute may outpace them in the future) and, no surprise, the alignment algorithm (BWA), variant caller (GATK) and even the formats of the data (SAM for alignments and VCF for variants) used by most genomic scientists today were created by them.&lt;br /&gt;&lt;br /&gt;Are they the best ways of doing things? Certainly not. I think even the authors of said programs will admit there will be better ways of doing these analyses even in the not so distant future (and it's not unlikely that these same people may be the ones who develop them). But it takes a lot of work to computationally create programs of this sort and, honestly, there is neither enough funding nor enough people to get it done quickly.&lt;br /&gt;&lt;br /&gt;And I would say that's the problem. Who's going to go back and bring the old data up to the current standard every time a new and better analysis comes along? Do we just leave that data to the back issues of Nature and proceed with new data? I think not.&lt;br /&gt;&lt;br /&gt;So it comes full circle, really. We need to keep more than just a list of variants relative to the reference genome. That's not adequate for reanalysis. At this point, I think it's safe to say we won't be squeezing anything more useful out of the images off the machines, but the raw read data is probably as far as we can go for the forseeable future if we want our data to stay relevant.&lt;br /&gt;&lt;br /&gt;But the available resources for storing that data are not yet adequate. The Sequence Read Archive (SRA) is a good attempt, but difficult to use and navigate and likely limited in its future given it will need infinite expansion capability. Clouds offer a cost-effective alternative, but storing personal genomic data on a company owned computer system definitely rubs the medical and research community the wrong way.&lt;br /&gt;&lt;br /&gt;So what do I offer as a solution? The answer is the same answer for nearly any problem of this sort:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;$$$&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Anyone who's applied for a grant from bioinformatics can tell you how insanely difficult it can be to get funding for such projects.&lt;br /&gt;&lt;br /&gt;Just try getting funding for a project to establish a standard format for structural variation calling (because, let's face it, the current VCF attempt at it is not good enough). Try getting funding to write an assembly algorithm that doesn't take either three months or 96GB of RAM to run on a human genome (wait, does that even exist yet?). Or just write a grant about sequencing twenty cancers and do it anyway with that funding, because that's much more likely to get funded.&lt;br /&gt;&lt;br /&gt;It's like Chris Ponting says in Elizabeth Pennisi's Science article: There needs to be a priority shift for funding in academia towards more bioinformatics.&lt;br /&gt;&lt;br /&gt;Then again, we can see already that the big companies are scooping up as many bright, young bioinformaticists as they can. Perhaps we will be leaving these analyses to the corporate world. I can already see immense value in a company that solely develops the best software for specific bioinformatics needs--in fact, they already exist as Novocraft, CLCBio, and many others. &lt;br /&gt;&lt;br /&gt;But this leaves us with the issue of what to do with all our data. One of my fellow post docs here at Stanford has about twenty 2TB external hard drives under his desk. I can tell you right now: That's no future for genomic data. Sure, snail mailing 10TB of data is faster (and, ironically, more secure) than sending it over the Internet, but even over a USB3, it's a long time to even transfer that much data from the drive to an internal disk. Solid state drives are still prohibitively expensive, but ultimately that's what we want to be using (as a large portion of our data analysis time right now is reading and writing to disks!). Meanwhile, labs can't be expected to forever buy hard drives &lt;i&gt;nor&lt;/i&gt; to rely on cloud "solutions" that are potentially insecure and often rely on snail-mailing disks.&lt;br /&gt;&lt;br /&gt;And then there's the problem I mentioned above: What about bringing old data up-to-date for the sake of comparisons? We have a ton of data already that isn't commonly being used because it's "too old", though in actuality there's nothing wrong with it and it could easily be brought up to date given the disk space, manpower and time to do it.&lt;br /&gt;&lt;br /&gt;I'd love to see someone write a grant to the effect of: "We're going to take all the world's genome sequencing data and keep it up-to-date with the latest analytical techniques." I'd love to see that project exist and get funded. Maybe I'll write it.&lt;br /&gt;&lt;br /&gt;&lt;i&gt;Thanks for reading! So what do you think of this "data deluge" problem? Is it a problem at all in genomics?&lt;/i&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3162625598067227745-3196648412354102494?l=mendeliandisorder.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mendeliandisorder.blogspot.com/feeds/3196648412354102494/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mendeliandisorder.blogspot.com/2011/02/data-deluge-and-genomics.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3162625598067227745/posts/default/3196648412354102494'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3162625598067227745/posts/default/3196648412354102494'/><link rel='alternate' type='text/html' href='http://mendeliandisorder.blogspot.com/2011/02/data-deluge-and-genomics.html' title='The &quot;Data Deluge&quot; and DNA'/><author><name>MJ</name><uri>http://www.blogger.com/profile/15925017686062610434</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/-HTMyuoZZmcI/Tp312OoFZwI/AAAAAAAAAts/HG8JIQnyggA/s220/63294_481448885235_672250235_7212947_4269557_n.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3162625598067227745.post-5157547594175792243</id><published>2010-06-01T18:05:00.000-07:00</published><updated>2010-06-01T18:05:35.136-07:00</updated><title type='text'>Genome Studies: Where Do We Go From Here?</title><content type='html'>&lt;span class="Apple-style-span" style="font-family: Arial; font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;&lt;div class="MsoNormal"&gt;As I prepare to write my PhD dissertation, I have been reflecting on the state of genomics, particularly of publishing genomics. A question I sometimes get asked, which surprises me every time, is: “What is the point of sequencing the whole genome?” I admit, the first time I was shocked. But I tried to think about it from this other biologist’s perspective. From his point of view, sequencing a whole genome was all it really took to publish in a major journal. It is no simple task to sequence an entire genome, but it is more of a “data production” mode. Someone like him needs to go through ten or more individual, unique experiments to establish what a particular mutation in a particular gene is doing in a mouse before he can publish. I think biological scientists generally desire hypothesis-driven experimentation—answering a question by performing experiments. They see sequencing as just one big experiment. And maybe it is, but it also gives an incredibly large amount of information, making it a lot different from a single experiment of another type.&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;In our sequencing of the U87MG cell line, we tried to derive some biological relevance from the sequence and we did so by making general observations about the genome. For some biologists, this can in some ways feel lacking for some reason. For this reason, much of the field is moving toward whole exome sequencing in order to sequence the low-hanging fruit across a large number of samples rather than exploring the whole genome. They want to supplement their more traditional experimental approaches with next-gen sequencing, but they don’t see a point to whole genomes. &lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;I think there’s a great deal of merit to that, but I also think there are important, biologically relevant questions that cannot be answered with whole exome alone, and therefore I do think there is need to perform whole genome studies in some cases. It all depends on the question being asked.&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;/div&gt;&lt;a name='more'&gt;&lt;/a&gt;&lt;span class="Apple-style-span" style="font-size: medium;"&gt;A Whole Genome Sequencing Project&lt;/span&gt;&lt;br /&gt;&lt;div class="MsoNormal"&gt;&lt;span class="Apple-style-span" style="font-size: medium;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;What were those observations we found in the U87MG sequence?&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;One thing we did not find was an abnormally large number of small variants. To the contrary, U87MG appeared quite consistent with the relatively few other genomes that had been publish up to that point in terms of the sheer number of small variations. The vast majority of small variants were in the dbSNP database, and of those that weren’t, about 10% overlapped with the YanHuang (first Chinese) genome. A whopping 50% of the single nucleotide variants (SNVs--including those in dbSNP) were shared with YanHuang. &lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;U87MG being a cell line derived from a Caucasian glioma, I can say honestly that I was surprised to not see a significantly higher number of small variations in its genome. We were certainly hoping, with it being the first cancer cell line to be whole genome sequenced, that we would find something striking in terms of variation.&lt;/div&gt;&lt;div class="MsoNormal"&gt;And then we did: Structural variations. The genome of U87MG is well known for its cytogenetic aberrations, but the resolution of karyotyping, SKY, and microarrays is not high enough to accurately elucidate the complexity of those aberrations. Sequencing, on the other hand, can visualize these events in much higher resolution, such that we gained a picture that suggested a more complex mechanism for structural variation than merely large genomic breaking and rejoining events. For example, small regions of other chromosomes being nestled between large fragments of other chromosomes, facilitating a major translocation.&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;But again, what did it mean biologically? With the sequence alone, it’s hard to say much about it. We described our observations: That structural variation appears to be the major mode of mutation in this cancer cell line, and that said structural variations are often more complex than previously thought. I do think there’s a great deal of value to these observations, but we definitely wanted to find a smoking gun—a new cancer gene like IDH1, for example.&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;Now, all of that said, U87MG’s genome is important for other reasons even without a “smoking gun”. Specifically, it is one of the most used cancer cell lines, and for certain the most used brain cancer cell line. It is very often used as an in vitro model for brain cancer. Tons of papers have been published using U87MG. For any of those papers, going back and looking at the sequence, which is now freely available, could very possibly reveal new and important insight into the genetic etiology of a given phenotype, such as drug response. But as for the U87MG sequencing project, getting the sequencing done, aligning the data, calling and annotating variants and validating was a huge effort by itself, done by a fairly small group for a relatively small amount of money. &lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;The sequencing data itself from our U87MG project is being used by other genomic scientists in their studies, as it’s an easily-accessible, freely-obtainable cancer sequence. The advantages of that could fill another blog post, but sufficed to say, it will continue to produce results for years after we published it, which makes the whole thing more than worthwhile.&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;The U87MG project and paper was not merely for the sake of sequencing a genome, however. It was a showcase of genomic technology and analytical techniques. That was the tail-end of a time when every group doing genomic analysis performed their own techniques and designed their own analytical tools to get everything done. For our part, we used the SOLiD system, and were the first group to publish a whole genome using it outside of Life Technologies itself. We used an alignment algorithm written by a member of the lab (Nils Homer-BFAST), annotated variants using a database written by another member of the lab (Brian O’Connor-SeqWare) and called structural variations using a program written by myself (Breakway). &lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span class="Apple-style-span" style="font-size: medium;"&gt;Where Do We Go From Here?&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span class="Apple-style-span" style="font-size: medium;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;So what is the next step? Some people, particularly from the larger genome sequencing groups, feel that publishing on single whole genomes is no longer worthwhile. Sequence ten whole genomes, or a hundred, or a thousand, and then publish. But I don’t see this as the only route for performing biologically meaningful whole genome studies. &lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;In the case of cancer, for example, we have already shown in one cancer cell line sample that cytogenetic abnormalities are the major mutational mechanism. If this holds true for other cancer samples, simply whole exome sequencing isn’t going to be enough, because it doesn’t generally resolve those events. I think there is going to be a lot of power in paired sequencing of patient tumor and normal samples for the sake of discovering novel tumor mutations. Ironically, I see that as a major source of low-hanging fruit. By supplementing those experiments with some other biological experiments—assays to test the effect of detected mutations in cell lines, for example—there is an excellent chance of detecting cancer mutations that would be very unlikely to be detected through other means.&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;As for other genetic disorders, perhaps whole genome is not practical yet, but ultimately I again think it will be the way to go in the future when it’s more affordable and analysis is streamlined. The Autism genome, for example, may take thousands of whole genomes before it really gives us the genetic reasons for Autism. &lt;span style="mso-spacerun: yes;"&gt;&amp;nbsp;&lt;/span&gt;But, fortunately for us, a time is coming soon when sequencing will actually be that affordable. &lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;The value of a single genome sequence is dependent on the experiment at hand. It may only take a single paired tumor/normal sample being whole genome sequenced to detect the next major cancer gene. Doubtless it will take just a few individuals being sequenced to identify the cause of many Mendelian disorders. But it may take thousands of whole genomes to figure out Autism or Athersclerosis. It’s likely we’ll be rewarded not by relying solely on whole genome sequencing for those experiments, but by combining whole genome studies with other omics techniques—analyzing metabolic flux, assessing the whole inflammatory response, ChIP-seq, RNAseq, et cetera. It may be that analyzing the entire biological system through all these means and putting it all in the context of whole genome sequences results in the answers to many of these complex disorders.&lt;/div&gt;&lt;/span&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3162625598067227745-5157547594175792243?l=mendeliandisorder.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mendeliandisorder.blogspot.com/feeds/5157547594175792243/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mendeliandisorder.blogspot.com/2010/06/genome-studies-where-do-we-go-from-here.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3162625598067227745/posts/default/5157547594175792243'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3162625598067227745/posts/default/5157547594175792243'/><link rel='alternate' type='text/html' href='http://mendeliandisorder.blogspot.com/2010/06/genome-studies-where-do-we-go-from-here.html' title='Genome Studies: Where Do We Go From Here?'/><author><name>MJ</name><uri>http://www.blogger.com/profile/15925017686062610434</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/-HTMyuoZZmcI/Tp312OoFZwI/AAAAAAAAAts/HG8JIQnyggA/s220/63294_481448885235_672250235_7212947_4269557_n.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3162625598067227745.post-5950431484887161859</id><published>2010-05-23T15:03:00.000-07:00</published><updated>2010-05-23T15:09:56.106-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='bioinformatics'/><category scheme='http://www.blogger.com/atom/ns#' term='webtools'/><title type='text'>Google Charts API</title><content type='html'>So there are a lot of free tools online that are fun and easy to use. The &lt;a href="http://code.google.com/apis/chart/image_charts.html"&gt;Google Charts API&lt;/a&gt; is a free, powerful on-the-fly chart generator. Apparently it was designed for in-house use (some of the charts look familiar--I think I've seen them on &lt;a href="http://www.google.com/analytics/"&gt;Google Analytics&lt;/a&gt;), but they decided it was useful enough to let the world have access to them. &lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: left;"&gt;We actually used this in the &lt;a href="http://www.plosgenetics.org/article/info:doi%2F10.1371%2Fjournal.pgen.1000832"&gt;U87MG paper&lt;/a&gt; to generate our Venn diagrams. Figures are created by adjusting parameters in the URL, though they've added a &lt;a href="http://code.google.com/apis/chart/docs/chart_playground.html"&gt;live chart design tool&lt;/a&gt; that makes designing figures a bit easier.&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: left;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: left;"&gt;As a simple example, I've been charting my weight loss (yes, I'm on a diet!) using the API:&amp;nbsp;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://chart.apis.google.com/chart?cht=lc&amp;amp;chtt=Morning+Weights&amp;amp;chs=500x500&amp;amp;chd=t:85,75,60,40&amp;amp;chxt=x,y,x,y&amp;amp;chxr=1,200,220,1&amp;amp;chxl=0:%7CMay%2019%7CMay%2020%7CMay%2021%7CMay%2022%7C2:%7C%7CDate%7C%7C3:%7C%7CWeight+%28lbs%29%7C" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="320" src="http://chart.apis.google.com/chart?cht=lc&amp;amp;chtt=Morning+Weights&amp;amp;chs=500x500&amp;amp;chd=t:85,75,60,40&amp;amp;chxt=x,y,x,y&amp;amp;chxr=1,200,220,1&amp;amp;chxl=0:%7CMay%2019%7CMay%2020%7CMay%2021%7CMay%2022%7C2:%7C%7CDate%7C%7C3:%7C%7CWeight+%28lbs%29%7C" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;All the data to generate this chart is encoded in the URL:&lt;br /&gt;&lt;div style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;http://chart.apis.google.com/chart?cht=lc&amp;amp;chtt=Morning+Weights&amp;amp;chs=500x500&amp;amp;chd=t:85,75,60,40&amp;amp;chxt=x,y,x,y&amp;amp;chxr=1,200,220,1&amp;amp;chxl=0:|May%2019|May%2020|May%2021|May%2022|2:||Date||3:||Weight+%28lbs%29|&lt;/div&gt;&lt;br /&gt;The API is pretty manual for the time being. For example, axis scaling is completely manual. Notice that I set &lt;span style="font-family: &amp;quot;Courier New&amp;quot;,Courier,monospace;"&gt;chd=t:85,75,60,40&lt;/span&gt;, which are the weight values (217, 215, 212, 208) relative to the Y-axis scale (which is always ranged 0-100). Also note that to categorize each axis ("Date" and "Weight (lbs)", I have to add a second "x,y" to chxt, &lt;i&gt;then&lt;/i&gt; label them in chxl accordingly &lt;i&gt;and&lt;/i&gt; center them by adding in surrounding empty sets. Not overly difficult, but definitely manual.&lt;br /&gt;&lt;br /&gt;The applications for bioinformatics are pretty huge. First of all, the API just makes some pretty charts easily, so it's a decent choice for figures generally.&lt;br /&gt;&lt;br /&gt;For example, here's a Venn diagram of large insertions detected by &lt;a href="http://breakway.sf.net/"&gt;Breakway&lt;/a&gt; in a tumor/germline paired sample from the same patient:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://chart.apis.google.com/chart?cht=v&amp;amp;chs=600x400&amp;amp;chd=t:41,96,0,23&amp;amp;chco=FF6342,63C6DE,ADDE63&amp;amp;chtt=Insertions&amp;amp;chts=000000,20&amp;amp;chdl=Germline%7CTumor&amp;amp;chdls=000000,14" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="266" src="http://chart.apis.google.com/chart?cht=v&amp;amp;chs=600x400&amp;amp;chd=t:41,96,0,23&amp;amp;chco=FF6342,63C6DE,ADDE63&amp;amp;chtt=Insertions&amp;amp;chts=000000,20&amp;amp;chdl=Germline%7CTumor&amp;amp;chdls=000000,14" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;And here's a pie chart showing events detected in the tumor:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://chart.apis.google.com/chart?cht=p3&amp;amp;chs=700x300&amp;amp;chd=t:3,96,101&amp;amp;chco=FF6342%7CADDE63%7C63C6DE&amp;amp;chl=Int.%20Translocations%20%283%29%7CInsertions%20%2896%29%7CDeletions%20%28101%29&amp;amp;chtt=Tumor%20Events&amp;amp;chts=000000,20" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="171" src="http://chart.apis.google.com/chart?cht=p3&amp;amp;chs=700x300&amp;amp;chd=t:3,96,101&amp;amp;chco=FF6342%7CADDE63%7C63C6DE&amp;amp;chl=Int.%20Translocations%20%283%29%7CInsertions%20%2896%29%7CDeletions%20%28101%29&amp;amp;chtt=Tumor%20Events&amp;amp;chts=000000,20" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;These images are linked directly from the API, so check the image location for the code used to generate them.&lt;br /&gt;&lt;br /&gt;Probably one of the most powerful parts of the API, though, is the ability to generate them on-the-fly from URLs. This would make it a useful tool for auto-generating figures of performance stats that could be remotely monitored, for example. Could be pretty nice for monitoring sequencer performance, project stats, et cetera.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3162625598067227745-5950431484887161859?l=mendeliandisorder.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mendeliandisorder.blogspot.com/feeds/5950431484887161859/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mendeliandisorder.blogspot.com/2010/05/google-charts-api.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3162625598067227745/posts/default/5950431484887161859'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3162625598067227745/posts/default/5950431484887161859'/><link rel='alternate' type='text/html' href='http://mendeliandisorder.blogspot.com/2010/05/google-charts-api.html' title='Google Charts API'/><author><name>MJ</name><uri>http://www.blogger.com/profile/15925017686062610434</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/-HTMyuoZZmcI/Tp312OoFZwI/AAAAAAAAAts/HG8JIQnyggA/s220/63294_481448885235_672250235_7212947_4269557_n.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3162625598067227745.post-2609793619540855985</id><published>2010-05-06T17:38:00.000-07:00</published><updated>2010-05-06T17:38:47.876-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='blogging'/><title type='text'>Google Verified Blogging</title><content type='html'>The next step in optimizing the Blog involves making sure it gets visibility through Google searching by getting it "Google Verified". This is really simple.&lt;br /&gt;&lt;br /&gt;Go to &lt;a href="http://www.google.com/webmasters/"&gt;Google Webmaster Central&lt;/a&gt; and sign in. Add your blog. Your blog will be added instantly, but you'll have to verify it. This is easy to do using the &lt;b&gt;meta tag &lt;/b&gt;with Blogger. Go to your &lt;b&gt;Blogger Dashboard&lt;/b&gt;, go to &lt;b&gt;Layout&lt;/b&gt;, and edit HTML. Go to the bottom of your layout's head and insert the meta tag. The easiest way to do this without screwing anything up is to put it right above the last line of the header. Voila! Verify your site after implementing the meta tag and you're good to go.&lt;br /&gt;&lt;br /&gt;Google Webmaster Central includes some interesting stats especially regarding searches that pop up your blog. It's no replacement for &lt;a href="http://www.googleanalytics.com/"&gt;GoogleAnalytics&lt;/a&gt;, but it does make it easier for Google to put your blog entries up as search results when appropriate.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3162625598067227745-2609793619540855985?l=mendeliandisorder.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mendeliandisorder.blogspot.com/feeds/2609793619540855985/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mendeliandisorder.blogspot.com/2010/05/google-verified-blogging.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3162625598067227745/posts/default/2609793619540855985'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3162625598067227745/posts/default/2609793619540855985'/><link rel='alternate' type='text/html' href='http://mendeliandisorder.blogspot.com/2010/05/google-verified-blogging.html' title='Google Verified Blogging'/><author><name>MJ</name><uri>http://www.blogger.com/profile/15925017686062610434</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/-HTMyuoZZmcI/Tp312OoFZwI/AAAAAAAAAts/HG8JIQnyggA/s220/63294_481448885235_672250235_7212947_4269557_n.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3162625598067227745.post-7318173956541442243</id><published>2010-05-06T16:22:00.000-07:00</published><updated>2010-05-06T16:23:18.624-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='blogging'/><title type='text'>Setting up GoogleAnalytics with Blogger</title><content type='html'>&lt;b&gt;&lt;a href="http://www.google.com/analytics/"&gt;GoogleAnalytics&lt;/a&gt;&lt;/b&gt; is a very cool free online tool for tracking site usage and access. It includes some very cool features like showing usage over time, tracking sources and showing where users are viewing your page from. It's very simple to set up in a traditional webpage, too, by just planting a little snippet of code at the bottom of your pages. It's also very easy to set up on a Blogger page.&lt;br /&gt;&lt;br /&gt;First, log on to &lt;b&gt;&lt;a href="http://www.google.com/analytics/"&gt;GoogleAnalytics&lt;/a&gt;&lt;/b&gt; and set up a new account for your blog. You'll receive tracking code that needs to be inserted into the HTML of your blog's pages. Fortunately, this is really easy thanks to Blogger using a HTML template for every page of your blog.&lt;br /&gt;&lt;br /&gt;In your &lt;b&gt;Blogger Dashboard&lt;/b&gt;, go to &lt;b&gt;Layout &lt;/b&gt;and add a gadget to the &lt;b&gt;Footer&lt;/b&gt;. The gadget you want is "&lt;b&gt;HTML/JavaScript&lt;/b&gt;". Now just copy and paste the tracking code from GoogleAnalytics into this gadget. (Leave the title blank--there's no need for it to have a title.) Save your layout and you should be good to go. Within a few hours, your GoogleAnalytics for your blog ought to have a green checkmark signifying that GA is receiving data from your blog.&lt;br /&gt;&lt;br /&gt;For those with HTML savvy or heavily modified layouts, you can do the same thing by just putting the tracking code at the bottom of your page's body. Then again, you probably already knew that.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3162625598067227745-7318173956541442243?l=mendeliandisorder.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mendeliandisorder.blogspot.com/feeds/7318173956541442243/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mendeliandisorder.blogspot.com/2010/05/setting-up-googleanalytics-with-blogger.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3162625598067227745/posts/default/7318173956541442243'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3162625598067227745/posts/default/7318173956541442243'/><link rel='alternate' type='text/html' href='http://mendeliandisorder.blogspot.com/2010/05/setting-up-googleanalytics-with-blogger.html' title='Setting up GoogleAnalytics with Blogger'/><author><name>MJ</name><uri>http://www.blogger.com/profile/15925017686062610434</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/-HTMyuoZZmcI/Tp312OoFZwI/AAAAAAAAAts/HG8JIQnyggA/s220/63294_481448885235_672250235_7212947_4269557_n.jpg'/></author><thr:total>0</thr:total></entry></feed>
