Tuesday, June 1, 2010

Genome Studies: Where Do We Go From Here?

As I prepare to write my PhD dissertation, I have been reflecting on the state of genomics, particularly of publishing genomics. A question I sometimes get asked, which surprises me every time, is: “What is the point of sequencing the whole genome?” I admit, the first time I was shocked. But I tried to think about it from this other biologist’s perspective. From his point of view, sequencing a whole genome was all it really took to publish in a major journal. It is no simple task to sequence an entire genome, but it is more of a “data production” mode. Someone like him needs to go through ten or more individual, unique experiments to establish what a particular mutation in a particular gene is doing in a mouse before he can publish. I think biological scientists generally desire hypothesis-driven experimentation—answering a question by performing experiments. They see sequencing as just one big experiment. And maybe it is, but it also gives an incredibly large amount of information, making it a lot different from a single experiment of another type.

In our sequencing of the U87MG cell line, we tried to derive some biological relevance from the sequence and we did so by making general observations about the genome. For some biologists, this can in some ways feel lacking for some reason. For this reason, much of the field is moving toward whole exome sequencing in order to sequence the low-hanging fruit across a large number of samples rather than exploring the whole genome. They want to supplement their more traditional experimental approaches with next-gen sequencing, but they don’t see a point to whole genomes.

I think there’s a great deal of merit to that, but I also think there are important, biologically relevant questions that cannot be answered with whole exome alone, and therefore I do think there is need to perform whole genome studies in some cases. It all depends on the question being asked.

Sunday, May 23, 2010

Google Charts API

So there are a lot of free tools online that are fun and easy to use. The Google Charts API is a free, powerful on-the-fly chart generator. Apparently it was designed for in-house use (some of the charts look familiar--I think I've seen them on Google Analytics), but they decided it was useful enough to let the world have access to them.

We actually used this in the U87MG paper to generate our Venn diagrams. Figures are created by adjusting parameters in the URL, though they've added a live chart design tool that makes designing figures a bit easier.

As a simple example, I've been charting my weight loss (yes, I'm on a diet!) using the API: 
All the data to generate this chart is encoded in the URL:
http://chart.apis.google.com/chart?cht=lc&chtt=Morning+Weights&chs=500x500&chd=t:85,75,60,40&chxt=x,y,x,y&chxr=1,200,220,1&chxl=0:|May%2019|May%2020|May%2021|May%2022|2:||Date||3:||Weight+%28lbs%29|

The API is pretty manual for the time being. For example, axis scaling is completely manual. Notice that I set chd=t:85,75,60,40, which are the weight values (217, 215, 212, 208) relative to the Y-axis scale (which is always ranged 0-100). Also note that to categorize each axis ("Date" and "Weight (lbs)", I have to add a second "x,y" to chxt, then label them in chxl accordingly and center them by adding in surrounding empty sets. Not overly difficult, but definitely manual.

The applications for bioinformatics are pretty huge. First of all, the API just makes some pretty charts easily, so it's a decent choice for figures generally.

For example, here's a Venn diagram of large insertions detected by Breakway in a tumor/germline paired sample from the same patient:


And here's a pie chart showing events detected in the tumor:


These images are linked directly from the API, so check the image location for the code used to generate them.

Probably one of the most powerful parts of the API, though, is the ability to generate them on-the-fly from URLs. This would make it a useful tool for auto-generating figures of performance stats that could be remotely monitored, for example. Could be pretty nice for monitoring sequencer performance, project stats, et cetera.

Thursday, May 6, 2010

Google Verified Blogging

The next step in optimizing the Blog involves making sure it gets visibility through Google searching by getting it "Google Verified". This is really simple.

Go to Google Webmaster Central and sign in. Add your blog. Your blog will be added instantly, but you'll have to verify it. This is easy to do using the meta tag with Blogger. Go to your Blogger Dashboard, go to Layout, and edit HTML. Go to the bottom of your layout's head and insert the meta tag. The easiest way to do this without screwing anything up is to put it right above the last line of the header. Voila! Verify your site after implementing the meta tag and you're good to go.

Google Webmaster Central includes some interesting stats especially regarding searches that pop up your blog. It's no replacement for GoogleAnalytics, but it does make it easier for Google to put your blog entries up as search results when appropriate.

Setting up GoogleAnalytics with Blogger

GoogleAnalytics is a very cool free online tool for tracking site usage and access. It includes some very cool features like showing usage over time, tracking sources and showing where users are viewing your page from. It's very simple to set up in a traditional webpage, too, by just planting a little snippet of code at the bottom of your pages. It's also very easy to set up on a Blogger page.

First, log on to GoogleAnalytics and set up a new account for your blog. You'll receive tracking code that needs to be inserted into the HTML of your blog's pages. Fortunately, this is really easy thanks to Blogger using a HTML template for every page of your blog.

In your Blogger Dashboard, go to Layout and add a gadget to the Footer. The gadget you want is "HTML/JavaScript". Now just copy and paste the tracking code from GoogleAnalytics into this gadget. (Leave the title blank--there's no need for it to have a title.) Save your layout and you should be good to go. Within a few hours, your GoogleAnalytics for your blog ought to have a green checkmark signifying that GA is receiving data from your blog.

For those with HTML savvy or heavily modified layouts, you can do the same thing by just putting the tracking code at the bottom of your page's body. Then again, you probably already knew that.