Friday, February 17, 2012

Exome Annotations

 I just posted a thread on 23andMe about which annotations I use for my exome data. Here's what I said:

I currently use Annovar for annotating VCF files. The output from Annovar is not particularly intuitive, so I wrote a perl script that generates a VCF-based report. I thought I would share the annotations I've been using and ones I plan to add, and see if anyone else has any other annotation ideas. These could be useful for us to annotate our own genomes (and potentially for 23andMe to provide in the future).

The annotations I've been including are:

  • Gene annotation (type of mutation--exonic, intronic, splicing, etc.)
  • Gene name
  • Mutational description (i.e. specific amino acid change, etc.)
  • dbSNP130
  • dbSNP135
  • WashU Exome Variant DB (EVS)
  • Transcription Factor Binding Site (TFBS)
  • SIFT score
  • PolyPhen 2 score (PP2)
  • GWAS presence
  • Segmental duplication

(The reason I include both dbSNP130 and 135 is that 135 contains quite a few SNPs that are potentially meaningful from a disease and trait standpoint while 130 is mostly markers not directly affecting diseases and traits. 130 is a subset of 135. Also, the EVS is potentially more useful than either of them as a filtering device.)

Ones that I would like to include in the future:
  • VAAST
  • MIE sites/scores (Mendelian inheritance errors)
  • 23andMe annotations (anything from 23andMe's SNP databases--can 23andMe help with that?)

Any other ideas for great annotations that should be included?

The idea behind these types of annotations is to give us a way to sift through the data and extract biologically meaningful results. For example, we are most interested in mutations that actually cause a protein coding change, that are uncommon in the population, and that are predicted to have a dramatic effect on function.

So far these types of annotations have allowed me to narrow very long lists of results in exomes (think on the order of 30-50,000 mutations)  down to just a handful (1-20) candidate mutations for particular Mendelian disorders.

Anything I missed?

Friday, February 10, 2012

Sequencing My Exome: Why?

“Why do you want to do this?”
My wife, immediately after I tell her I'm going to sequence my own exome.

There are a few times in life where you want to do something so badly, but find it difficult to convey to others why. This was one of those times. Such a simple question, but so many different answers. And each answer as valid as all the others. All of them coming together to explain why I would want to do something so, well, unusual.

I could frame a whole dissertation on the reasons behind wanting to sequence my own exome. (And I will.) But first, the simple answer:

I’m curious about myself.

I want to see if I can figure out why I am the way I am. And by that I mean both physically and mentally. For some people, this isn’t something they’ll ever think about. For others, they might see very clearly that they are this way because God made them this way, or because their parents raised them like this, or because they had bad luck. They may simply accept that they have specific features that make them who they are and aren’t concerned with why.

For me, those answers are not good enough.

I know that there are mysteries to be solved in my genetic code. It comes with the territory of being a geneticist. That said, though, almost everyone thinks this way, usually without even realizing it. We can all look at our own families as a proxy for genetics. If your mom had type 2 diabetes and your sister has type 2 diabetes and your uncle has type 2 diabetes, you’re pretty sure you have a higher chance of getting type 2 diabetes. You’ll hear all sorts of people saying that—“guess I got my mom’s bad genes” and “guess he took after his father” and so forth. If you’ve ever known somebody who got old enough, you might have heard her tell you about her mother lived to a ripe, old age and her mother before her, and so on. That’s what I call thinking genetic.

The difference for me is that I’m thinking genetic at a different level. I actually want to look at my genetics to try to explain these types of things. I don’t believe in fate without reason. If my mom lives to be 90, and my grandmother lived to be 90, I want to know if I got the mutations that helped them get there. Could I just shrug my shoulders, say, “I probably did,” and move on? Absolutely. But that’s just not good enough for me.

And really, it’s not good enough for anyone. We’re no longer entering the era where we can do better than that. We’re already there. Exome sequencing represents that first major step into the era.

I think, to make a case for exome sequencing (and by the way, whole genome sequencing is basically just an expansion and improvement upon exome sequencing—more on that later), I first need to explain what we can learn from it.  And to do that, first you’ll need to know what an exome is. For those readers who already know all about this, feel free to skip down.

What is an exome, anyway?

To understand what the exome is, you first have to understand what the genome is. There are massive tomes on the details of the subject, but to describe the genome succinctly:

The genome is the blueprint for every cell in your body.

Every single protein in every one of your cells is encoded on this massive blueprint. In order to create and maintain a cell (and therefore, your body and very being), your cell quite literally reads the genome and generates certain amounts and types of various proteins to fit the particular cell it’s trying to become or to fulfill a particular function.

The exome is a subset of the genome that contains the instruction to create the proteins themselves. The exome makes up about 1-2% of the whole genome. If the genome is the blueprint, then:

The exome is the instructions for making every protein in your body.

Therefore, being able to read those instructions means we can figure out if differences in them will result in different protein structures.

What can I learn from the exome?

Identifying variations in the exome that lead to differences in proteins (which we call mutations) can give us a direct way of determining if a protein might have altered function in us compared to other people. Significant protein mutations will manifest themselves as traits. To bring up an example from a previous post, the earwax trait is a result of a variation in the exome that leads to a mutation in a protein that results in determining if your earwax will be wet or dry. But it goes far beyond that type of “interesting” trait. Mendelian disorders (which this blog derives its title from) are disorders resulting from mutations in a single gene, which we can detect in the exome (and, in fact, quite a few Mendelian disorders have been “solved” through exome sequencing now).

By sequencing the exome, we can directly assess every line of the “instructions” and identify those lines that differ from the norm.

But that’s not the only way to use this information. We can hunt for mutations that damage our proteins, and that is the first obvious thing to do when looking at the exome. But we don’t know how every mutation will affect a person. To the contrary—there are very few mutations for which we understand the effect.
In fact, the current standard in personal genomics testing (such as that from DTC companies like 23andMe or through-physician companies like Navigenics) is actually an approach that dominated the field for about a decade before next-generation sequencing really became a reality. Using microarray technology, these approaches measure specific sites known to harbor variants in the genome that are associated with a trait or disease but typically not causative for the trait or disease.

For example, right now if you were to do a standard 23andMe test, you’d have genetic variations assessed at about a million sites across your genome. These variations would then be compared to a database that tells how strongly particular variations associate with particular traits or diseases. So 23andMe can tell me that I have a collection of variants that associate with type 2 diabetes, and it can calculate how that increases my risk of getting the disease compared to the average person.

This is more of a science than people often think. This is thinking genetic at a slightly more advanced level. I could simply turn to my family history and guess that I’m at an increased risk for type 2 diabetes. However, the fact that my genetics confirm the increased risk makes it much more “real” to me. Not only do I have a family history, I actually inherited some of those genetic factors. My risk is real.

Knowing my exome sequence takes that to the next level. Rather than simply having associations, I may be actually able to go into the regions of association and identify mutations causing these problems.

Moreover, as more and more information regarding the genetic causes of various traits and diseases are discovered, my exome sequence will always be at hand for me to cross-reference. Imagine that tomorrow a study is released identifying a gene that tells you with complete confidence whether or not you’ll get type 2 diabetes. I would check that gene in my own exome for mutations immediately!

That may sound unrealistic, but when it comes to conditions like cancer, these kinds of studies come out all the time. I may identify a random mutation in a gene that pre-disposes people to getting a particular type of cancer in my own genome, and then I will know that I need to have my doctor monitor for that. Having worked closely on brain cancer for a few years, it struck me that the reason it’s the deadliest type of cancer is because by the time we detect it, it’s already at a very advanced stage. But if we have a gene or set of genes that we know predisposes people to get malignant brain tumors, we could look in our own exomes for mutations in those genes and then get ourselves MRIs starting at a particular age to try to detect them earlier and hopefully allow effective, long-term treatement.

I think anyone can see how powerful that type of diagnostic and predictive tool can be.

And that brings up a major reason to sequence one’s genome: This information is immutable. Your exome is not changing. On the day you die, you’ve got pretty much the same exome and genome you had when you were born. If a major discovery is made tomorrow, I’ll have my exome to look at for it. If another discovery is made in ten years, I can take that same exome sequence and look for it. There’s no “expiration date” on that information.

And that’s what really sold me on the whole thing, actually. My intimate knowledge that my exome is always going to be a part of me, and that our understanding of genetics and diseases will always be expanding. That means my investment now is going to pay off for my whole life. Or at least until I sequence my whole genome.

I hope that conveys my major reasoning behind why I would want to do this. Of course there are other factors as well. For one thing, I am a geneticist. Genetics is not just my job, it’s my hobby. I love it. And over the years I’ve become increasingly interested in my own genetics. But that’s honestly not the only reason. At this point, I see it as a choice that will help me keep myself healthy throughout my life. 

I think there will be a shift generally towards that thinking in the medical community at large in the very near future as well. It may only be a couple years before your doctor suggests you get your exome sequenced as well. In a society where I feel most of us already think genetic, I think it's only a matter of time before we stop simply guessing that it's genetic and instead actually prove it. And beyond that, we actually figure out that there's something we can do about it. That is empowering right there.

Thursday, February 9, 2012

Enter the Exome





My Exome kit from 23andMe has arrived! In a few short weeks, I will have my exome sequence in hand and ready to analyze. I've looked at hundreds of exomes over the past year, but only now, when I'm about to look at my own, have I started to really think about how to extract meaning from a healthy individual's exome. All of the work I've done has either been to assess exome sequencing as a science or to hunt for mutations causing specific conditions (Mendelian Disorders and novel genetic syndromes).

Now I'm going to have my own sequence in hand and have a very basic yet exceedingly complex question to answer: What does this all mean?

And with that question comes other questions:

Why do I care about my own exome?

What can I learn from it?

What justifies the cost?

Is it safe?


In the coming days, I will be posting about my answers to these questions and more. I came to a realization a couple of days ago (right after I ordered this kit) that even questions that seem simple to me as a geneticist are not so simple for most people.

"Why do you want to do this?" is harder to answer than people may think. Off the cuff I might say, "I'm a geneticist, it's what I do!" but that isn't at all the whole answer. So I am going to make it a goal to explain why anyone would want to have his or her exome (or genome) sequenced in terms that hopefully anyone can understand.