Friday, May 11, 2012

ANNOVAR Patching (May 11, 2012)

Making Annovar Work

Add support for Feb 1kG release

It turns out I had made a typo in the command and the Feb 2012 variants work fine! Just be very careful to have it as "1000g2012feb_all" in the annotation part.

Kai, the author of Annovar, explained to me that the 1kG is going to start releasing variants in ethnic subgroups in the future, thus the addition of the "_all" to these datasets now. Works for me! (see below)

Fix to 1kG annotation confusion

Also something to note is that the Annovar website is a bit unclear (in my opinion) about the naming scheme it uses for the databases from 1kG. To download them, one would use the naming scheme on the downloads page (e.g. "1000g2011may"). 

But line 136 in the script enforces a naming scheme that doesn't match this. Rather, one would need to use "1000g2011may_all" or else the db isn't recognized as native and is treated as a "generic" db. 

Rather than patching a "fix" to that, I simply changed my queries to add the "_all" onto all of the 1kG annotations.

TFBS memory problem

I don't know if I'm experiencing this because of using a new cluster system (I think that's why) or if it's the new version of Annovar, but I was actually having annotation against transcription factor binding sites ("tfbs" in Annovar). Thankfully, there's now memory functions built into Annovar that seem to help this. 

I simply added "--memtotal 100000000" and it started working.

Not sure what happens if it actually tries to use more than that much memory--does it dump to a tmp file or crash? Not sure. But so far it's working with this much room.

What's kind of weird is no other database has this issue so far. Just TFBS of all things.