Friday, May 11, 2012

ANNOVAR Patching (May 11, 2012)

Making Annovar Work

Add support for Feb 1kG release



It turns out I had made a typo in the command and the Feb 2012 variants work fine! Just be very careful to have it as "1000g2012feb_all" in the annotation part.

Kai, the author of Annovar, explained to me that the 1kG is going to start releasing variants in ethnic subgroups in the future, thus the addition of the "_all" to these datasets now. Works for me! (see below)

Fix to 1kG annotation confusion

Also something to note is that the Annovar website is a bit unclear (in my opinion) about the naming scheme it uses for the databases from 1kG. To download them, one would use the naming scheme on the downloads page (e.g. "1000g2011may"). 

But line 136 in the annotate_variation.pl script enforces a naming scheme that doesn't match this. Rather, one would need to use "1000g2011may_all" or else the db isn't recognized as native and is treated as a "generic" db. 

Rather than patching a "fix" to that, I simply changed my queries to add the "_all" onto all of the 1kG annotations.

TFBS memory problem

I don't know if I'm experiencing this because of using a new cluster system (I think that's why) or if it's the new version of Annovar, but I was actually having annotation against transcription factor binding sites ("tfbs" in Annovar). Thankfully, there's now memory functions built into Annovar that seem to help this. 

I simply added "--memtotal 100000000" and it started working.

Not sure what happens if it actually tries to use more than that much memory--does it dump to a tmp file or crash? Not sure. But so far it's working with this much room.

What's kind of weird is no other database has this issue so far. Just TFBS of all things.

2 comments:

  1. Hello Michael,

    I can't seem to replicate your issue in my own samples, as that change resulted in the exact same output file from 273576 variants. (187063 variants were annotated with this database)

    When you say it partially works do you mean that some variants are missed?

    Thanks,
    Rick

    ReplyDelete
    Replies
    1. Thanks for your post, Rick!

      I went through and tried to figure out what the problem with what I was doing was because Kai emailed me back and showed me that he had added another function that handles this. I realized that my original problem was from a typo!

      Specifically, I was using "-dbtype 1000g2012feb_al":

      perl annotate_variation.pl snpindel.avinput -filter -buildver hg19 -dbtype 1000g2012feb_al -v --outfile test humandb/
      NOTICE: Variants matching filtering criteria are written to test.hg19_AL.sites.2012_02_dropped, other variants are written to test.hg19_AL.sites.2012_02_filtered
      NOTICE: Processing next batch with 61849 unique variants in 61853 input lines
      Error: cannot read from input database file humandb/hg19_AL.sites.2012_02.txt: No such file or directory

      When I tried it later, I rewrote it correctly and it worked. But because I had been copy-pasting the commands, it appeared consistent.

      Facepalm. Anyway, thanks for pointing it out!

      Delete