What's hidden in your GWAS?
Over the past decade, genome-wide association studies (GWAS) have identified numerous loci for a range of complex diseases. Often these genomic locations fall within non-coding regions that have limited annotation and no obvious functional consequence. Additionally, most loci account for only a small fraction of disease heritability, with the vast majority of published SNPs having odds ratios under two. The standard GWAS analysis paradigm is ill-suited for collecting these functionally ambiguous, small-effect associations and placing them in an informative context. Given the apparent polygenic nature of most complex disease, better integrative approaches are required to extract information from all disease associations, not just the top hits, and place them in an informative context.

How can Sherlock help?
Like a good detective, Sherlock attempts to aggregate small clues from GWAS to implicate the real "culprits" of complex disease. It uses a database of gene expression associations (eQTL) in different tissues to identify patterns in GWAS (i.e. genetic signatures) that match those for specific genes. Unlike other approaches, it incorporates information from all eQTL SNPs, including those in both cis- and trans-. In isolation, many such associations fall below genome-wide significance and are typically ignored. However, multiple loci may operate in concert with other polymorphisms to alter the expression of functionally-relevant genes. Typically, there is no indication that such genes are disease-relevant from just the GWAS results alone. Our collection of sample results highlights several such instances.

Finding disease genes is elementary, my dear Watson!
Sherlock Holmes

How do I use Sherlock?
Simply submit your list of GWAS associations (SNPs and p-values) as described here. It is important to upload all SNPs in your association study, not just the top hits. Sherlock may be able to group multiple lower-confidence SNPs to discover functionally-important genes. The system will email results to you, usually within one day. Currently, all eQTL studies in our database are for CEU cohorts. If you have access to imputed SNPs (not just the set of SNPs on your microarray), please use these.

How does it work?
We posit that if a molecular trait has a causal relation to a complex phenotype, then any genetic variation that affects this molecular trait can affect the phenotype as well. Sherlock uses a Bayesian statistical method to match the "signature" of genes from eQTL with patterns of association in GWAS. Compared with earlier eQTL approaches for mining GWAS, our method uses gene expression SNPs in both cis- and trans-, can distinguish causality from coincidence, and can be generalizable to any molecular trait (e.g. metabolite level).

The degree of overlap in GWAS and eQTL SNPs can indicate a causal gene -> disease relationship.
Causal Example




Hao Li Laboratory at The University of California, San Francisco