While GWAS have successfully identified genomic regions associated with human traits and diseases, the biological impact of these risk variants is unclear. Here we outline a protocol to computationally predict putative target genes of GWAS risk variants using chromatin interaction profiles. Often times identification of risk genes is a first step to understand disease mechanisms, and allow for normal therapeutic approaches.
We hope that the results of this work could eventually lead to final strategies to diagnose and treat Alzheimer's disease. The main advantage of this technique is that by using 3D chromoton contact frequencies we can identify the genes affected by Alzheimer's disease risk variance even if they are thousands or even millions of base pairs away. When attempting this protocol, familiarity with R or an X pair system is critical because the user is expected to conduct the entire protocol with the system.
To perform this computational protocol, refer to the code in the text manuscript or onscreen. Begin, by setting up in R, to generate a G ranges object for credible, single nucleatide polymorphisms or SNPS. For positional mapping, setup in R then load the promoter and exonic region and generate a G range object.
Overlap the credible SNPS with the exonic regions and with the promoter regions. To link SNPS to their putative target genes using Chromaton interactions, load the Hi C dataset and generate a G range object. Overlap the credible SNPS with the Hi C G range object.
And compile AD candidate genes, defined by positional mapping and chromoton interaction profiles. Next, explore developmental trajectories. Setup in R and process the expression metadata.
Specify developmental stages and select cortical regions. Extract the developmental expression profiles of AD risk genes and compare prenatal versus postnatal expression levels. Investigate cell-type expression profiles by setting up in R and extracting cellular expression profiles of AD risk.
Finally, perform gene annotation enrichment analysis of AD risk genes. Download and configure Homer. Then run Homer and plot the enriched terms with R Studio.
A set of 800 credible SNPs was investigated using this process. Positional mapping revealed that 103 SNPs overlapped with promoters and 42 SNPs overlapped with Exons, while 84%of the SNPs remained unannotated. Using Hi-C datasets in the adult brain, an additional 208 SNPs were linked to 64 genes based on physical proximity.
In total, 284 AD credible SNPs were mapped to 112 AD risk genes. AD risk genes were associated with amyloid precursor proteins, amyloid Beta formation and immune response, which reflects the known biology of the disease. Developmental expression profiles of AD risk genes showed marked postnatal enrichment indicative of the age-associated elevated risk of the disease.
Finally, the genes were highly expressed in microglia the primary immune cells in the brain which supports the recurrent findings that AD has a strong immune basis. Here we use Hi-C data from the brain tissue to analyze a biological impact of Alzheimer's Disease risk variance. However, to apply this method to another GWAS study the level of the new Hi-C data in the relevant tissue is critical.
These results can be further studied and validated using crisper-based technologies, enhancer reporter assays, or by intersecting with other functional genomic datasets such as EQTLs. Here we identify dozens of Alzheimer's disease risk genes and we expect that the identification of these genes can help us understand their previously unknown role in Alzheimer's disease.