Aby wyświetlić tę treść, wymagana jest subskrypcja JoVE. Zaloguj się lub rozpocznij bezpłatny okres próbny.
Method Article
* Wspomniani autorzy wnieśli do projektu równy wkład.
We present a strategic plan and protocol for identifying non-coding genetic variants affecting transcription factor (TF) DNA binding. A detailed experimental protocol is provided for electrophoretic mobility shift assay (EMSA) and DNA affinity precipitation assay (DAPA) analysis of genotype-dependent TF DNA binding.
Population and family-based genetic studies typically result in the identification of genetic variants that are statistically associated with a clinical disease or phenotype. For many diseases and traits, most variants are non-coding, and are thus likely to act by impacting subtle, comparatively hard to predict mechanisms controlling gene expression. Here, we describe a general strategic approach to prioritize non-coding variants, and screen them for their function. This approach involves computational prioritization using functional genomic databases followed by experimental analysis of differential binding of transcription factors (TFs) to risk and non-risk alleles. For both electrophoretic mobility shift assay (EMSA) and DNA affinity precipitation assay (DAPA) analysis of genetic variants, a synthetic DNA oligonucleotide (oligo) is used to identify factors in the nuclear lysate of disease or phenotype-relevant cells. For EMSA, the oligonucleotides with or without bound nuclear factors (often TFs) are analyzed by non-denaturing electrophoresis on a tris-borate-EDTA (TBE) polyacrylamide gel. For DAPA, the oligonucleotides are bound to a magnetic column and the nuclear factors that specifically bind the DNA sequence are eluted and analyzed through mass spectrometry or with a reducing sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE) followed by Western blot analysis. This general approach can be widely used to study the function of non-coding genetic variants associated with any disease, trait, or phenotype.
Sequencing and genotyping based studies, including Genome-Wide Association Studies (GWAS), candidate locus studies, and deep-sequencing studies, have identified many genetic variants that are statistically associated with a disease, trait, or phenotype. Contrary to early predictions, most of these variants (85-93%) are located in non-coding regions and do not change the amino acid sequence of proteins1,2. Interpreting the function of these non-coding variants and determining the biological mechanisms connecting them to the associated disease, trait, or phenotype has proven challenging3-6. We have developed a general strategy to identify the molecular mechanisms that link variants to an important intermediate phenotype – gene expression. This pipeline is specifically designed to identify modulation of TF binding by genetic variants. This strategy combines computational approaches and molecular biology techniques aimed to predict biological effects of candidate variants in silico, and verify these predictions empirically (Figure 1).
Figure 1: Strategic approach for the analysis of non-coding genetic variants. Steps that are not included in the detailed protocol associated with this manuscript are shaded in grey. Please click here to view a larger version of this figure.
In many cases, it is important to begin by expanding the list of variants to include all those in high linkage-disequilibrium (LD) with each statistically associated variant. LD is a measure of non-random association of alleles at two different chromosomal positions, which can be measured by the r2 statistic7. r2 is a measure of the linkage disequilibrium between two variants, with an r2 = 1 denoting perfect linkage between two variants. Alleles in high LD are found to co-segregate on the chromosome across ancestral populations. Current genotyping arrays do not include all known variants in the human genome. Instead, they exploit the LD within the human genome and include a subset of the known variants that act as proxies for other variants within a particular region of LD8. Thus, a variant without any biological consequence may be associated with a particular disease because it is in LD with the causal variant-the variant with a meaningful biological effect. Procedurally, it is recommended to convert the latest release of the 1,000 genomes project9 variant call files (vcf) into binary files compatible with PLINK10,11, an open-source tool for whole genome association analysis. Subsequently, all other genetic variants with LD r2 >0.8 with each input genetic variant can be identified as candidates. It is important to use the appropriate reference population for this step- e.g., if a variant was identified in subjects of European ancestry, data from subjects of similar ancestry should be used for LD expansion.
LD expansion often results in dozens of candidate variants, and it is likely that only a small fraction of these contribute to disease mechanism. Often, it is infeasible to experimentally examine each of these variants individually. It is therefore useful to leverage the thousands of publically available functional genomic datasets as a filter to prioritize the variants. For example, the ENCODE consortium12 has performed thousands of ChIP-seq experiments describing the binding of TFs and co-factors, and histone marks in a wide range of contexts, along with chromatin accessibility data from technologies such as DNase-seq13, ATAC-seq14, and FAIRE-seq15. Databases and web servers such as the UCSC Genome Browser16, Roadmap Epigenomics17, Blueprint Epigenome18, Cistrome19, and ReMap20 provide free access to data produced by these and other experimental techniques across a wide range of cell types and conditions. When there are too many variants to examine experimentally, these data can be used to prioritize those located within likely regulatory regions in relevant cell and tissue types. Further, in cases where a variant is within a ChIP-seq peak for a specific protein, these data can provide potential leads as to the specific TF(s) or co-factors whose binding might be affecting.
Next, the resulting prioritized variants are screened experimentally to validate predicted genotype-dependent protein binding using EMSA21,22. EMSA measures the change in the migration of the oligo on a non-reducing TBE gel. Fluorescently labeled oligo is incubated with the nuclear lysate, and binding of nuclear factors will retard the movement of the oligo on the gel. In this manner, oligo that has bound more nuclear factors will present as a stronger fluorescent signal upon scanning. Notably, EMSA does not require predictions about the specific proteins whose binding will be affected.
Once variants are identified that are located within predicted regulatory regions and are capable of differentially binding nuclear factors, computational methods are employed to predict the specific TF(s) whose binding they might affect. We prefer to use CIS-BP23,24, RegulomeDB25, UniProbe26, and JASPAR27. Once candidate TFs are identified, these predictions can be specifically tested using antibodies against these TFs (EMSA-supershifts and DAPA-Westerns). An EMSA-supershift involves the addition of a TF-specific antibody to the nuclear lysate and oligo. A positive result in an EMSA-supershift is represented as a further shift in the EMSA band, or a loss of the band (reviewed in reference28). In the complementary DAPA, a 5'-biotinylated oligo duplex containing the variant and the 20 base-pair flanking nucleotides are incubated with nuclear lysate from relevant cell type(s) to capture any nuclear factors specifically binding the oligos. The oligo duplex-nuclear factor complex is immobilized by streptavidin microbeads in a magnetic column. The bound nuclear factors are collected directly through elution29,48. Binding predictions can then be assessed by a Western blot using antibodies specific for the protein. In cases where there are no obvious predictions, or too many predictions, the elutions from variant pull-downs of the DAPA experiments can be sent to a proteomics core to identify candidate TFs using mass-spectrometry, which can subsequently be validated using these previously described methods.
In the remainder of the article, the detailed protocol for EMSA and DAPA analysis of genetic variants is provided.
1. Preparation of Solutions and Reagents
Name | Sequence |
rs76562819_REF_FOR | GTAATGCCTTAATGAGAGAGAGTTAGTCATCTTCTCACTTC |
rs76562819_REF_REV | GAAGTGAGAAGATGACTAACTCTCTCTCATTAAGGCATTAC |
rs76562819_NONREF_FOR | GTAATGCCTTAATGAGAGAGGGTTAGTCATCTTCTCACTTC |
rs76562819_NONREF_REV | GAAGTGAGAAGATGACTAACCCTCTCTCATTAAGGCATTAC |
Table 1: Example EMSA/DAPA oligonucleotide design to test a SNP for differential binding. "REF" stands for the reference allele, while "NONREF" stands for the non-reference allele. "FOR" stands for the forward strand, while "REV" indicates its complement. The SNP is seen in red.
2. Preparation of Nuclear Lysate from Cultured Cells
Note: This experimental protocol was optimized using B-lymphoblastoid cell lines, but has been tested in several other unrelated adherent/ suspension cell lines and works universally.
3. Electrophoretic Mobility Shift Assay (EMSA)
Reagent | Final Conc. | Rxn #1 | Rxn #2 | Rxn #3 | Rxn #4 |
Ultrapure Water | to 20 µl vol. | 13.5 µl | 11.98 µl | 13.5µl | 11.98 µl |
10x Binding Buffer | 1x | 2 μl | 2 μl | 2 μl | 2 µl |
DTT/TW-20 | 1x | 2 μl | 2 μl | 2 μl | 2 μl |
Salmon Sperm DNA | 500 ng/μl | 0.5 μl | 0.5 μl | 0.5 μl | 0.5 μl |
1μg/μl Poly d(I-C) | 1 μg | 1 μl | 1 μl | 1 μl | 1 μl |
Nuclear Extract (5.26 ug/µl) | 8 μg | - | 1.52 μl | - | 1.52 μl |
NE Buffer | 1.52 μl | - | 1.52 μl | - | |
Reference allele oligo | 50 fmol | 1 μl | 1 μl | - | - |
Non-Reference allele oligo | 50 fmol | - | - | 1 μl | 1 μl |
Table 2: Example EMSA reaction setup. The table illustrates an example EMSA to test the hypothesis that there is genotype-dependent binding of TFs to a specific SNP.
4. DNA Affinity Purification Assay (DAPA)
In this section, representative results of what to expect are provided when performing an EMSA or DAPA, and the variability with regards to the quality of lysate is characterized. For example, it has been suggested that freezing and thawing protein samples multiple times may result in denaturation. In order to explore the reproducibility of EMSA analysis in the context of these "freeze-thaw" cycles, two 35 bp oligos differing at one genetic variant were incubated with a single bat...
Although advances in sequencing and genotyping technologies have greatly enhanced our capacity to identify genetic variants associated with disease, our ability to understand the functional mechanisms impacted by these variants is lagging. A major source of the problem is that many disease-associated variants are located in n on-coding regions of the genome, which likely affect harder-to-predict mechanisms controlling gene expression. Here, we present a protocol based on the EMSA and DAPA techniques, valuable molecular t...
The authors have nothing to disclose.
We thank Erin Zoller, Jessica Bene, and Lindsey Hays for input and direction in protocol development. MTW was supported in part by NIH R21 HG008186 and a Trustee Award grant from the Cincinnati Children's Hospital Research Foundation. ZHP was supported in part by T32 GM063483-13.
Name | Company | Catalog Number | Comments |
Custom DNA Oligonucleotides | Integrated DNA Technologies | http://www.idtdna.com/site/order/oligoentry | |
Potassium Chloride | Fisher Scientific | BP366-500 | KCl, for CE buffer |
HEPES (1 M) | Fisher Scientific | 15630-080 | For CE and NE buffer |
EDTA (0.5M), pH 8.0 | Life Technologies | R1021 | For CE, NE, and annealing buffer |
Sodium Chloride | Fisher Scientific | BP358-1 | NaCl, for NE buffer |
Tris-HCl (1M), pH 8.0 | Invitrogen | BP1756-100 | For annealing buffer |
Phosphate Buffered Saline (1x) | Fisher Scientific | MT21040CM | PBS, for cell wash |
DL-Dithiothreitol solution (1 M) | Sigma | 646563 | Reducing agent |
Protease Inhibitor Cocktail | Thermo Scientific | 87786 | Prevents catabolism of TFs |
Phosphatase Inhibitor Cocktail | Thermo Scientific | 78420 | Prevents dephosphorylation of TFs |
Nonidet P-40 Substitute | IBI Scientific | IB01140 | NP-40, for nuclear extraction |
BCA Protein Assay Kit | Thermo Scientific | 23225 | For measuring protein concentration |
Odyssey EMSA Buffer Kit | Licor | 829-07910 | Contains all necessary EMSA buffers |
TBE Gels, 6%, 12 Wells | Invitrogen | EC6265BOX | For EMSA |
TBE Buffer (10x) | Thermo Scientific | B52 | For EMSA |
FactorFinder Starting Kit | Miltenyi Biotec | 130-092-318 | Contains all necessary DAPA buffers |
Licor Odyssey CLx | Licor | Recommended scanner for DAPA/EMSA | |
Antibiotic-Antimycotic | Gibco | 15240-062 | Contains 10,000 units/ml of penicillin, 10,000 µg/ml of streptomycin, and 25 µg/ml of Fungizone® Antimycotic |
Fetal Bovine Serum | Gibco | 26140-079 | FBS, for culture media |
RPMI 1640 Medium | Gibco | 22400-071 | Contains L-glutamine and 25 mM HEPES |
Zapytaj o uprawnienia na użycie tekstu lub obrazów z tego artykułu JoVE
Zapytaj o uprawnieniaThis article has been published
Video Coming Soon
Copyright © 2025 MyJoVE Corporation. Wszelkie prawa zastrzeżone