Sign In

A subscription to JoVE is required to view this content. Sign in or start your free trial.

In This Article

  • Summary
  • Abstract
  • Introduction
  • Protocol
  • Representative Results
  • Discussion
  • Acknowledgements
  • Materials
  • References
  • Reprints and Permissions

Summary

This protocol guides bioinformatics beginners through an introductory CUT&RUN analysis pipeline that enables users to complete an initial analysis and validation of CUT&RUN sequencing data. Completing the analysis steps described here, combined with downstream peak annotation, will allow users to draw mechanistic insights into chromatin regulation.

Abstract

The CUT&RUN technique facilitates detection of protein-DNA interactions across the genome. Typical applications of CUT&RUN include profiling changes in histone tail modifications or mapping transcription factor chromatin occupancy. Widespread adoption of CUT&RUN is driven, in part, by technical advantages over conventional ChIP-seq that include lower cell input requirements, lower sequencing depth requirements, and increased sensitivity with reduced background signal due to a lack of cross-linking agents that otherwise mask antibody epitopes. Widespread adoption of CUT&RUN has also been achieved through the generous sharing of reagents by the Henikoff lab and the development of commercial kits to accelerate adoption for beginners. As technical adoption of CUT&RUN increases, CUT&RUN sequencing analysis and validation become critical bottlenecks that must be surmounted to enable complete adoption by predominantly wet lab teams. CUT&RUN analysis typically begins with quality control checks on raw sequencing reads to assess sequencing depth, read quality, and potential biases. Reads are then aligned to a reference genome sequence assembly, and several bioinformatics tools are subsequently employed to annotate genomic regions of protein enrichment, confirm data interpretability, and draw biological conclusions. Although multiple in silico analysis pipelines have been developed to support CUT&RUN data analysis, their complex multi-module structure and usage of multiple programming languages render the platforms difficult for bioinformatics beginners who may lack familiarity with multiple programming languages but wish to understand the CUT&RUN analysis procedure and customize their analysis pipelines. Here, we provide a single-language step-by-step CUT&RUN analysis pipeline protocol designed for users with any level of bioinformatics experience. This protocol includes completing critical quality checks to validate that the sequencing data is suitable for biological interpretation. We expect that following the introductory protocol provided in this article combined with downstream peak annotation will allow users to draw biological insights from their own CUT&RUN datasets.

Introduction

The ability to measure interactions between proteins and genomic DNA is fundamental to understanding the biology of chromatin regulation. Effective assays that measure chromatin occupancy for a given protein provide at least two key pieces of information: i) genomic localization and ii) protein abundance at a given genomic region. Tracking the recruitment and localization changes of a protein of interest in chromatin can reveal direct target loci of the protein and reveal mechanistic roles of that protein in chromatin-based biological processes such as regulation of transcription, DNA repair, or DNA replication. The techniques available today to profile protein-DNA in....

Protocol

NOTE: Information for CUT&RUN fastq files in GSE126612 are available in Table 1. Information related to the software applications used in this study are listed in the Table of Materials.

1. Downloading Easy-Shells_CUTnRUN pipeline from its Github page

  1. Open terminal from operating system.
    NOTE: If user is not sure how to open terminal in macOS and Windows, review this webpage (https://discovery.cs.illinois.edu/guides/Syst.......

Representative Results

Quality and adapter trimming retains reads with high sequencing quality
High-throughput sequencing techniques are prone to generating sequencing errors such as sequence 'mutations' in reads. Furthermore, sequencing adapter dimers can be enriched in sequencing datasets due to poor adapter removal during library preparation. Excessive sequencing errors, such as read mutations, generation of reads shorter than required for proper mapping, and enrichment of adapter dimers, can increase read map.......

Discussion

The ability to map protein occupancy on chromatin is fundamental to conducting mechanistic studies in the field of chromatin biology. As laboratories adopt new wet lab techniques to profile chromatin, the ability to analyze sequencing data from those wet lab experiments becomes a common bottleneck for wet lab scientists. Therefore, we describe an introductory step-by-step protocol to enable bioinformatics beginners to overcome the analysis bottleneck, and initiate analysis and quality control checks of their own CUT&.......

Acknowledgements

All illustrated figures were created with BioRender.com. CAI acknowledges support provided through an Ovarian Cancer Research Alliance Early Career Investigator Award, a Forbeck Foundation Accelerator Grant, and the Minnestoa Ovarian Cancer Alliance National Early Detection Research Award.

....

Materials

NameCompanyCatalog NumberComments
bedGraphToBigWigENCODEhttps://hgdownload.soe.ucsc.edu/admin/exe/Software to compress and convert readcounts bedGraph to bigWig
bedtools-2.31.1The Quinlan Lab @ the U. of Utahhttps://bedtools.readthedocs.io/en/latest/index.htmlSoftware to process bam/bed/bedGraph files
bowtie2 2.5.4Johns Hopkins Universityhttps://bowtie-bio.sourceforge.net/bowtie2/index.shtmlSoftware to build bowtie index and perform alignment
CollectInsertSizeMetrics (Picard)Broad institutehttps://github.com/broadinstitute/picardSoftware to perform insert size distribution analysis
CutadaptNBIShttps://cutadapt.readthedocs.io/en/stable/index.htmlSoftware to perform adapter trimming
Deeptoolsv3.5.1Max Planck Institutehttps://deeptools.readthedocs.io/en/develop/index.htmlSoftware to perform Pearson coefficient correlation analysis, Principal component analysis, and Heatmap/average plot analysis
FastQC Version 0.12.0Babraham Bioinformaticshttps://github.com/s-andrews/FastQCSoftware to check quality of fastq file
Intervenev0.6.1Computational Biology & Gene regulation - Mathelier grouphttps://intervene.readthedocs.io/en/latest/index.htmlSoftware to perform venn diagram analysis using peak files
MACSv2.2.9.1Chan Zuckerberg initiativehttps://github.com/macs3-project/MACS/tree/macs_v2Software to call peaks
MACSv3.0.2Chan Zuckerberg initiativehttps://github.com/macs3-project/MACS/tree/masterSoftware to call peaks
Samtools-1.21Wellcome Sanger Institutehttps://github.com/samtools/samtoolsSoftware to process sam/bam files
SEACRv1.3Howard Hughes Medial institutehttps://github.com/FredHutch/SEACRSoftware to call peaks
SRA Toolkit Release 3.1.1NCBIhttps://github.com/ncbi/sra-toolsSoftware to download SRR from GEO
Trim_Galore v0.6.10Babraham Bioinformaticshttps://github.com/FelixKrueger/TrimGaloreSoftware to perform quality and atapter trimming

References

  1. Hainer, S. J., Fazzio, T. G. High-resolution chromatin profiling using CUT&RUN. Curr Protoc Mol Biol. 126 (1), e85 (2019).
  2. Zhang, Y., et al. Model-based analysis of ChiP-Seq (MACS). Genome Biology. 9 (9), R137 (2008)....

Explore More Articles

Geneticscleavage under targets and release using nuclease CUT RUNprotein DNA interactionanalysisvalidation

This article has been published

Video Coming Soon

JoVE Logo

Privacy

Terms of Use

Policies

Research

Education

ABOUT JoVE

Copyright © 2024 MyJoVE Corporation. All rights reserved