Introductory Analysis and Validation of CUT&#38;RUN Sequencing Data

Junwoo Lee; Biji Chatterjee; Nakyung Oh; Dhurjhoti Saha; Yue Lu; Blaine Bartholomew; Charles A. Ishak

doi:10.3791/67359

A subscription to JoVE is required to view this content. Sign in or start your free trial.

Summary

This protocol guides bioinformatics beginners through an introductory CUT&RUN analysis pipeline that enables users to complete an initial analysis and validation of CUT&RUN sequencing data. Completing the analysis steps described here, combined with downstream peak annotation, will allow users to draw mechanistic insights into chromatin regulation.

Abstract

The CUT&RUN technique facilitates detection of protein-DNA interactions across the genome. Typical applications of CUT&RUN include profiling changes in histone tail modifications or mapping transcription factor chromatin occupancy. Widespread adoption of CUT&RUN is driven, in part, by technical advantages over conventional ChIP-seq that include lower cell input requirements, lower sequencing depth requirements, and increased sensitivity with reduced background signal due to a lack of cross-linking agents that otherwise mask antibody epitopes. Widespread adoption of CUT&RUN has also been achieved through the generous sharing of reagents by the Henikoff lab and the development of commercial kits to accelerate adoption for beginners. As technical adoption of CUT&RUN increases, CUT&RUN sequencing analysis and validation become critical bottlenecks that must be surmounted to enable complete adoption by predominantly wet lab teams. CUT&RUN analysis typically begins with quality control checks on raw sequencing reads to assess sequencing depth, read quality, and potential biases. Reads are then aligned to a reference genome sequence assembly, and several bioinformatics tools are subsequently employed to annotate genomic regions of protein enrichment, confirm data interpretability, and draw biological conclusions. Although multiple in silico analysis pipelines have been developed to support CUT&RUN data analysis, their complex multi-module structure and usage of multiple programming languages render the platforms difficult for bioinformatics beginners who may lack familiarity with multiple programming languages but wish to understand the CUT&RUN analysis procedure and customize their analysis pipelines. Here, we provide a single-language step-by-step CUT&RUN analysis pipeline protocol designed for users with any level of bioinformatics experience. This protocol includes completing critical quality checks to validate that the sequencing data is suitable for biological interpretation. We expect that following the introductory protocol provided in this article combined with downstream peak annotation will allow users to draw biological insights from their own CUT&RUN datasets.

Introduction

The ability to measure interactions between proteins and genomic DNA is fundamental to understanding the biology of chromatin regulation. Effective assays that measure chromatin occupancy for a given protein provide at least two key pieces of information: i) genomic localization and ii) protein abundance at a given genomic region. Tracking the recruitment and localization changes of a protein of interest in chromatin can reveal direct target loci of the protein and reveal mechanistic roles of that protein in chromatin-based biological processes such as regulation of transcription, DNA repair, or DNA replication. The techniques available today to profile protein-DNA interactions are enabling researchers to explore regulation at unprecedented resolution. Such technical advances have been enabled through the introduction of new chromatin profiling techniques that include the development of Cleavage Under Targets and Release Using Nuclease (CUT&RUN) by the Henikoff laboratory. CUT&RUN offers several technical advantages over conventional chromatin immunoprecipitation (ChIP) that include lower cell input requirements, lower sequencing depth requirements, and increased sensitivity with reduced background signal due to a lack of cross-linking agents that otherwise mask antibody epitopes. Adopting this technique to study chromatin regulation requires a thorough understanding of the principle underlying the technique, and an understanding of how to analyze, validate, and interpret CUT&RUN data.

The CUT&RUN procedure begins with binding of cells to Concanavalin A conjugated to magnetic beads to enable manipulation of low cell numbers throughout the procedure. Isolated cells are permeabilized using a mild detergent to facilitate introduction of an antibody that targets the protein of interest. Micrococcal nuclease (MNase) is then recruited to the bound antibody using a Protein A or Protein A/G tag tethered to the enzyme. Calcium is introduced to initiate enzymatic activity. MNase digestion results in mono-nucleosomal DNA-protein complexes. Calcium is subsequently chelated to end the digestion reaction, and short DNA fragments from the MNase digestion are released from nuclei, then subjected to DNA purification, library preparation, and high-throughput sequencing¹ (Figure 1).

In silico approaches to map and quantify protein occupancy across the genome have developed in parallel with the wet lab approaches used to enrich those DNA-protein interactions. Identification of regions of enriched signals (peaks) is one of the most critical steps in the bioinformatics analysis. Initial ChIP-seq analysis methods used algorithms such as MACS² and SICER³, which employed statistical models to distinguish bona fide protein-DNA binding sites from background noise. However, the lower background noise and higher resolution of CUT&RUN data render some peak calling programs employed in ChIP-seq analysis unsuitable for CUT&RUN analysis⁴. This challenge highlights the need for new tools better suited for the analysis of CUT&RUN data. SEACR⁴ represents one such tool recently developed to enable peak calling from CUT&RUN data while overcoming limitations associated with tools typically employed toward ChIP-seq analysis.

Biological interpretations from CUT&RUN sequencing data are drawn from the outputs downstream of peak calling in the analysis pipeline. Several functional annotation programs can be implemented to predict the potential biological relevance of the called peaks from CUT&RUN data. For example, the Gene Ontology (GO) project provides well-established functional identification of genes of interest⁵^,⁶^,⁷. Various software tools and resources facilitate GO analysis to reveal genes and gene sets enriched amongst CUT&RUN peaks⁸^,⁹^,¹⁰^,¹¹^,¹²^,¹³^,¹⁴. Furthermore, visualization software such as Deeptools¹⁵, Integrative genomics viewer (IGV)¹⁶, and UCSC Genome Browser¹⁷ enable visualization of signal distribution and patterns at regions of interest across the genome.

The ability to draw biological interpretations from CUT&RUN data depends critically upon validation of data quality. Critical components to validate include the assessment of: i) CUT&RUN library sequencing quality, ii) replicate similarity, and iii) signal distribution at peak centers. Completing the validation of all three components is crucial to ensure the reliability of CUT&RUN library samples and downstream analysis results. Therefore, it is essential to establish introductory CUT&RUN analysis guides to enable bioinformatics beginners and wet lab researchers to conduct such validation steps as part of their standard CUT&RUN analysis pipelines.

Alongside the development of wet lab CUT&RUN experiment, various in silico CUT&RUN analysis pipelines, such as CUT&RUNTools 2.0¹⁸^,¹⁹, nf-core/cutandrun²⁰, and CnRAP²¹, have been developed to support CUT&RUN data analysis. These tools provide powerful approaches to analyzing single-cell and bulk CUT&RUN and CUT&Tag datasets. However, the relatively complex modular program structure and required familiarity with multiple programming languages to conduct these analysis pipelines may hinder adoption by bioinformatics beginners who seek to thoroughly understand the CUT&RUN analysis steps and customize their own pipelines. Circumvention of this barrier requires a new introductory CUT&RUN analysis pipeline that is provided in simple step-by-step scripts encoded using a simple single programming language.

In this article, we describe a simple single-language CUT&RUN analysis pipeline protocol that provides step-by-step scripts supported with detailed descriptions to enable new and novice users to conduct CUT&RUN sequencing analysis. Programs used in this pipeline are publicly available by the original developer groups. Major steps described in this protocol include read alignment, peak calling, functional analysis and, most critically, validation steps to assess sample quality to determine data suitability and reliability for biological interpretation (Figure 2). Furthermore, this pipeline provides users the opportunity to cross-reference analysis results against publicly available CUT&RUN datasets. Ultimately, this CUT&RUN analysis pipeline protocol serves as an introductory guide and reference for bioinformatic analysis beginners and wet lab researchers.

Access restricted. Please log in or start a trial to view this content.

Protocol

NOTE: Information for CUT&RUN fastq files in GSE126612 are available in Table 1. Information related to the software applications used in this study are listed in the Table of Materials.

1. Downloading Easy-Shells_CUTnRUN pipeline from its Github page

Open terminal from operating system.
NOTE: If user is not sure how to open terminal in macOS and Windows, review this webpage (https://discovery.cs.illinois.edu/guides/System-Setup/terminal/). For Linux, review this webpage (https://www.geeksforgeeks.org/how-to-open-terminal-in-linux/).
Download the compressed analysis pipeline from Github by typing wget https://github.com/JunwooLee89/Easy-Shells_CUTnRUN/archive/refs/heads/main.zip -O ~/Desktop/Easy-Shells_CUTnRUN.zip in the terminal.
After downloading the zip file, decompress the downloaded zip file by typing unzip ~/Desktop/Easy-Shells_CUTnRUN.zip -d ~/Desktop/ in the terminal.
After the decompression, delete the zip file by typing rm ~/Desktop/Easy-Shells_CUTnRUN.zip in the terminal and change the folder name by typing mv ~/Desktop/Easy-Shells_CUTnRUN-master ~/Desktop/Easy-Shells_CUTnRUN.
After removing the zipped file, type chmod +x ~/Desktop/Easy-Shells_CUTnRUN/script/*.sh in the terminal to set the executable permission for all the shell scripts within the working directory. From now on, just type the path and the name of these shell scripts in terminal or drag the scripts into terminal and enter to run these shell scripts in terminal.
NOTE: The Bash shell is typically pre-installed on most Linux distributions. However, recent macOS versions no longer provide pre-installed Bash shell. If the system does not have Bash, install Bash shell first. Visit links below for instructions describing how to install the Bash shell in Linux OS (https://ioflood.com/blog/install-bash-shell-linux/) and macOS (https://www.cs.cornell.edu/courses/cs2043/2024sp/styled-3/#:~:text=The%20first%20thing%20you%20will,you%20will%20see%20the%20following:). These step-by-step shell scripts are written to create one folder ~/Desktop/GSE126612 to perform most of this CUT&RUN analysis within this directory without any need for modification. If the user understands how to use these shell scripts, users may revise and customize these shell scripts to analyze other CUT&RUN datasets and modify options as per project-specific needs. To read and edit these shell scripts, consider using Visual studio Code (https://code.visualstudio.com/) as one option for an easy-to-use program available for major Operating Systems.

2. Installing the programs required for Easy Shells CUTnRUN

Among the shell scripts with the name of Script_01_installation_***.sh, find out the shell script of which name includes operating system type of user's system. Currently, Easy Shells CUTnRUN supports the installation script for macOS, Debian/Ubuntu, and CentOS/RPM-based systems.
Open the terminal and type echo $SHELL to check the default shell in the active terminal. If Bash shell is the default shell in current terminal, users may see the following: /path/to/bash (or a similar message such as /bin/bash) in the terminal.
If the default shell is not Bash, set Bash shell as default shell by typing chsh -s $(which bash) in the terminal. If the terminal uses Bash shell as default, skip this step.
In the terminal, operate the installation shell script by typing ~/Desktop/Easy-Shells_CUTnRUN/scripts/Script_01_installation_***.sh or drag the shell script file into the terminal and enter.
Read Test_README.md file in /path/to/SEACR-1.3/Testfiles folder. Follow the instruction within the README file to clarify whether the SEACR in user's system works properly.
NOTE: It is crucial to validate the SEACR function with test files given by SEACR Github page to obtain proper peak calling results from CUT&RUN data. Therefore, follow the instruction of Test_README.md in /path/to/SEACR-1.3/Testfiles immediately after the SEACR installation. Although Easy Shells CUTnRUN provides installation shell scripts for some operating systems, these scripts may not work in some users' system to install all the programs required for Easy Shells CUTnRUN. If there is any problem in installation, review the original website of the uninstalled program, or request assistance using the Easy Shells CUTnRUN github issues webpage (https://github.com/JunwooLee89/Easy-Shells_CUTnRUN/issues).

3. Downloading the publicly available CUT&RUN dataset from Sequence Read Archive (SRA)

Open the terminal and type echo $SHELL to check the default shell in the active terminal. If Bash shell is the default shell in current terminal, users may see the following: /path/to/bash (or a similar message such as /bin/bash) in the terminal.
If the default shell is not Bash, set Bash shell as default shell by typing chsh -s $(which bash) in the terminal. If the terminal uses Bash shell as default, skip this step.
Type ~/Desktop/Easy-Shells_CUTnRUN/scripts/Script_02_download-fastq.sh in the terminal or drag the shell script file into the terminal and enter.
NOTE:This script will: (i) Create one folder (~/Desktop/GSE126612/fastq) and download a list of SRA files written in a text file (~/Desktop/Easy-Shells_CUTnRUN/sample_info/SRR_list.txt) within the fastq folder. As an example, the SRR_list.txt includes the fastq files of subset of GSE126612 CUT&RUN samples. (ii) Download the raw fastq files within the fastq folder. (iii) Create one folder (~/Desktop/GSE126612/log/fastq) and write down a log file (download-fastq_log.txt) and downloaded sample information file (SRR_list_info.txt) within this log folder.
After running the script, check the log file. If there is any error message within the log file, fix the error and try step 3.3 again. If there is any problem to solve the issue, ask help in Easy Shells CUTnRUN github issues webpage (https://github.com/JunwooLee89/Easy-Shells_CUTnRUN/issues).
NOTE: To facilitate practice of this CUT&RUN analysis pipeline, the following publicly-available samples are retrieved from SRA: one sample from mock control (IgG), three samples of a chromatin architecture and transcription factor protein (CTCF), four samples corresponding to an 'active' histone mark (H3K27Ac), and three samples corresponding to regions of transcriptional initiation marked by RNA polymerase II (RNAPII-S5P). Sequencing was performed as paired-end, therefore two files are paired per sample.

4. Initial quality check for the raw sequencing files

Open the terminal and type echo $SHELL to check the default shell in the active terminal. If Bash shell is the default shell in current terminal, users may see the following: /path/to/bash (or a similar message such as /bin/bash) in the terminal.
If the default shell is not Bash, set Bash shell as default shell by typing chsh -s $(which bash) in the terminal. If the terminal uses Bash shell as default, skip this step.
Type ~/Desktop/Easy-Shells_CUTnRUN/scripts/Script_03_fastQC.sh in the terminal or drag the shell script into the terminal and enter.
NOTE: This shell script will: (i) Run FastQC program for all raw fastq files in ~/Desktop/GSE126612/fastq folder and save the quality check report files into the ~/Desktop/GSE126612/fastqc.1st folder. (ii) Write down a log file (fastqc.1st.log.SRR-number.txt) per a FastQC run into a log folder (~/Desktop/GSE126612/log/fastqc.1st).
After completion of running the shell script, review the log file to clarify the success of the run. If there is any error message within the log file, correct the error and repeat step 4.3. If there is any problem to solve the issue, request assistance using the Easy Shells CUTnRUN github issues webpage (https://github.com/JunwooLee89/Easy-Shells_CUTnRUN/issues).
NOTE: Among the output files, fastqc.html files include user-friendly quality check results. If there is severe quality issue(s), discuss with bioinformatics colleagues to determine data suitability for downstream analysis. Similar quality control reports are used to confirm improved data quality after adapter trimming. To use this script for other datasets, edit the path of the working and output directories to meet user's needs. A notable difference when interpreting QC of CUT&RUN compared to ChIP-seq reads is that duplicate reads in CUT&RUN do not necessarily indicate PCR duplicates. This is because recruited MNase will digest in the same or similar locations within experimental groups.

5. Quality and adapter trimming for raw sequencing files

Open the terminal and type echo $SHELL to check the default shell in the active terminal. If Bash shell is the default shell in current terminal, users may see the following: /path/to/bash (or a similar message such as /bin/bash) in the terminal.
If the default shell is not Bash, set Bash shell as default shell by typing chsh -s $(which bash) in the terminal. If the terminal uses Bash shell as default, skip this step.
Type ~/Desktop/Easy-Shells_CUTnRUN/scripts/Script_04_trimming.sh in the terminal or drag the Script_04_trimming.sh script into the terminal and enter.
NOTE: This shell script will: (i) Run the Trim-Galore program for all the raw fastq files in ~/Desktop/GSE126612/fastq to perform adapter and quality trimming. (ii) Create one folder (~/Desktop/GSE126612/trimmed) and save the Trim-Galore output files within the trimmed folder. (iii) Create one log folder (~/Desktop/GSE126612/log/trim_galore) and write down a log file trim_galore_log_RSS-number.txt per a Trim-Galore run.
After completion of the run, review the log file carefully. If there is any error message within the log file, correct the error and repeat step 5.3. If there is any problem to solve the issue, request assistance using the Easy Shells CUTnRUN github issues webpage (https://github.com/JunwooLee89/Easy-Shells_CUTnRUN/issues).
After completion of this process, compare the .html output files with the fastqc.html files created in 4.3. Revise the path of input and output directories to perform the trimming step for any fastq files located elsewhere.

6. Downloading the bowtie2 index for the reference genomes for actual and spike-in control samples

Open the terminal and type echo $SHELL to check the default shell in the active terminal. If Bash shell is the default shell in current terminal, users may see the following: /path/to/bash (or a similar message such as /bin/bash) in the terminal.
If the default shell is not Bash, set Bash shell as default shell by typing chsh -s $(which bash) in the terminal. If the terminal uses Bash shell as default, skip this step.
Type ~/Desktop/Easy-Shells_CUTnRUN/scripts/Script_05_bowtie2-index.sh in the terminal or drag the shell script into the terminal and enter.
NOTE: This script will: (i) Download Bowtie2 indexes for actual sample reference genomes (human; hg19; used in original publication²²) and Spike-in control reference genomes (budding yeast; R64-1-1) into bowtie2-index folder (~/Desktop/Easy-Shells_CUTnRUN/bowtie2-index). (iii) Write down a log file (bowtie2-index-log.txt) into a log directory (~/Desktop/GSE126612/log/bowtie2-index).
After completion of the run, check the log file. If there is any error message, correct the error and repeat step 6.3. If there is any problem to solve the issue, request assistance using the Easy Shells CUTnRUN github issues webpage (https://github.com/JunwooLee89/Easy-Shells_CUTnRUN/issues).
NOTE: Currently, Bowtie2 indexes for various reference genomes are provided in Bowtie2 website (https://bowtie-bio.sourceforge.net/bowtie2/manual.shtml). Users can edit Script_05_bowtie2-index.sh to download any Bowtie2 index to meet the user's requirement. If the user cannot locate the Bowtie2 index of reference genome of interest, locate the reference genome sequence fasta files from:
1. Ensembl ftp (https://ftp.ensembl.org/pub/current_fasta/)
2. UCSC webpage (https://hgdownload.soe.ucsc.edu/downloads.html)
3. or other species-specific databases.
  After locating the reference genome sequence fasta files, create a Bowtie2 index for the downloaded reference genome by following "The bowtie2-build indexer" section (https://bowtie-bio.sourceforge.net/bowtie2/manual.shtml#the-bowtie2-build-indexer) of the Bowtie2 website.

7. Mapping trimmed CUT&RUN sequencing reads to the reference genomes

Open the terminal and type echo $SHELL to check the default shell in the active terminal. If Bash shell is the default shell in current terminal, users may see the following: /path/to/bash (or a similar message such as /bin/bash) in the terminal.
If the default shell is not Bash, set Bash shell as default shell by typing chsh -s $(which bash) in the terminal. If the terminal uses Bash shell as default, skip this step.
Type ~/Desktop/Easy-Shells_CUTnRUN/scripts/Script_06_bowtie2-mapping.sh in the terminal or drag the shell script file into the terminal and enter.
NOTE: This shell script will: (1) Run the bowtie2 program to map all the adapter and quality trimmed fastq files to both experimental (human; hg19) and spike-in control (budding yeast; R64-1-1) reference genomes independently. (ii) Run samtools view function to compress the mapped read pairs files as bam format. (iii) Create one folder (~/Desktop/GSE126612/bowtie2-mapped) and save the compressed mapped read pairs file within the bowtie2-mapped folder. (iv) Create one folder (~/Desktop/GSE126612/log/bowtie2-mapped) and write down the log of mapping process as text file bowtie2_log_hg19_SRR-number.txt for read pairs mapped on hg19 reference genome and bowtie2_log_R64-1-1_SRR-number.txt for read pairs mapped on R64-1-1) to indicate mapping efficiency within the bowtie2-mapping log folder.
After completion of the run, check the log file. If there is any error message within the log file, correct the error and run the shell script again. If there is any problem to solve the issue, request assistance using the Easy Shells CUTnRUN github issues webpage (https://github.com/JunwooLee89/Easy-Shells_CUTnRUN/issues).
NOTE: This shell script runs bowtie2 with options to map paired-end sequencing files to find concordantly mapped read pairs with 10 bp-700 bp fragment lengths. Discover option descriptions by typing bowtie2 --help in terminal or by visiting the bowtie2 website (https://bowtie-bio.sourceforge.net/bowtie2/manual.shtml#the-bowtie2-aligner) to understand and change options as needed. Use this shell script to map any other fastq files by changing the path and name format of the fastq files and Bowtie2 indexes.

8. Sorting and filtering the mapped read pairs files

Open the terminal and type echo $SHELL to check the default shell in the active terminal. If Bash shell is the default shell in current terminal, users may see the following: /path/to/bash (or a similar message such as /bin/bash) in the terminal.
If the default shell is not Bash, set Bash shell as default shell by typing "chsh -s $(which bash)" in the terminal. If the terminal uses Bash shell as default, skip this step.
Type ~/Desktop/Easy-Shells_CUTnRUN/scripts/Script_07_filter-sort-bam.sh in the terminal or drag the shell script file into the terminal and enter.
NOTE: This script will: (i) Run samtools view function for all the compressed mapped read pair files in ~/Desktop/GSE126612/bowtie2-mapped folder to filter out read pairs mapped at non-canonical chromosome regions, publicly annotated blacklist and TA repeat regions. (ii) Execute the samtools sort function to sort the filtered bam files by fragments' names or coordinate within the same directory. (iii) Write down a log file per an input bam file in ~/Desktop/GSE126612/log/filter-sort-bam directory.
After completion of the run, review the log files carefully. If there is any error message in the log files, correct the error and try to run the shell script again. If there is any problem to solve the issue, request assistance using the Easy Shells CUTnRUN github issues webpage (https://github.com/JunwooLee89/Easy-Shells_CUTnRUN/issues).
NOTE: The resulting bam files (output) sorted by fragments' names will serve as input files to create fragment BED and raw readcounts bedGraph files. The bam files sorted by coordinate will serve as input files to generate fragment BEDPE files. All the BED, bedGraph, and BEDPE will be used for peak calling and visualization in downstream analysis.All the annotation bed files for canonical chromosome regions (chr1~22, chrX, chrY and chrM), publicly annotated blacklist regions²³ and TA repeat regions¹⁸ are located in ~/Desktop/Easy-Shells_CUTnRUN/blacklist directory. If needed, use this directory to add additional blacklist files. Use this shell script to perform same functions for other mapped read pairs bam files by changing path and the name of the bam files. Type samtools view --help and samtools sort --help in terminal for more description about these functions.

9. Convert mapped read pairs to fragment BEDPE, BED and raw readcounts bedGraph files

Open the terminal and type echo $SHELL to check the default shell in the active terminal. If Bash shell is the default shell in current terminal, users may see the following: /path/to/bash (or a similar message such as /bin/bash) in the terminal.
If the default shell is not Bash, set Bash shell as default shell by typing chsh -s $(which bash) in the terminal. If the terminal uses Bash shell as default, skip this step.
Type ~/Desktop/Easy-Shells_CUTnRUN/scripts/Script_08_bam-to-BEDPE-BED-bedGraph.sh in the terminal or drag the shell script file into the terminal and enter.
NOTE: This script will: (i) Run macs3 filterdup and awk function to convert bam files sorted by coordinate to fragment BEDPE files of which fragment lengths are shorter than 1kb, and save the BEDPE files in ~/Desktop/GSE126612/BEDPE. (ii) Create a log directory (~/Desktop/GSE126612/log/bam-to-BEDPE) and write down a log file per mapped reads fragments file. (iii) Run bedtools bamtobed and awk, cut, sort functions to convert bam files sorted by fragments' names to fragment BED files of which fragment lengths are shorter than 1 kb. (iv) Create one folder (~/Desktop/GSE126612/bam-to-bed) and save the fragment BED files within the bam-to-bed folder. (v) Write down a log file per mapped reads fragments BED file into a log directory (~/Desktop/GSE126612/log/bam-to-bed). (vi) Execute bedtools genomecov function to generate raw readcounts bedGraph files using the fragment BED files in one folder (~/Desktop/GSE126612/bedGraph).
After completion of the run, check the log files carefully. If there is any problem to solve the issue, request assistance using the Easy Shells CUTnRUN github issues webpage (https://github.com/JunwooLee89/Easy-Shells_CUTnRUN/issues).
NOTE: Output raw readcounts bedGraph files will be used as input files for the SEACR peak caller program with normalization option in section 12 and scaled fractional readcount (SFRC) normalization²² in section 10. The fragment BED files will serve as input files for Spike-in normalized Reads Per Million mapped reads in the negative Control (SRPMC) normalization²⁴^,²⁵ in section 10.To capture short fragments (>100 bp) only for CUT&RUN data of chromatin-associated factors, change the fragment filtration step in this script and proceed normalization step. To compare the CUT&RUN signals between short and regular size fragments within same sample, SFRC normalization may be helpful to reduce potential down-sampling effect caused by capturing short fragments only. Use this shell script to perform the same processes for other paired-end sequenced sorted bam files by changing the path and name format of bam and bed files.

10. Converting raw readcounts bedGraph files to normalized bedGraph and bigWig files

Open the terminal and type echo $SHELL to check the default shell in the active terminal. If Bash shell is the default shell in current terminal, users may see the following: /path/to/bash (or a similar message such as /bin/bash) in the terminal.
If the default shell is not Bash, set Bash shell as default shell by typing chsh -s $(which bash) in the terminal. If the terminal uses Bash shell as default, skip this step.
Type ~/Desktop/Easy-Shells_CUTnRUN/scripts/Script_09_normalization_SFRC.sh in the terminal or drag the shell script file into the terminal and enter.
NOTE: This script is written to: (i) Run for-loop with awk function to create SFRC normalized bedGraph files using raw readcounts bedGraph files within ~/Desktop/GSE126612/bedGraph. (ii) Execute bedGraphToBigWig function to create compressed format (.bw) of the SFRC normalized bedGraph files in ~/Desktop/GSE126612/bigWig. (iii) Write down one log file to record the normalization factor used for the SFRC calculation per a run and save the log file within ~/Desktop/GSE126612/log/SFRC.
After completion of the run, check the log files. If there is any error message, correct the error and run the shell script again. If there is any problem to solve the issue, request assistance using the Easy Shells CUTnRUN github issues webpage (https://github.com/JunwooLee89/Easy-Shells_CUTnRUN/issues).
NOTE: The scaled fractional readcount normalization was used in original publication²² of GSE126612 CUT&RUN dataset. The formula of the normalization at bin i is same as below:

Since this normalization method does not include normalization with negative control (for example, IgG sample) nor spike-in control, this approach may not be ideal to observe genome-wide signal difference between samples. However, since this method is theoretically similar with other total readcounts-based normalization (for example, Count Per Million), it would be good enough to observe local signal difference between samples.
Type ~/Desktop/Easy-Shells_CUTnRUN/scripts/Script_09_normalization_SRPMC.sh in the terminal or drag the shell script file into the terminal and enter.
NOTE: This script will: (i) Run for-loop with bedtools genomecov function to create SRPMC normalized bedgraph files in ~/Desktop/GSE126612/bedGraph using fragment BED files in ~/Desktop/GSE126612/bam-to-bed. (ii) Write down a log file to record the normalization factors used for the SRPMC normalization per a run in ~/Desktop/GSE126612/log/SRPMC. (iii) Execute bedGraphToBigWig function to create compressed format (.bw) of the normalized bedGraph files and save the normalized bigWig files into ~/Desktop/GSE126612/bigWig folder.
After completion of the run, review the log files carefully. If there is any error message within the log files, correct the error and run the shell script again. If there is any problem to solve the issue, request assistance using the Easy Shells CUTnRUN github issues webpage (https://github.com/JunwooLee89/Easy-Shells_CUTnRUN/issues).
NOTE: The formula of SRPMC normalization was developed to normalize actual sample readcounts with both negative control (IgG sample, for example) and spike-in control by combining RPM (Reads Per Million mapped reads) normalization factor, RPS (ratio Reads Per Spike-in read) and relative signal ratio to control²⁴^,²⁵. The definition of RPS is same as below:

By applying RPS for both actual sample and negative control sample, the relative signal ratio (RS) to control for actual sample can be calculated as below:

And the definition of RPM normalization factor (RPM:NF) is same as below:

From here, SRPMC normalization factor (SRPMC:NF) has come out by combining the RS and RPM:NF together:

And this formula can be simplified as below:

Therefore, SRPMC method normalizes reads by the (1) ratio of spike-in reads between control and sample, and (2) RPM normalized control reads. Since this normalization factor considers spike-in reads and makes control reads comparable between samples together, this method would be appropriate to observe genome-wide difference between samples and reduce batch effect in total reads of actual samples and controls in different batch experiments. These normalized bedGraph files will become input files to call peaks using SEACR in section 11. And these normalized bigWig files will be used in loci visualization through IGV and creating heatmap and average plot via Deeptools. It is strongly suggested using a genome browser to visualize the landscape pattern of the CUT&RUN dataset using the normalized bigWig files at representative genomic regions to evaluate data quality. CUT&RUN samples that display noisy background signal patterns that resemble the IgG control are likely appropriate to omit for downstream analyses. Use these shell scripts to normalize other reads bed files and raw readcounts bedGraph files by changing the path and file names for both input and output bed and bedgraph files. Edit these scripts to apply other normalization calculations by changing the factors and formula within this script.

11. Validating fragment size distribution

Open the terminal and type echo $SHELL to check the default shell in the active terminal. If Bash shell is the default shell in current terminal, users may see the following: /path/to/bash (or a similar message such as /bin/bash) in the terminal.
If the default shell is not Bash, set Bash shell as default shell by typing chsh -s $(which bash) in the terminal. If the terminal uses Bash shell as default, skip this step.
Type ~/Desktop/Easy-Shells_CUTnRUN/scripts/Script_10_insert-size-analysis.sh in the terminal or drag the shell script file into the terminal and enter.
NOTE: This script is written to: (i) Run picard.jar CollectInsertSizeMetrics function using mapped read pairs bam files in ~/Desktop/GSE126612/filtered-bam folder to identify insert size distribution. (ii) Create one folder (~/Desktop/GSE126612/insert-size-distribution) and save the insert size distribution analysis results into the created folder. (iii) Write down a log file per an input bam file in ~/Desktop/GSE126612/log/insert-size-distribution folder.
After completion of the run, check the log files carefully. If there is any error message within the log files, correct the error and try to run the shell script again. If there is any problem to solve the issue, request assistance using the Easy Shells CUTnRUN github issues webpage (https://github.com/JunwooLee89/Easy-Shells_CUTnRUN/issues).
NOTE: In general, the insert size analysis (Output) for CUT&RUN samples shows major peaks at mono- (100-300 bp) and di- (300-500 bp) nucleosomal size ranges. Technical errors/limitations (such as over-/under-digestion of MNase during CUT&RUN sample preparation or improper size-selection during library preparation) can cause the enrichment of same or larger than tri-nucleosomal (500-700 bp) and same or shorter than sub-nucleosomal (<100 bp) fragments.Sometimes the absence of mono-nucleosomal size peaks with the enrichment of the long (>500 bp) and short fragments (<100 bp) may be due to library size selection ranges chosen at the wet lab stage, or low sequencing depth. Compare sequencing depth ('total sequenced bases' / 'total reference genome size'), genomic landscape overview using normalized readcounts bigWig files in section 10, and the insert size distribution pattern together to clarify the quality of processed CUT&RUN samples. The dashed lines in the histograms represent the 'cumulative fraction' of reads with an insert size greater than or equal to the value on the x-axis. This dashed line enables identification of the distribution of insert sizes in input mapped reads file. Progression along the x-axis is associated with increasing insert size. The dashed line identifies the proportion of mapped read pairs in the input bam file that have an insert size at least as large as indicated on the intersecting x-axis position. Therefore, interpretation begins at 1 on the left, indicating all reads have an insert size greater than or equal to the smallest size, and decreases towards 0 as the insert size increases.

12. Calling peaks using MACS2, MACS3 and SEACR

Open the terminal and type echo $SHELL to check the default shell in the active terminal. If Bash shell is the default shell in current terminal, users may see the following: /path/to/bash (or a similar message such as /bin/bash) in the terminal.
If the default shell is not Bash, set Bash shell as default shell by typing chsh -s $(which bash) in the terminal. If the terminal uses Bash shell as default, skip this step.
Type ~/Desktop/Easy-Shells_CUTnRUN/scripts/Script_11_peak-calling_MACS.sh in the terminal or drag the shell script file into the terminal and enter.
NOTE: This script is written to: (i) Run macs2 callpeak and macs3 callpeak functions with and without IgG control using fragment BEDPE files to call peaks and save the peak calling results in output directories (~/Desktop/GSE126612/MACS2 and ~/Desktop/GSE126612/MACS3). (ii) Write down the log of these peak callings as text file in log directory (~/Desktop/GSE126612/log/MACS2 and ~/Desktop/GSE126612/log/MACS3)
Type ~/Desktop/Easy-Shells_CUTnRUN/scripts/Script_11_peak-calling_SEACR.sh in the terminal or drag the shell script file into the terminal and enter.
NOTE: This script is written to: (i) Run SEACR_1.3.sh script with and without IgG control, with stringent and relaxed options using raw readcounts bedGraph and normalized bedGraph files to call peaks. (ii) Create output directory (~/Desktop/GSE126612/SEACR-peaks) and save the peak calling results by SEACR. (iii) Write down the log of these peak callings as text file in log directory (~/Desktop/GSE126612/log/SEACR).
After completion of running shell scripts, check the log files carefully. If there is any error message in the log files, correct the error first. Some programs may not call peaks for IgG control sample with IgG control option together, thus, omit the error message regarding IgG control sample with IgG control option. If there is any problem to solve the issue, request assistance using the Easy Shells CUTnRUN github issues webpage (https://github.com/JunwooLee89/Easy-Shells_CUTnRUN/issues).
NOTE: These two shell scripts perform peak calling for CUT&RUN samples using three peak callers (MACS2, MACS3 and SEACR) with various options: with/without IgG control option, using raw readcounts bedGraph files with normalization option of peak caller or normalized readcounts bedGraph files without normalization option of peak caller, and stringent and relaxed SEACR peak calling options. Since the peak calling output files are not sufficient to be used directly in downstream analyses, Easy Shells CUTnRUN includes one script to process these called peaks output files to create new peak files which includes chromosome, start, end and name of peaks.Through intensive peak calling approaches, Easy Shells CUTnRUN provides an opportunity to choose the peak calling program best suited for a user's CUT&RUN project by comparing the peaks called across three peak callers. In addition, this CUT&RUN analysis pipeline also provides an opportunity to select peak calling options best suited for a user's CUT&RUN project. These comparisons will be done by Venn diagram, and visualization as heatmap and average plot.

13. Creating called peak bed files

Open the terminal and type echo $SHELL to check the default shell in the active terminal. If Bash shell is the default shell in current terminal, users may see the following: /path/to/bash (or a similar message such as /bin/bash) in the terminal.
If the default shell is not Bash, set Bash shell as default shell by typing chsh -s $(which bash) in the terminal. If the terminal uses Bash shell as default, skip this step.
Type ~/Desktop/Easy-Shells_CUTnRUN/scripts/Script_12_make-peak-bed.sh in the terminal or drag the shell script file into the terminal and enter.
NOTE: This script is written to: (i) Run awk function using bed files in ~/Desktop/GSE126612/SEACR folder to create two types of SEACR peak bed files ~/Desktop/GSE126612/peak-bed_SEACR folder. The whole peak bed files include the start and end of each peak, and the focused peak bed files include the start and bed of highest signal bin within each peak. (ii) Run awk function using ***_peaks.xls files in ~/Desktop/GSE126612/MACS2 and ~/Desktop/GSE126612/MACS3 folders to create whole peak bed files which includes start and end of the each peak called by MACS2 and MACS3 in ~/Desktop/GSE126612/peak-bed_MACS2 and ~/Desktop/GSE126612/peak-bed_MACS3 folders. (iii) Run awk function using ***_summits.bed files in ~/Desktop/GSE126612/MACS2 and ~/Desktop/GSE126612/MACS3 folders to create focused peak bed files which includes start and end of the most significant bin within each peak. (iv) Log files are written in text file format in ~/Desktop/GSE126612/log/peak-bed folder.
Type ~/Desktop/Easy-Shells_CUTnRUN/scripts/Script_13_filter-peaks.sh in the terminal or drag the shell script file into the terminal and enter.
NOTE: This script is written to: (i) Run bedtools intersect function using peak bed files which are called without IgG control option to remove peaks overlapped with IgG control peaks. (ii) The filtered peak bed files are saved in ~/Desktop/GSE126612/peak-bed-filtered_MACS2, ~/Desktop/GSE126612/peak-bed-filtered_MACS3, and ~/Desktop/GSE126612/peak-bed-filtered_SEACR folders. (iii) A log file log_filter-peaks.txt is created in ~/Desktop/GSE126612/log/filter-peaks folder.
Type ~/Desktop/Easy-Shells_CUTnRUN/scripts/Script_14_cat-merge-peak-bed_MACS.sh in the terminal or drag the shell script file into the terminal and enter.
NOTE: This script is written to: (i) Run cat and sort functions to concatenate the replicates' MACS2 and MACS3 whole peak bed files as one peak bed file and sort the concatenated peak bed file in ~/Desktop/GSE126612/bed-for-comparison folder. (ii) Run bedtools merge function using the concatenated whole peak bed files to merge peaks which overlap each other. (iii) A log file log_cat-merged-peak-bed_MACS.txt is written in log folder ~/Desktop/GSE126612/log/cat-merged-peak-bed.
Type ~/Desktop/Easy-Shells_CUTnRUN/scripts/Script_14_cat-merge-peak-bed_SEACR.sh in the terminal or drag the shell script file into the terminal and enter.
NOTE: This script is written to: (i) Run cat and sort functions to concatenate the replicates' SEACR whole peak bed files as one peak bed file and sort the concatenated peak bed file in ~/Desktop/GSE126612/bed-for-comparison folder. (ii) Run bedtools merge function using the concatenated whole peak bed files to merge peaks which overlap each other. (iii) A log file log_cat-merged-peak-bed_SEACR.txt is written in log folder ~/Desktop/GSE126612/log/cat-merged-peak-bed.
After completion of the run the shell scripts, review the log files carefully. If there is any error message in the log files, correct the error and run the script(s) again. If there is any problem to solve the issue, request assistance using the Easy Shells CUTnRUN github issues webpage (https://github.com/JunwooLee89/Easy-Shells_CUTnRUN/issues).
NOTE: The whole peak regions peak bed files will be used as input files of Venn diagram analysis to compare similarity between peak calling options, peak calling methods, replicates, and genomic landscape observations near peak regions. The merged whole peak regions peak bed files will be used for Principal component (PC) analysis and Pearson coefficient correlation analysis using deeptools. The focused peak bed files will be used for Heatmap and average plot analysis using Deeptools.

14. Validating similarity between replicates using Pearson correlation and Principal component (PC) analysis.

Open the terminal and type echo $SHELL to check the default shell in the active terminal. If Bash shell is the default shell in current terminal, users may see the following: /path/to/bash (or a similar message such as /bin/bash) in the terminal.
If the default shell is not Bash, set Bash shell as default shell by typing chsh -s $(which bash) in the terminal. If the terminal uses Bash shell as default, skip this step.
Type ~/Desktop/Easy-Shells_CUTnRUN/scripts/Script_15_correlation_plotCorrelation.sh in the terminal or drag the shell script file into the terminal and enter.
NOTE: This script is written to: (i) Run multiBamSummary BED-file function using the bam files of replicates, which were sorted by coordinate, and merged whole peak bed files for CTCF, H3K27Ac and RNAPII-S5P to generate matrix files for Pearson correlation analysis in Desktop/GSE126612/deeptools_multiBamSummary folder. (ii) Run plotCorrelation function using the matrix files to perform Pearson correlation coefficient calculation and heatmap clustering and save the result in ~/Desktop/GSE126612/deeptools_plotCorrelation folder. (iii) Write down a log file log_plotCorrelation.txt in ~/Desktop/GSE126612/log/correlation folder.
Type ~/Desktop/Easy-Shells_CUTnRUN/scripts/Script_15_correlation_plotPCA.sh in the terminal or drag the shell script file into the terminal and enter.
NOTE: This script is written to: (i) Run multiBamSummary BED-file function using the bam files, which were sorted by coordinate, and merged whole peak bed files, which includes all the CTCF, H3K27ac and RNAPII-S5P peaks, to generate matrix files for Principal component analysis (PCA) in Desktop/GSE126612/deeptools_multiBamSummary folder. (ii) Run plotPCA function using the matrix files to perform PCA and save the result in ~/Desktop/GSE126612/deeptools_plotPCA folder. (iii) Write down a log file log_plotPCA.txt in ~/Desktop/GSE126612/log/correlation folder.
After completion of running shell scripts, check the log files. If there is any error message, correct the error and run the shell scripts again. If there is any problem to solve the issue, request assistance using the Easy Shells CUTnRUN github issues webpage (https://github.com/JunwooLee89/Easy-Shells_CUTnRUN/issues).
NOTE: In principle, properly prepared and processed replicates show higher Pearson correlation coefficient values within same clustering group and close positioning in Principal component analysis. Any replicate, which shows a lower Pearson correlation coefficient, and long distance from other replicates in the Principal component plot, may represent a potential outlier among the replicates. This shell script is applicable for any bam format mapped reads data. Change the path and file name of bigwig files to meet project-specific requirements.

15. Validating similarity between replicates, peak calling methods and options using Venn diagram

Open the terminal and type echo $SHELL to check the default shell in the active terminal. If Bash shell is the default shell in current terminal, there may be something like /path/to/bash (for example, /bin/bash) in the terminal.
If the default shell is not Bash, set Bash shell as default shell by typing chsh -s $(which bash) in the terminal. If the terminal uses Bash shell as default one, consider skipping this step
Type ~/Desktop/Easy-Shells_CUTnRUN/scripts/Script_16_venn-diagram_methods.sh in the terminal or drag the shell script file into the terminal and enter.
NOTE: This script is written to: (i) Run intervene venn function using whole peak region peak bed files to find overlaps between the peaks called by various options (with/without IgG control option, with/without normalization, and stringent/relaxed peak calling options for SEACR). (ii) Create one folder (~/Desktop/GSE126612/intervene_methods) and save the Venn diagram analysis results in this folder. (iii) Write down one log file log_intervene_methods.txt in ~/Desktop/GSE126612/log/intervene folder.
Type ~/Desktop/Easy-Shells_CUTnRUN/scripts/Script_16_venn-diagram_replicates.sh in the terminal or drag the shell script file into the terminal and enter.
NOTE: This script is written to: (i) Run intervene venn function using whole peak region peak bed files to find overlaps between the replicates' peaks. (ii) Create one folder (~/Desktop/GSE126612/intervene_replicates) and save the Venn diagram analysis results in this folder. (iii) Write down one log file log_intervene_replicates.txt in ~/Desktop/GSE126612/log/intervene folder.
After the finish of running the shell scripts, review the log files. If there is any error message, correct the error and run the shell scripts again. If there is any issue in usage of Easy Shells CUTnRUN analysis pipeline, ask help in Easy Shells CUTnRUN github issues webpage (https://github.com/JunwooLee89/Easy-Shells_CUTnRUN/issues).
NOTE: These Venn diagram analysis results give insight to choose the most appropriate peak calling options, methods, and replicates with high reproducibility for downstream analysis. It may be preferred to choose the peak calling options and methods which show the highest called peak numbers with good overlap with other peak calling methods and options.

16. Analyzing heatmaps and average plots to visualize called peaks.

Open the terminal and type echo $SHELL to check the default shell in the active terminal. If Bash shell is the default shell in current terminal, there may be something like /path/to/bash (for example, /bin/bash) in the terminal.
If the default shell is not Bash, set Bash shell as default shell by typing chsh -s $(which bash) in the terminal. If the terminal uses Bash shell as default one, consider skipping this step
Type ~/Desktop/Easy-Shells_CUTnRUN/scripts/Script_27_plotHeatmap_focused.sh in the terminal or drag the shell script file into the terminal and enter.
NOTE: This script is written to: (i) Run computeMatrix reference-point function using normalized bigWig files and focused peak bed files to make normalized readcounts matrixes at the center of the focused peaks in ~/Desktop/GSE126612/deeptools_computeMatrix folder. (ii) Run plotHeatmap function using the normalized readcounts matrix to generate heatmaps and average plots which visualize normalized readcounts distribution pattern at the focused peak locations. (iii) Create one folder (~/Desktop/GSE126612/deeptools_plotHeatmap) and save the plotHeatmap output files within this folder. (iv) Write down one log file log_plotHeatmap_focused.txt in ~/Desktop/GSE126612/log/plotHeatmap folder.
Type ~/Desktop/Easy-Shells_CUTnRUN/scripts/Script_27_plotHeatmap_whole.sh in the terminal or drag the shell script file into the terminal and enter.
This script is written to: (i) Run computeMatrix reference-point function using normalized bigWig files and whole peak bed files to make normalized readcounts matrixes at the center of the whole peaks in ~/Desktop/GSE126612/deeptools_computeMatrix folder. (ii) Run plotHeatmap function using the normalized readcounts matrix to generate heatmaps and average plots which visualize normalized readcounts distribution pattern at the whole peak locations. (iii) Create one folder (~/Desktop/GSE126612/deeptools_plotHeatmap) and save the plotHeatmap output files within this folder. (iv) Write down one log file log_plotHeatmap_whole.txt in ~/Desktop/GSE126612/log/plotHeatmap folder.
After the finish of running the shell scripts, review the log files. If there is any error message, correct the error and run the shell scripts again. If there is any issue in usage of Easy Shells CUTnRUN analysis pipeline, ask help in Easy Shells CUTnRUN github issues webpage (https://github.com/JunwooLee89/Easy-Shells_CUTnRUN/issues).
NOTE: Ideally, the peak summit locations of MACS2/3 peaks and the focused peak locations of SEACR ones exhibit a sharp and focused signal distribution at the center of the plots. However, if the peak calling algorithm does not work properly for CUT&RUN data, less focused 'noisy' signal distribution may appear in the plots. Therefore, using the number of called peaks and the peak signal distribution patterns of the output plots will guide determination of peak validity for further CUT&RUN analysis that include downstream peak annotation.

Access restricted. Please log in or start a trial to view this content.

Results

Quality and adapter trimming retains reads with high sequencing quality
High-throughput sequencing techniques are prone to generating sequencing errors such as sequence 'mutations' in reads. Furthermore, sequencing adapter dimers can be enriched in sequencing datasets due to poor adapter removal during library preparation. Excessive sequencing errors, such as read mutations, generation of reads shorter than required for proper mapping, and enrichment of adapter dimers, can increase read map...

Access restricted. Please log in or start a trial to view this content.

Discussion

The ability to map protein occupancy on chromatin is fundamental to conducting mechanistic studies in the field of chromatin biology. As laboratories adopt new wet lab techniques to profile chromatin, the ability to analyze sequencing data from those wet lab experiments becomes a common bottleneck for wet lab scientists. Therefore, we describe an introductory step-by-step protocol to enable bioinformatics beginners to overcome the analysis bottleneck, and initiate analysis and quality control checks of their own CUT&...

Access restricted. Please log in or start a trial to view this content.

Disclosures

The authors declare no disclosures.

Acknowledgements

All illustrated figures were created with BioRender.com. CAI acknowledges support provided through an Ovarian Cancer Research Alliance Early Career Investigator Award, a Forbeck Foundation Accelerator Grant, and the Minnestoa Ovarian Cancer Alliance National Early Detection Research Award.

Access restricted. Please log in or start a trial to view this content.

Materials

Name	Company	Catalog Number	Comments
bedGraphToBigWig	ENCODE	https://hgdownload.soe.ucsc.edu/admin/exe/	Software to compress and convert readcounts bedGraph to bigWig
bedtools-2.31.1	The Quinlan Lab @ the U. of Utah	https://bedtools.readthedocs.io/en/latest/index.html	Software to process bam/bed/bedGraph files
bowtie2 2.5.4	Johns Hopkins University	https://bowtie-bio.sourceforge.net/bowtie2/index.shtml	Software to build bowtie index and perform alignment
CollectInsertSizeMetrics (Picard)	Broad institute	https://github.com/broadinstitute/picard	Software to perform insert size distribution analysis
Cutadapt	NBIS	https://cutadapt.readthedocs.io/en/stable/index.html	Software to perform adapter trimming
Deeptoolsv3.5.1	Max Planck Institute	https://deeptools.readthedocs.io/en/develop/index.html	Software to perform Pearson coefficient correlation analysis, Principal component analysis, and Heatmap/average plot analysis
FastQC Version 0.12.0	Babraham Bioinformatics	https://github.com/s-andrews/FastQC	Software to check quality of fastq file
Intervenev0.6.1	Computational Biology & Gene regulation - Mathelier group	https://intervene.readthedocs.io/en/latest/index.html	Software to perform venn diagram analysis using peak files
MACSv2.2.9.1	Chan Zuckerberg initiative	https://github.com/macs3-project/MACS/tree/macs_v2	Software to call peaks
MACSv3.0.2	Chan Zuckerberg initiative	https://github.com/macs3-project/MACS/tree/master	Software to call peaks
Samtools-1.21	Wellcome Sanger Institute	https://github.com/samtools/samtools	Software to process sam/bam files
SEACRv1.3	Howard Hughes Medial institute	https://github.com/FredHutch/SEACR	Software to call peaks
SRA Toolkit Release 3.1.1	NCBI	https://github.com/ncbi/sra-tools	Software to download SRR from GEO
Trim_Galore v0.6.10	Babraham Bioinformatics	https://github.com/FelixKrueger/TrimGalore	Software to perform quality and atapter trimming

References

Hainer, S. J., Fazzio, T. G. High-resolution chromatin profiling using CUT&RUN. Curr Protoc Mol Biol. 126 (1), e85(2019).
Zhang, Y., et al. Model-based analysis of ChiP-Seq (MACS). Genome Biology. 9 (9), R137(2008).
Xu, S., Grullon, S., Ge, K., Peng, W. Stem cell transcriptional networks: Methods and Protocols. , Springer. New York, NY. (2014).
Meers, M. P., Tenenbaum, D., Henikoff, S. Peak calling by sparse enrichment analysis for cut&run chromatin profiling. Epigenetics Chromatin. 12 (1), 42(2019).
Ashburner, M., et al. Gene ontology: Tool for the unification of biology. The gene ontology consortium. Nat Genet. 25 (1), 25-29 (2000).
Harris, M. A., et al. The gene ontology (GO) database and informatics resource. Nucleic Acids Res. 32 (Database issue), D258-D261 (2004).
The Gene Ontology Consortium. The gene ontology resource: 20 years and still going strong. Nucleic Acids Res. 47 (D1), D330-D338 (2019).
Conesa, A., et al. Blast2go: A universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics. 21 (18), 3674-3676 (2005).
Carbon, S., et al. AmiGO: Online access to ontology and annotation data. Bioinformatics. 25 (2), 288-289 (2009).
Eden, E., Navon, R., Steinfeld, I., Lipson, D., Yakhini, Z. Gorilla: A tool for discovery and visualization of enriched go terms in ranked gene lists. BMC Bioinformatics. 10, 48(2009).
Huang Da, W., Sherman, B. T., Lempicki, R. A. Bioinformatics enrichment tools: Paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 37 (1), 1-13 (2009).
Huang Da, W., Sherman, B. T., Lempicki, R. A. Systematic and integrative analysis of large gene lists using david bioinformatics resources. Nat Protoc. 4 (1), 44-57 (2009).
Ge, S. X., Jung, D., Yao, R. ShinyGO: A graphical gene-set enrichment tool for animals and plants. Bioinformatics. 36 (8), 2628-2629 (2020).
Tang, D., et al. SRplot: A free online platform for data visualization and graphing. PLoS One. 18 (11), e0294236(2023).
Ramírez, F., et al. Deeptools2: A next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 44 (W1), W160-W165 (2016).
Robinson, J. T., et al. Integrative genomics viewer. Nat Biotechnol. 29 (1), 24-26 (2011).
Kent, W. J., et al. The human genome browser at ucsc. Genome Res. 12 (6), 996-1006 (2002).
Yu, F., Sankaran, V. G., Yuan, G. -C. CUT&RUNTools 2.0: A pipeline for single-cell and bulk-level CUT&RUN and CUT&Tag data analysis. Bioinformatics. 38 (1), 252-254 (2021).
Zhu, Q., Liu, N., Orkin, S. H., Yuan, G. -C. CUT&RUNTools: A flexible pipeline for CUT&RUN processing and footprint analysis. Genome Biol. 20 (1), 192(2019).
Chris Cheshire, C. -W., et al. Nf-core/cutandrun: Nf-core/cutandrun v3.2.2 iridium ibis. , At https://github.com/nf-core/cutandrun/tree/3.2.2 (2024).
Kong, N. R., Chai, L., Tenen, D. G., Bassal, M. A. A modified CUT&RUN protocol and analysis pipeline to identify transcription factor binding sites in human cell lines. STAR Protoc. 2 (3), 100750(2021).
Meers, M. P., Bryson, T. D., Henikoff, J. G., Henikoff, S. Improved CUT&RUN chromatin profiling tools. eLife. 8, e46314(2019).
Amemiya, H. M., Kundaje, A., Boyle, A. P. The encode blacklist: Identification of problematic regions of the genome. Sci Rep. 9 (1), 9354(2019).
Deberardine, M. BRgenomics for analyzing high-resolution genomics data in R. Bioinformatics. 39 (6), btad331(2023).
Deberardine, M., Booth, G. T., Versluis, P. P., Lis, J. T. The nelf pausing checkpoint mediates the functional divergence of cdk9. Nat Commun. 14 (1), 2762(2023).
Andrews, S. Fastqc: A quality control tool for high throughput sequence data. , At http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ (2010).
Krueger, F., James, F. O., Ewels, P. A., Afyounian, E., Schuster-Boeckler, B. FelixKrueger/TrimGalore: v0.6.7 - DOI via Zenodo. , (2021).
Mcgaughey, D. Easy bam downsampling. , Available from: https://davemcg.github.io/post/easy-bam-downsampling/ (2018).
Positionbaseddownsamplesam (picard). , GATK Team. At https://gatk.broadinstitute.org/hc/en-us/articles/360041850311-PositionBasedDownsampleSam-Picard (2020).

Access restricted. Please log in or start a trial to view this content.

Reprints and Permissions

Request permission to reuse the text or figures of this JoVE article

Request Permission

Explore More Articles

CUT RUN Next generation Sequencing Cancer Initiation Repetitive DNA Elements Epigenetic Factors Chromatin Changes Transcription Factors Bioinformatics Chronic Double stranded RNA Immunotherapy TP53 Protein Protein DNA Interactions Histone Modifications Sequencing Sensitivity

This article has been published

Video Coming Soon

Keep me updated:

Introductory Analysis and Validation of CUT&RUN Sequencing Data

In This Article

Summary

Abstract

Introduction

Protocol

Results

Discussion

Disclosures

Acknowledgements

Materials

References

Reprints and Permissions

Explore More Articles