Introductory Analysis and Validation of CUT&RUN Sequencing Data

1.7K Views

•

04:58 min

•

December 13th, 2024

DOI :

10.3791/67359-v

December 13th, 2024

•

Junwoo Lee¹, Biji Chatterjee¹^,², Nakyung Oh¹, Dhurjhoti Saha¹, Yue Lu¹, Blaine Bartholomew¹, Charles A. Ishak¹^,³

¹Department of Epigenetics and Molecular Carcinogenesis, University of Texas MD Anderson Cancer Center, ²Department of Genomic Medicine, University of Texas MD Anderson Cancer Center, ³Department of Gynecologic Oncology and Reproductive Medicine, University of Texas MD Anderson Cancer Center

Transcript

Our research explores the role of repetitive DNA elements in cancer initiation and progression. We ask whether these elements are directly involved in mechanisms of cancer initiation, and if so, can we use this information to improve patient outcomes? Many next generation sequencing technology and analysis pipeline has been developed to capture the feature and changes of the epigenetic factors such as transcriptional, or the post-translational chromatin changes, and the binding at the specific localization on the chromatin, and sequencing of DNA or RNA, which are captured by the sensor protein in both bulk and single-cell resolution.

The technologies used to advance research in our field are those that can map the genomic positions of modified histones or transcription factors on chromatin. For our purposes, the most relevant technique within this umbrella of technologies is a technique called CUT&RUN. Many publicly available CUT&RUN analysis pipeline are relatively complex for the bioinformatics beginner who want to understand the flow of the analysis and to develop their own analysis pipeline for their project.

But our CUT&RUN analysis pipeline are developed in step-by-step manner in single program language, which are much easier to understand the flow and easier to modify to get their publication-quality data and the build up the experience for the bioinformatics. We are keep focusing on to find the answer about how chronic double-stranded RNA expression affect the biomimicry evasion and immunotherapy-resistant cancer development, and how TP53 protein can regulate the chronic double-stranded RNA expression and the biomimicry evasion. And at the end, we want to understand how we can cure the immunotherapy-resistant cancer by understanding and targeting the chronic double-stranded RNA expression and the biomimicry evasion process.

To begin, download the compressed analysis pipeline from GitHub by typing the command in the terminal. After downloading, enter the command to decompress the zip file. Once the decompression is complete, remove the zip file and change the decompressed folder name.

Then, set executable permissions for all the shell scripts in the working directory. From the available options, locate the appropriate installation shell script. Look for a script named Script_01_installation.sh.

Open the terminal and type echo shell to check the default shell used in the active terminal. If bash is not the default shell, set it by typing chsh s in the terminal. Type the fastq download script in the terminal, or drag the script file into the terminal and press Enter.

After running the script, check the log file for error messages. Now, enter the trimming script in the terminal and execute it. Next, type the bowtie-2-index script in the terminal and execute it.

Finally, run the bowtie-2-mapping script and press Enter. Type the bam filtering and sorting script in the terminal and execute it. Next, convert bam to BEDPE-BED and bedGraph formats by typing the appropriate command in the terminal and pressing Enter.

Perform normalization using the SRPMC method. Then, run the insert-size-analysis script. Perform peak-calling using the MACS command in the terminal.

Afterward, type the command for peak-calling using SEACR and press Enter. Create peak-bed files by using the appropriate command in the terminal, and then filter the peaks. Next, enter the command to merge MACS peaks.

Type the command in the terminal to merge SEACR peaks. Generate a correlation plot and run the command for principle component analysis in the terminal. Then, create a Venn diagram comparing different methods using the appropriate command.

Enter the command to generate a Venn diagram for replicates. Finally, generate a focused peak center heat map. Then, generate a whole peak center heat map.

Summary

Explore More Videos

CUT RUN

Next generation Sequencing

Cancer Initiation

Repetitive DNA Elements

Epigenetic Factors

Chromatin Changes

Transcription Factors

Bioinformatics

Chronic Double stranded RNA

Immunotherapy