Performing structure function analysis on repetitive and disordered transcription factors is difficult. By coupling transcriptomics with the right cellular context, this approach better uncovers important structure function relationships. Using RNA sequencing as a functional output allows the effective assessment of all of the genes regulated by a single protein in one experiment, making partial function detection more likely.
Partial function detection is particularly important for oncogenic fusion transcription factors, as we don't know how these proteins function and their sequencing can lead to better therapies in fusion driven cancers. Although we will focus on the disordered EWS domain in EWS/FLI EWS is involved in other fusions that contain disorder domains with poorly defined functions. For a cDNA expression construct transduction first, quickly thaw the frozen virus with cDNA constructs in a 37 degrees Celsius water bath and gently mix 2.5 microliters of eight milligrams per milliliter polybrene into each vile.
Next remove the medium from one 50 to 70%confluent 10 centimeter cell culture plate per vial construct and gently bypass the entire two milliliter volume of construct down the side of the plate. Rock the plate to spread the virus evenly across the cells and place the plate in a 37 degrees Celsius tissue culture incubator for two hours, rocking the plate every 30 minutes to prevent any areas of the plate from drying out. At the end of the incubation, add five milliliters of medium supplemented with fetal bovine serum, antibiotics, sodium pyruvate, and polybrene.
After overnight incubation in the cell culture incubator replace the supernatant with selection medium and return the cells to the cell culture incubator for an additional seven to 10 days to allow for selection and cDNA construct expression. At the end of the selection period, collect the cells into a 15 milliliter conical tube for counting and allocate five to 10 times 10 to the fifth cells into a new tube for RNA sequencing and two times 10 to the six cells into another new tube for protein extraction. Sediment the cells by centrifugation and resuspend the pellets in one milliliter of cold PBS or a five minute centrifugation.
Then flash freeze both pellets and liquid nitrogen and store the cells at minus 80 degrees Celsius. To validate the knockdown of proteins of interest and the expression of the panel of constructs, blot the protein lysate samples with the appropriate primary and secondary antibodies according to standard Western blot analysis protocols. To assess RNA quality and quantity, use the lysis buffer from a silica spin column based extraction kit to lys the RNA sequencing cell samples and apply the lysates to a genomic DNA removal column at greater than 13, 000 revolutions per minute for 30 to 60 seconds.
Next, proceed with silica spin column purification and wash the RNA on the column according to kit instructions. Then elute the RNA in 30 microliters of elution buffer and analyze at least 2.5 micrograms of RNA on a spectrophotometer at a 260 to 280 nanometer ratio to assess the RNA quantity and sample quality. For fastq file analysis use Putty to open the terminal to the high performance computing environment and create an analysis directory called project.
Navigate to the path to project directory and create a directory for the compressed raw fastq. gz files called fastq and a second directory called trimmed. Us an appropriate secure file transfer program to transfer the compressed raw fastq.
gz files from local storage to the path to project fastq directory and check that there is an R1 and an R2 file for each sample. Navigate to path to project fastq and use the command in trim galore as indicated to trim the low quality reads from the fastq. gz files.
Navigate to the path to project directory and create a new directory called STAR output. Navigate to the path to project trim directory and use the command as indicated to run STAR to align the trimmed fastq. gz files.
Locate the required output for the next steps, which contain the counts per transcript at the indicated location and use the command to read in each reads per gene out tab file. For the first column use only the characters before the period in the ENSEMBL gene ID column for the ease of downstream processing. Then use the command to compile the counts of all the samples into the data frame called totcts and save this new table of raw count data as a tab-delimited text file.
To define the differential expression profile for each construct using DESeq2 input the experimental DESeq2 design and use the DESeq dataset from matrix function to construct a DESeq dataset to estimate the size factors and to run DESeq2. To evaluate the quality of the analysis use DESeq2 to extract the regularized log normalize counts. When extracting the results for each transcriptional profile from the DESeq2 results, perform pairwise comparisons in reference to either the knockdown condition or the baseline empty vector.
Further amend these results with the HGNC gene symbols and extract the data from the DESeq2 data as a single file with the ENSEMBL gene ID, HGNC symbol, baseMean expression, and a differential expression data for all of the constructs with log two-fold change and raw and adjusted P values. Assess the successful batch normalization and interest sample similarity, and use the code to use the regularized log normalized counts to check the sample clustering with principal components analysis and a sample-to-sample distance plots. Use the regularized log normalized counts to extract the 1, 000 most variable genes into a matrix and use a heat map to perform an unsupervised hierarchical clustering of the samples based on these genes.
To extract the clusters of interest from the dendrogram, decide what level of the dendrogram clusters of interests appear and set K equal to the number of clusters at that level. To determine which clusters are of interest, re-plot the heat map ordered by cluster and export the list of genes associated with each cluster in a table, then use an appropriate bioinformatic tool to identify the biological roles for the different clusters of genes identified and compared between the classes. In this representative analysis an effective knockdown and rescue with the positive and negative constructs can be observed.
Note that DAF rescued cells failed to form colonies, suggesting impaired oncogenic transformation. Following completion of the replicant validation, phenotypic assays and initial RNA sequencing data processing gene counts can be obtained for all of the samples for batch normalization and analysis. DESeq2 without batch normalization can result in confounding batch defects, likely due to biological variability introduced by the passage of cells in culture and differences in the processing of each batch.
Following batch normalization, DESeq2 can be used to generate transcriptional profiles for the constructs of interest, relative to the baseline. Principal component analysis for these data suggests that the transcriptional profile of DAF is intermediate, between wild-type EWS/FLI and Delta 22, confirming partial function. Moreover, hierarchical clustering of the 1, 000 most variable genes across samples shows that DAF fails to repress EWS/FLI target genes, and only partially retains gene activation activity.
Top gene analysis suggests that the classes of genes that DAF activates are functionally distinct from those EWS/FLI activated targets where DAF is non-functional. Interestingly, DAF is most able to rescue GGAA-microsatellite activated genes, but unable to rescue activated genes near a high affinity site. Pairing the transcriptomic output with the relevant phenotypic assays completes the structure function analysis.
Researchers can also use other techniques to study the mechanistic drivers of different transcriptional functions.