In this protocol, we describe a procedure containing two parts. First, a monolayer 2D keratinocyte differentiation method in vitro by contact inhibition. Second, its molecular characterization by RNA-seq.
The 2D in vitro differentiation method is straightforward and easy to perform. It can be used to study epidermal differentiation, especially when a large number of cells are necessary. For example, in epigenomic analysis.
The detailed RNA-seq analysis pipeline is clear and transparent for researchers with basic bioinformatics skills, and it can be used for any bulk RNA-seq analysis. When performing the differentiation protocol for the very first time, keep in mind that the seeding density can be tricky. Try out multiple seeding densities to find out which one works best with your cell line.
Begin by preparing keratinocyte growth medium or KGM and 500 milliliters each of proliferation medium and differentiation medium according to manuscript directions. Seed primary keratinocytes at a density of 5 to 20, 000 cells per centimeter squared and add enough proliferation medium to cover the cells. Two days after seeding, refresh the cells with the proliferation medium.
Check the cells regularly under the microscope and refresh medium every other day. When the cells reach 90%confluency, induce differentiation by changing the medium to the differentiation medium. To collect cells for further RNA analyses, wash the cells twice with DPBS and then add lysis buffer.
Collect cells on differentiation days zero, two, four, and seven. After RNA isolation, double-stranded cDNA will be converted and RNA-seq libraries will be prepared for next generation sequencing. Following the sequencing, the sequence reads will be downloaded and mapped to the human genome.
After downloading and installing R packages, generate account table from the readspergene.out. tab files and write a sample data file containing all filenames, the day of differentiation and other relevant sample data. Refer to the supplementary coding files for an example.
Then use the count table and sample data to generate a deseq2 object containing both the countable and sample data. For gene expression normalization, normalize the count table in the deseq2 object using either deseq2RLD or VST normalization. While RLD normalization is preferred, VST normalization is much faster.
Use the dist function in R to plot the sample distance based on the normalized read count intensities, then perform hclust clustering based on sample distance. Next, plot the heat map using pheatmap function. Finally, generate a principle component analysis or PCA plot of the normalized read count intensities using the plotPCA function of deseq2.
PCA serves as a tool in exploratory data analysis and can be used to visualize distance and connection between different samples. Perform highly variable gene expression analysis with the rowVars function which extracts the top 500 highly variable genes by ordering on the variance between samples from different time points. These genes have the highest standard deviation of their normalized intensity across days of differentiation.
Perform k-means clustering on the highly variable genes to cluster them by different expression patterns. Visualize the genes in a heat map with the pheatmap package. The intensity plotted in the heat map is the deseq2 normalized intensity with the median value subtracted.
Generate a list of expressed background genes that includes all genes with more than 10 counts in a single sample and make a list with genes from each cluster. Finally, perform Gene Ontology analysis using GOrilla. Use the background gene list as the background set and the list of the highly variable genes in clusters as the target set, then click on search enriched GO terms.
Alternatively, more advanced R users can use the package clusterProfiler to automate GO term enrichment analysis. Keratinocyte lines derived from five individuals were used for differentiation in RNA-seq analyses. Principal component analysis showed that keratinocytes undergoing differentiation had connected but distinct overall gene expression profiles.
Highly variable genes were clustered to visualize gene expression dynamics and patterns during differentiation. Each cluster of genes was represented by the keratinocyte differentiation hallmark genes and Gene Ontology annotation showed a clear difference in gene functions. In the second experiment, cell morphology and gene expression differences were compared between keratinocytes from healthy controls and cell lines derived from patients carrying P63 mutations.
Principal component analysis showed that the control cell lines clearly followed a differentiation pattern, but the gene expression pattern of mutant cells during differentiation stays largely similar to that of proliferating or undifferentiated cells. In the clustering analysis, genes in cluster one were downregulated in control cells and partially downregulated in patient cells carrying R204W and R279H mutations, but remained high in cells carrying R304W. These genes likely play roles in cell proliferation as shown by Gene Ontology analysis.
Cluster two genes were first introduced and subsequently downregulated in control cells. These genes are likely involved in keratinocyte differentiation as epidermal differentiation and keratinization functions were highly enriched for this cluster. Genes in cluster three were only induced at the end of differentiation in control cells which indicates that they may play a role in the outermost layer of the epidermis.
When analyzing RNA-seq data, it is important to know beforehand what to expect. This will help you both with interpreting the quality control and the PCA plot and in assessing if your results are making sense. Our methods can be used to study gene transcription both in epidermal biology as well as in other biological systems.