CorExplorer finds tumor-associated gene expression factors in a way that is mathematically principled using interactive visualizations, as well as clinical and database information for biological discovery. CorEx performs information maximization in a way that requires relatively few samples to find clusters in high-dimensional data. The hierarchy of latent factors that it produces facilitates the biological understanding.
This method has already proven interesting when applied to cancer biology. Its information theoretic foundations make it applicable to any system with many variables and little prior knowledge about their relationships. Begin by navigating to the CorExplorer homepage.
Under Quick Links, click on the plus expand button to see a summary of the CorEx factor graph that was trained on the cancer data of interest. For example, here, the factor graphs for TCGA ovarian cancer data are shown. After inspecting the factor graphs, click lung TCGA-LUAD to access the CorExplorer page for lung cancer RNA sequencing and use the CorExplorer Factor Graph window to explore the CorEx factor graph for a gene of interest.
While moving the mouse cursor over the factor graph display window, zoom into the factor graph using the mouse trackpad to view the details of the graph, such as the most important genes in each factor and the connections between the nodes at different layers. To locate a target gene, click the Gene menu and type the gene name to select it in the drop-down list. Press Return on the keyboard to make the view zoom to the factor with which the gene of interest is most strongly correlated.
Reposition the mouse over the graph display and scroll to zoom out to see the level of the node and its associated factors that are neighbors to the most closely associated gene factor. Note that only genes with a weight greater than the threshold indicated on the min Link Weight slider are shown. To view all of the genes associated with the factor, click on the appropriate node and select Load Additional Genes in the pop-up window.
When Done appears, close the pop-up window. In the header section, click and drag the min Link Weight modifier to 0.05 to allow the genes to appear in weight order. To identify associations with a biological function, in the Annotation window uncheck false discovery rate sort to sort the Factor drop-down menu by factor number rather than false discovery rate.
Scroll and click to select the factor of interest in the Annotation window drop-down menu to reveal the enrichment annotations for the factor. Then click an enrichment factor to immediately view the associated genes as highlighted in yellow on the graph display. Note that factors that disappear or appear as different GO terms are selected according to whether or not they are enriched for genes with the selected annotation.
For filter factors of interest, using survival and cluster quality, from the Dataset drop-down menu select TCGA_OVCA to go to the CorExplorer page for the TCGA ovarian cancer RNA sequencing. Note from the Survival window the factor with the largest survival differential and select this factor in the Factor Graph window from the Factor drop-down menu. Click and drag the Link Weight slider to 0.5 and note the number of genes in the factor.
Expand the list of factors in the Survival window and click on the next best factor in the Survival window drop-down to view its associated survival curves. The significant GO and Kegg annotations will be shown. To gain a better understanding of the biological role of genes in this factor, select the Factor layer at the top of the Factor Graph window and move the mouse over the window while zooming out to reveal the entire cluster and associated factors.
To understand the relative significance of the factors linked to the cluster node uncheck Sort by p-val in the Survival window and click on each of the factor numbers in succession to view them, noting the factors that display a survival association. In the Add Window menu select PPI and click Add to add a PPI graph window to the display area. In the PPI graph window select a factor layer of interest to display the protein-protein interactions that are significant.
Click the View at StringDB link to connect to the STRINGdb online database and click Continue. Then open the Anaylsis tab to obtain an online GO analysis for the PPI network genes. The top cellular component will be displayed.
Return to the CorExplorer tab and PPI window and select another factor. Click the View at StringDB link again. A different top cellular component will be displayed.
Then, in the PPI window, select another factor for a STRING database analysis. To find commonalities and differences of gene expression variation across tumor types click on the CorExplorer heading to return to the front page and click Search to access a page allowing searching of all the datasets on the CorExplorer site. In the Gene Search box enter a gene name of interest and click Search.
For example, as demonstrated FLT1 is found with a relatively high weight and multiple different factors. Searching for the BRCA1 gene in the lung cancer dataset reveals the gene to be the most strongly associated with CorEx factor 26. The GO term enrichment for this factor is extremely high, with DNA repair exhibiting a false discovery rate of only one times 10 to the negative 19.
The selection also draws attention to the second level cluster L2_8 that has six closely related factors as children. The DNA repair protein-protein interaction network is strongly connected, further supporting the tightly linked functionality of the genes in factor 26. The associated survival graphs suggest a possible association with patient survival that would have to be confirmed in a larger dataset.
Starting with the survival assessment can allow dissection of the factors that correlate with an improved survival as associated with particular gene expression groups. Adding a Protein-Protein Interaction window for each factor in turn facilitates the determination of possible explanations for their associations with survival. It is important to check the heat maps for each factor to confirm that the gene expression pattern is of an adequate quality to support biological interpretations.
Heat maps that show strong, clear variations in gene expression patterns may exhibit either coordinated expression of the factor genes, ranging from high to low, or more complex patterns, with some genes having low expression correlated with other genes with a high expression. The overarching goal of this procedure is to set a personalized therapy by mapping an out of sample tumor onto the CorExplorer factors to identify potential tumor-specific therapies.