Aby wyświetlić tę treść, wymagana jest subskrypcja JoVE. Zaloguj się lub rozpocznij bezpłatny okres próbny.
Method Article
Here, we present a protocol to explore the biomarker and survival predictor of breast cancer based on the comprehensive analysis of pooled clinical datasets derived from a variety of publicly accessible databases, using the strategy of expression, correlation and survival analysis step by step.
In recent years, emerging databases were designed to lower the barriers for approaching the intricate cancer genomic datasets, thereby, facilitating investigators to analyze and interpret genes, samples and clinical data across different types of cancer. Herein, we describe a practical operation procedure, taking ID1 (Inhibitor of DNA binding proteins 1) as an example, to characterize the expression patterns of biomarker and survival predictors of breast cancer based on pooled clinical datasets derived from online accessible databases, including ONCOMINE, bcGenExMiner v4.0 (Breast cancer gene-expression miner v4.0), GOBO (Gene expression-based Outcome for Breast cancer Online), HPA (The human protein atlas), and Kaplan-Meier plotter. The analysis began with querying the expression pattern of the gene of interest (e.g., ID1) in cancerous samples vs. normal samples. Then, the correlation analysis between ID1 and clinicopathological characteristics in breast cancer was performed. Next, the expression profiles of ID1 was stratified according to different subgroups. Finally, the association between ID1 expression and survival outcome was analyzed. The operation procedure simplifies the concept to integrate multidimensional data types at the gene level from different databases and test hypotheses regarding recurrence and genomic context of gene alteration events in breast cancer. This method can improve the credibility and representativeness of the conclusions, thereby, present informative perspective on a gene of interest.
Breast cancer is a heterogeneous disease with diverse prognosis and treatment strategies in different molecular subtypes, in which the pathogenesis and development are probably associated with disparate molecular mechanisms1,2,3. However, identifying a therapeutic target usually takes years, or even decades, from initial discovery in basic research to clinical use4. Genome wide application of high-throughput sequencing technology for cancer genome has greatly advanced the process of searching for valuable biomarkers or therapeutic targets 5.
The overwhelming amount of cancer genomics data generated from the large-scale cancer genomics platforms, such as the ICGC (International Cancer Genome Consortium) and TCGA (The Cancer Genome Atlas), is posing a great challenge for researchers to perform data exploration, integration, and analytics, particularly for users lacking intensive training in informatics and computation6,7,8,9,10. In recent years, emerging databases, (e.g., ONCOMINE, bcGenExMiner v4.0, and Kaplan-Meier plotter, etc.) were designed and developed to lower the bar for approaching the intricate cancer genomic datasets, thereby, facilitating investigators to analyze and interpret the genes, samples and clinical data across various types of cancer11. The goal of this protocol is to describe a research strategy that integrated with multiple levels of gene information from a series of open access databases, which have been widely recognized by a great number of researchers, to identify the potential biomarkers and prognostic factors for breast cancer.
The ONCOMINE database is a web-based data-mining platform with cancer microarray information and is designed to facilitate discovery of novel biomarkers and therapeutic targets11. Currently, there are more than 48 million gene expression measurements from 65 gene expression datasets in this database11,12. The bcGenExMiner v4.0 (a free tool for non-profit institution), also called breast cancer Gene-Expression Miner, is a user-friendly web-based application comprising DNA microarrays results of 3,414 recovered breast cancer patients and 1,209 experienced a pejorative event13. It is designed to improve gene prognostic analysis performance with R statistical software and packages.
The GOBO is a multifunctional user-friendly online tool with microarrays information (e.g., Affymetrix U133A) from a 51-sample breast cancer cell line set and an 1881-sample breast tumor data set, that allows a wide array of analyses14. There are a variety of applications available in the GOBO database, which include rapid analysis of gene expression profiles in different molecular subtypes of breast tumors and cell lines, screening for co-expressed genes for creation of potential metagenes, and correlation analysis between outcome and gene expression levels of single genes, sets of genes, or gene signatures in breast cancer data set15.
The Human Protein Atlas is an open-access program designed for scientists to explore human proteome, which has already contributed to a large number of publications in the field of human biology and disease. The Human Protein Atlas is recognized as a European core resource for life science community16,17.
The Kaplan Meier plotter is an online tool integrating gene expression and clinical data simultaneously that allows assessment of the prognostic effect of 54,675 genes based on 10,461 cancer samples, which include 1,065 gastric, 2,437 lung, 1,816 ovarian and 5,143 breast cancer patients with a mean follow-up of 33/49/40/69 months18. Information of gene expression, relapse-free survival (RFS) and overall survival (OS) are downloadable from this database19,20.
Here, we describe a practical operation procedure of using multiple publicly accessible databases to compare, analyze and visualize patterns of alterations in the expression of the gene of interest across multiple cancer studies, with the goal of summarizing the expression profiles, prognostic values and potential biological functions in breast cancer. For example, recent studies have indicated the oncogenic properties of ID proteins in tumors and were associated with malignant features, including cellular transformation, immortalization, enhanced proliferation and metastasis21,22,23. However, each member of the ID family plays distinct roles in different types of solid tumors, and their role in breast cancer remains unclear24. In previous studies, explored through this method, we found that ID1 was a meaningful prognostic indicator in breast cancer25. Therefore, the protocol will take ID1 as an example to introduce the data mining methods.
The analysis starts from querying the expression pattern of the gene of interest in cancerous samples vs. normal samples in ONCOMINE. Then, the expression correlation of genes of interest in breast cancer was performed using the bc-GenExMiner v4.0, GOBO, and ONCOMINE. Next, the expression profiles of ID1 was stratified according to different subgroups using the above three databases. Finally, the association between ID1 expression and survival out was analyzed using bc-GenExMiner v4.0, the human protein atlas, and Kaplan-Meier plotter. The operation procedure was shown as the flowchart in Figure 1.
Access restricted. Please log in or start a trial to view this content.
1. Expression Pattern Analysis
2. Expression Correlation Analysis
3. Subgroup Analysis
4. Survival Analysis
Access restricted. Please log in or start a trial to view this content.
A representative result of data mining and integrative analysis of breast cancer biomarker was performed using ID1, one of the inhibitors of DNA-binding family members, which have been reported in the previous study 25.
As demonstrated in Figure 2, the differences of ID1 mRNA expression between tumor and normal tissues in multiple types of cancer were analyzed using the ONCOM...
Access restricted. Please log in or start a trial to view this content.
Comprehensive analysis of public databases may indicate the underlying function of the gene of interest and reveal the potential link between this gene and clinicopathological parameters in specific cancer27,31. The exploration and analysis based on one single database might provide limited or isolated perspectives due to the potential selection bias, or in a certain extent, possibly due to the variety of data quality, including data collection and the analytical...
Access restricted. Please log in or start a trial to view this content.
The authors have nothing to disclose
This work was partly supported by the Natural Science Foundation of Guangdong Province, China (No. 2018A030313562), the Teaching Reform Project of Guangdong Clinical Teaching Base (NO. 2016JDB092), National Natural Science Foundation of China (81600358), and Youth Innovative Talent Project of Colleges and Universities in Guangdong Province, China (NO. 2017KQNCX073)
Access restricted. Please log in or start a trial to view this content.
Name | Company | Catalog Number | Comments |
A personal computer or computing device with an Internet browser with Javascript enabled | Microsoft | 051690762553 | We support and test the following browsers: Google Chrome, Firefox 3.0 and above, Safari, and Internet Explorer 9.0 and above |
Adobe Flash player | Adobe Systems Inc. | It can be freely downloaded from http://get.adobe.com/flashplayer/. | This browser plug-in is required for visualizing networks on the network analysis tab. |
Chrome Broswer | Google Inc. | It can be freely downloaded from https://www.google.cn/chrome/ | This is necessary for viewing PDF files including the Pathology Reports and many of the downloadable files. |
Java Runtime Environment | Oracle Corporation | It can be downloaded from http://www.java.com/getjava/. | |
Office 365 ProPlus for Faculty | Microsoft | 2003BFFD8117EA68 | This is necessary for viewing the Pathology Reports and for viewing many of the downloadable files. |
Vectr Online | Vectr Labs Inc. | It can be freely used from https://vectr.com/new | This is necessary for visualizing and editing many of the downloadable files and pictures. |
Access restricted. Please log in or start a trial to view this content.
Zapytaj o uprawnienia na użycie tekstu lub obrazów z tego artykułu JoVE
Zapytaj o uprawnieniaThis article has been published
Video Coming Soon
Copyright © 2025 MyJoVE Corporation. Wszelkie prawa zastrzeżone