JoVE Logo

Zaloguj się

Aby wyświetlić tę treść, wymagana jest subskrypcja JoVE. Zaloguj się lub rozpocznij bezpłatny okres próbny.

W tym Artykule

  • Podsumowanie
  • Streszczenie
  • Wprowadzenie
  • Protokół
  • Wyniki
  • Dyskusje
  • Ujawnienia
  • Podziękowania
  • Materiały
  • Odniesienia
  • Przedruki i uprawnienia

Podsumowanie

Here, we present a protocol to explore the biomarker and survival predictor of breast cancer based on the comprehensive analysis of pooled clinical datasets derived from a variety of publicly accessible databases, using the strategy of expression, correlation and survival analysis step by step.

Streszczenie

In recent years, emerging databases were designed to lower the barriers for approaching the intricate cancer genomic datasets, thereby, facilitating investigators to analyze and interpret genes, samples and clinical data across different types of cancer. Herein, we describe a practical operation procedure, taking ID1 (Inhibitor of DNA binding proteins 1) as an example, to characterize the expression patterns of biomarker and survival predictors of breast cancer based on pooled clinical datasets derived from online accessible databases, including ONCOMINE, bcGenExMiner v4.0 (Breast cancer gene-expression miner v4.0), GOBO (Gene expression-based Outcome for Breast cancer Online), HPA (The human protein atlas), and Kaplan-Meier plotter. The analysis began with querying the expression pattern of the gene of interest (e.g., ID1) in cancerous samples vs. normal samples. Then, the correlation analysis between ID1 and clinicopathological characteristics in breast cancer was performed. Next, the expression profiles of ID1 was stratified according to different subgroups. Finally, the association between ID1 expression and survival outcome was analyzed. The operation procedure simplifies the concept to integrate multidimensional data types at the gene level from different databases and test hypotheses regarding recurrence and genomic context of gene alteration events in breast cancer. This method can improve the credibility and representativeness of the conclusions, thereby, present informative perspective on a gene of interest.

Wprowadzenie

Breast cancer is a heterogeneous disease with diverse prognosis and treatment strategies in different molecular subtypes, in which the pathogenesis and development are probably associated with disparate molecular mechanisms1,2,3. However, identifying a therapeutic target usually takes years, or even decades, from initial discovery in basic research to clinical use4. Genome wide application of high-throughput sequencing technology for cancer genome has greatly advanced the process of searching for valuable biomarkers or therapeutic targets 5.

The overwhelming amount of cancer genomics data generated from the large-scale cancer genomics platforms, such as the ICGC (International Cancer Genome Consortium) and TCGA (The Cancer Genome Atlas), is posing a great challenge for researchers to perform data exploration, integration, and analytics, particularly for users lacking intensive training in informatics and computation6,7,8,9,10. In recent years, emerging databases, (e.g., ONCOMINE, bcGenExMiner v4.0, and Kaplan-Meier plotter, etc.) were designed and developed to lower the bar for approaching the intricate cancer genomic datasets, thereby, facilitating investigators to analyze and interpret the genes, samples and clinical data across various types of cancer11. The goal of this protocol is to describe a research strategy that integrated with multiple levels of gene information from a series of open access databases, which have been widely recognized by a great number of researchers, to identify the potential biomarkers and prognostic factors for breast cancer.

The ONCOMINE database is a web-based data-mining platform with cancer microarray information and is designed to facilitate discovery of novel biomarkers and therapeutic targets11. Currently, there are more than 48 million gene expression measurements from 65 gene expression datasets in this database11,12. The bcGenExMiner v4.0 (a free tool for non-profit institution), also called breast cancer Gene-Expression Miner, is a user-friendly web-based application comprising DNA microarrays results of 3,414 recovered breast cancer patients and 1,209 experienced a pejorative event13. It is designed to improve gene prognostic analysis performance with R statistical software and packages.

The GOBO is a multifunctional user-friendly online tool with microarrays information (e.g., Affymetrix U133A) from a 51-sample breast cancer cell line set and an 1881-sample breast tumor data set, that allows a wide array of analyses14. There are a variety of applications available in the GOBO database, which include rapid analysis of gene expression profiles in different molecular subtypes of breast tumors and cell lines, screening for co-expressed genes for creation of potential metagenes, and correlation analysis between outcome and gene expression levels of single genes, sets of genes, or gene signatures in breast cancer data set15.

The Human Protein Atlas is an open-access program designed for scientists to explore human proteome, which has already contributed to a large number of publications in the field of human biology and disease. The Human Protein Atlas is recognized as a European core resource for life science community16,17.

The Kaplan Meier plotter is an online tool integrating gene expression and clinical data simultaneously that allows assessment of the prognostic effect of 54,675 genes based on 10,461 cancer samples, which include 1,065 gastric, 2,437 lung, 1,816 ovarian and 5,143 breast cancer patients with a mean follow-up of 33/49/40/69 months18. Information of gene expression, relapse-free survival (RFS) and overall survival (OS) are downloadable from this database19,20.

Here, we describe a practical operation procedure of using multiple publicly accessible databases to compare, analyze and visualize patterns of alterations in the expression of the gene of interest across multiple cancer studies, with the goal of summarizing the expression profiles, prognostic values and potential biological functions in breast cancer. For example, recent studies have indicated the oncogenic properties of ID proteins in tumors and were associated with malignant features, including cellular transformation, immortalization, enhanced proliferation and metastasis21,22,23. However, each member of the ID family plays distinct roles in different types of solid tumors, and their role in breast cancer remains unclear24. In previous studies, explored through this method, we found that ID1 was a meaningful prognostic indicator in breast cancer25. Therefore, the protocol will take ID1 as an example to introduce the data mining methods.

The analysis starts from querying the expression pattern of the gene of interest in cancerous samples vs. normal samples in ONCOMINE. Then, the expression correlation of genes of interest in breast cancer was performed using the bc-GenExMiner v4.0, GOBO, and ONCOMINE. Next, the expression profiles of ID1 was stratified according to different subgroups using the above three databases. Finally, the association between ID1 expression and survival out was analyzed using bc-GenExMiner v4.0, the human protein atlas, and Kaplan-Meier plotter. The operation procedure was shown as the flowchart in Figure 1.

Access restricted. Please log in or start a trial to view this content.

Protokół

1. Expression Pattern Analysis

  1. Go to the ONCOMINE web interface26.
  2. Obtain the relative expression levels of gene ID1 in various types of malignancies by typing ID1 to the Search Box.
  3. Select Analysis Type from the Primary Filters menu. Then, select Cancer vs. Normal Analysis, Breast Cancer vs. Normal Analysis.
  4. Select Gene Summary View from the OTHER VIEWS menu. Set the threshold of P-value at 0.01. Download the figures.
    NOTE: The threshold of fold change is 2, as described in the previous study27.

2. Expression Correlation Analysis

  1. Go to the bc-GenExMiner v4.0 web interface28.
  2. Select CORRELATION from the ANALYSIS menu, press the EXHAUSTIVE button. Type ID1 to the search box. Press the Submit button and the Start analysis button.
    NOTE: Default setting show expression correlation analysis of all patients, which can be more accurate in different subtypes of breast cancer by pressing the Molecule subtype filter.

3. Subgroup Analysis

  1. Subgroup analysis in bc-GenExMiner v4.0
    1. Go to the bc-GenExMiner v4.0 web interface28.
    2. Select EXPRESSION from the ANALYSIS menu, press the EXHAUSTIVE button. Type ID1 to the search box and press the Submit button and the Start analysis button.
    3. Click the Nodal status (LN) and Scarff Bloom & Richardson grade status (SBR) thumbnails to view full images. In the SBR images, press the button below to visualize the P-values of the figures. Download the figures.
  2. Subgroup analysis in Gene expression-based Outcome for Breast Cancer Online (GOBO)
    1. Go to the GOBO web interface14.
    2. Type Gene symbol of interest ID1 to the screen upload the gene set.
    3. Set the search range of Define gene/probe identifiers to Gene Symbol. Set All in Tumor selection. Select Node status and Grade stratified in the Multivariate parameters. Other items remain default. Submit the inquiry and download the figures.

4. Survival Analysis

  1. Survival analysis in bc-GenExMiner v4.0
    1. Go to the bc-GenExMiner v4.0 web interface28.
    2. Select PROGNOSTIC from the ANALYSIS menu, press the EXHAUSTIVE button. Type ID1 to the search box and press the Submit button and the Start analysis button.
    3. In the Exhaustive prognostic analysis, select Nm, ERm, MR in the Population and event criteria and press the Submit button to obtain more information. Press the Kaplan-Meier curve thumbnails to export the full graphs.
      NOTE: N (+, -, m): nodal status (+: positive, -: negative, m: mixed); ER (+, -, m): oestrogen receptor status (+: positive, -: negative, m: mixed); MR: metastatic relapse
  2. Survival analysis in The Human Protein Atlas (HPA)
    1. Go to the Human Protein Atlas web interface29.
    2. Type ID1 to the search box and click the Search button. Select Pathology sub-atlas.
      NOTE: The mRNA expression levels across the 17 cancer types are shown in the RNA Expression overview section. Every cancer tissue label of the box plot is clickable to access a detailed page providing survival analysis data and RNA expression levels.
    3. Click the label of Breast Cancer, then the detailed page to show interactive survival scatter plot and survival analysis. Download the figures.
  3. Survival analysis in The Kaplan-Meier Plotter Survival
    1. Go to the Kaplan-Meier Plotter web interface30. Click Start KM plotter for breast cancer in the mRNA gene chip zone.
    2. Type ID1 to the search bar and select the green item in the candidate menu.
    3. Select RFS as survival type and Other items remain default. Click Draw Kaplan-Meier plot and download the figures.
      NOTE: Settings of the survival types, cutoff types, and follow-up threshold, as well as probe set options, can be changed as required. Subgroup prognostic analysis including ER, PR, HER-2, lymph nodes, grade, Tp53 status, and molecular subtypes can be obtained via changing the setting in the Restrict analysis to subtypes box1. Likewise, the filter limitation of treatment could be set in Restrict analysis to selected cohorts’ box.

Access restricted. Please log in or start a trial to view this content.

Wyniki

A representative result of data mining and integrative analysis of breast cancer biomarker was performed using ID1, one of the inhibitors of DNA-binding family members, which have been reported in the previous study 25.

As demonstrated in Figure 2, the differences of ID1 mRNA expression between tumor and normal tissues in multiple types of cancer were analyzed using the ONCOM...

Access restricted. Please log in or start a trial to view this content.

Dyskusje

Comprehensive analysis of public databases may indicate the underlying function of the gene of interest and reveal the potential link between this gene and clinicopathological parameters in specific cancer27,31. The exploration and analysis based on one single database might provide limited or isolated perspectives due to the potential selection bias, or in a certain extent, possibly due to the variety of data quality, including data collection and the analytical...

Access restricted. Please log in or start a trial to view this content.

Ujawnienia

The authors have nothing to disclose

Podziękowania

This work was partly supported by the Natural Science Foundation of Guangdong Province, China (No. 2018A030313562), the Teaching Reform Project of Guangdong Clinical Teaching Base (NO.  2016JDB092), National Natural Science Foundation of China (81600358), and Youth Innovative Talent Project of Colleges and Universities in Guangdong Province, China (NO. 2017KQNCX073)

Access restricted. Please log in or start a trial to view this content.

Materiały

NameCompanyCatalog NumberComments
A personal computer or computing device with an Internet browser with Javascript
enabled
Microsoft051690762553We support and test the following browsers: Google Chrome, Firefox 3.0 and above, Safari, and Internet Explorer 9.0 and above
Adobe Flash playerAdobe Systems Inc.It can be freely downloaded from http://get.adobe.com/flashplayer/.This browser plug-in is required for visualizing networks on the network
analysis tab.
Chrome BroswerGoogle Inc.It can be freely downloaded from https://www.google.cn/chrome/This is necessary for viewing PDF files including the Pathology Reports and many of
the downloadable files.
Java Runtime EnvironmentOracle CorporationIt can be downloaded from http://www.java.com/getjava/.
Office 365 ProPlus for FacultyMicrosoft2003BFFD8117EA68This is necessary for viewing the Pathology Reports and for viewing many of
the downloadable files.
Vectr OnlineVectr Labs Inc.It can be freely used from https://vectr.com/newThis is necessary for visualizing and editing many of
the downloadable files and pictures.

Odniesienia

  1. van 't Veer, L. J., et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature. 415 (6871), 530-536 (2002).
  2. Loi, S., et al. Definition of clinically distinct molecular subtypes in estrogen receptor-positive breast carcinomas through genomic grade. Journal of Clinical Oncology. 25 (10), 1239-1246 (2007).
  3. Cancer Genome Atlas, N. Comprehensive molecular portraits of human breast tumours. Nature. 490 (7418), 61-70 (2012).
  4. Emerson, J. W., Dolled-Filhart, M., Harris, L., Rimm, D. L., Tuck, D. P. Quantitative assessment of tissue biomarkers and construction of a model to predict outcome in breast cancer using multiple imputation. Cancer Informatics. 7, 29-40 (2009).
  5. Yu, H., et al. Integrative genomic and transcriptomic analysis for pinpointing recurrent alterations of plant homeodomain genes and their clinical significance in breast cancer. Oncotarget. 8 (8), 13099-13115 (2017).
  6. He, W., et al. TCGA datasetbased construction and integrated analysis of aberrantly expressed long noncoding RNA mediated competing endogenous RNA network in gastric cancer. Oncology Reports. , (2018).
  7. Liu, J., et al. An Integrated TCGA Pan-Cancer Clinical Data Resource to Drive High-Quality Survival Outcome Analytics. Cell. 173 (2), e411 400-416 (2018).
  8. Esgueva, R., et al. Next-generation prostate cancer biobanking: toward a processing protocol amenable for the International Cancer Genome Consortium. Diagnostic Molecular Pathology. 21 (2), 61-68 (2012).
  9. Joly, Y., Dove, E. S., Knoppers, B. M., Bobrow, M., Chalmers, D. Data sharing in the post-genomic world: the experience of the International Cancer Genome Consortium (ICGC) Data Access Compliance Office (DACO). PLoS Computational Biology. 8 (7), e1002549(2012).
  10. Zhang, J., et al. International Cancer Genome Consortium Data Portal--a one-stop shop for cancer genomics data. Database (Oxford). 2011, bar026 (2011).
  11. Rhodes, D. R., et al. ONCOMINE: a cancer microarray database and integrated data-mining platform. Neoplasia. 6 (1), 1-6 (2004).
  12. Rhodes, D. R., et al. Oncomine 3.0: genes, pathways, and networks in a collection of 18,000 cancer gene expression profiles. Neoplasia. 9 (2), 166-180 (2007).
  13. Jezequel, P., et al. bc-GenExMiner: an easy-to-use online platform for gene prognostic analyses in breast cancer. Breast Cancer Research and Treatment. 131 (3), 765-775 (2012).
  14. , Available from: http://co.bmc.lu.se/gobo/gsa.plb (2018).
  15. Ringner, M., Fredlund, E., Hakkinen, J., Borg, A., Staaf, J. GOBO: gene expression-based outcome for breast cancer online. PLoS One. 6 (3), e17911(2011).
  16. Ponten, F., Jirstrom, K., Uhlen, M. The Human Protein Atlas--a tool for pathology. Journal of Pathology. 216 (4), 387-393 (2008).
  17. Ponten, F., Schwenk, J. M., Asplund, A., Edqvist, P. H. The Human Protein Atlas as a proteomic resource for biomarker discovery. Journal of Internal Medicine. 270 (5), 428-446 (2011).
  18. Gyorffy, B., et al. An online survival analysis tool to rapidly assess the effect of 22,277 genes on breast cancer prognosis using microarray data of 1,809 patients. Breast Cancer Research and Treatment. 123 (3), 725-731 (2010).
  19. Stevinson, C., Lawlor, D. A. Searching multiple databases for systematic reviews: added value or diminishing returns? Complementary Therapies in Medicine. 12 (4), 228-232 (2004).
  20. Yin, J., et al. Integrating multiple genome annotation databases improves the interpretation of microarray gene expression data. BMC Genomics. 11, 50(2010).
  21. Patel, D., Morton, D. J., Carey, J., Havrda, M. C., Chaudhary, J. Inhibitor of differentiation 4 (ID4): From development to cancer. Biochimica et Biophysica Acta. 1855 (1), 92-103 (2015).
  22. Kamalian, L., et al. Increased expression of Id family proteins in small cell lung cancer and its prognostic significance. Clinical Cancer Research. 14 (8), 2318-2325 (2008).
  23. Cruz-Rodriguez, N., et al. High expression of ID family and IGJ genes signature as predictor of low induction treatment response and worst survival in adult Hispanic patients with B-acute lymphoblastic leukemia. Journal of Experimental and Clinical Cancer Research. 35, 64(2016).
  24. Yang, H. Y., et al. Expression and prognostic value of Id protein family in human breast carcinoma. Oncology Reports. 23 (2), 321-328 (2010).
  25. Zhou, X. L., et al. Prognostic values of the inhibitor of DNAbinding family members in breast cancer. Oncology Reports. 40 (4), 1897-1906 (2018).
  26. , Available from: https://www.oncomine.org (2018).
  27. Lin, H. Y., Zeng, L., iang, Y. K., Wei, X. L., Chen, C. F. GATA3 and TRPS1 are distinct biomarkers and prognostic factors in breast cancer: database mining for GATA family members in malignancies. Oncotarget. 8 (21), 34750-34761 (2017).
  28. , Available from: http://bcgenex.centregauducheau.fr/BCGEM/GEM-requete.php (2018).
  29. , Available from: https://www.proteinatlas.org (2018).
  30. , Available from: http://kmplot.com/analysis (2018).
  31. Zhu, Y. F., Dong, M. Expression of TUSC3 and its prognostic significance in colorectal cancer. Pathology-Research and Practice. 214 (9), 1497-1503 (2018).
  32. Nelson, J. C., et al. Validation sampling can reduce bias in health care database studies: an illustration using influenza vaccination effectiveness. Journal of Clinical Epidemiology. 66 (8 Suppl), S110-S121 (2013).
  33. Haibe-Kains, B., Desmedt, C., Sotiriou, C., Bontempi, G. A comparative study of survival models for breast cancer prognostication based on microarray data: does a single gene beat them all? Bioinformatics. 24 (19), 2200-2208 (2008).
  34. Yang, C., et al. Understanding genetic toxicity through data mining: the process of building knowledge by integrating multiple genetic toxicity databases. Toxicology Mechanisms and Methods. 18 (2-3), 277-295 (2008).
  35. Cannata, N., Merelli, E., Altman, R. B. Time to organize the bioinformatics resourceome. PLoS Computational Biology. 1 (7), e76(2005).
  36. Wren, J. D., Bateman, A. Databases, data tombs and dust in the wind. Bioinformatics. 24 (19), 2127-2128 (2008).

Access restricted. Please log in or start a trial to view this content.

Przedruki i uprawnienia

Zapytaj o uprawnienia na użycie tekstu lub obrazów z tego artykułu JoVE

Zapytaj o uprawnienia

Przeglądaj więcej artyków

Data MiningIntegrative AnalysisBiomarkerBreast CancerPublic DatabasesSurvival PredictorONCOMINEBC Gene Expression MinerGene ExpressionMolecular SubtypesCorrelation AnalysisScarff Bloom Richardson GradeGOBO Web Interface

This article has been published

Video Coming Soon

JoVE Logo

Prywatność

Warunki Korzystania

Zasady

Badania

Edukacja

O JoVE

Copyright © 2025 MyJoVE Corporation. Wszelkie prawa zastrzeżone