Establishing a Competing Risk Regression Nomogram Model for Survival Data

Lunpo Wu; Chenyang Ge; Hongjuan Zheng; Haiping Lin; Wei Fu; Jianfei Fu

doi:10.3791/60684

A subscription to JoVE is required to view this content. Sign in or start your free trial.

Summary

Presented here is a protocol to build nomograms based on the Cox proportional hazards regression model and competing risk regression model. The competing method is a more rational method to apply when competing events are present in the survival analysis.

Abstract

The Kaplan–Meier method and Cox proportional hazards regression model are the most common analyses in the survival framework. These are relatively easy to apply and interpret and can be depicted visually. However, when competing events (e.g., cardiovascular and cerebrovascular accidents, treatment-related deaths, traffic accidents) are present, the standard survival methods should be applied with caution, and real-world data cannot be correctly interpreted. It may be desirable to distinguish different kinds of events that may lead to the failure and treat them differently in the analysis. Here, the methods focus on using the competing regression model to identify significant prognostic factors or risk factors when competing events are present. Additionally, nomograms based on a proportional hazard regression model and a competing regression model are established to help clinicians make individual assessments and risk stratifications in order to explain the impact of controversial factors on prognosis.

Introduction

The time to event survival analysis is quite common in clinical studies. Survival data measure the time span from the start time until the occurrence of the event of interest, but the occurrence of the event of interest is often precluded by another event. If more than one type of end point is present, they are called competing risks end points. In this case, the standard hazard analysis (i.e., Cox proportional cause-specific hazards model) often does not work well because individuals experiencing another type of event are censored. Individuals who experience a competing event often remain in the risk set, as the competing risks are usually not independent. Therefore, Fine and Gray¹ studied the regression model estimation for the sub distribution of a competing risk. In a competing risk setting, three different types of events can be discriminated.

One measures overall survival (OS) by demonstrating a direct clinical benefit from new treatment methods for a disease. OS measures the survival time from time of origin (i.e., time of diagnosis or treatment) to the time of death due to any cause and generally evaluates the absolute risk of death, thereby failing to differentiate the causes of death (e.g., cancer-specific death (CSD) or non-cancer-specific death (non-CSD))². OS is, therefore, considered as the most important endpoint. The events of interest are often cancer related, while the non-cancer-specific events, which include heart disease, traffic accidents or other unrelated causes, are considered competing events. Malignant patients with a favorable prognosis, who are expected to survive longer, are often at a greater risk of non-CSD. That is, the OS will be diluted by other causes of death and fail to correctly interpret the real effectiveness of clinical treatment. Therefore, OS may not be the optimal measure for accessing the outcomes of disease³. Such biases could be corrected by the competing risk regression model.

There are two main methods for competing risk data: cause-specific hazard models (Cox models) and subdistribution hazard models (competing models). In the following protocol, we present two methods to generate nomograms based on the cause-specific hazard model and the subdistribution hazard model. The cause-specific hazard model can be made to fit in the Cox proportional hazards model, which treats subjects who experience the competing event as censored at the time that the competing event occurred. In the subdistribution hazard model that was introduced by Fine and Gray¹ in 1999, three different types of events can be discriminated, and individuals who experience a competing event remain at the risk set forever.

A nomogram is a mathematical representation of the relationship between three or more variables⁴. Medical nomograms consider biological and clinical event as variables (e.g., tumor grade and patient age) and generate probabilities of a clinical event (e.g., cancer recurrence or death) that is graphically depicted as a statistical prognostic model for a given individual. Generally, a nomogram is formulated based on the results of the Cox proportional hazards model⁵^,⁶^,⁷^,⁸^,⁹^,¹⁰.

However, when competing risks are present, a nomogram based on the Cox model might fail to perform well. Though several previous studies¹¹^,¹²^,¹³^,¹⁴ have applied the competing risk nomogram to estimate the probability of CSD, few studies have described how to establish the nomogram based on a competing risk regression model, and there is no existing package available to accomplish this. Therefore, the method presented below will provide a step-by-step protocol to establish a specific competing-risk nomogram based on a competing risk regression model as well as a risk score estimation to aid clinicians in treatment decision-making.

Access restricted. Please log in or start a trial to view this content.

Protocol

The research protocol was approved by the Ethics Committee of Jinhua Hospital, Zhejiang University School of Medicine. For this experiment, the cases were obtained from the Surveillance, Epidemiology, and End Results (SEER) database. SEER is an open-access database that includes demographic, incidence and survival data from 18 population-based cancer registries. We registered on the SEER website and signed a letter of assurance to acquire the research data (12296-Nov2018).

1. Data source

Obtain cases from the databases as well as permission (if any) to use the cases from the registries.
NOTE: The cohort data are uploaded in Supplementary File 1. Readers who already have survival data with competing risks can skip this section.

2. Installing and loading packages and importing data

NOTE: Perform the following procedures based on R software (version 3.5.3) using the packages rms¹⁵ and cmprsk¹⁶ (http://www.r-project.org/).

Install rms and cmprsk R packages.
>install.packages("rms")
>install.packages("cmprsk")
Load the R packages.
>library("rms")
>library("cmprsk")
Import the cohort data.
>Dataset<-read.csv("…/Cohort Data.csv") # cohort data is the example

3. Nomogram based on the Cox Proportional Hazards Regression model

Establish the Cox Proportional Hazards Regression model.
NOTE: The independent variables (X) include categorical variables (dummy variables, such as race) and continuous variables (such as age). The factors significant in the univariable analysis will be selected for the use in multivariable analysis.
1. Fit the Cox proportional hazards model to the data. Establish the Cox proportional hard regression model using the function cph. The simplified format in R is shown below:
  > f0 <- cph(Surv(Survivalmonths, status) ~ factor1+ factor2+…,
  x=T, y=T, surv=T, data=Dataset)
  NOTE: Death was set as the status in the example code.
Develop a Cox Regression Nomogram using the commands detailed below.
> nom <- nomogram(f0, fun=list(function(x) surv(24, x)…), funlabel=c(“2-year predicted survival rate”…), maxscale=100, fun.at)
> plot (nom)
NOTE: Take the 2-year predicted survival rate as an example.

4. Nomogram based on the Competing Risk Regression Model

Establish the Competing Risk Regression Model.
1. Fit the competing risk regression model. Readers could include the factors that they consider important, this step could be skipped. In the example, the factors significant in the univariable analysis are included.
  NOTE: The censoring variable is coded as 1 for the event of interest and as 2 for the competing risk event. To facilitate the analysis, Scrucca et al.¹⁷ provide an R function factor2ind(), which creates a matrix of indicator variables from a factor.
2. For categorical variables, carefully code them numerically when including them in the competing model. That is, for a categorical variable made of J levels, create J-1 dummy variables or indicator variables.
3. To establish a competing risk regression model, first place prognostic variables into a matrix. Use the function cbind() to concatenate the variables by columns and fit them into the competing regression model.
  >x <-cbind(factor2ind(factor1, "1"), factor2ind(factor2, "1")…)
  > mod<- crr (Survivalmonths, fstatus, failcode=1 or 2, cov1=x)
Plot the competing nomogram
NOTE: The beta value (β value) is the regression coefficient of a variate (X) in the formula of the Cox proportional hazards regression. The X.score (comprehensive effect of the dependent variable) and X.real (at special timepoints, for example, 60 months, to predict the cumulative incidence function) are calculated from the Cox regression model and then, a nomogram is established.
1. Use the function nomogram to construct Cox nom (as listed in step 3.2).
2. Replace X.beta and X.point as well as total.points, X.real, and X.score of the competing risk regression model.
  1. Get the baseline cif, that is cif(min). See Supplementary file 2 for details.
    > x0=x
    > x0 <- as.matrix(x0)
    > lhat <- matrix(0, nrow = length(mod$uftime), ncol = nrow(x0))
    > for (j in 1:nrow(x0)) lhat[, j] <- cumsum(exp(sum(x0[j, ] * mod$coef)) * mod$bfitj)
    > lhat <- cbind(mod$uftime, 1 - exp(-lhat))
    > suv<-as.data.frame(lhat)
    > colnames(suv)<- c("time")
    > line24<-which(suv$time=="24")
    > cif.min24<-suv[line24,which.min(suv[line24,])]
  2. Replace the X.beta and X.point.
    > lmaxbeta<-which.max(abs(mod$coef))
    > maxbeta<-abs(mod$coef[lmaxbeta])
    > race0<-0
    > names(race0)<-"race:1"
    > race.beta<-c(race0,mod$coef[c("race:2","race:3")])
    > race.beta.min<-race.beta[which.min(race.beta)]
    > race.beta1<-race.beta-race.beta.min
    > race.scale<-(race.beta1/maxbeta*100) # how the scale is calculated
    > nom$Race$Xbeta<-race.beta1
    > nom$Race$points<-race.scale
    NOTE: Take race as an example.
  3. Replace the total X.point and X.real.
    > nom$total.points$x<-c(0,50,100, …)
    > real.2y<-c(0.01,0.1,0.2,…)
    NOTE: Replacements are according to the minimax value.
  4. Calculate the X.score and plot the nomogram.
    > score.2y<-log(log((1-real.2y),(1-cif.min24)))/(maxbeta/100)
    > nom$`2-year survival`$x<-score.2y
    > nom$`2-year survival`$x.real<-real.2y
    > nom$`2-year survival`$fat<-as.character(real.2y)
    > plot(nom)
    NOTE: X.score=log(log((1-X.real),(1-cif0)))/(maxbeta/100). The equations for the X.score and X.real relationship can be calculated according to the intrinsic attribution of the competing model(crr). Cif0 means baseline cif, which will be calculated by the predict.crr function.

5. Subgroup analysis based on the Group Risk Score (GRS)

Calculate the risk score (RS)
NOTE: Calculate the risk score for each patient by totalling the points of every variable. Cut-off values are used to classify the cohort. Taking 3 subgroups as an example, use the package meta to draw a forest plot.
1. Install and load the R packages
  > install.packages("meta")
  > library("meta")
2. Obtain the GRS and divide the cohort into 3 subgroups.
  > d1<-Dataset
  > d1$X<-nom$X$points
  > #For example, d1$race[d1$race==1]<-nom$race$point[1]
  > d1$RS<-d1$race + d1$marry + d1$histology + d1$grademodify + d1$Tclassification + d1$Nclassification
  > d1$GRS<- cut(d1$RS, quantile(d1$RS, seq(0, 1,1/3)), include.lowest = TRUE, labels = 1:3)
3. Draw the forest plot. Get the HR, LCI and UCI via the function crr.
  > subgroup<-crr(ftime, fstatus, cov1, failcode=1)
  > HR<- summary(subgroup)$conf.int[1]
  > LCI<- summary(subgroup)$conf.int[3]
  > UCI<- summary(subgroup)$conf.int[4]
  > LABxx<-c("Low Risk", "Median Risk", "High Risk")
  > xx<-metagen(log(HR), lower = log(LCI), upper = log(UCI), studlab = LABxx, sm = "HR")
  > forest(xx, col.square = "black", hetstat =TRUE, leftcols = "studlab")

Access restricted. Please log in or start a trial to view this content.

Results

Survival characteristics of the example cohort
In the example cohort, a total of 8,550 eligible patients were included in the analysis and the median follow-up time was 88 months (range, 1 to 95 months). A total of 679 (7.94%) patients were younger than 40 years old and 7,871 (92.06%) patients were older than 40. At the end of the trial, 7,483 (87.52%) patients were still alive, 662 (7.74%) died because of breast cancer, and 405 (4.74%) patients died because of other causes (competing risks).

<...

Access restricted. Please log in or start a trial to view this content.

Discussion

The overall goal of the current study was to establish a specific competing-risk nomogram that could describe real-world diseases and to develop a convenient individual assessment model for clinicians to approach treatment decisions. Here, we provide a step-by-step tutorial for establishing nomograms based on the Cox regression model and competing risk regression model and further performing subgroup analysis. Zhang et al.¹⁸ introduced an approach to create a competing-risk nomogram, but the main ...

Access restricted. Please log in or start a trial to view this content.

Disclosures

None

Acknowledgements

The study was supported by grants from the general program of Zhejiang Province Natural Science Foundation (grant number LY19H160020) and key program of the Jinhua Municipal Science & Technology Bureau (grant number 2016-3-005, 2018-3-001d and 2019-3-013).

Access restricted. Please log in or start a trial to view this content.

Materials

Name	Company	Catalog Number	Comments
no	no	no

References

Fine, J. P., Gray, R. J. A proportional hazards model for the subdistribution of a competing risk. Journal of the American Statistical Association. 94 (446), 496-509 (1999).
Fu, J., et al. Real-world impact of non-breast cancer-specific death on overall survival in resectable breast cancer. Cancer. 123 (13), 2432-2443 (2017).
Kim, H. T. Cumulative incidence in competing risks data and competing risks regression analysis. Clinical Cancer Research. 13, 2 Pt 1 559-565 (2007).
Balachandran, V. P., Gonen, M., Smith, J. J., DeMatteo, R. P. Nomograms in oncology: more than meets the eye. Lancet Oncology. 16 (4), 173-180 (2015).
Han, D. S., et al. Nomogram predicting long-term survival after d2 gastrectomy for gastric cancer. Journal of Clinical Oncology. 30 (31), 3834-3840 (2012).
Karakiewicz, P. I., et al. Multi-institutional validation of a new renal cancer-specific survival nomogram. Journal of Clinical Oncology. 25 (11), 1316-1322 (2007).
Liang, W., et al. Development and validation of a nomogram for predicting survival in patients with resected non-small-cell lung cancer. Journal of Clinical Oncology. 33 (8), 861-869 (2015).
Valentini, V., et al. Nomograms for predicting local recurrence, distant metastases, and overall survival for patients with locally advanced rectal cancer on the basis of European randomized clinical trials. Journal of Clinical Oncology. 29 (23), 3163-3172 (2011).
Iasonos, A., Schrag, D., Raj, G. V., Panageas, K. S. How to build and interpret a nomogram for cancer prognosis. Journal of Clinical Oncology. 26 (8), 1364-1370 (2008).
Chisholm, J. C., et al. Prognostic factors after relapse in nonmetastatic rhabdomyosarcoma: a nomogram to better define patients who can be salvaged with further therapy. Journal of Clinical Oncology. 29 (10), 1319-1325 (2011).
Brockman, J. A., et al. Nomogram Predicting Prostate Cancer-specific Mortality for Men with Biochemical Recurrence After Radical Prostatectomy. European Urology. 67 (6), 1160-1167 (2015).
Zhou, H., et al. Nomogram to Predict Cause-Specific Mortality in Patients With Surgically Resected Stage I Non-Small-Cell Lung Cancer: A Competing Risk Analysis. Clinical Lung Cancer. 19 (2), 195-203 (2018).
Fu, J., et al. De-escalating chemotherapy for stage II colon cancer. Therapeutic Advances in Gastroenterology. 12, 1756284819867553(2019).
Chen, D., Li, J., Chong, J. K. Hazards regression for freemium products and services: a competing risks approach. Journal of Statistical Computation and Simulation. 87 (9), 1863-1876 (2017).
Frank, E., H, J. rms: Regression Modeling Strategies. R package version 5.1-2. , Available from: https://CRAN.R-project.org/package=rms (2018).
Gray, B. cmprsk: Subdistribution Analysis of Competing Risks. R package version 2.2-7. , Available from: https://CRAN.R-project.org/package=cmprsk (2014).
Scrucca, L., Santucci, A., Aversa, F. Regression modeling of competing risk using R: an in depth guide for clinicians. Bone Marrow Transplantation. 45 (9), 1388-1395 (2010).
Zhang, Z., Geskus, R. B., Kattan, M. W., Zhang, H., Liu, T. Nomogram for survival analysis in the presence of competing risks. Annals in Translational Medicine. 5 (20), 403(2017).
Geskus, R. B. Cause-specific cumulative incidence estimation and the fine and gray model under both left truncation and right censoring. Biometrics. 67 (1), 39-49 (2011).
Fu, J., et al. Young-onset breast cancer: a poor prognosis only exists in low-risk patients. Journal of Cancer. 10 (14), 3124-3132 (2019).
de Glas, N. A., et al. Performing Survival Analyses in the Presence of Competing Risks: A Clinical Example in Older Breast Cancer Patients. Journal of the National Cancer Institute. 108 (5), (2016).

Access restricted. Please log in or start a trial to view this content.

Reprints and Permissions

Request permission to reuse the text or figures of this JoVE article

Request Permission

Explore More Articles

Competing Risk Regression Survival Data Nomogram Model Cox Proportional Hazards Risk Score Cumulative Incidence Function CIF Hazard Ratio Prognostic Variables R Packages Meta Package Forest Plot Kaplan Meier Method Competing Events Analysis Cohort

This article has been published

Video Coming Soon

Keep me updated:

Establishing a Competing Risk Regression Nomogram Model for Survival Data

In This Article

Summary

Abstract

Introduction

Protocol

Results

Discussion

Disclosures

Acknowledgements

Materials

References

Reprints and Permissions

Explore More Articles