Forecasting Hepatocellular Carcinoma Mortality using a Weighted Regression Model to Estimate Cohort Effects in Taiwan

I-Shiang Tzeng; Chan-Yen Kuo; Chia-Chi Wang

doi:10.3791/62253

A subscription to JoVE is required to view this content. Sign in or start your free trial.

Summary

We depict a multistage method to measure a cohort effect with age data, thereby allowing data to be eliminated in many situations without sacrificing data quality. The protocol demonstrates the strategy and provides a weighted regression model for analyzing the hepatocellular carcinoma data.

Abstract

To eliminate the influence of age and period in age cycle contingency table data, a multistage method was adopted to evaluate the cohort effect. The most general primary malignant tumor of the liver is hepatocellular carcinoma (HCC). HCC is associated with liver cirrhosis with alcohol and viral etiologies. In epidemiology, long-term trends in HCC mortality were delineated (or forecasted) by using an age-period-cohort (APC) model. The HCC deaths were determined for each cohort with its weighted influence. The confidence interval (CI) of the weighted mean is fairly narrow (compared to the equally weighted estimates). Due to the fairly narrow CI with less uncertainty, the weighted mean estimation was used as a means for forecasting. With the multistage method, it is recommended to use weighted mean estimation based on a regression model to evaluate the cohort effect in the age-period contingency table data.

Introduction

The most common primary malignant tumor of the liver is hepatocellular carcinoma (HCC). Its mortality rate ranks fifth in men and eighth in women (6% of men and 3% of women) ¹ among all malignant tumors worldwide. In Taiwan, it is the most common cancer in men and the second most common cancer in women (21.8% of men and 14.2% of women) ². It is estimated that since 2000, the annual number of HCCs diagnosed worldwide is 564,000, among which 398,000 are men and 166,000 are women ³. In epidemiology, the most common way to explain the relationship between age, period, and cohort (APC) variables is that age and period influence each other to create a unique generational experience for the disease trend investigated.

Even though this conceptualization still has a precise linear connection of age + cohort = period, exposure (predictor) is not an inherent factor in a birth cohort. Instead, we propose that when changes cause different distributions of disease, there is a cohort effect. Nevertheless, since age + cohort = period, these three variables are linearly related; only if other restrictions are enforced is it impossible to generate an estimated age-period-cohort (APC) model using the linear effects of age, period, and cohort. In this study, we clarified this problem and the potential restrictions we imposed in our previous publications ⁴^,⁵^,⁶^,⁷.

With the slightest conjectures about the contingency table data, the multistage method ⁸ provides three stages to evaluate the cohort effect. In addition, since median polish does not depend on a specific distribution or framework, it was used for various types of data, such as ratios, logarithmic ratios, and counts. Median polishing is the prime technique used in the multiphase method.

Data from a two-way contingency table ⁹ were used to generate the development of the polished median. The median polishing procedure is used to eliminate the cumulative effects of age (i.e., row) and period (i.e., column) by iteratively subtracting the median from each row and each column. This procedure is often used in epidemiological data analysis ¹⁰. One advantage of this technique is that no assumptions about the distribution or structure of the data in the bidirectional contingency table are required. Therefore, this technique was broadly utilized for any type of data contained in the table, such as suicide data ¹¹. The APC model has also been used to describe the long-term trends of disease incidence or mortality ⁵. APC models often assume that age, period, and cohort have additive effects on the logarithmic transformation of disease/mortality. To evaluate cohort effects, the described protocol generates an APC model for complete hepatocellular carcinoma (HCC) mortality analysis with weighted regression, thereby supporting reliable predictions and moderate assessment of treatment effects.

Protocol

1. Data sources

To demonstrate the calculations, we used annual data on HCC mortality from 1976 to 2015 for men and women in Taiwan. Statistical package for social sciences (SPSS) version 24.0 for Windows and Microsoft Excel were used to execute the protocols for this study.

Have the HCC physician classify the patients' clinical symptoms, laboratory tests and medical imaging results to give a diagnosis code according to the International Classification of Disease (ICD) Code, ICD 150.
Ensure that the data file (saved as a CSV) contains the year (i.e., period), age, cohort, death number, mid-year population number, and mortality as columns.
1. Click File | Import Data | CSV data | Open. Make sure there is a check in the box next to Read variable names from the first row of the data and click OK. Ensure that the data file is imported into SPSS.
Construct contingency table data crossed by age-period groups through SPSS. In general, we defined row variables as age and column variables as period. If the data featured single period year (or single age year) data, it was necessary to integrate them into a period group (or age group). Then, we cross-tabulated the attitudes to an age group across the survey years.
1. Click Analyze | Descriptive Statistics | Crosstabs and select the age variable in the box next to row(s) and period variable in the box next to column(s). Click Cells and make sure that there is a check in the box next to Observed. A contingency table of death number (or mid-year population number, or mortality) can be performed in SPSS by the above steps.
2. Export the contingency table data that were entered in CSV format for analysis through other software. Click File | Export Data | Ensure Desired Data Format as CSV | Location. This non editable field displays the safe location for the exported file.
3. File name: Click Select to change the file name.
4. Export as type: Select a CSV file type from the drop-down menu. Click variables to display the available variables and to select the variable tables. By default, all variables from the source data set are retained for the exported file. Researchers can use the tables to specify which source variables to include in the exported file. Click Export.

2. Model setting

NOTE: The multistage method was proposed by Keys and Li ⁸ with graphical investigation. A median polish analysis was performed to eliminate the cumulative effects of age and period; finally, these residuals from the median polish phase in the cohort category in the linear regression model were regressed, and cohort effects using data in the contingency table were evaluated.

Graphical representation as the first phase
1. Create a line plot of age groups and period groups. To inspect birth cohorts across age groups or birth cohorts, draw even birth cohorts across ages or cycles in the line charts.
2. Import a CSV file with contingency table mortality data. Click File | Open | Browse to select a CSV file from a folder. Remember to choose All Files in the drop-down list next to File name box.
3. Click Open to open the CSV file. Highlight the rows and columns of the mortality contingency data and click Insert | Charts | Line Graph.
Median polish analysis as the second phase
1. Iteratively subtract the median from each row and each column to eliminate the cumulative effect of age and period. After the median polishing phase, keep the residuals for the regression procedure to evaluate the cohort effects.
2. Compute the overall median and residual table. Import a CSV file with contingency table mortality data (refer to 2.1.1.2).
3. LN was used for each cell of the contingency table mortality data. Click Formulas | Mathematical & Trigonometry Function and select LN.
4. Number: Enter the location label for each cell. Ensure each cell of the contingency table mortality data took LN. Click Formulas | More Functions | Statistics and select MEDIAN.
5. Number1: Enter the first cell location label.
6. Number2: Enter the last cell location label. Ensure that the resulting median value is stored in the upper left-hand margin of the contingency table. Ensure a residual table is created by taking the difference between the original value (i.e., the LN mortality data) and the overall median.
7. Compute the row medians (i.e., the medians of each age group) and ensure it computed the row median values for the response age group. Click Formulas | More Functions | Statistics | Select MEDIAN.
  1. Number 1: Enter the first cell location label of the raw sample.
  2. Number 2: Enter the last cell location label of the raw material. Ensure the resulting row median values are stored in the left-hand margin of the contingency table.
8. Create a new residual table after subtracting from the row medians. Ensure a new set of residual values is created from the row medians where each cell takes on the value of the subtraction of the row median from each response variable in that row. Click = and ensure each row's overall cell location label has subtracted the label of the left-hand margin median.
9. Compute the column medians (i.e., the medians of each period group) and ensure it computes the column median values for the response period group. Click Formulas | More Functions | Statistics | Select MEDIAN.
  1. Number 1: Enter the first cell location label of the column. Number 2: Enter the last cell location label of the column. Ensure the resulting column median values are stored in the upper margin of the contingency table.
10. Create a new residual table after subtracting from the column medians. Ensure a new set of residual values is created from the column medians where each cell takes on the value of the subtraction of the column median from each response variable in that column. Click = and ensure each column overall cell location label has subtracted the label of the upper margin median.
11. Repeat Steps 2.1.2.7 to 2.1.2.10 until the row and column medians approximate zero. Click Formulas | More Functions | Statistics | Select MEDIAN. Make sure the row and column medians are approximately zero. Save the final residual table in CSV format.
Regression procedure with weight as the third phase
NOTE: We calculated the dependent variable as residuals for each cohort with death number as the weight. Next, we ran linear regressions to calculate the cohort effects.
1. Ensure Kutools for Excel was installed and use its Transpose Table Dimensions tool to quickly convert cross tables to flat lists. Import a CSV file with contingency table residual data (refer to 2.1.2.11).
2. Select the table to be converted to a list. Click the Kutools | Modify | Transpose Table Dimension. In the Transpose Table Dimensions dialog box, make sure there is a check in the box next to the Cross table to list, and select the Results range to store the residuals in list format.
3. Column insert in the initial data file (refer to 1.2) with the residual list format data (refer to 2.1.3.1). Ensure it inserted a supporting column in the residual list format data (refer to 2.1.3.1). Click = age & period variables and click Enter. Use a supporting column to look up the age and period group labels of the residual list format data to insert a response residual column in the initial data file (refer to 1.2).
4. Click Formulas | Lookup & Reference | Select VLOOKUP. Set VLOOKUP (cell location label of age & cell location label of the period, first cell location label of the supporting column: the last cell location label of the residual column, 4, 0). Make sure the select range includes supporting, age, period, and residual columns (i.e., the 4^th column as the residual list).
5. Ensure that the residuals are inserted in the initial data file (refer to 1.2) look up for the residual list format data (refer to 2.1.3.1) for the next step. Fit the regression model by unweighted least squares and analyze the residuals.
6. Click Analyze | Regression | Linear. Transfer the independent variable, cohort category (i.e., 17 birth cohorts), into the Independent(s), box and the dependent variable, Residuals, into the Dependent: box. Click OK. Ensure generate the results of the unweighted cohort effects.
7. Ensure that the residuals are inserted in the initial Excel data file (refer to 1.2) look up for the residual list format data (refer to 2.1.3.1) for the next step. Fit the regression model by weighted least squares and analyze the residuals. Click Analyze | Regression | Linear.
8. Transfer the independent variable and cohort category (i.e., 17 birth cohorts) into the independent(s) box and the dependent variable and residuals into the dependent: box. Transfer the death number into the WLS weight box. Click OK. Ensure it generates the results of the weighted average of the cohort effect.

Results

The mortality data were demonstrated for 10 five-year age groups (40-44, 45-49, 50-54, 55-59, 60-64, 65-69, 70-74, 75-79, 80-84, and 85+) and 8 five-year time periods (1976-1980, 1981-1985, 1986-1990, 1991-1995, 1996-2000, 2001-2005, 2006-2010 and 2011-2015). The number of cohort groups was selected by subtracting one from the total number of age-period groups: 10 (five-year age groups) + 8 (five-year time periods) -1 = 17 birth cohorts, with the birth cohort groups denoted by mid-cohort years as 1891, 1896, 1901, 1906, ...

Discussion

Due to the time trend of HCC mortality, conventional models underestimate some important features hidden in the data (such as cohort effects), and conventional analyses that use simple linear extrapolation of the observed logarithmic age correction rate show significantly reduced accuracy in their predictions. It is clear that this trend has continued for 35 years and will trend upwards in the next few years if we directly observe the long-term trend of HCC mortality in Taiwan from 1976 to 2015 (Figu...

Disclosures

The authors have nothing to disclose.

Acknowledgements

This work was supported by Taipei Tzu Chi Hospital TCRD-TPE-109-RT-8 (2/3) and TCRD-TPE-109-39 (2/2).

Materials

Name	Company	Catalog Number	Comments
not applicable	not applicable	not applicable	not applicable

References

Kuntz, E., Kuntz, H. D. . Hepatology: Principles and Practice. , 774 (2006).
McGlynn, K. A., et al. International trends and patterns of primary liver cancer. International Journal of Cancer. 94 (2), 290-296 (2001).
Bosch, F. X., Ribes, J., Diaz, M., Cleries, R. Primary liver cancer: worldwide incidence and trends. Gastroenterology. 127, 5-16 (2004).
Tzeng, I. S., Ng, C. Y., Chen, J. Y., Chen, L. S., Wu, C. C. Using weighted regression model for estimating cohort effect in age-period contingency table data. Oncotarget. 9 (28), 19826-19835 (2018).
Tzeng, I. S., Lee, W. C. Forecasting hepatocellular carcinoma mortality in Taiwan using an age-period-cohort model. Asia-Pacific Journal of PublicHealth. 27, 65-73 (2015).
Tzeng, I. S., et al. Predicting emergency departments visit rates from septicemia in Taiwan using an age-period-cohort model, 1998 to 2012. Medicine. 95, 5598 (2016).
Chen, S. H., et al. Period and Cohort Analysis of Rates of Emergency Department Visits Due to Pneumonia in Taiwan, 1998-2012. Risk Management and Healthcare Policy. 13, 1459-1466 (2020).
Keyes, K. M., Li, G. A multiphase method for estimating cohort effects in age-period contingency table data. Annals of Epidemiology. 20, 779-785 (2010).
Tukey, J. . Exploratory data analysis Reading: MS. , (1977).
Selvin, S. . Statistical analysis of epidemiologic data. , (1996).
Légaré, G., Hamel, D. An age-period-cohort approach to analyzing trends in suicide in Quebec between 1950 and 2009. Canadian Journal of Public Health. 104, 118-123 (2013).
Lavanchy, D. Hepatitis B virus epidemiology, disease burden, treatment, and current and emerging prevention and control measures. Journal of Viral Hepatitis. 11, 97-107 (2004).
Chang, M. H., et al. Universal hepatitis B vaccination in Taiwan and the incidence of hepatocellular carcinoma in children. Taiwan Childhood Hepatoma Study Group. New England Journal of Medicine. 336, 1855-1859 (1997).
Lu, F. T., Ni, Y. H. Elimination of mother-to-infant transmission of hepatitis B virus: 35 years of experience. Pediatric Gastroenterology, Hepatology & Nutrition. 23 (4), 311-318 (2020).
Chien, Y. C., Jan, C. F., Kuo, H. S., Chen, C. J. Nationwide hepatitis B vaccination program in Taiwan: effectiveness in the 20 years after it was launched. Epidemiologic Reviews. 28, 126-135 (2006).
Ahmad, O. B., et al. Age standardization of rates: a new WHO standard. Geneva: GPE Discussion Paper Series. World Health Organization. , 31 (2005).
da Silva, C. P., Emídio, E. S., de Marchi, M. R. Method validation using weighted linear regression models for quantification of UV filters in water samples. Talanta. 131, 221-227 (2015).
Dawes, R. M. The robust beauty of improper linear models in decision making. American Psychologist. 34, 571-582 (1979).
Dawes, R. M., Corrigan, B. Linear models in decision making. Psychological Bulletin. 81, 95-106 (1974).
Einhorn, H. J., Hogarth, R. M. Unit weighting schemes for decision making. Organizational Behavior and Human Performance. 13, 171-192 (1975).
Wang, W., et al. Association of hepatitis B virus DNA level and follow-up interval with hepatocellular carcinoma recurrence. JAMA Network Open. 3 (4), 203707 (2020).
Holford, T. R. The estimation of age, period and cohort effects for vital rates. Biometrics. 39, 311-324 (1983).

Reprints and Permissions

Request permission to reuse the text or figures of this JoVE article

Request Permission

Explore More Articles

Hepatocellular Carcinoma HCC Mortality Forecasting Weighted Regression Model Cohort Effect Multistage Method Age period cohort Model Liver Cirrhosis Epidemiology Confidence Interval Weighted Mean Estimation Age Cycle Data

This article has been published

Video Coming Soon

Keep me updated: