Validation of Mortality Risk Stratification Models for Cardiovascular Disease
Article Outline
Risk stratification models are effective tools for the management of cardiovascular diseases. Although several risk scores have been proposed, the relevance and superiority of these predictive models have not been fully validated in an independent and nonclinical trial-based population. We studied 2,472 consecutive patients initially hospitalized in our institution from April 2004 to December 2009. Risk scores were calculated for each patient using 4 risk score models, including the Seattle Heart Failure Model (SHFM), Acute Decompensated Heart Failure National Registry regression model, the American Heart Association Get With The Guidelines-Heart Failure score, and the Association of Health Aging and Body Composition Heart Failure score. The predictive ability for the composite end point, including total death, heart transplantation, and left ventricle assist device implantation, was assessed by calculating the area under the receiver operating characteristic curve for each model. During the follow-up period after admission (median 924.5 days), the combined end point occurred in 295 patients (11.8%), including 27 in-hospital deaths (1.1%). Compared with the other 3 risk score models, the SHFM risk score demonstrated a greater area under the curve for the combined end point at the overall, in-hospital, 30-day, and 1-, 2-, and 3-year follow-up point (0.741 to 0.890). The survival rate predicted by SHFM demonstrated an excellent correlation with the actual survival rate (R2 = 0.990). In conclusion, these results suggest that the SHFM risk score is the most suitable for the discrimination and calibration of mortality risk stratification in patients with cardiovascular disease.
Cardiovascular disease is one of the leading causes of morbidity and mortality, imposing a substantial healthcare cost in most countries. It is, therefore, important to assess the risk status of patients for decision-making process and effective management of patient care. Several predictive risk models have recently been proposed in an attempt to improve risk stratification: the Seattle Heart Failure Model (SHFM)1; the Acute Decompensated Heart Failure National Registry (ADHERE)2; Get With The Guidelines-Heart Failure (GWTG-HF)3; the Association of Health Aging and Body Composition Heart Failure score (ABC).4 These existing risk models were derived from a limited population mainly from clinical trial studies. Consequently, it remains unclear whether these risk models can provide us with a standardized approach to estimate the risk in all patients with cardiovascular disease in the “real world.” The purpose of the present study was to evaluate the prognostic accuracy of these 4 risk models to predict overall, in-hospital, 30-day, and 1-, 2-, and 3-year survival in our large cohort of patients with cardiovascular disease.
Methods
We studied 3,026 consecutive patients initially admitted to our institution from April 2004 to December 2009. The data prospectively collected from the medical records included the clinical characteristics, medical history, therapy, laboratory tests, and follow-up information. In addition, deaths were determined by conducting a telephone survey of family members and local hospitals. We calculated the risk scores for each patient using the published models: (1) SHFM, (2) ADHERE, (3) GWTG-HF, and (4) ABC. The variables required for SHFM scoring were age, gender, New York Heart Association class, body weight, left ventricular ejection fraction, systolic blood pressure (SBP), etiology of cardiomyopathy, medication (angiotensin-converting enzyme inhibitors, β blockers, angiotensin II receptor blocker, statin, allopurinol, or K-sparing diuretics), diuretic dosage, laboratory values, and implanted device status.1 Specifically, the SHFM score was determined as follows; SHFM score = age/10 × ln(1.09) + (if male) ln(1.089) + New York Heart Association class × ln(1.60) + 100/ejection fraction × ln(1.03) + (if ischemic heart disease) ln(1.354) + (if SBP <160 mm Hg) SBP/10 × ln(0.877) + (if SBP ≥160 mm Hg) 160/10 × ln(0.877) + 100/cholesterol × ln(2.206) + (if angiotensin-converting enzyme inhibitor treated) ln(0.77) + (if angiotensin II receptor blocker treated) ln(0.85) + (if β blocker treated) ln(0.66) + (if K-sparing diuretics treated) ln(0.74) + (if statin treated) ln(0.63) + diuretic/kg × ln(1.178) + (if sodium <138) (138 − sodium) × ln(1.05) + (if hemoglobin <16) (16 − hemoglobin) × ln(1.124) + (if hemoglobin >16) (hemoglobin − 16) × ln(1.336) + percentage of lymphocytes/5 × ln(0.897) + uric acid × ln(1.064) + (if cardiac resynchronization therapy implanted) ln(1.00) + (if implantable cardioverter-defibrillator implanted) ln(0.73) + (if cardiac resynchronization therapy-defibrillator implanted) ln(0.79), with ln representing natural log. Survival at time (t) for score (s) was calculated by the following equation: Survival (t) = e(−0.0405×t)×e(s).
The ADHERE regression model requires information on blood urea nitrogen levels, SBP, heart rate, and age.2 The GWTG-HF risk score also uses age, blood urea nitrogen, SBP, heart rate, sodium concentration, and the presence of chronic obstructive pulmonary disease.3 The ABC includes age, SBP, heart rate, creatinine, albumin, fasting glucose, history of coronary artery disease, smoking status, and the presence of left ventricular hypertrophy.4 The left ventricular ejection fraction was determined by echocardiography or left ventriculography. The Minnesota code criteria were applied for the diagnosis of left ventricular hypertrophy from the electrocardiograms. From the data obtained, the ejection fraction was missing in 7.7%, the heart rate at admission was missing in 6.7%, an electrocardiogram was missing for 4.1%, the serum high-density lipoprotein cholesterol level was missing for 4.1%, smoking habits were missing for 2.8%, and other variables were missing for <2%. The diuretic dose was converted to the furosemide equivalent dose as follows: furosemide 40 mg = torasemide 20 mg = azosemide 60 mg = indapamide 2 mg = trichlormethiazide 2 mg. To evaluate the risk score precisely, we did not replace the missing covariates with imputed values, such as the cohort mean. Therefore, 92% of the patients had all variables for the SHFM (n = 2,793), 93% for ADHERE (n = 2,823), 93% for GWTG (n = 2,810), and 87% for ABC (n = 2,633). Finally, 81% of the patients (n = 2,472) had all the variables for these 4 models and were analyzed for the present study.
The discrimination of the risk score was assessed by calculating the area under the receiver operating characteristic curve (AUC) for each of the risk models at different points of follow-up using a statistical test and the Hanley and McNeil approach.5 The calibration of model performance was assessed using the Hosmer-Lemeshow statistic. We also compared the predicted mortality with the observed composite end point, including death, heart transplantation, or implantation of left ventricular assist device. All analyses were performed using the Statistical Package for Social Sciences, version 17.0, for Windows (SPSS, Chicago, Illinois). A p value of <0.05 (2 tailed) was considered statistically significant.
The patients' identifying information was removed before analysis. We had access to all the data, take complete responsibility for its integrity, and have read and agreed to the report as written.
Results
The baseline patient characteristics are listed in Table 1. The average length of hospital stay was 23.0 ± 1.3 days. During 6,687.3 patient years of follow-up (median 924.5 days), 291 (11.8%) of 2,472 patients died (annual mortality rate 5.7% 95% confidence interval 4.5% to 7.0%). In addition, 4 patients underwent heart transplantation and/or left ventricular assist device support. Therefore, the combined end point occurred in 295 patients (11.9%). In-hospital death occurred in 27 patients (1.1%). The total number of patients experiencing the combined end point after the 30-day and 1-, 2-, and 3-year follow-up visit was 18 (0.8%), 113 (5.2%), 189 (11.2%), and 247 (19.3%), respectively.
Table 1. Patient characteristics (n = 2,472)
| Characteristic | Value |
|---|---|
| Age (years) | 61.6 |
| Men | 63.8% |
| New York Heart Association class | |
| 60% | |
| 21% | |
| 11% | |
| 8% | |
| Hypertension | 51.2% |
| Ejection fraction | 60.2 |
| Myocardial ischemia | 37.5% |
| 14.9% | |
| 7.8% | |
| 5.4% | |
| 9.5% | |
| Arrhythmia | 25.4% |
| 8.0% | |
| 5.0% | |
| 4.0% | |
| 3.7% | |
| 4.7% | |
| Cardiomyopathy | 15.5% |
| 7.3% | |
| 3.3% | |
| 3.2% | |
| 1.7% | |
| Valvular disease | 6.8% |
| 2.4% | |
| 1.9% | |
| 1.6% | |
| 0.9% | |
| Pulmonary artery disease | 5.7% |
| 2.1% | |
| 1.9% | |
| 2.7% | |
| Diabetes mellitus | 24.6% |
| Chronic obstructive pulmonary disease | 8.4% |
| Atrial fibrillation | 23.4% |
| Smoking | 30.1% |
| Systolic blood pressure at admission (mm Hg) | 124.7 |
| Diastolic blood pressure at admission (mm Hg) | 73.0 |
| Heart rate (beats/min) | 77 |
| Creatinine (mg/dl) | 1.1 |
| Sodium (mEq/L) | 141.1 |
| Blood urea nitrogen (mg/dl) | 18.6 |
| Uric acid (mg/dl) | 6.0 |
| Total cholesterol (mg/dl) | 181.9 |
| High-density lipoprotein (mg/dl) | 48.0 |
| Albumin (g/dl) | 3.9 |
| Hemoglobin (g/dl) | 13.1 |
| Fasting blood glucose (mg/dl) | 118.2 |
| Lymphocytes (%) | 25.4 |
| Cardiac device | 15.2% |
| 4.0% | |
| 0.6% | |
| 1.0% | |
| 9.6% | |
| Angiotensin-converting enzyme inhibitor | 36.7% |
| Angiotensin-receptor blocker | 29.4% |
| β Blockers | 36.4% |
| Aldosterone antagonist | 18.1% |
| Statins | 31.0% |
| Amiodarone | 4.4% |
| Warfarin | 27.7% |
| Loop diuretics | 29.0% |
| Daily diuretic use (mg/kg) (if used, furosemide equivalent) | 29.3 |
The AUC for the combined end point in the models is shown in Figure 1. The values of AUC with the 95% confidence interval and p values are summarized in Table 2. Compared to the other models, the SHFM risk score demonstrated a significantly greater AUC for overall outcomes (p = 0.028 vs ADHERE, p = 0.018 vs GWTG-HF, and p < 0.001 vs ABC), in-hospital death (p = 0.039 vs ADHERE, p = 0.045 vs GWTG-HF, and p < 0.001 vs ABC), and mortality at 2 years (p = 0.040 vs ADHERE, p = 0.042 vs GWTG-HF, and p = 0.018 vs ABC). We noted a significant difference in AUC between the SHFM and ABC for combined mortality at 3 years (p = 0.034 vs ABC). The SHFM also showed a nonsignificant tendency toward greater AUCs for the 30-day and 1-year mortality compared to the other models. Both the ADHERE and GWTG-HF risk scores demonstrated significantly greater AUCs for overall combined end points compared to ABC (ADHERE vs ABC, p <0.001; and GWTG-HF vs ABC, p <0.001). Kaplan-Meier curves for the risk score models categorized by quintiles are shown in Figure 2. All models demonstrated excellent risk stratification.

Figure 1.
AUCs for combined end point of death, heart transplantation, or left ventricular assist device (LVAD) implantation for SHFM, ADHERE, GWTG-HF, and ABC, for (A) in-hospital death, and combined end points at (B) 30 days and (C) 1- and (D) 2 years of follow-up.
Table 2. Comparison of area under receiver operating characteristic curve (AUC) for Seattle Heart Failure Model (SHFM), Acute Decompensated Heart Failure National Registry (ADHERE), Get With The Guidelines-Heart Failure (GWTG-HF), and Association of Health Aging and Body Composition Heart Failure Score (ABC)
| Variable | SHFM | ADHERE | GWTG-HF | ABC |
|---|---|---|---|---|
| In-hospital death | 0.890⁎ | 0.792 | 0.805 | 0.702 |
| Combined end point | ||||
| 0.747⁎ | 0.714† | 0.711† | 0.642 | |
| 0.866 | 0.801 | 0.807 | 0.769 | |
| 0.777 | 0.736 | 0.740 | 0.715 | |
| 0.746⁎ | 0.698 | 0.701 | 0.679 | |
| 0.744‡ | 0.709 | 0.712 | 0.694 |
⁎p <0.05, SHFM vs ADHERE, GWTG-HF, or ABC; |
†p <0.05 ADHERE or GWTG-HF vs ABC; |
‡p <0.05, SHFM vs ABC. |

Figure 2.
Kaplan-Meier curves for quintiles of risk score models during 2-year period in (A) SHFM, (B) GWTG-HF, (C) ADHERE, and (D) ABC.
The predicted survival and observed survival during follow-up are compared in Figure 3. With the SHFM risk score, the predicted survival rate at 30 days and 1 and 2 years was 99.2%, 93.2%, and 87.1%, and the observed survival rate was 99.4%, 94.7%, and 88.7%, respectively. A good correlation between the predicted and observed survival was noted (R2 = 0.990). Figure 3 shows good calibration of the predicted and observed end point probabilities across deciles of predicted risk using the Hosmer-Lemeshow test at 30 days and 1 and 2 years of follow-up.

Figure 3.
(A) Comparison of predicted and observed survival for SHFM. Predicted (blue) versus observed (white) survival rate at each day plotted during follow-up period of ≤3 years. Calibration plots for composite outcome at (B) 30 days and (C) 1 and (D) 2 years for SHFM. Predicted (blue) versus observed (white) mortality according to decile of risk shown. Hosmer-Lemeshow chi-square was 7.21 (p = 0.51), 11.15 (p = 0.19), and 5.04 (p = 0.74) at 30 days and 1 and 2 years, respectively.
Discussion
In the present study, we compared the 4 risk score models for the prediction of mortality in our large cohort of patients with cardiovascular disease. All models were validated using the same data set to ensure a proper comparison. Our results showed that the SHFM was superior to other models in predicting, not only the short-term outcome (e.g., in-hospital mortality), but also the long-term (2-year) and overall follow-up (AUCs of 0.744 to 0.890) outcomes. Thus, the SHFM is an adequate application tool for risk stratification in the general population of patients with cardiovascular disease.
Risk score models are important tools, not only for guiding the treatment plan for the physician, but also for evaluating the cost-effectiveness in public health. Although several models have been developed for this purpose, few studies have compared such models in the ability to predict patient outcomes.6, 7, 8 In addition, these models were mainly derived from clinical trial data, in which a patient population might have been limited because of strict enrollment criteria, resulting in the exclusion of patients with severe conditions, such as liver dysfunction and severe renal insufficiency. The outcome from these models would be different in clinical settings. For application in the “real world,” risk models should be validated using a broader patient population. Furthermore, because risk score models require a number of covariates from clinical information, many validation studies usually have a great deal of data missing and have imputed the cohort means for missing values. For example, in the study by May et al,9 many values were missing for several variables (New York Heart Association 72.1%; lymphocytes 34.7%; uric acid 66.2%; ejection fraction 25.0%; total cholesterol 19.8%), which were estimated using multiple imputations. This could have resulted in an underestimation of the dispersion and led to incorrect inferences. Thus, we included all data to perform a complete case analysis. The present study should be considered as entirely representative of a general patient population with cardiovascular disease.
Several explanations for the superiority of the SHFM can be provided. First, the SHFM has been validated in several databases. The model was originally derived from the Prospective Randomized Amlodipine Survival Evaluation database10 and validated in 5 other study populations, including patients with a wide age range (14 to 100 years), ejection fraction (1% to 75%), and heart failure severity (New York Heart Association class I to IV).1 This could explain why the SHFM was the most applicable to the present study population, a broad sample of patients hospitalized for cardiovascular disease. Previous studies of the SHFM have reported that it is a good risk prediction model for patients with severe heart failure,9, 11 including patients who are potential candidates for, or recipients of, a left ventricular assist device.12, 13, 14 Our results indicate that the SHFM is also an adequate risk prediction model in those with milder heart failure or no heart failure. Second, the SHFM risk model requires information about medications and clinical devices. The inclusion of this information could contribute to the better prediction of mortality than clinical characteristics alone, because medications and devices are critically altered by physicians to improve the chances of survival of their patients. Other risk prediction models do not use information pertaining to medications and clinical devices. Third, blood pressure data have a different effect on the SHFM risk score than on the score for the GWTG and ADHERE. In the SHFM, the inclusion of data regarding blood pressure elevation increases the risk score, because hypertension is known to be a common and powerful contributor to all the major cardiovascular diseases.15, 16 In contrast, a lower systolic blood pressure actually increases the risk score in the GWTG and ADHERE, consistent with the finding that lower systolic blood pressure at admission correlated significantly with greater mortality from acute congestive heart failure.17, 18 Both models were developed to predict the short-term outcome in patients with acute heart failure and greater in-hospital mortality (ADHERE, 4.0%; GWTG, 2.9%; the present study, 1.1%) and provided adequate risk stratification for in-hospital mortality.2, 3 Nevertheless, the SHFM was significantly better in predicting in-hospital mortality than the GWTG, despite the greater AUC of the GWTG score for in-hospital mortality in our study than in the original study (0.81 vs 0.75, respectively).3 The AUC of the ADHERE was not reported in the original study.2 In contrast, the ABC score model did not provide accurate predictions in the present study, although the elevation of blood pressure increases the risk scores for both the ABC and the SHFM. The different patient populations, including the older age range (73.6 ± 3 years) and female predominance (53.1%) in the ABC population, might explain the apparent discrepancies.
Several limitations should be mentioned for the present study. First, the risk score we calculated used the data obtained on the initial admission to our hospital. During a long follow-up period, the risk score should be recalculated after changes in clinical status or medications and devices. Nonetheless, our results indicate that a SHFM score calculated at the initial hospitalization was accurate in predicting the mortality in patients with cardiovascular disease. Second, the present study shares the limitations of all observational nonrandomized studies; however, it was a wide-ranging study and diligent in patient ascertainment. Third, it is possible that our findings might not be applicable to other settings, because the SHFM risk score was created using a United States population. Even in the original study of the SHFM, the question was raised about the need to recalibrate for different ethnic populations. However, the present study has demonstrated the SHFM is an excellent predictive model in the Japanese population, as well as in the United States. Fourth, we did not study all risk score models. For example, the Heart Failure Survival Score is a clinical prognostic model derived and validated in 2 cohorts of patients with a mean age of >75 years.19 However, the Heart Failure Survival Score requires a peak oxygen consumption value, which, although a good index for predicting the prognosis, is not applicable to all patients with cardiovascular disease, particularly for patients for whom heart failure is not a factor. In fact, oxygen consumption data were available for <5% of the patients in our study. Therefore, because of the lack of easily obtainable oxygen consumption information, we did not evaluate the Heart Failure Survival Score risk model. Likewise, in the present study, we did not include the Enhanced Feedback for Effective Cardiac Treatment model,20 because of the large amount of missing data for the respiratory rate. This was the case, not only for low-risk patients, but also for high-risk patients, at our institution. Although we could impute a respiratory rate of <20 for almost all low-risk patients, the calculated risk score might not be accurate if we had imputed a speculative respiratory rate for the high-risk patients. This point needs to be examined in a future study.
References
- . The Seattle Heart Failure Model: prediction of survival in heart failure. Circulation. 2006;113:1424–1433
- . Risk stratification for in-hospital mortality in acutely decompensated heart failure: classification and regression tree analysis. JAMA. 2005;293:572–580
- . A validated risk score for in-hospital mortality in patients with heart failure from the American Heart Association Get With The Guidelines program. Circ Cardiovasc Qual Outcomes. 2010;3:25–32
- . Incident heart failure prediction in the elderly: the health ABC heart failure score. Circ Heart Fail. 2008;1:125–133
- . A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology. 1983;148:839–843
- . Estimation of cardiovascular risk: a comparison between the Framingham and the SCORE model in people under 60 years of age. Eur J Cardiovasc Prev Rehabil. 2008;15:562–566
- . Evaluation of 6 prognostic models used to calculate mortality rates in elderly heart failure patients with a fatal heart failure admission. Congest Heart Fail. 2010;16:196–201
- . Multidimensional Prognostic Index based on a comprehensive geriatric assessment predicts short-term mortality in older patients with heart failure. Circ Heart Fail. 2010;3:14–20
- . Validation of the Seattle Heart Failure Model in a community-based heart failure population and enhancement by adding B-type natriuretic peptide. Am J Cardiol. 2007;100:697–700
- . Effect of amlodipine on morbidity and mortality in severe chronic heart failure (Prospective Randomized Amlodipine Survival Evaluation Study Group). N Engl J Med. 1996;335:1107–1114
- . Utility of the Seattle Heart Failure Model in patients with advanced heart failure. J Am Coll Cardiol. 2009;53:334–342
- . Can the Seattle Heart Failure Model be used to risk-stratify heart failure patients for potential left ventricular assist device therapy?. J Heart Lung Transplant. 2009;28:231–236
- . Evaluation of risk indices in continuous-flow left ventricular assist device patients. Ann Thorac Surg. 2009;88:1889–1896
- . Predictive value of the Seattle Heart Failure Model in patients undergoing left ventricular assist device placement. J Heart Lung Transplant. 2010;29:1021–1025
- . Impact of high-normal blood pressure on the risk of cardiovascular disease. N Engl J Med. 2001;345:1291–1297
- . Framingham Study insights on the hazards of elevated blood pressure. JAMA. 2008;300:2545–2547
- . Survival of patients with a new diagnosis of heart failure: a population based study. Heart. 2000;83:505–510
- . Systolic blood pressure at admission, clinical characteristics, and outcomes in patients hospitalized with acute heart failure. JAMA. 2006;296:2217–2226
- . Risk stratification in middle-aged patients with congestive heart failure: prospective comparison of the Heart Failure Survival Score (HFSS) and a simplified two-variable model. Eur J Heart Fail. 2001;3:577–585
- . Predicting mortality among patients hospitalized for heart failure: derivation and validation of a clinical model. JAMA. 2003;290:2581–2587
This work was supported in part by grants 17790480 and 19590802 from the Grants-in-Aid from the Japanese ministry of Education, Culture, Sports, Science and Technology, Tokyo, Japan.
PII: S0002-9149(11)01357-9
doi:10.1016/j.amjcard.2011.03.062
© 2011 Elsevier Inc. All rights reserved.
