The notes for the topics on this page can be found in the lectures 16 and 17 folder on Canvas.
To examine goodness of fit using deviance we will use gof_deviance()
from catfun, to conduct a Hosmer-Lemeshow test we will use hoslem.test()
from ResourceSelection.
gof_deviance()
takes two arguments: (1) a working logistic regression model and (2) a saturated working logistic regression model.
library(catfun)
cad <- tibble(
gender = c(rep("male", 6), rep("female", 6)),
agecat = c(rep(c("<60", "60-69", "70+"), 4)),
event = c(rep(c(0, 0, 0, 1, 1, 1), 2)),
count = c(110, 131, 126, 13, 37, 45, 411, 110, 189, 41, 24, 12)
)
saturated <- glm(event ~ gender*agecat, weight = count, data = cad, family = "binomial")
working <- glm(event ~ gender + agecat, weight = count, data = cad, family = "binomial")
gof_deviance(working, saturated)
## # A tibble: 2 x 6
## model deviance df.residual stat df p.value
## <chr> <dbl> <int> <dbl> <int> <dbl>
## 1 working 964. 8 15.1 2 0.000535
## 2 saturated 949. 6 NA NA NA
hoslem.test()
requires three arguments: The first argument is a vector of the observed outcomes, the second argument is a vector of the predicted outcomes, and the third is the number of quintiles for conducting the test.
library(ResourceSelection)
library(readxl)
evans_county <- read_excel("C:/Users/niwi8/OneDrive/Documents/p8120_ta/data/evanscounty.xlsx")
evans_model <- glm(chd ~ cat + age + ecg + smk + chl + hpt, data = evans_county, family = "binomial")
hoslem.test(evans_model$y, fitted(evans_model), g = 10)
##
## Hosmer and Lemeshow goodness of fit (GOF) test
##
## data: evans_model$y, fitted(evans_model)
## X-squared = 5.4074, df = 8, p-value = 0.7133
To obtain goodness of fit based on deviance in SAS you can specify scale = none
in the model
argument.
data cad;
input gender $ agecat $ events trials;
cards;
male <60 13 123
male 60-69 37 168
male 70+ 45 171
female <60 41 452
female 60-69 24 134
female 70+ 12 201
;
run;
proc logistic data = cad;
class gender (ref = 'female')
agecat (ref = '<60') / param = ref;
model events / trials = gender agecat / cl scale = none;
run;
## The LOGISTIC Procedure
##
## Model Information
##
## Data Set WORK.CAD
## Response Variable (Events) events
## Response Variable (Trials) trials
## Model binary logit
## Optimization Technique Fisher's scoring
##
##
## Number of Observations Read 6
## Number of Observations Used 6
## Sum of Frequencies Read 1249
## Sum of Frequencies Used 1249
##
##
## Response Profile
##
## Ordered Binary Total
## Value Outcome Frequency
##
## 1 Event 172
## 2 Nonevent 1077
##
##
## Class Level Information
##
## Design
## Class Value Variables
##
## gender female 0
## male 1
##
## agecat 60-69 1 0
## 70+ 0 1
## <60 0 0
##
##
## Model Convergence Status
##
## Convergence criterion (GCONV=1E-8) satisfied.
##
##
## Deviance and Pearson Goodness-of-Fit Statistics
##
## Criterion Value DF Value/DF Pr > ChiSq
##
## Deviance 15.0659 2 7.5330 0.0005
## Pearson 14.2936 2 7.1468 0.0008
##
## Number of events/trials observations: 6
##
##
## Model Fit Statistics
##
## Intercept and Covariates
## Intercept Log Full Log
## Criterion Only Likelihood Likelihood
##
## AIC 1003.161 972.169 52.479
## SC 1008.291 992.689 73.000
## -2 Log L 1001.161 964.169 44.479
##
##
## Testing Global Null Hypothesis: BETA=0
##
## Test Chi-Square DF Pr > ChiSq
##
## Likelihood Ratio 36.9918 3 <.0001
## Score 37.9118 3 <.0001
## Wald 36.0422 3 <.0001
##
##
## Type 3 Analysis of Effects
##
## Wald
## Effect DF Chi-Square Pr > ChiSq
##
## gender 1 16.5123 <.0001
## agecat 2 9.4288 0.0090
##
##
## Analysis of Maximum Likelihood Estimates
##
## Standard Wald
## Parameter DF Estimate Error Chi-Square Pr > ChiSq
##
## Intercept 1 -2.4553 0.1542 253.5990 <.0001
## gender male 1 0.7091 0.1745 16.5123 <.0001
## agecat 60-69 1 0.6501 0.2123 9.3785 0.0022
## agecat 70+ 1 0.3765 0.2092 3.2385 0.0719
##
##
## Odds Ratio Estimates
##
## Point 95% Wald
## Effect Estimate Confidence Limits
##
## gender male vs female 2.032 1.443 2.861
## agecat 60-69 vs <60 1.916 1.264 2.904
## agecat 70+ vs <60 1.457 0.967 2.196
##
##
## Association of Predicted Probabilities and Observed Responses
##
## Percent Concordant 54.2 Somers' D 0.267
## Percent Discordant 27.6 Gamma 0.326
## Percent Tied 18.2 Tau-a 0.063
## Pairs 185244 c 0.633
##
##
## Parameter Estimates and Wald Confidence Intervals
##
## Parameter Estimate 95% Confidence Limits
##
## Intercept -2.4553 -2.7575 -2.1531
## gender male 0.7091 0.3671 1.0511
## agecat 60-69 0.6501 0.2340 1.0661
## agecat 70+ 0.3765 -0.0336 0.7865
To conduct a Hosmer-Lemeshow test in SAS you specify lackfit
in the model
argument.
proc import datafile = 'C:/Users/niwi8/OneDrive/Documents/p8120_ta/data/evanscounty.xlsx'
out = evanscounty
dbms = xlsx;
run;
proc logistic data = evanscounty descending;
model chd = cat age ecg smk chl hpt / lackfit; run;
## The LOGISTIC Procedure
##
## Model Information
##
## Data Set WORK.EVANSCOUNTY
## Response Variable chd chd
## Number of Response Levels 2
## Model binary logit
## Optimization Technique Fisher's scoring
##
##
## Number of Observations Read 609
## Number of Observations Used 609
##
##
## Response Profile
##
## Ordered Total
## Value chd Frequency
##
## 1 1 71
## 2 0 538
##
## Probability modeled is chd='1'.
##
##
## Model Convergence Status
##
## Convergence criterion (GCONV=1E-8) satisfied.
##
##
## Model Fit Statistics
##
## Intercept
## Intercept and
## Criterion Only Covariates
##
## AIC 440.558 414.394
## SC 444.970 445.277
## -2 Log L 438.558 400.394
##
##
## Testing Global Null Hypothesis: BETA=0
##
## Test Chi-Square DF Pr > ChiSq
##
## Likelihood Ratio 38.1645 6 <.0001
## Score 38.9267 6 <.0001
## Wald 34.9941 6 <.0001
##
##
## Analysis of Maximum Likelihood Estimates
##
## Standard Wald
## Parameter DF Estimate Error Chi-Square Pr > ChiSq
##
## Intercept 1 -6.7747 1.1402 35.3015 <.0001
## cat 1 0.5978 0.3520 2.8843 0.0894
## age 1 0.0322 0.0152 4.5097 0.0337
## ecg 1 0.3695 0.2936 1.5839 0.2082
## smk 1 0.8348 0.3052 7.4802 0.0062
## chl 1 0.00875 0.00326 7.1864 0.0073
## hpt 1 0.4392 0.2908 2.2806 0.1310
##
##
## Odds Ratio Estimates
##
## Point 95% Wald
## Effect Estimate Confidence Limits
##
## cat 1.818 0.912 3.624
## age 1.033 1.002 1.064
## ecg 1.447 0.814 2.573
## smk 2.304 1.267 4.191
## chl 1.009 1.002 1.015
## hpt 1.551 0.877 2.743
##
##
## Association of Predicted Probabilities and Observed Responses
##
## Percent Concordant 70.5 Somers' D 0.411
## Percent Discordant 29.5 Gamma 0.411
## Percent Tied 0.0 Tau-a 0.085
## Pairs 38198 c 0.705
##
##
## Partition for the Hosmer and Lemeshow Test
##
## chd = 1 chd = 0
## Group Total Observed Expected Observed Expected
##
## 1 61 0 1.72 61 59.28
## 2 61 2 2.65 59 58.35
## 3 61 5 3.46 56 57.54
## 4 61 6 4.22 55 56.78
## 5 61 7 5.21 54 55.79
## 6 61 6 6.10 55 54.90
## 7 61 5 7.50 56 53.50
## 8 61 9 9.19 52 51.81
## 9 61 12 11.86 49 49.14
## 10 60 19 19.08 41 40.92
##
##
## Hosmer and Lemeshow Goodness-of-Fit Test
##
## Chi-Square DF Pr > ChiSq
##
## 5.1028 8 0.7465