Goodness of Fit

The notes for the topics on this page can be found in the lectures 16 and 17 folder on Canvas.

R

To examine goodness of fit using deviance we will use gof_deviance() from catfun, to conduct a Hosmer-Lemeshow test we will use hoslem.test() from ResourceSelection.

Deviance

gof_deviance() takes two arguments: (1) a working logistic regression model and (2) a saturated working logistic regression model.

library(catfun)

cad <- tibble(
  gender = c(rep("male", 6), rep("female", 6)), 
  agecat = c(rep(c("<60", "60-69", "70+"), 4)), 
  event = c(rep(c(0, 0, 0, 1, 1, 1), 2)), 
  count = c(110, 131, 126, 13, 37, 45, 411, 110, 189, 41, 24, 12)
)

saturated <- glm(event ~ gender*agecat, weight = count, data = cad, family = "binomial")
working <- glm(event ~ gender + agecat, weight = count, data = cad, family = "binomial")

gof_deviance(working, saturated)

## # A tibble: 2 x 6
##   model     deviance df.residual  stat    df   p.value
##   <chr>        <dbl>       <int> <dbl> <int>     <dbl>
## 1 working       964.           8  15.1     2  0.000535
## 2 saturated     949.           6  NA      NA NA

Hosmer-Lemeshow

hoslem.test() requires three arguments: The first argument is a vector of the observed outcomes, the second argument is a vector of the predicted outcomes, and the third is the number of quintiles for conducting the test.

library(ResourceSelection)
library(readxl)

evans_county <- read_excel("C:/Users/niwi8/OneDrive/Documents/p8120_ta/data/evanscounty.xlsx")

evans_model <- glm(chd ~ cat + age + ecg + smk + chl + hpt, data = evans_county, family = "binomial")

hoslem.test(evans_model$y, fitted(evans_model), g = 10)

## 
##  Hosmer and Lemeshow goodness of fit (GOF) test
## 
## data:  evans_model$y, fitted(evans_model)
## X-squared = 5.4074, df = 8, p-value = 0.7133

SAS

Deviance

To obtain goodness of fit based on deviance in SAS you can specify scale = none in the model argument.

data cad; 
  input gender $ agecat $ events trials; 
  cards; 
  male <60 13 123
  male 60-69 37 168
  male 70+ 45 171
  female <60 41 452
  female 60-69 24 134
  female 70+ 12 201
  ; 
run; 
            
proc logistic data = cad; 
  class gender (ref = 'female')
        agecat (ref = '<60') / param = ref; 
  model events / trials = gender agecat / cl scale = none; 
run;

##                                          The LOGISTIC Procedure
## 
##                                            Model Information
## 
##                             Data Set                       WORK.CAD        
##                             Response Variable (Events)     events          
##                             Response Variable (Trials)     trials          
##                             Model                          binary logit    
##                             Optimization Technique         Fisher's scoring
## 
## 
##                                 Number of Observations Read           6
##                                 Number of Observations Used           6
##                                 Sum of Frequencies Read            1249
##                                 Sum of Frequencies Used            1249
## 
## 
##                                             Response Profile
##  
##                                    Ordered     Binary           Total
##                                      Value     Outcome      Frequency
## 
##                                          1     Event              172
##                                          2     Nonevent          1077
## 
## 
##                                         Class Level Information
##  
##                                                             Design
##                                     Class      Value      Variables
## 
##                                     gender     female      0       
##                                                male        1       
## 
##                                     agecat     60-69       1      0
##                                                70+         0      1
##                                                <60         0      0
## 
## 
##                                         Model Convergence Status
## 
##                              Convergence criterion (GCONV=1E-8) satisfied.          
## 
## 
##                             Deviance and Pearson Goodness-of-Fit Statistics
##  
##                      Criterion          Value       DF     Value/DF     Pr > ChiSq
## 
##                      Deviance         15.0659        2       7.5330         0.0005
##                      Pearson          14.2936        2       7.1468         0.0008
## 
##                                 Number of events/trials observations: 6
## 
## 
##                                          Model Fit Statistics
##  
##                                                       Intercept and Covariates
##                                        Intercept            Log       Full Log
##                          Criterion          Only     Likelihood     Likelihood
## 
##                          AIC            1003.161        972.169         52.479
##                          SC             1008.291        992.689         73.000
##                          -2 Log L       1001.161        964.169         44.479
## 
## 
##                                 Testing Global Null Hypothesis: BETA=0
##  
##                         Test                 Chi-Square       DF     Pr > ChiSq
## 
##                         Likelihood Ratio        36.9918        3         <.0001
##                         Score                   37.9118        3         <.0001
##                         Wald                    36.0422        3         <.0001
## 
## 
##                                        Type 3 Analysis of Effects
##  
##                                                        Wald
##                                Effect      DF    Chi-Square    Pr > ChiSq
## 
##                                gender       1       16.5123        <.0001
##                                agecat       2        9.4288        0.0090
## 
## 
##                                 Analysis of Maximum Likelihood Estimates
##  
##                                                      Standard          Wald
##                Parameter           DF    Estimate       Error    Chi-Square    Pr > ChiSq
## 
##                Intercept            1     -2.4553      0.1542      253.5990        <.0001
##                gender    male       1      0.7091      0.1745       16.5123        <.0001
##                agecat    60-69      1      0.6501      0.2123        9.3785        0.0022
##                agecat    70+        1      0.3765      0.2092        3.2385        0.0719
## 
## 
##                                          Odds Ratio Estimates
##                                                    
##                                                    Point          95% Wald
##                        Effect                   Estimate      Confidence Limits
## 
##                        gender male vs female       2.032       1.443       2.861
##                        agecat 60-69 vs <60         1.916       1.264       2.904
##                        agecat 70+   vs <60         1.457       0.967       2.196
## 
## 
##                       Association of Predicted Probabilities and Observed Responses
## 
##                            Percent Concordant      54.2    Somers' D    0.267
##                            Percent Discordant      27.6    Gamma        0.326
##                            Percent Tied            18.2    Tau-a        0.063
##                            Pairs                 185244    c            0.633
## 
## 
##                            Parameter Estimates and Wald Confidence Intervals
##  
##                         Parameter            Estimate     95% Confidence Limits
## 
##                         Intercept             -2.4553      -2.7575      -2.1531
##                         gender    male         0.7091       0.3671       1.0511
##                         agecat    60-69        0.6501       0.2340       1.0661
##                         agecat    70+          0.3765      -0.0336       0.7865

Hosmer-Lemeshow

To conduct a Hosmer-Lemeshow test in SAS you specify lackfit in the model argument.

proc import datafile = 'C:/Users/niwi8/OneDrive/Documents/p8120_ta/data/evanscounty.xlsx'
            out = evanscounty
            dbms = xlsx; 
run; 
            
proc logistic data = evanscounty descending; 
  model chd = cat age ecg smk chl hpt / lackfit; run;

##                                          The LOGISTIC Procedure
## 
##                                            Model Information
## 
##                          Data Set                      WORK.EVANSCOUNTY        
##                          Response Variable             chd                  chd
##                          Number of Response Levels     2                       
##                          Model                         binary logit            
##                          Optimization Technique        Fisher's scoring        
## 
## 
##                                 Number of Observations Read         609
##                                 Number of Observations Used         609
## 
## 
##                                             Response Profile
##  
##                                  Ordered                          Total
##                                    Value     chd              Frequency
## 
##                                        1     1                       71
##                                        2     0                      538
## 
##                                     Probability modeled is chd='1'.
## 
## 
##                                         Model Convergence Status
## 
##                              Convergence criterion (GCONV=1E-8) satisfied.          
## 
## 
##                                           Model Fit Statistics
##  
##                                                               Intercept
##                                                Intercept            and
##                                  Criterion          Only     Covariates
## 
##                                  AIC             440.558        414.394
##                                  SC              444.970        445.277
##                                  -2 Log L        438.558        400.394
## 
## 
##                                 Testing Global Null Hypothesis: BETA=0
##  
##                         Test                 Chi-Square       DF     Pr > ChiSq
## 
##                         Likelihood Ratio        38.1645        6         <.0001
##                         Score                   38.9267        6         <.0001
##                         Wald                    34.9941        6         <.0001
## 
## 
##                                Analysis of Maximum Likelihood Estimates
##  
##                                                  Standard          Wald
##                   Parameter    DF    Estimate       Error    Chi-Square    Pr > ChiSq
## 
##                   Intercept     1     -6.7747      1.1402       35.3015        <.0001
##                   cat           1      0.5978      0.3520        2.8843        0.0894
##                   age           1      0.0322      0.0152        4.5097        0.0337
##                   ecg           1      0.3695      0.2936        1.5839        0.2082
##                   smk           1      0.8348      0.3052        7.4802        0.0062
##                   chl           1     0.00875     0.00326        7.1864        0.0073
##                   hpt           1      0.4392      0.2908        2.2806        0.1310
## 
## 
##                                           Odds Ratio Estimates
##                                                     
##                                             Point          95% Wald
##                                Effect    Estimate      Confidence Limits
## 
##                                cat          1.818       0.912       3.624
##                                age          1.033       1.002       1.064
##                                ecg          1.447       0.814       2.573
##                                smk          2.304       1.267       4.191
##                                chl          1.009       1.002       1.015
##                                hpt          1.551       0.877       2.743
## 
## 
##                      Association of Predicted Probabilities and Observed Responses
## 
##                            Percent Concordant     70.5    Somers' D    0.411
##                            Percent Discordant     29.5    Gamma        0.411
##                            Percent Tied            0.0    Tau-a        0.085
##                            Pairs                 38198    c            0.705
## 
## 
##                                Partition for the Hosmer and Lemeshow Test
##  
##                                                  chd = 1                 chd = 0
##                      Group       Total    Observed    Expected    Observed    Expected
## 
##                          1          61           0        1.72          61       59.28
##                          2          61           2        2.65          59       58.35
##                          3          61           5        3.46          56       57.54
##                          4          61           6        4.22          55       56.78
##                          5          61           7        5.21          54       55.79
##                          6          61           6        6.10          55       54.90
##                          7          61           5        7.50          56       53.50
##                          8          61           9        9.19          52       51.81
##                          9          61          12       11.86          49       49.14
##                         10          60          19       19.08          41       40.92
## 
## 
##                                 Hosmer and Lemeshow Goodness-of-Fit Test
##  
##                                    Chi-Square       DF     Pr > ChiSq
## 
##                                        5.1028        8         0.7465