Analysis of Categorical Data

This class will provide a comprehensive overview of methods of analysis for binary and other discrete response data, with applications to epidemiological and clinical studies. It is a second level course that presumes some knowledge of applied statistics and epidemiology. Topics discussed include 2 × 2 tables, m × 2 tables, tests of independence, measures of association, power and sample size determination, stratification and matching in design and analysis, and logistic regression analysis.

Statistical computing is essential to the successful implementation of these methods. This course has traditionally used SAS, but starting with this semester (Spring 2019), code to carry out the same analyses in R will also be provided as supplemental materials. As this is a pilot phase, we welcome any feedback that you may have as you work through the materials on this website.

Both SAS and R have their advantages and disadvantages. Being able to leverage the benefits of both languages is a powerful tool; this is why we are providing instruction for using both languages.

It is important to note that one must use caution when comparing output between SAS and R. Functions in R and procedures in SAS don’t always use the same underlying default settings. It is therefore possible to obtain different answers for the same analyses when done in SAS versus R. We have tried to provide notes throughout on when this might occur and where possible, how to change the default settings to get similar answers.

Of note, this is not a course on statistical programming. If you would like to learn more on programming, there are a variety of courses at CUMC that teach R or SAS. Please seek out Dr. Mauro or your TAs if you would like any recommendations.

Policy on help

If you need help with R or SAS, please email Nick (ntw2117@cumc.columbia.edu) directly and copy both Dr. Mauro (cmm2212@cumc.columbia.edu) and Anjile (ja3237@cumc.columbia.edu). Please attach screenshots of both your code and the accompanying error message; these are necessary to successfully troubleshoot any problems.