This document is intended to be used as a guide to SAS for P8120: Analysis of Categorical Data. This document is not exhaustive; please consult one of the TA’s, Dr. Mauro, or the online SAS user guide for more help if needed.
SAS is available on all computers at the CUMC library or can be purchased through the IT department in the Hammer Library for $85 (Windows computers only). In addition, there is now a free version of SAS that works on multiple operating systems called SAS University Edition. Instructions for download can be found here. Another free option is SAS OnDemand, which some students may already be using in P8483. Instructions to download and use the software can be found here.
SAS is a powerful statistical software and programming language commonly used in universities, government, and the private sector. Programming in SAS can roughly be broken down into two sections: data steps (begun by a data
statement) and statistical procedures (begun by a proc
statement). Almost all code blocks in SAS begin with one of these two statements and almost all code blocks are ended with run;
. Within code blocks, arguments in SAS are ended with a semicolon.
When starting a new session the basic workflow is:
Always remember to save your work often.
Importing data in SAS can be done a couple of different ways. We recommend using the import procedure.
The import procedure is the most straightforward way of importing data into SAS. The procedure requires two arguments and a third optional requirement is recommended: out
, datafile
, and dbms
. The out
argument specifies the name of the newly created SAS dataset; the datafile
argument specifies the complete path for the file containing the data to be imported in quotation marks; the dbms
argument is optional and specifies the file type to be parsed. The most commonly used identifiers are csv
, tab
, xlsx
, and dlm
. If importing a csv file, this argument is not necessary. Further help can be found here.
If using SAS Studio, you must first upload the data into SAS Studio and then reference that file path.
Syntax
proc import out = <SAS dataset name>
datafile = "filepath"
dbms = <identifier> replace;
run;
Example
proc import out = azt
datafile = "C:\Users\niwi8\OneDrive\Documents\p8120_ta\p8120\data\AZT.csv"
dbms = csv replace;
run;
The data step in SAS is where all data manipulation occurs, this includes being able to import data and create data sets by “hand”. Sometimes you will be asked to conduct analyses on group-level instead of individual-level data. In these instances, you will need to create a dataset using a data step. The first argument to any data step is the name of the dataset you are creating. The next argument should be input
where you specify variable names and types (for this course you only need to worry about numeric variables and character variables; if a variable is of type character it should be followed by a $). After input, the next argument is cards
followed by a semicolon. After cards, you begin writing your data; there should only be as many columns as there are variables you created and after the last line of data you must go to the next line and include a semicolon.
Syntax
data <dataset name>
input <a character variable> $ <a numeric variable>
cards;
some_data 123
;
run;
Example
data vietnam;
input service $ sleep $ count;
cards;
yes yes 173
yes no 160
no yes 599
no no 851
;
run;
You will be asked to include SAS output in homework assignments. SAS has a built in output delivery system (ODS) that can save output as an rtf file to be used in a word document. The basic usage is:
ods rtf file = "C://my_computer/contains_some_file/some_file.rtf";
<insert SAS procedure>
ods rtf close;
You are not required to use this method when including SAS output in assignments.
Use the following table to locate procedures for hypothesis tests. If you are unsure of the usage of a procedure, refer to the online SAS guide. Lecture numbers are linked to examples.
Hypothesis test | SAS procedure | Arguments needed | Lecture number |
---|---|---|---|
Confidence intervals, proportions | proc freq |
Same as normal and exact | 3 |
One sample test proportions, normal approximation | proc freq |
binomial(p = <insert null>) |
4 |
One sample test proportions, exact | proc freq |
exact binomial |
4 |
Chi-squared | proc freq |
chisq expected |
8 |
Fisher’s exact | proc freq |
Same as chi-squared | 8 |
Likelihood ratio test | proc freq |
Same as chi-squared | 9 |
Two sample test of proportions | proc freq |
expected riskdiff(equal var = null) |
9 |
Logistic regression | proc logistic |
13, 14, & 15 | |
GOF & Hosmer Lemeshow | proc logistic |
scale = none / lackfit |
16 & 17 |
Breslow-Day test | proc freq |
cmh |
19 |
Cochran-Mantel-Haenzsel test | proc freq |
cmh |
20 |
McNemar’s test | proc freq |
agree |
21 |
Conditional logistic regression | proc logistic |
strata |
22 |
Cochran-Armitage Trend test | proc freq |
trend |
23 |
Use the following table to locate procedures for measures of association. If you are unsure of the usage of a procedure, refer to the online SAS guide. Lecture numbers are linked to examples.
Measure | SAS procedure | Arguments needed | Lecture number |
---|---|---|---|
Odds ratio (2 x k table) | proc freq |
relrisk |
5 & 7 |
Risk ratio (2 x k table) | proc freq |
relrisk |
5 & 7 |
Risk difference | proc freq |
riskdiff |
5 & 7 |
Use the following table to locate procedures for power. If you are unsure of the usage of a procedure, refer to the online SAS guide. Lecture numbers are linked to examples.
Power analysis | SAS procedure | Notes | Lecture |
---|---|---|---|
Power from sample size | proc power |
power = . |
10 |
Sample size from power (balanced) | proc power |
npergroup = . |
10 |
Sample size from power (unbalanced) | proc power |
must specify groupweights |
10 |