Getting started

This document is intended to be used as a guide to SAS for P8120: Analysis of Categorical Data. This document is not exhaustive; please consult one of the TA’s, Dr. Mauro, or the online SAS user guide for more help if needed.

SAS is available on all computers at the CUMC library or can be purchased through the IT department in the Hammer Library for $85 (Windows computers only). In addition, there is now a free version of SAS that works on multiple operating systems called SAS University Edition. Instructions for download can be found here. Another free option is SAS OnDemand, which some students may already be using in P8483. Instructions to download and use the software can be found here.

SAS is a powerful statistical software and programming language commonly used in universities, government, and the private sector. Programming in SAS can roughly be broken down into two sections: data steps (begun by a data statement) and statistical procedures (begun by a proc statement). Almost all code blocks in SAS begin with one of these two statements and almost all code blocks are ended with run;. Within code blocks, arguments in SAS are ended with a semicolon.

Workflow

When starting a new session the basic workflow is:

  • open a new program
  • type some code
  • run code
  • export necessary procedures

Always remember to save your work often.

  • To open a new program: File > New Program
  • To run code: highlight the code of interest and click the running individual on the toolbar
  • To save a program: File > Save As…

Importing data and manipulation

Importing data in SAS can be done a couple of different ways. We recommend using the import procedure.

The import procedure is the most straightforward way of importing data into SAS. The procedure requires two arguments and a third optional requirement is recommended: out, datafile, and dbms. The out argument specifies the name of the newly created SAS dataset; the datafile argument specifies the complete path for the file containing the data to be imported in quotation marks; the dbms argument is optional and specifies the file type to be parsed. The most commonly used identifiers are csv, tab, xlsx, and dlm. If importing a csv file, this argument is not necessary. Further help can be found here.

If using SAS Studio, you must first upload the data into SAS Studio and then reference that file path.

Syntax

proc import out = <SAS dataset name>
            datafile = "filepath"
            dbms = <identifier> replace;
run; 

Example

proc import out = azt
            datafile = "C:\Users\niwi8\OneDrive\Documents\p8120_ta\p8120\data\AZT.csv"
            dbms = csv replace; 
run; 

The data step in SAS is where all data manipulation occurs, this includes being able to import data and create data sets by “hand”. Sometimes you will be asked to conduct analyses on group-level instead of individual-level data. In these instances, you will need to create a dataset using a data step. The first argument to any data step is the name of the dataset you are creating. The next argument should be input where you specify variable names and types (for this course you only need to worry about numeric variables and character variables; if a variable is of type character it should be followed by a $). After input, the next argument is cards followed by a semicolon. After cards, you begin writing your data; there should only be as many columns as there are variables you created and after the last line of data you must go to the next line and include a semicolon.

Syntax

data <dataset name>
  input <a character variable> $ <a numeric variable>
  cards; 
  some_data 123
  ; 
run; 

Example

data vietnam; 
  input service $ sleep $ count; 
  cards; 
  yes yes 173
  yes no 160
  no yes 599
  no no 851
  ; 
run; 

Output delivery system (ODS)

You will be asked to include SAS output in homework assignments. SAS has a built in output delivery system (ODS) that can save output as an rtf file to be used in a word document. The basic usage is:

ods rtf file = "C://my_computer/contains_some_file/some_file.rtf";

<insert SAS procedure>

ods rtf close; 

You are not required to use this method when including SAS output in assignments.

Hypothesis tests

Use the following table to locate procedures for hypothesis tests. If you are unsure of the usage of a procedure, refer to the online SAS guide. Lecture numbers are linked to examples.

Hypothesis test SAS procedure Arguments needed Lecture number
Confidence intervals, proportions proc freq Same as normal and exact 3
One sample test proportions, normal approximation proc freq binomial(p = <insert null>) 4
One sample test proportions, exact proc freq exact binomial 4
Chi-squared proc freq chisq expected 8
Fisher’s exact proc freq Same as chi-squared 8
Likelihood ratio test proc freq Same as chi-squared 9
Two sample test of proportions proc freq expected riskdiff(equal var = null) 9
Logistic regression proc logistic 13, 14, & 15
GOF & Hosmer Lemeshow proc logistic scale = none / lackfit 16 & 17
Breslow-Day test proc freq cmh 19
Cochran-Mantel-Haenzsel test proc freq cmh 20
McNemar’s test proc freq agree 21
Conditional logistic regression proc logistic strata 22
Cochran-Armitage Trend test proc freq trend 23

Measures of association

Use the following table to locate procedures for measures of association. If you are unsure of the usage of a procedure, refer to the online SAS guide. Lecture numbers are linked to examples.

Measure SAS procedure Arguments needed Lecture number
Odds ratio (2 x k table) proc freq relrisk 5 & 7
Risk ratio (2 x k table) proc freq relrisk 5 & 7
Risk difference proc freq riskdiff 5 & 7

Power

Use the following table to locate procedures for power. If you are unsure of the usage of a procedure, refer to the online SAS guide. Lecture numbers are linked to examples.

Power analysis SAS procedure Notes Lecture
Power from sample size proc power power = . 10
Sample size from power (balanced) proc power npergroup = . 10
Sample size from power (unbalanced) proc power must specify groupweights 10