SAS

Getting started

This document is intended to be used as a guide to SAS for P8120: Analysis of Categorical Data. This document is not exhaustive; please consult one of the TA’s, Dr. Mauro, or the online SAS user guide for more help if needed.

SAS is available on all computers at the CUMC library or can be purchased through the IT department in the Hammer Library for $85 (Windows computers only). In addition, there is now a free version of SAS that works on multiple operating systems called SAS University Edition. Instructions for download can be found here. Another free option is SAS OnDemand, which some students may already be using in P8483. Instructions to download and use the software can be found here.

SAS is a powerful statistical software and programming language commonly used in universities, government, and the private sector. Programming in SAS can roughly be broken down into two sections: data steps (begun by a data statement) and statistical procedures (begun by a proc statement). Almost all code blocks in SAS begin with one of these two statements and almost all code blocks are ended with run;. Within code blocks, arguments in SAS are ended with a semicolon.

Workflow

When starting a new session the basic workflow is:

open a new program
type some code
run code
export necessary procedures

Always remember to save your work often.

To open a new program: File > New Program
To run code: highlight the code of interest and click the running individual on the toolbar
To save a program: File > Save As…

Importing data and manipulation

Importing data in SAS can be done a couple of different ways. We recommend using the import procedure.

The import procedure is the most straightforward way of importing data into SAS. The procedure requires two arguments and a third optional requirement is recommended: out, datafile, and dbms. The out argument specifies the name of the newly created SAS dataset; the datafile argument specifies the complete path for the file containing the data to be imported in quotation marks; the dbms argument is optional and specifies the file type to be parsed. The most commonly used identifiers are csv, tab, xlsx, and dlm. If importing a csv file, this argument is not necessary. Further help can be found here.

If using SAS Studio, you must first upload the data into SAS Studio and then reference that file path.

Syntax

proc import out = <SAS dataset name>
            datafile = "filepath"
            dbms = <identifier> replace;
run;

Example

proc import out = azt
            datafile = "C:\Users\niwi8\OneDrive\Documents\p8120_ta\p8120\data\AZT.csv"
            dbms = csv replace; 
run;

The data step in SAS is where all data manipulation occurs, this includes being able to import data and create data sets by “hand”. Sometimes you will be asked to conduct analyses on group-level instead of individual-level data. In these instances, you will need to create a dataset using a data step. The first argument to any data step is the name of the dataset you are creating. The next argument should be input where you specify variable names and types (for this course you only need to worry about numeric variables and character variables; if a variable is of type character it should be followed by a $). After input, the next argument is cards followed by a semicolon. After cards, you begin writing your data; there should only be as many columns as there are variables you created and after the last line of data you must go to the next line and include a semicolon.

Syntax

data <dataset name>
  input <a character variable> $ <a numeric variable>
  cards; 
  some_data 123
  ; 
run;

Example

data vietnam; 
  input service $ sleep $ count; 
  cards; 
  yes yes 173
  yes no 160
  no yes 599
  no no 851
  ; 
run;

Output delivery system (ODS)

You will be asked to include SAS output in homework assignments. SAS has a built in output delivery system (ODS) that can save output as an rtf file to be used in a word document. The basic usage is:

ods rtf file = "C://my_computer/contains_some_file/some_file.rtf";

<insert SAS procedure>

ods rtf close;

You are not required to use this method when including SAS output in assignments.

Hypothesis tests

Use the following table to locate procedures for hypothesis tests. If you are unsure of the usage of a procedure, refer to the online SAS guide. Lecture numbers are linked to examples.

Hypothesis test	SAS procedure	Arguments needed	Lecture number
Confidence intervals, proportions	`proc freq`	Same as normal and exact	3
One sample test proportions, normal approximation	`proc freq`	`binomial(p = <insert null>)`	4
One sample test proportions, exact	`proc freq`	`exact binomial`	4
Chi-squared	`proc freq`	`chisq expected`	8
Fisher’s exact	`proc freq`	Same as chi-squared	8
Likelihood ratio test	`proc freq`	Same as chi-squared	9
Two sample test of proportions	`proc freq`	`expected riskdiff(equal var = null)`	9
Logistic regression	`proc logistic`		13, 14, & 15
GOF & Hosmer Lemeshow	`proc logistic`	`scale = none` / `lackfit`	16 & 17
Breslow-Day test	`proc freq`	`cmh`	19
Cochran-Mantel-Haenzsel test	`proc freq`	`cmh`	20
McNemar’s test	`proc freq`	`agree`	21
Conditional logistic regression	`proc logistic`	`strata`	22
Cochran-Armitage Trend test	`proc freq`	`trend`	23

Measures of association

Use the following table to locate procedures for measures of association. If you are unsure of the usage of a procedure, refer to the online SAS guide. Lecture numbers are linked to examples.

Measure	SAS procedure	Arguments needed	Lecture number
Odds ratio (2 x k table)	`proc freq`	`relrisk`	5 & 7
Risk ratio (2 x k table)	`proc freq`	`relrisk`	5 & 7
Risk difference	`proc freq`	`riskdiff`	5 & 7

Power

Use the following table to locate procedures for power. If you are unsure of the usage of a procedure, refer to the online SAS guide. Lecture numbers are linked to examples.

Power analysis	SAS procedure	Notes	Lecture
Power from sample size	`proc power`	`power = .`	10
Sample size from power (balanced)	`proc power`	`npergroup = .`	10
Sample size from power (unbalanced)	`proc power`	must specify `groupweights`	10