SAS Tutorials
Home

also

New
Quick Reference Guide for SAS

BeSmartNotes (tm)

SAS BeSmartNotes Quick Reference Guide

Click for more info

Order

 

For quick and simple statistical analysis use WINKS SDA
Click for more info

 

 

 

 

One-Way Frequency Tables using SAS

PROC FREQ

See www.stattutorials.com/SASDATA for files mentioned in this tutorial

© TexaSoft, 2006

 

These SAS statistics tutorials briefly explain the use and interpretation of standard statistical analysis techniques for Medical, Pharmaceutical, Clinical Trials, Marketing or Scientific Research. The examples include how-to instructions for SAS Software

 

 

Creating One-Way Frequency Tables with PROC FREQ

 

Data that are collected as counts require a specific kind of data analysis. It doesn’t make sense to calculate means and standard deviations on categorical data. Instead, categorical data is analyzed by creating frequency and crosstabulation tables. The primary procedure within SAS for this kind of analysis is PROC FREQ.  

 

This tutorial covers the creation and analysis of a single variable frequency table using the PROC FREQ procedure.

 

The syntax for PROC FREQ is:

 

PROC FREQ <options>; TABLES specification; <statements>;

 

Commonly used options used in PROC FREQ is:

 

      DATA =      (Specify which data set to use)

      ORDER=FREQ  (Output data in frequency order)

 

A commonly used statement used with PROC FREQ is:

 

BY varlist  (Specify BY list to create subsetted analyses)

 

The TABLES statement is used to request which tables will be produced.  For example, to obtain counts of the number of subjects in each GROUP categories, use the code:

 

PROC FREQ; TABLES GROUP;

 

To produce a chi-square test for goodness of fit, use code such as

 

proc freq;

  tables color / chisq nocum testp=(0.5625 0.1875 0.1875 0.0625);

 

(See details about these options later in the tutorial.)


 

Creating a One-Way Frequency Table

 

When only one variable is used in the TABLES statement, PROC FREQ produces a frequency table. For example, using the data from the SOMEDATA SAS data set, the following code produces a frequency table using data in the STATUS variable: (PROCFREQ1.SAS)

 

* ASSUMES YOU HAVE A SAS LIBRARY NAMED MYDATA;

ODS RTF;

PROC FREQ DATA=MYDATA.SOMEDATA; TABLES STATUS;

TITLE 'Simple Example of PROC FREQ';

RUN;

PROC FREQ DATA=MYDATA.SOMEDATA ORDER=FREQ; TABLES STATUS;

TITLE 'Simple Example of PROC FREQ';

RUN;

ODS RTF CLOSE;

 

The output for this job is:

 

Socioeconomic Status

STATUS

Frequency

Percent

Cumulative
Frequency

Cumulative
Percent

1

3

6.00

3

6.00

2

7

14.00

10

20.00

3

6

12.00

16

32.00

4

8

16.00

24

48.00

5

26

52.00

50

100.00

 

The frequency gives the count of the number of times the STATUS variable took on the value in the STATUS column. The percent column is the percent of total (50). The Cumulative Frequency and Percent columns report an increasing count or percent for each value of STATUS. Use this type of analysis to discover the distribution of the categories in your data set. For example, in this data, over half of the subjects fall into the STATUS=5 category. If you’d hoped for a representative sample in each category, this shows you that that criteria was not met.

 

Exercise: Using the Order=Freq orders the table by frequency. Change the PROC FREQ line to read

 

PROC FREQ Order=Freq; TABLES STATUS;

 

And rerun the program to get the sorted by frequency output. This helps you identify which categories have the most and fewest counts.

 

                                     

Socioeconomic Status

STATUS

Frequency

Percent

Cumulative
Frequency

Cumulative
Percent

5

26

52.00

26

52.00

4

8

16.00

34

68.00

2

7

14.00

41

82.00

3

6

12.00

47

94.00

1

3

6.00

50

100.00

 

Suppose your data were summarized into counts already. In this case you can use the WEIGHT statement to read in your data. For example (PROCFREQ2.SAS)

 

DATA CDS;

     INPUT @1 CATEGORY $9. @10 NUMBER 3.;

DATALINES;

JAZZ     252

POP       49

CLASSICAL 59

RAP       21

GOSPEL    44

JAZZ      21

;

ODS RTF;

PROC FREQ DATA=CDS ORDER=FREQ; WEIGHT NUMBER;

  TITLE3 'READ IN SUMMARIZED DATA';

  TABLES CATEGORY;

RUN;

ODS RTF CLOSE;

 

Produces the following table:

 

CATEGORY

Frequency

Percent

Cumulative
Frequency

Cumulative
Percent

JAZZ

273

61.21

273

61.21

CLASSICAL

59

13.23

332

74.44

POP

49

10.99

381

85.43

GOSPEL

44

9.87

425

95.29

RAP

21

4.71

446

100.00

 

Notice that although the data were summarized, there were two observations in the data set for “JAZZ” which were combined into a single category in the table.

 

 

Testing Goodness of Fit in a One-Way Table

 

A goodness-of-fit test of a single population is a test to determine if the distribution of observed frequencies in the sample data closely matches the expected number of occurrences under a hypothetical distribution of the population. The data observations must be independent and each data value can be counted in one and only one category. It is also assumed that the number of observations is fixed. The hypotheses being tested are

 

Ho: The population follows the hypothesized distribution.
Ha: The population does not follow the hypothesized distribution.

 

A Chi-Square statistic is calculation and a decision can be made based on the p-value associated with that statistic. A low p-value indicates rejection of the null hypothesis. That is, a low p-value indicates that the data do not follow the hypothesized, or theoretical, distribution.

 

For example, data for this test comes from Zar (1999), page 465. According to a genetic theory, crossbred pea plants show a 9:3:3:1 ratio of yellow smooth, yellow wrinkled, green smooth, green wrinkled offspring. Out of 250 plants, under the theoretical ratio (distribution) of 9:3:3:1, you would expect about

 

(9/16)x250=140.625 yellow smooth peas (56.25%)
(3/16)x250=46.875 yellow wrinkled peas (18.75%)
(3/16)x250=46.875 green smooth peas (18.75%)
(1/16)x250=15.625 green wrinkled peas (6.25%)

 

After growing 250 of these pea plants, you observe that

 

152 have yellow smooth peas
39 have yellow wrinkled peas
53 have green smooth peas
6 have green wrinkled peas

 

You can perform this analysis using the following SAS program, (PROCFREQ3.SAS)

 

DATA GENE;

     INPUT @1 COLOR $13. @15 NUMBER 3.;

DATALINES;

YELLOWSMOOTH  152

YELLOWWRINKLE  39

GREENSMOOTH    53

GREENWRINKLE    6

;

* HYPOTHESIZING A 9:3:3:1 RATIO;

PROC FREQ DATA=GENE ORDER=DATA; WEIGHT NUMBER;

  TITLE3 'GOODNESS OF FIT ANALYSIS';

  TABLES COLOR / CHISQ NOCUM TESTP=(0.5625 0.1875 0.1875 0.0625);

RUN;

 

  • The CHISQ requests that a Chi-Square test be performed

  • The TESTP=() statement specifies the hypothesized proportions to be tested. (Your could have used the TESTF=() and used expected frequencies instead.)

  • The NOCUM option suppresses cumulative frequencies

  • Use the ORDER=DATA option to cause SAS to displayed data in the same order as they are entered in the input data set.

 

The result of this analysis is:

 

                                   

COLOR

Frequency

Percent

Test
Percent

YELLOWSMOOTH

152

60.80

56.25

YELLOWWRINKLE

39

15.60

18.75

GREENSMOOTH

53

21.20

18.75

GREENWRINKLE

6

2.40

6.25

 

 

 

Chi-Square Test
for Specified Proportions

Chi-Square

8.9724

DF

3

Pr > ChiSq

0.0297

 

                                       Sample Size = 250

 

In this case, the p-value for the Chi-Square test is < 0.05 and we reject the null hypothesis and conclude that the peas do not come from a population having the (9:3:3:1) phenotypic ratios.

 

  

End of tutorial

See http://www.stattutorials.com/SAS

 

Get the SAS BeSmartNotes Quick Reference Guide

Order


| Send comments | Back to Tutorial Menu | TexaSoft |

© Copyright TexaSoft, 1996-2007

This site is not affiliated with SAS(r) or SAS Institute