Statistical Modeling
Dr. Courtney Brown
Assignment #8
For this assignment, you will be extending the previous assignment using dummy variables to learn how to construct graphs using SAS. As with all assignments in this course, remember that this is an assignment of scientific writing, so explain your results clearly so that anyone can understand your findings. In your previous assignment, you constructed intercept and slope dummy variables and tested whether the estimated parameter values were different from one another. Now you want to construct plots that illustrate the different slopes and intercepts. If you have not already done so, be sure to read the assigned article by Gerald Wright, "Linear Models for Evaluating Conditional Relationships."
The program below is similar to the one you used for your last assignment, but it now contains language to produce "print ready" graphs that plot the regression model's predicted values for the dependent variable on top of the variable you used to construct the slope dummy variables (in the case below, partyid). The graphs are constructed by first outputting a new data set that is a copy of the panel80 data set but which has one extra variable. This new variable is the predicted values for your dependent variable (carfeel3) as obtained by your regression model. You want to construct a graph that plots the predicted values for carfeel3 for African Americans and white Americans on the same graph, but in a way that you can still tell them apart. This type of plot is called an "overlay". You also want to put a regression line through the African American data and another regression line through the white American data. Be sure to change the intercept and slope dummy variables in the program below to match those that you used for your previous assignment.
Write up a few pages that explain this graph. You can use the text of your previous assignment, and then just add a couple more pages to explain the graph. Be sure to include the graph as well, of course. Remember that figures get placed at the end of all your pages, after the tables. Note that partyid is the only variable on the horizontal axis for the graph, but the predicted values for your dependent variable are based on the values of all of the independent variables in the regression model, not just partyid. This is why the B and W values in the plot are scattered around their respective regression lines rather than being right on the lines. Think about this when you look at your graph until it makes sense.
Finally, after you get your one plot to look great, try to make one additional figure by changing a variable on the plot statement below. You may also want to try to change something on one or more of the symbol statements. For example, you may want to plot the real carfeel3 data rather than the predicted (pcarfeel3). Just try something to get another plot that is cool and fun. This is the only way you will really learn how to do this stuff. Play around and try things. I cannot emphasize this enough.
By the way, the "vorigin=3 in" in the second line of the program (in the GOPTIONS statement) places the bottom of the graph three inches up fom the bottom of the printed page. This is nice when you look at it on a piece of paper. But it sometimes makes it difficult to see the plot on the screen since the plot is often up too high. To get around that, you can change the statement to read "vorigin=0 in" until you are ready to print the plot on a piece of paper. Then change the value from 0 back to 3.
Here is some help in interpreting the variable values.
Feeling thermometers: 0 to 100, with 50 being neutral.
Liberal/conservative scales: 1=extreme liberal, 7=extreme conservative.
Inter1-Inter3: respondent's interest in the campaign/low to high.
P1 through P4: This refers to the panel wave, January, July, Sept. & Nov.
Expectation to vote: 5 will vote, 1 no.
Education: years of education
Income: not in thousands of dollars, but a scale, low to high.
Frequency of church attendance: low to high
R : This refers to the respondent.
Generally all of the variables go from to low to high. Thus, if you see a variable
and you do not know the coding scheme, assume that a small number means less
and a larger number means more. The other codes are in the variable labels.
Most of the variables for this data set originated as a panel study supplied by the Interuniversity Consortium for Social and Political Research (ICPSR). Emory University is a member of the ICPSR. I have added some contextual variables to the survey data set by extracting these contextual data from a separate ICPSR data set.
libname windata 'e:\';
GOPTIONS lfactor=10 hsize=6 in vsize=6 in horigin=1 in vorigin=3 in;
options nocenter ls=120;
**********************************************************;
* CLASS, NOTE THAT IF YOU BEGIN A LINE WITH AN ASTERISK *
* THEN YOU CAN PUT NOTES IN YOUR PROGRAM FILES. THIS IS
* LIKE A COMMENT CARD IN SPSS. HOWEVER, REMEMBER
* TO EVENTUALLY PUT A FINAL SEMICOLON AT THE END OF YOUR COMMENTS.;
***********************************************************;
* NOTE THAT I INDENT SOME STATEMENTS. THIS
* IS JUST FOR NEATNESS.;
***********************************************************;
* COPYRIGHT (c) Courtney Brown 2005, All Rights Reserved;
* Permission granted to use this file and computer code for any nonprofit and
* educational purposes, including classroom instruction.
* No further permission required.
* Please cite source as "From www.courtneybrown.com";
***********************************************************;
DATA panel80;SET windata.panel80;
if ((vote eq 1) or (vote eq 2));
if (age le 43) then generation = 'youth';
if (age gt 43) then generation = 'older';
if (partyid le 2) then party = 'Dem.';
if ((partyid ge 3) and (partyid le 5)) then party = 'Ind.';
if (partyid ge 6) then party = 'Rep.';
gender=sex;
if ((race eq 1) or (race eq 2)); * This gets rid of the "other" category
in race;
if (race eq 1) then white=1;else white=0; * This creates the intercept dummy
variable for whites;
if (race eq 2) then black=1;else black=0; * This creates the intercept dummy
variable for African Americans;
wpartyid=white*partyid; * This is one way of creating the whites only slope
dummy variable for partyid;
bpartyid=black*partyid; * This is one way of creating the blacks only slope
dummy variable for partyid;
proc format;
value VoteFmt 1 ='Reagan'
2 ='Carter'
3 ='Clark'
4 ='Anderson';
value GenderFmt 1 ='Male'
2 ='Female';
proc print;var white black wpartyid bpartyid partyid;
proc means;
proc contents;
proc reg;
model carfeel3 = white black wpartyid bpartyid gender age / stb tol noint;
test white=black;
test wpartyid=bpartyid;
output out=newpanel80 p=pcarfeel3;
title 'Carter Feelings';
data newpanel80; set newpanel80;
*The next few lines create white and black variables out of the partyid dummy
variables;
*Remember that blacks=0 in wpartyid, and whites=0 in bpartyid;
*Now missing values are created for blacks in wpartyid and for whites in bpartyid;
*The new variables are wonlypid and bonlypid;
*We create these new variables so that the graphs will not include the missing
value data;
wonlypid=wpartyid;if wonlypid=0 then wonlypid=.;
bonlypid=bpartyid;if bonlypid=0 then bonlypid=.;
*Now we create some useful symbol statements that will be used in our graphs.;
symbol1 color=black v=NONE f=centb i=join;
symbol2 color=black f=centb v='.' height=2 interpol=R;
symbol3 color=black f=centb v='W' height=1 interpol=R;
symbol4 color=black f=centb v='B' height=1 interpol=R;
*Now we make our plots;
proc gplot data=newpanel80;
axis1 color=black
value=(h=1.5 f=swissb c=black)
label=(h=1.3 a=90 r=0 f=swissb c=black 'Respondent feeling for Carter');
axis2 color=black order=0 to 8 by 1 minor=none
value=(h=1.5 f=swissb c=black)
label=(h=1.3 f=swissb c=black 'Partisan Identification');
PLOT pcarfeel3*wonlypid=3 pcarfeel3*bonlypid=4 / overlay
vaxis=axis1 haxis=axis2 vminor=0 hminor=0;
TITLE 'Figure 1: Actual and Predicted Feelings for Carter, September 1980';
run;
quit;