Statistical Modeling
Dr. Courtney Brown
Assignment #4
You will be working with multiple regression using the same data set as you used in a previous assignment. As with all assignments in this course, remember that this is an assignment of scientific writing, so explain your results clearly so that anyone can understand your findings. If needed, you should first, download the survey data set for the Reagan vs. Carter election in 1980. The data set is zipped, and it is called "panel80.df." It is an R data frame. This data set is called a "panel study" because there were four waves of interviews that took place throughout 1980 to capture the sense of the campaign during (1) the entry into the primary season, (2) the post-primary season, (3) the main campaign in September, and (4) after the vote in November. After you download the data set, you will need to extract it since it is "zipped." Use Windows Explorer to do this by right-mouse clicking the zipped file and choosing "Extract." Put the extracted data set on your R_Working_Directory. Now you can run the R script below. Cut and paste everything below this line into the R console to get started.
For this assignment, you will try to explain both feelings for President Jimmy Carter and feelings for challenger Ronald Reagan using the variables CARFEEL3 and REAFEEL3. These variables will be your dependent variables, and they are for feeling thermometers asked of the survey respondents in September of 1980. You are to try to find interesting sets of independent variables that explain these two dependent variables. Use the variable labels to find new and interesting independent variables to include with your multiple regression. Find some story to tell from your exploratory analysis of these data. Then write up what you find in a three to five page analysis. You will want to include both unstandardized and standardized regression coefficients in your tables. Discuss which variables have statistically significant coefficients as well. What are the fits of your models, and what do these fits mean? Be sure to interpret the statistics. Don't just report them.
Here is some help in interpreting the variable values.
Feeling thermometers: 0 to 100, with 50 being neutral.
Liberal/conservative scales: 1=extreme liberal, 7=extreme conservative.
Inter1-Inter3: respondent's interest in the campaign/low to high.
P1 through P4: This refers to the panel wave, January, July, Sept. & Nov.
Expectation to vote: 5 will vote, 1 no.
Education: years of education
Income: not in thousands of dollars, but a scale, low to high.
Frequency of church attendance: low to high
R : This refers to the respondent.
Generally all of the variables go from to low to high. Thus, if you see a variable
and you do not know the coding scheme, assume that a small number means less
and a larger number means more. The other codes are in the variable labels.
Most of the variables for this data set originated as a panel study supplied by the Interuniversity Consortium for Social and Political Research (ICPSR). Emory University is a member of the ICPSR. I have added some contextual variables to the survey data set by extracting these contextual data from a separate ICPSR data set.
Here are the variables in the data set:
V3543= NEIGHBOR #1-VOTE F PRES REF=3543 ID=763
V3547= NEIGHBOR #2-VOTE F PRES REF=3547 ID=763
V3551= NEIGHBOR #3-VOTE F PRES REF=3551 ID=763
INTER1= INTEREST IN POLITICS FOR R,P1
INTER2= INTEREST IN POLITICS FOR R,P2
INTER3= INTEREST IN POLITICS FOR R,P3
INTER4= INTEREST IN POLITICS FOR R,P4
INFO1= INFORMATION LEVEL FROM NEWS FOR R,P1
INFO2= INFORMATION LEVEL FROM NEWS FOR R,P2
DEMCAND1= FEELING THERM. FOR ALL DEM. CANDS,P1
DEMCAND2= FEELING THERM. FOR ALL DEM CANDS, P2
DEMCAND3= FEELING THERM. FOR ALL DEM CANDS, P3
REPCAND1= FEELING THERM FOR ALL REP CANDS, P1
REPCAND2= FEELING THERM FOR ALL REP CANDS, P2
REPCAND3= FEELING THERM FOR ALL REP CANDS, P3
DEMPART1= FEELING THERM FOR DEM PARTY, P1
DEMPART2= FEELING THERM FOR DEM PARTY, P2
DEMPART3= FEELING THERM FOR DEM PARTY, P3
REPPART1= FEELING THERM FOR REP PARTY, P1
REPPART2= FEELING THERM FOR REP PARTY, P2
REPPART3= FEELING THERM FOR REP PARTY, P3
PARTIES1= FEELING THERM FOR BOTH PARTIES,P1
PARTIES2= FEELING THERM FOR BOTH PARTIES,P2
PARTIES3= FEELING THERM FOR BOTH PARTIES,P3
INDFEEL1= FEELING THERM FOR INDEPENDENTS,P1
INDFEEL2= FEELING THERM FOR INDEPENDENTS,P2
INDFEEL3= FEELING THERM FOR INDEPENDENTS,P3
CARFEEL1= FEELING THERM FOR CARTER, P1
CARFEEL2= FEELING THERM FOR CARTER, P2
CARFEEL3= FEELING THERM FOR CARTER, P3
REAFEEL1= FEELING THERM FOR REAGAN, P1
REAFEEL2= FEELING THERM FOR REAGAN, P2
REAFEEL3= FEELING THERM FOR REAGAN, P3
KENFEEL1= FEELING THERM FOR KENNEDY, P1
KENFEEL2= FEELING THERM FOR KENNEDY, P2
KENFEEL3= FEELING THERM FOR KENNEDY, P3
NEWV121= LIB/CON SCALE FOR R, P1
NEWV2125= LIB/CON SCALE FOR R, P2
NEWV3213= LIB/CON SCALE FOR R, P3
NEWV122= LIB/CON SCALE FOR CARTER, P1
NEWV2126= LIB/CON SCALE FOR CARTER, P2
NEWV3214= LIB/CON SCALE FOR CARTER, P3
NEWV123= LIB/CON SCALE FOR REAGAN, P1
NEWV2127= LIB/CON SCALE FOR REAGAN, P2
NEWV3215= LIB/CON SCALE FOR REAGAN, P3
NEWV130= LIB/CON SCALE FOR REPS, P1
NEWV2134= LIB/CON SCALE FOR REPS, P2
NEWV3224= LIB/CON SCALE FOR REPS, P3
NEWV131= LIB/CON SCALE FOR DEMS, P1
NEWV2135= LIB/CON SCALE FOR DEMS, P2
NEWV3225= LIB/CON SCALE FOR DEMS, P3
PARTYID1= PARTY ID, P1
PARTYID2= PARTY ID, P2
PARTYID3= PARTY ID, P3
PARTYID4= PARTY ID, P4
PARTYID= 1 STRONG DEMOCRAT, 7 STRONG REPUBLICAN
NEWV251= EXPECTATION TO VOTE FOR R,P1
NEWV2272= EXPECTATION TO VOTE FOR R,P2
NEWV3081= EXPECTATION TO VOTE FOR R,P3
COMMUN1= R CUMMUNICATED ABOUT CAMPAIGN,P1
COMMUN2= R COMMUNICATED ABOUT CAMPAIGN,P2
COMMUN4= R COMMUNICATED ABOUT CAMPAIGN,P4
COMMALL= R COMMUNICATED ABOUT CAMPAIGN,ALL
EDUC= EDUCATION OF R
INC= INCOME OF R
REL= RELIGION OF R
RELFREQ= FREQ OF CHURCH ATTENDENCE FOR R
STATUS= STATUS (INC+ED) OF R
STATE= STATE OF RESIDENCE FOR R
PRESTOCO= TOTAL PRESIDENTIAL VOTE, COUNTY
PRESTOST= TOTAL PRES. VOTE, STATE 9
CONGTOCO= TOTAL CONGRESSIONAL VOTE,COUNTY
PDEMCONT= PROP. PRES DEM VOTE, COUNTY
PREPCONT= PROP. PRES REP VOTE, COUNTY
PDEMSTAT= PROP. PRES DEM VOTE, STATE
PREPSTAT= PROP. PRES REP VOTE, STATE
CDEMCONT= PROP. CONG. DEM VOTE, COUNTY
CREPCONT= PROP. CONG. REP VOTE, COUNTY
RFRIENDS= ALL 3 NEIGHBORS INTEND VOTE REAGAN
DFRIENDS= ALL 3 NEIGHBORS INTEND VOTE CARTER
MFRIENDS= 3 NEIGHBORS SPLIT IN VOTE INTENTION
AGE
SEX= 1 IS MALE AND 2 IS FEMALE
RACE= 1 IS WHITE, 2 IS BLACK, 3 IS OTHER
REGION= THE SOLID SOUTH IS 4
DIDVOTE= R VOTED 1 IS YES AND 2 IS NO
VOTE= 1 REAGAN 2 CARTER 3 CLARK 4 ANDERSO
PARTREG= PARTY REGISTRATION 1 DEM 2 IND 3 REP
VOTEVALI= VOTER VALIDATION 1 VALIDATED 2 NO
Here is some R code to get you started. The rest is up to you. Further below is a SAS program that may give you some additional ideas. But use R for this assignment.
# First we get our data.
mydata <- read.table("panel80.df")
# attach(mydata) # In case you want to work with the variables directly
names(mydata) # This shows us all the variable names.
# options(scipen=20) # suppress "scientific" notation
options(scipen=NULL) # Brings things back to normal
reagan.model <- lm(REAFEEL3 ~ INC + AGE + PARTYID, data=mydata)
summary(reagan.model)
layout(matrix(c(1,2,3,4),2,2)) # optional 4 graphs/page
plot(reagan.model) # These are diagnostic plots.
windows()
carter.model <- lm(CARFEEL3 ~ INC + AGE + PARTYID, data=mydata)
summary(carter.model)
layout(matrix(c(1,2,3,4),2,2)) # optional 4 graphs/page
plot(carter.model) # These are diagnostic plots.
mysubsetdata<-subset(mydata, select=c(REAFEEL3, CARFEEL3, INC, AGE, PARTYID)) #This keeps only the variables that we are using.
cor(mysubsetdata, use = "pairwise.complete.obs") # A correlation matrix for the variables in the regression
cov(mysubsetdata, use = "pairwise.complete.obs") # A covariance matrix for the variables in the regression
# Now let's get the standardized regression coeficients
sdvariables <- sd(mysubsetdata, na.rm = TRUE) # This gets the standard deviations of all the variables.
sdvariables # This prints out the standard deviations, which is not very useful but nice to see.
mystandardizeddata <- as.data.frame(scale(mysubsetdata, center=FALSE, scale=sdvariables) ) # standardize variables
var(mystandardizeddata, use = "pairwise.complete.obs") # Note that the variance-covariance matrix = correlation matrix
carter.model2 <- lm(CARFEEL3 ~ INC + AGE + PARTYID, data=mystandardizeddata)
summary(carter.model2)
* Below is the SAS code that also works for this assignment;
libname windata 'e:\';
GOPTIONS lfactor=10 hsize=6 in vsize=6 in horigin=1 in vorigin=1 in;
options nocenter ls=120;
**********************************************************;
* CLASS, NOTE THAT IF YOU BEGIN A LINE WITH AN ASTERISK *
* THEN YOU CAN PUT NOTES IN YOUR PROGRAM FILES. THIS IS
* LIKE A COMMENT CARD IN SPSS. HOWEVER, REMEMBER
* TO EVENTUALLY PUT A FINAL SEMICOLON AT THE END OF YOUR COMMENTS.;
***********************************************************;
* NOTE THAT I INDENT SOME STATEMENTS. THIS
* IS JUST FOR NEATNESS.;
***********************************************************;
* COPYRIGHT (c) Courtney Brown 2005, All Rights Reserved;
* Permission granted to use this file and computer code for any nonprofit and
* educational purposes, including classroom instruction.
* No further permission required.
* Please cite source as "From www.courtneybrown.com";
***********************************************************;
DATA panel80;SET windata.panel80;
proc reg;
model carfeel3 = inc age partyid / stb;
title 'Carter Feelings';
proc reg;
model reafeel3 = inc age partyid / stb;
title 'Reagan Feelings';
run;
quit;