Statistical Modeling

Dr. Courtney Brown

Assignment #3

You will be working with cross-tabulation tables and a new data set in this assignment. Remember that this is an assignment of scientific writing, so explain your results clearly so that anyone can understand your findings. First, download the survey data set for the Reagan vs. Carter election in 1980. The data set is zipped, and it is called "panel80.df." It is an R data frame. This data set is called a "panel study" because there were four waves of interviews that took place throughout 1980 to capture the sense of the campaign during (1) the entry into the primary season, (2) the post-primary season, (3) the main campaign in September, and (4) after the vote in November. After you download the data set, you will need to extract it since it is "zipped." Use Windows Explorer to do this by right-mouse clicking the zipped file and choosing "Extract." Put the extracted data set on your R_Working_Directory. Now you can run the R script below. Cut and paste everything below this line into the R console to get started.

Here is some help in interpreting the variable values.
Feeling thermometers: 0 to 100, with 50 being neutral.
Liberal/conservative scales: 1=extreme liberal, 7=extreme conservative.
Inter1-Inter3: respondent's interest in the campaign/low to high.
P1 through P4: This refers to the panel wave, January, July, Sept. & Nov.
Expectation to vote: 5 will vote, 1 no.
Education: years of education
Income: not in thousands of dollars, but a scale, low to high.
Frequency of church attendance: low to high
R : This refers to the respondent.
Generally all of the variables go from to low to high. Thus, if you see a variable and you do not know the coding scheme, assume that a small number means less and a larger number means more. The other codes are in the variable labels.

Most of the variables for this data set originated as a panel study supplied by the Interuniversity Consortium for Social and Political Research (ICPSR). Emory University is a member of the ICPSR. I have added some contextual variables to the survey data set by extracting these contextual data from a separate ICPSR data set.

Here are the variables in the data set:

V3543= NEIGHBOR #1-VOTE F PRES REF=3543 ID=763
V3547= NEIGHBOR #2-VOTE F PRES REF=3547 ID=763
V3551= NEIGHBOR #3-VOTE F PRES REF=3551 ID=763
INTER1= INTEREST IN POLITICS FOR R,P1
INTER2= INTEREST IN POLITICS FOR R,P2
INTER3= INTEREST IN POLITICS FOR R,P3
INTER4= INTEREST IN POLITICS FOR R,P4
INFO1= INFORMATION LEVEL FROM NEWS FOR R,P1
INFO2= INFORMATION LEVEL FROM NEWS FOR R,P2
DEMCAND1= FEELING THERM. FOR ALL DEM. CANDS,P1
DEMCAND2= FEELING THERM. FOR ALL DEM CANDS, P2
DEMCAND3= FEELING THERM. FOR ALL DEM CANDS, P3
REPCAND1= FEELING THERM FOR ALL REP CANDS, P1
REPCAND2= FEELING THERM FOR ALL REP CANDS, P2
REPCAND3= FEELING THERM FOR ALL REP CANDS, P3
DEMPART1= FEELING THERM FOR DEM PARTY, P1
DEMPART2= FEELING THERM FOR DEM PARTY, P2
DEMPART3= FEELING THERM FOR DEM PARTY, P3
REPPART1= FEELING THERM FOR REP PARTY, P1
REPPART2= FEELING THERM FOR REP PARTY, P2
REPPART3= FEELING THERM FOR REP PARTY, P3
PARTIES1= FEELING THERM FOR BOTH PARTIES,P1
PARTIES2= FEELING THERM FOR BOTH PARTIES,P2
PARTIES3= FEELING THERM FOR BOTH PARTIES,P3
INDFEEL1= FEELING THERM FOR INDEPENDENTS,P1
INDFEEL2= FEELING THERM FOR INDEPENDENTS,P2
INDFEEL3= FEELING THERM FOR INDEPENDENTS,P3
CARFEEL1= FEELING THERM FOR CARTER, P1
CARFEEL2= FEELING THERM FOR CARTER, P2
CARFEEL3= FEELING THERM FOR CARTER, P3
REAFEEL1= FEELING THERM FOR REAGAN, P1
REAFEEL2= FEELING THERM FOR REAGAN, P2
REAFEEL3= FEELING THERM FOR REAGAN, P3
KENFEEL1= FEELING THERM FOR KENNEDY, P1
KENFEEL2= FEELING THERM FOR KENNEDY, P2
KENFEEL3= FEELING THERM FOR KENNEDY, P3
NEWV121= LIB/CON SCALE FOR R, P1
NEWV2125= LIB/CON SCALE FOR R, P2
NEWV3213= LIB/CON SCALE FOR R, P3
NEWV122= LIB/CON SCALE FOR CARTER, P1
NEWV2126= LIB/CON SCALE FOR CARTER, P2
NEWV3214= LIB/CON SCALE FOR CARTER, P3
NEWV123= LIB/CON SCALE FOR REAGAN, P1
NEWV2127= LIB/CON SCALE FOR REAGAN, P2
NEWV3215= LIB/CON SCALE FOR REAGAN, P3
NEWV130= LIB/CON SCALE FOR REPS, P1
NEWV2134= LIB/CON SCALE FOR REPS, P2
NEWV3224= LIB/CON SCALE FOR REPS, P3
NEWV131= LIB/CON SCALE FOR DEMS, P1
NEWV2135= LIB/CON SCALE FOR DEMS, P2
NEWV3225= LIB/CON SCALE FOR DEMS, P3
PARTYID1= PARTY ID, P1
PARTYID2= PARTY ID, P2
PARTYID3= PARTY ID, P3
PARTYID4= PARTY ID, P4
PARTYID= 1 STRONG DEMOCRAT, 7 STRONG REPUBLICAN
NEWV251= EXPECTATION TO VOTE FOR R,P1
NEWV2272= EXPECTATION TO VOTE FOR R,P2
NEWV3081= EXPECTATION TO VOTE FOR R,P3
COMMUN1= R CUMMUNICATED ABOUT CAMPAIGN,P1
COMMUN2= R COMMUNICATED ABOUT CAMPAIGN,P2
COMMUN4= R COMMUNICATED ABOUT CAMPAIGN,P4
COMMALL= R COMMUNICATED ABOUT CAMPAIGN,ALL
EDUC= EDUCATION OF R
INC= INCOME OF R
REL= RELIGION OF R
RELFREQ= FREQ OF CHURCH ATTENDENCE FOR R
STATUS= STATUS (INC+ED) OF R
STATE= STATE OF RESIDENCE FOR R
PRESTOCO= TOTAL PRESIDENTIAL VOTE, COUNTY
PRESTOST= TOTAL PRES. VOTE, STATE 9
CONGTOCO= TOTAL CONGRESSIONAL VOTE,COUNTY
PDEMCONT= PROP. PRES DEM VOTE, COUNTY
PREPCONT= PROP. PRES REP VOTE, COUNTY
PDEMSTAT= PROP. PRES DEM VOTE, STATE
PREPSTAT= PROP. PRES REP VOTE, STATE
CDEMCONT= PROP. CONG. DEM VOTE, COUNTY
CREPCONT= PROP. CONG. REP VOTE, COUNTY
RFRIENDS= ALL 3 NEIGHBORS INTEND VOTE REAGAN
DFRIENDS= ALL 3 NEIGHBORS INTEND VOTE CARTER
MFRIENDS= 3 NEIGHBORS SPLIT IN VOTE INTENTION
AGE
SEX= 1 IS MALE AND 2 IS FEMALE
RACE= 1 IS WHITE, 2 IS BLACK, 3 IS OTHER
REGION= THE SOLID SOUTH IS 4
DIDVOTE= R VOTED 1 IS YES AND 2 IS NO
VOTE= 1 REAGAN 2 CARTER 3 CLARK 4 ANDERSO
PARTREG= PARTY REGISTRATION 1 DEM 2 IND 3 REP
VOTEVALI= VOTER VALIDATION 1 VALIDATED 2 NO

For this assignment, you will search for an interesting relationship between two or more variables by constructing cross-tabulation tables using variations of the program below. Try all sorts of variables and variable combinations. Look at what relationships have significant chi-square tests. Find some story to tell from your exploratory analysis of these data. Then write up what you find in a three to five page analysis. Be sure to include at least one revealing cross-tabulation table in your analysis. (More than one is fine.) Discuss the relevant percentages. Remember that when you are trying to describe a relationship between two variables, it is normal to compare column numbers when you are using row percentages, and visa versa. Finally, you may want to convert some variables into new variables that have only a few categories. To get you started in the code below, I split income by its mean. Perhaps you may try splitting it by the median. Remember to get rid of the missing values when you do this. Various ways to recode variables can be found HERE. Understand that it is normally not good practice to reduce variation in a variable by creating categories out of continuous data. But sometimes it is heuristically helpful, and we will do it here to practice using tables and to begin recoding variables in R. If you only use categorical variables to begin with, this is not an issue, of course.

# First, load a library that will build nice crosstabulation tables
library(gmodels)
# Now we get our data.
mydata <- read.table("panel80.df")
names(mydata) # This shows us all the variable names.
mynewdata <- mydata[ which(mydata$VOTE <= 2), ] # This gets rid of observations where votes went to minor candidates.
# Now we create an income category variable based on the mean for use in a table.
mynewdata$incomecategories <- ifelse(mynewdata$INC < mean(mynewdata$INC, na.rm = TRUE), 1, 2) # One way to recode variables.
CrossTable(mynewdata$SEX, mynewdata$VOTE, chisq=TRUE, expected = TRUE, format="SAS")
CrossTable(mynewdata$incomecategories, mynewdata$VOTE, chisq=TRUE, expected = TRUE, format="SAS")

 

* Below is the SAS code that performs the analysis for this assignment;

libname windata 'e:\';
GOPTIONS lfactor=10 hsize=6 in vsize=6 in horigin=1 in vorigin=1 in;
options nocenter ls=120;
**********************************************************;
* CLASS, NOTE THAT IF YOU BEGIN A LINE WITH AN ASTERISK *
* THEN YOU CAN PUT NOTES IN YOUR PROGRAM FILES. THIS IS
* LIKE A COMMENT CARD IN SPSS. HOWEVER, REMEMBER
* TO EVENTUALLY PUT A FINAL SEMICOLON AT THE END OF YOUR COMMENTS.;
***********************************************************;
* NOTE THAT I INDENT SOME STATEMENTS. THIS
* IS JUST FOR NEATNESS.;
***********************************************************;
* COPYRIGHT (c) Courtney Brown 2005, All Rights Reserved;
* Permission granted to use this file and computer code for any nonprofit and
* educational purposes, including classroom instruction.
* No further permission required.
* Please cite source as "From www.courtneybrown.com";
***********************************************************;
DATA panel80;SET windata.panel80;
if ((vote eq 1) or (vote eq 2));
if (age le 43) then generation = 'youth';
if (age gt 43) then generation = 'older';
if (partyid le 2) then party = 'Dem.';
if ((partyid ge 3) and (partyid le 5)) then party = 'Ind.';
if (partyid ge 6) then party = 'Rep.';
gender=sex;
proc format;
value VoteFmt 1 ='Reagan'
2 ='Carter'
3 ='Clark'
4 ='Anderson';
value GenderFmt 1 ='Male'
2 ='Female';
proc means;
proc contents;
proc freq;
tables gender*vote generation*vote gender*party generation*party / chisq;
format gender GenderFmt. vote VoteFmt.;
title 'Gender Gap';
run;
quit;