Statistical Modeling

Dr. Courtney Brown

Assignment #1

In this assignment, you will analyze the on-year and off-year pattern of political participation in voting during U.S. congressional elections. In on-year elections (every four years), the congressional elections coincide with the presidential elections. More people go out to vote in those elections. In the off-year elections (the elections between the on-year elections), fewer people vote. Thus, congressional electoral mobilization as a proportion of the total eligible voters is greater in on-year elections than in off-year elections due to the structure of the electoral calendar.

You need to do this assignment in two parts. First you need to obtain the data that are used for this assignment. The data set us named "usparty.txt," and you can get it HERE. The SAS code that was used to create this data set can be found HERE, and you will find it useful as a reference. After unzipping it, place the data set in your R_Working_Directory. You will then conduct your analysis on the data for the years from 1950 to 1988. Here is a partial codebook for the variable in that data set.

YEAR
RCONG= REPUBLICAN CONGRESSIONAL VOTE
DCONG= DEMOCRATIC CONGRESSIONAL VOTE
RPRES= REPUBLICAN PRESIDENTIAL VOTE
DPRES= DEMOCRATIC PRESIDENTIAL VOTE
G= 1 ON YEAR ELECTION, 0 OFF YEAR
PRCONT= 1 REP CONTROLS PRES., 0 DEM CONTROLS
PDCONT= 1 DEM CONTROLS PRES., 0 REP CONTROLS
TOTPRES= TOTAL PRESIDENTIAL VOTE
TOTCONG= TOTAL CONGRESSIONAL VOTE
RREP= NUMBER OF REPUBLICAN REPRESENTATIVES
DREP= NUMBER OF DEMOCRATIC REPRESENTATIVES
TR= VOTE FOR T. ROOSEVELT, BULL MOOSE, 1912
INFLAT= PRICE INDEX, 1967 IS 100
PREVINFL= PRICE INDEX FOR THE PREVIOUS YEAR
PREVGNP= GNP FOR THE PREVIOUS YEAR
GNP= GROSS NATIONAL PRODUCT IN 1958 DOLLARS
TOTPOP18= TOTAL POPULATION 18 YEARS OR OLDER
ELIGIBLE
MRCONG= MOBILIZED REP CONG VOTE PROP
MDCONG= MOBILIZED DEM CONG VOTE PROP
MRPRES= MOBILIZED REP PRES VOTE PROP
MDPRES= MOBILIZED DEM PRES VOTE PROP
MTOTCONG= MOBILIZED TOTAL CONG VOTE PROP
MTOTPRES= MOBILIZED TOTAL PRES VOTE PROP
MTR=
VRCONG=
VDCONG=
VRPRES=
VDPRES=
VTOTCONG=
VTOTPRES=
VTR=
LMRCONG= LAG OF MOBIL. REP CONG VOTE PROP
LMDCONG= LAG OF MOBIL. DEM CONG VOTE PROP
LRREP=
LDREP=
REPRATIO=
DRATIO=
RRATIO=
REPSHIFT=
MRATIO1=
MRATIO2=
ON= 1 FOR AN ON-YEAR ELECTION, 0 FOR AN OFF-YEAR ELECTION
ONR1=
ONR2=
OND1=
OND2=
ON1=
ON2=
OFF=
OFF1=
OFF2=
PRESDIF=
RPRESDIF=
LRPDIF=
LPRESDIF=
LMTCONG=
DIFOYTC=
SHIFT=
FDR=
CPOINT=
CPCGNP= CHANGE IN YEARLY PER CAPITA GNP
CGNP= CHANGE IN YEARLY GNP
LGNP=
LINFLAT=

You are to look at measures of central tendency and variation for on-year and off-year congressional mobilization between the years 1950 and 1970, and compare these numbers for the same variables between 1972 and 1988. You will be using the R programming language to do this assignment. In past years I had students use SAS, and I include the SAS code further below for reference should it be of interest to you. Immediately below is an R script that will get you started, and you should closely study this script as you begin to learn R. You will need to finish the program to get the on-year and off-year means for the years 1972-1988.

# First we get our data.
mydata <- read.table("usparty.txt")
names(mydata) # Lets us see all the variable names.
#attach(mydata) # This puts the variable names in memory. We will not be using this.
mysubsetdata<-subset(mydata, select=c(YEAR, MTOTCONG)) #This keeps only the two variables that we need.
summary(mysubsetdata) # Since no variables are listed, a summary for all variables in the data frame is printed.
mysubsetdata #This prints out all the variable values.

# Now let us look at what the on-year/off-year pattern looks like.
# Note that to reference a variable, you need to put the data set name followed by a $ before the variable name.
# We can reference the variables directly without using the data set name and the $ if we used the attach command above.
# But we do not want to use the attach command now because we will be creating other data sets that contain the same variable names.
plot(mysubsetdata$YEAR, mysubsetdata$MTOTCONG, xlab="", ylab="", ylim=c(0.2,0.8), pch=19, type="o")
title(xlab="Year", ylab="Congressional Mobilization", main="Figure 1: Plot of U.S. Congressional Mobilization", cex=1.5, col="black", font=2)
# Now let us get the overall mean for congressional mobilization
mean(mysubsetdata$MTOTCONG) # Here we are getting the mean only of the MTOTCONG variable.

windows() # This prevents the next plot from erasing the previous plot by giving us a new graphics window.

# Now let us work with just the years 1950 through 1970.
my5070data <- subset(mysubsetdata, YEAR >= 1950 & YEAR <= 1970)
my5070data
# attach(my5070data) # Let us avoid variable confusion with these data sets by not doing this.
plot(my5070data$YEAR, my5070data$MTOTCONG, xlab="", ylab="", ylim=c(0.2,0.8), pch=19, type="o", axes=FALSE)
axis(1, at=c(1952, 1956, 1960, 1964, 1968)) # This defines the X axis tick marks.
axis(2, yaxs="r") # This defines the Y axis.
box()
title(xlab="Year", ylab="Congressional Mobilization", main="Figure 2: Plot of U.S. Congressional Mobilization, 1950-70", cex=1.5, col="black", font=2)
mean(my5070data$MTOTCONG)

windows()

# Now let us work with just the years 1972 through 1988.
my7288data <- subset(mysubsetdata, YEAR >= 1972 & YEAR <= 1988)
my7288data
# attach(my7288data) # Again, let us avoid variable confusion with these data sets by not doing this.
plot(my7288data$YEAR, my7288data$MTOTCONG, xlab="", ylab="", ylim=c(0.2,0.8), pch=19, type="o", axes=FALSE)
axis(1, at=c(1972, 1976, 1980, 1984, 1988)) # This defines the X axis tick marks.
axis(2, yaxs="r") # This defines the Y axis.
box()
title(xlab="Year", ylab="Congressional Mobilization", main="Figure 2: Plot of U.S. Congressional Mobilization, 1972-88", cex=1.5, col="black", font=2)
mean(my7288data$MTOTCONG)

# Now let us work with on-year and off-year separately to get the means
myON5070data <- subset(mysubsetdata, YEAR == 1952 | YEAR == 1956 | YEAR == 1960 | YEAR == 1964 | YEAR == 1968)
mean(myON5070data$MTOTCONG)
myOFF5070data <- subset(mysubsetdata, YEAR == 1950 | YEAR == 1954 | YEAR == 1958 | YEAR == 1962 | YEAR == 1966 | YEAR == 1970)
mean(myOFF5070data$MTOTCONG)

 

# Below is the SAS code that does the analysis;

libname windata 'c:\';
GOPTIONS lfactor=10 hsize=6 in vsize=6 in horigin=1 in vorigin=1 in;
options nocenter;
**********************************************************;
* CLASS, NOTE THAT IF YOU BEGIN A LINE WITH AN ASTERISK *
* THEN YOU CAN PUT NOTES IN YOUR PROGRAM FILES. THIS IS
* LIKE A COMMENT CARD IN SPSS. HOWEVER, REMEMBER
* TO EVENTUALLY PUT A FINAL SEMICOLON AT THE END OF YOUR COMMENTS.;
***********************************************************;
* NOTE THAT I INDENT SOME STATEMENTS. THIS
* IS JUST FOR NEATNESS.;
***********************************************************;
* COPYRIGHT (c) Courtney Brown 2004, All Rights Reserved;
* Permission granted to use this file and computer code for any nonprofit and
* educational purposes, including classroom instruction.
* No further permission required.
* Please cite source as "From www.courtneybrown.com";
***********************************************************;
DATA USPARTY;SET windata.USPARTY;
DATA ON5070;SET USPARTY;
IF ((YEAR EQ 1952) OR (YEAR EQ 1956) OR (YEAR EQ 1960) OR (YEAR EQ 1964) OR (YEAR EQ 1968));
DATA ON7288;SET USPARTY;
IF ((YEAR EQ 1972) OR (YEAR EQ 1976) OR (YEAR EQ 1980) OR (YEAR EQ 1984) OR (YEAR EQ 1988));
DATA OFF5070;SET USPARTY;
IF ((YEAR EQ 1950) OR (YEAR EQ 1954) OR (YEAR EQ 1958) OR (YEAR EQ 1962) OR (YEAR EQ 1966) OR (YEAR EQ 1970));
DATA OFF7486;SET USPARTY;
IF ((YEAR EQ 1974) OR (YEAR EQ 1978) OR (YEAR EQ 1982) OR (YEAR EQ 1986));
PROC MEANS DATA=ON5070;
PROC MEANS DATA=ON7288;
PROC MEANS DATA=OFF5070;
PROC MEANS DATA=OFF7486;
PROC UNIVARIATE DATA=ON5070;
run;
quit;