Mathematical Modeling of Social Phenomena
Dr. Courtney Brown
Assignment #3
In this assignment, you will analyze the on-year and off-year pattern of political participation in voting during U.S. congressional elections. In on-year elections (every four years), the congressional elections coincide with the presidential elections. More people go out to vote in those elections. In the off-year elections (the elections between the on-year elections), fewer people vote. Thus, congressional electoral mobilization as a proportion of the total eligible voters is greater in on-year elections than in off-year elections due to the structure of the electoral calendar.
You need to do this assignment in two parts. First you need to get the data for this assignment and to read it into R. The data set (as an R data frame), can be found HERE. Second, you need to analyze the data, and we will describe the analysis part of the assignment further below. You will then conduct your analysis on the data for the years from 1950 to 1970. For this assignment, you will also need to look at this SAS code that was originally used to create this data set. You do not need to run the SAS code. Just look at it for reference. You will use it as described below.
Here are the variables for the data set:
YEAR
RCONG= REPUBLICAN CONGRESSIONAL VOTE
DCONG= DEMOCRATIC CONGRESSIONAL VOTE
RPRES= REPUBLICAN PRESIDENTIAL VOTE
DPRES= DEMOCRATIC PRESIDENTIAL VOTE
G= 1 ON YEAR ELECTION, 0 OFF YEAR
PRCONT= 1 REP CONTROLS PRES., 0 DEM CONTROLS
PDCONT= 1 DEM CONTROLS PRES., 0 REP CONTROLS
TOTPRES= TOTAL PRESIDENTIAL VOTE
TOTCONG= TOTAL CONGRESSIONAL VOTE
RREP= NUMBER OF REPUBLICAN REPRESENTATIVES
DREP= NUMBER OF DEMOCRATIC REPRESENTATIVES
TR= VOTE FOR T. ROOSEVELT, BULL MOOSE, 1912
INFLAT= PRICE INDEX, 1967 IS 100
PREVINFL= PRICE INDEX FOR THE PREVIOUS YEAR
PREVGNP= GNP FOR THE PREVIOUS YEAR
GNP= GROSS NATIONAL PRODUCT IN 1958 DOLLARS
TOTPOP18= TOTAL POPULATION 18 YEARS OR OLDER
ELIGIBLE
MRCONG= MOBILIZED REP CONG VOTE PROP
MDCONG= MOBILIZED DEM CONG VOTE PROP
MRPRES= MOBILIZED REP PRES VOTE PROP
MDPRES= MOBILIZED DEM PRES VOTE PROP
MTOTCONG= MOBILIZED TOTAL CONG VOTE PROP
MTOTPRES= MOBILIZED TOTAL PRES VOTE PROP
MTR=
VRCONG=
VDCONG=
VRPRES=
VDPRES=
VTOTCONG=
VTOTPRES=
VTR=
LMRCONG= LAG OF MOBIL. REP CONG VOTE PROP
LMDCONG= LAG OF MOBIL. DEM CONG VOTE PROP
LRREP=
LDREP=
REPRATIO=
DRATIO=
RRATIO=
REPSHIFT=
MRATIO1=
MRATIO2=
ON= 1 FOR AN ON-YEAR ELECTION, 0 FOR AN OFF-YEAR ELECTION
ONR1=
ONR2=
OND1=
OND2=
ON1=
ON2=
OFF=
OFF1=
OFF2=
PRESDIF=
RPRESDIF=
LRPDIF=
LPRESDIF=
LMTCONG=
DIFOYTC=
SHIFT=
FDR=
CPOINT=
CPCGNP= CHANGE IN YEARLY PER CAPITA GNP
CGNP= CHANGE IN YEARLY GNP
LGNP=
LINFLAT=
Many of these variables have variable labels. See if you can look at the SAS code linked above to fill in the remainder of the variable labels. That is, look at the SAS code to see what is going on so that you can describe these variables with your own variable labels to produce your own "codebook" for the data set. This is quite a common problem with data analysis. Often researchers need to look at variables that are defined in one language so that they can use them in another. When you are done, print out a complete list of the variable names with as many of the variable labels as you can figure out and include that as part of your assignment.
You will be writing an R program for this assignment, and it will do the same thing as the program below which is written in SAS. This program parallels what was done in assignment 2 for this class. The only real difference is that the slope obtained by regressing total congressional mobilization on its lag is not used in making the plot of the difference equation on the data (although the intercept is). Instead, a slope of "-1" is used. Why would that be so? You will need to explain this in your write-up.
Your assignment is to write a program in R that does the same thing that was done with all of your data sets during assignment 2. But again, this time you will want to assign a slope of -1 to the difference equation ... that is to say, after you figure out why you would want to do that in the first place. Thus, you will need two plots for this assignment. The first plot will be a plot of the first differences of total congressional mobilization from 1950-1970, and the second plot will be a plot of the on-year/off-year values together with the difference equation model on top, just like in assignment 2.
What have you learned about congressional voting and mathematical modeling? Why did I pick the years from 1950 to 1970 for you to conduct your analysis? Why not end in 1972, or 1980? (You need to know a bit about history to answer this.) Why is this relevant to the capabilities of a first-order linear difference equation with constant coefficients?
NOTE: The concept of this interesting assignment was originally developed by Professor John Sprague at Washington University in St. Louis.
The R program that you write will be simpler than the SAS program since you will not need to merge data sets as is done below with SAS. Again, follow what you did with assignment 2. To get you started with the R program, put the data set in your R_Working_Directory (after unzipping it), and then read the data into R using the lines below. The code goes so far as to get you the value of the intercept while restricting the slope to be -1. After that, you are on your own.
mydata <- read.table("usparty.df")
attach(mydata)
summary(mydata)
# The following line selects only the data for the years 1950 through 1970, and keeps only the variables YEAR, MTOTCONG, and LMTCONG.
my5070data <- subset(mydata, YEAR >= 1950 & YEAR <= 1970, select=c(YEAR, MTOTCONG, LMTCONG))
my5070data
first.model <- lm(MTOTCONG ~LMTCONG, data=my5070data) # This regression does NOT restrict the slope parameter. Compare these results to the second model.
summary(first.model)
second.model <- lm(MTOTCONG ~ offset(-LMTCONG), data=my5070data) # The "offset(-LMTCONG)" statement sets the slope parameter to -1.
summary(second.model)
# You finish the program. Remember that the second.model is the one you want to use, not the first.model. You will need to explain why.
# Below is the SAS code that will do the analysis of the data;
libname windata 'c:\';
GOPTIONS lfactor=10 hsize=6 in vsize=6 in horigin=1 in vorigin=1 in;
options nocenter;
**********************************************************;
* CLASS, NOTE THAT IF YOU BEGIN A LINE WITH AN ASTERISK *
* THEN YOU CAN PUT NOTES IN YOUR PROGRAM FILES. THIS IS
* LIKE A COMMENT CARD IN SPSS. HOWEVER, REMEMBER
* TO EVENTUALLY PUT A FINAL SEMICOLON AT THE END OF YOUR COMMENTS.;
***********************************************************;
* NOTE THAT I INDENT SOME STATEMENTS. THIS
* IS JUST FOR NEATNESS.;
***********************************************************;
* COPYRIGHT (c) Courtney Brown 2004, All Rights Reserved;
* Permission granted to use this file and computer code for any nonprofit and
* educational purposes, including classroom instruction.
* No further permission required.
* Please cite source as "From www.courtneybrown.com";
***********************************************************;
DATA USPARTY;SET windata.USPARTY;
LMTCONG=LAG(MTOTCONG);
IF ((YEAR GE 1950) AND (YEAR LE 1970));
PROC REG;
MODEL MTOTCONG=LMTCONG;
RESTRICT LMTCONG=-1;
DATA TRAJECT;
A=-1;
B=0.9931;
Y1=0.5714173;
DO YEAR=1950 TO 1970 BY 2;
Y2=(A*Y1)+B;
OUTPUT;
Y1=Y2;
END;
PROC SORT;
BY YEAR;
DATA COMBINE;
MERGE TRAJECT USPARTY;
BY YEAR;
DATA COMBINE;SET COMBINE;
EQUIL=B/(1-A);
PROC PRINT; VAR A B Y1 Y2 EQUIL YEAR MRCONG MDCONG MTOTCONG;
PROC REG;
MODEL MTOTCONG=LMTCONG;
RESTRICT LMTCONG=-1;
symbol1 color=black v=NONE f=centb i=join;
symbol2 color=black f=centb v='.' height=2 interpol=R;
symbol3 color=black f=centb v='.' height=2;
PROC GPLOT;
axis1 color=black order=0.3 to 0.7 by 0.1
value=(h=1.5 f=swissb c=black)
label=(h=1.3 a=90 r=0 f=swissb c=black 'Congressional Mobilization');
axis2 color=black
value=(h=1.5 f=swissb c=black)
label=(h=1.3 f=swissb c=black 'Year');
PLOT MTOTCONG*YEAR=3 y2*YEAR=1/ overlay
vaxis=axis1 haxis=axis2 vminor=0 hminor=0;
TITLE 'Figure 1: Congressional Mobilization, 1950-70';
* The code below does some interesting analysis with means, but it is not needed for this assignment;
* But the final "run" and "quit" statements are needed;
DATA ON;SET COMBINE;
IF ((YEAR EQ 1952) OR (YEAR EQ 1956) OR (YEAR EQ 1960) OR (YEAR EQ 1964) OR
(YEAR EQ 1968));
DATA OFF;SET COMBINE;
IF ((YEAR EQ 1950) OR (YEAR EQ 1954) OR (YEAR EQ 1958) OR (YEAR EQ 1962) OR
(YEAR EQ 1966) OR (YEAR EQ 1970));
PROC MEANS DATA=ON;
PROC MEANS DATA=OFF;
run;
quit;