4  Instrumental Variables

knitr::opts_chunk$set(
    echo = TRUE,
    message = FALSE,
    warning = FALSE,
    comment = NA
)

rm(list = ls())

library(haven)
dinas <- read.csv("dinas.csv")

4.1 Lecture Slides & Data



#install.packages("downloadthis")
library(downloadthis)

download_link(
  link = "https://bayreuth-politics.github.io/CI22/data/dinas.csv",
  output_name = "dinas",
  output_extension = ".csv",
  button_label = "Lab 7 Data",
  button_type = "success",
  has_icon = TRUE,
  self_contained = TRUE
)

#download_link(
 # link = "https://github.com/bayreuth-politics/CI22/raw/gh-pages/docs/lectures/lecture6.pdf",
#  output_name = "week6",
#  output_extension = ".pdf",
#  button_label = "Lecture Slides",
#  button_type = "default",
#  has_icon = FALSE,
 # self_contained = FALSE
#)



4.2 Recap

The Problem of Unobservables

So far, we discussed randomised experiments and selection on observables. But what about cases in which we do not (cannot) observe covariates? In such case, the conditional independence assumption does not hold. Take, for example, the following scenario: We would like to estimate the effect of D on Y, but are not able to observe the confounding variable U. Since U affects both the independent and dependent variable of interest, any naive estimate of the effect of D will be biased.

4.3 Instrumental Variables

An instrumental variable (IV) design helps us circumvent this problem. If D is partly determined by Z, our instrument, we can estimate the effect of D on Y, despite being unable to observe U. To do so, Z must be determined as-if random and only affect Y through D, i.e. affect the outcome only through the treatment. Thus, we can take advantage of the variance randomly introduced by Z. In other words, IVs only allow us to estimate the effect for compliers - that is, those units whose D is affected by Z. The local average treatment effect - or complier average causal effect is then as follows:

\[ LATE = \frac{E[Y_i|Z_i=1] - E[Y_i|Z_i=0]}{E[D_i|Z_i=1] - E[D_i|Z_i=0]} = \frac{ITT_Y}{ITT_D} \]

4.4 Two Stages Least Squares (2SLS)

In practice, IV designs are often estimated using 2SLS regressions. As opposed to manual calculations of the treatment effect, these estimators provide correct and robust measures of uncertainty and allow for the inclusion of covariates. The principle is simple: In the first stage, the treatment is regressed on the instrument and, possibly, covariates. The predicted values are then used in the second stage to fit the model. Importantly, both stages always need to include exactly the same covariates.

First Stage : \[ D_i = \alpha_1 + \phi Z_i + \beta_1 X_{1i} + \gamma_1 X_{2i} + e_{1i} \]

Second Stage: \[ Y_i = \alpha_2 + \lambda \hat{D}_i + \beta_2 X_{1i} + \gamma_2 X_{2i} + e_{2i} \]


4.5 IV Assumptions

For an IV design to be valid, several assumptions have to be met. In practice, this can be very hard to achieve. The five assumptions are:

  1. Monotonicity: There are no defiers
  2. Exclusion Restriction: The instrument affects the outcome only through the treatment
  3. Non-Zero Complier Proportion: The instrument affects the treatment
  4. Random Assignment of Z: The instrument is unrelated to potential outcomes
  5. SUTVA

To satisfy these assumptions, usually good knowledge of context and the particular mechanisms is required. This is particularly the case for the exclusion restriction, which cannot be tested statistically. Accordingly, we must be able to make a convincing case for the assumption to hold. If there are good reasons to believe that the assumption does not hold, the IV design is likely invalid.


Before starting this seminar

  1. Create a folder called “lab7”

  2. Download the data (you can use the button or the one at the top, or read csv files directly from github):

  3. Open an R script (or Markdown file) and save it in our “lab7” folder.

  4. Set your working directory using the setwd() function or by clicking on “More“. For example setwd(“~/Desktop/Causal Inference/2022/Lab7”)

  5. Let’s install an load packages that we will be using in this lab:

library(stargazer) # generate formated regression tables 
library(texreg) # generate formatted regression tables
library(tidyverse) # to conduct some tidy operations
library(plm) # conduct one-way and two-way fixed effects 
library(estimatr) #  to conduct ols and provides robust standard errors
library(lmtest) # calculates the variance-covariance matrix to be clustered by group.
library(multiwayvcov) # To cluster SEs
library(ivpack) # Calculates IV models
library(ivreg)
library(modelsummary)
library(fixest) 

5 Seminar Overview

In this seminar, we will cover the following topics:
1. Manually estimate the treatment effect using an instrumental variable and the lm() function
2. Run an IV regression using ivreg(), iv_robust() and iv_feols()
3. Present the output of 2SLS regressions
4. Manually calculate the Wald estimator
5. Use Placebo tests to support the validity of the IV design.
6. Check for weak instruments


5.1 Does Choice Bring Loyalty?

Today will work with data from Elias Dinas’ work on Does Choice Bring Loyalty?. In this paper, the author seeks to understand the foundation of partisan strength. There is a general debate over party identification (PID) in the literature. Some scholars claim that party identification strengthens with age. Others, including the author, suggest that voting for a party brings about loyalty and strengthens political attachment. A straightforward but naive empirical strategy would be to estimate the effect of having voted in one election on the strength of party identification a couple of years further down the line. However, both PID and vote choices are predicted by similar confounders - such research design would inevitably face the problem of unobserved covariates.

To address the research question without uing such naive design, the author takes advantage of a comprehensive panel dataset. The original data include four waves - 1965/1973/1982/1997 -, of which the author uses two (1965 and 1973). In a smart move, Dinas then makes use of the timing of elections and the characteristics of participants in the panel: Elections took place in 1968 and in 1972. To be able to use the effect of voting, Dinas exploited the age of respondents. Importantly, respondents who were born in 1947 (76% of the sample) share a very important characteristic. What is it? They turned 21 - which was the voting age at the time - in 1968. Those who turned 21 before election day were able to vote in 1968, those who did not were only eligible to vote in 1972. This allows the author to exploit respondents’ birthdays - which are random - to causally estimate the effect of voting on the strength of party identification. Obviously, however, not everyone who was eligible to vote in 1968 did vote.



Besides various covariates, we will be using the following key variables:

Variable Description
eligible68 Dummy for eligibility to vote in 1968 election (Instrument).
voted68 Dummy indicating whether participant voted in 1968 election (Treatment)
strngpid73 Strength of party identification in 1973 on an ordinal scale (Outcome)
knowledge65 Political knowledge in 1965
strngpid65 Strength of party identification in 1965
elig2false Dummy variable for placebo tests: 0 for young eligible and 1 for old eligible participants
v7 Numerical code for school (which we’ll use for clustering SEs)


Now let’s load the data. There are two ways to do this:

You can load the brands dataset from your laptop using the read.csv() function.

# Set your working directory
#setwd("~/Desktop/Causal Inference/2022/Lab7")
# 
library(haven)
#dinas <- read.csv("~/dinas.csv")

Or you can download the data from the course website from following url: https://bayreuth-politics.github.io/CI22/data/dinas.csv.


Exercise 1: Use the head() function to familiarise yourself with the data set.

Reveal Answer
head(dinas)

This looks wild. The data set includes various variables which are not labeled really well. For now, let’s focus on the key variables presented above.


Exercise 2: Regress the outcome (strngpid73) on the treatment (voted68) using lm(). Does the OLS provide a causal estimate?

Reveal Answer
ols <- lm(strngpid73 ~  voted68, data= dinas)
summary(ols)


The naive OLS provides an estimate of 0.19, which means that having voted in the 1968 election is associated with an increase of party identification of 0.19 on the PID scale. However, this does not provide a causal estimate: As we know, there are various factors that are likely to affect both the outcome and treatment. All we can say based on the OLS is the size of the bivariate correlation between these two variables.

Let’s also look at the visual relationship between these two variables:

# Using ggplot
ggplot(dinas, aes(x=voted68, y=strngpid73)) + 
  geom_point()+
  geom_smooth(method=lm) +
  xlab("Voted in 1968") + 
  ylab("PID Strength in 1973")

We can actually see a slight but significant increase in PID strength for those who voted in 1968. The decisive question is: Can we causally say that having voted is the reason for this?



5.2 IV Regression: 2SLS

We now know that a simple OLS doen’t provide any causal estimate. Let us now try to estimate the true treatment effect using an instrumental variable design. Following the author, we will be using the eligibility of respondents to vote in the 1968 election (eligible68) as instrument: That is, we exploit the randomness of respondents’ birthdays that determine their eligibility to vote in 1968. To do so, let’s separately look at the first and second stage.

Exercise 3: Investigate the relationship between treatment (voted68) and instrument (eligible68)

There are several ways to do this. Feel free to pick the option you deem most appropriate.

Reveal Answer
table(dinas$eligible68, dinas$voted68)


This looks ok. Of the 623 respondents who were eligible to vote in 1968, 373 did so. 250 decided not to vote. 25 respondents indicated that they voted although they were not eligible. How can that be? Most likely, they simply reported they voted even though they did not. They might have done so intentionally or misremembered the election - which might happen in such a long panel. It’s a bit annoying, but there’s not much we can do about it.


Let’s now calculate the first stage.

Exercise 4: Regress the treatment on the instrument and extract the predicted values

Note: Make sure to add the argument na.action=na.exclude to your lm() function in order to deal with missing values. You can use predicted_values <- predict(OLS_model) to extract the predicted values.

Reveal Answer
# Calculating the first stage. Note `na.action=na.exclude` deals with NAs so we can use the predicted values for the second stage
first=lm(voted68~eligible68, data=dinas, na.action=na.exclude)

# Extracting predicted values
vote_pred=predict(first)

# Displaying regression output
summary(first)


We can see that the instrument (eligible68) is indeed a strong and significant predictor of the treatment. That is what we expect and hope for. It’s also convincing to think that eligibility - i.e. respondents’ birthdays - is fully random.

Unfortunately, the first stage cannot tell you whether an instrument is appropriate. However, it can tell you something about inappropriate instruments. A common problem in IV designs are weak instruments. That is, if your instrument is only weakly correlated with the endogenous variable (i.e. the treatment), it is likely to render biased results. The F-Statistic of the first stage can be used to identify weak instruments. As a rule of thumb, your instrument is likely to be problematic if the F-Statistic of your first stage regression is below 10.

Going back to the regression output, we see that our F-Statistic here is about 110 - so nowhere near the conventional threshold. Our instrument is strongly correlated with the treatment as it should be - but note that this does not necessarily mean that it is a valid instrument.


Let’s now proceed to test the exclusion restriction.

Exercise 5: Test the exclusion restriction for the instrument.

Hint: Show that the instrument affects the outcome only through the treatment.

Reveal Answer


If you have regressed the outcome on the instrument (and the treatment), this might help familiarise yourself with the data - but it does not provide a test of the exclusion restriction. In fact, it is impossible to statistically test the exclusion restriction. All we can do is rely on theory and build a convincing case for alternative effects not taking place. The problem with a regression of Y on Z (and D) is that we still cannot observe further confounders and account for their effects. We can’t know if their effect does not come into play in such a regression.

Let’s plot the relationship between the outcome and the instrument nonetheless. As stated above, we can’t tell whether the assumption holds, but we could find that the exclusion restriction is likely to be violated.

Exercise 6: Plot the relationship between the outcome and instrument.

There are several ways to do this. Feel free to pick the option you deem most appropriate.

Reveal Answer
# Using ggplot
ggplot(dinas, aes(x=eligible68, y=strngpid73)) + 
  geom_point()+
  geom_smooth(method=lm) +
  xlab("Elegibility in 1968") + 
  ylab("PID Strength in 1973")


This looks as expected. There is no clear and significant association between the two variables. Recall that eligibility itself should not affect party identification strength unless respondents have voted in 1968 as only voting should affect the outcome.



Let’s now return to our IV model by calculating the second stage of our 2SLS model.

Exercise 7: Regress the outcome on the predicted values from the first stage

Reveal Answer
# Calculating the first stage
second_wrongSE=lm(strngpid73~vote_pred, data=dinas)

# Displaying regression output
summary(second_wrongSE)


The second stage uses the predicted values for the treatment from the first stage. Calculating the second stage, the output indicates that - once we instrument for voting in 1968 - the decision to cast a vote in 1968 does not have a significant effect on party identification.

However, calculating the two stages separately we have not adjusted standard errors and measures of uncertainty. Accordingly, hypothesis testing is likely to provide false results if we rely on such biased measures.


2 IV Regression using 2SL2 in one step


There are several packages that we could use to retrieve a two-stage least squares instrumental variables estimator. Let’s now conduct 2SLS using the ivreg(), iv_robust(), iv_feols(). See the below the syntax for each of these functions below:

Exercise 8: Conduct a two-stage least squares instrumental variable using strngpid73 as the outcome. voted68 as the endogenous predictor and eligible68 as the instrument. Use the ivreg(), iv_robust() functions. Store these models in a list (list()) and report them in a table. Interpret the results.

Variable Description
O Outcome variable
E Endogenous variable.
I Instrument variable.
FE Fixed Effect variable
ivreg(O ~ E | I, data = data ) # ivreg package

iv_robust(O ~ E | I, data = data) # estimatr package

iv_feols(O  ~ E | FE | I, data = data) # 


Reveal Answer
## ivreg ## 
ivreg_model <- ivreg(strngpid73 ~ voted68 | eligible68, data = dinas)

ivreg_model_clustered <- cluster.vcov(ivreg_model, dinas$v7) #This restimates the model and uses clustered SEs.

iv_clustered <- coeftest(ivreg_model, ivreg_model_clustered)

## iv_robust ## 
iv_robust_model <- iv_robust(strngpid73 ~ voted68 | eligible68, data = dinas, cluster = v7) # cluster by 

ivmodels <- list(ivreg_model, iv_robust_model)
rows <- tribble(~term,  ~ OLS1,  ~OLS2,
                'Covariates', 'No', 'No') # add one row reporting covariates
attr(rows, 'position') <- c(5)  ### Change location accordingly  

title <- 'Two-stage Least Squares Models' # add the title to your model

coeffs <- c('(Intercept)'= 'Intercept',
                     'voted68' = 'Voted') # rename coefficients 

# regression table 
modelsummary(ivmodels, estimate = "{estimate}{stars}",coef_map = coeffs, gof_omit = 'DF|se_type', add_rows = rows, title = title)

We find that both functions generate the same results - note that SEs vary as the first model indicates the unclustered regression output for comparison. Note how the difference is not incredibly large - yet can impact make a difference for hyptohesis testing. The Local Average Treatment Effect is 0.319. [Note that modelsummary struggles to process manually clustered regression output - feel free to use stargazer screenreg instead.]


In 2SLS we can include covariates to capture the covariate-adjusted LATE. Let’s include some covariates to the 2SLS. We can also add additional instruments to our model.

Exercise 9: Use the ivreg() function and include the following covariates: col1 and col2. Use the same endogenous treatment variable voted68. Include eligible68 and as.factor(knowledge1965) as instruments. Report the results of this estimation using the summary() function. Include the arguments in the table below to the summary function. Report what is the F-Statistics for this specification. Are the instruments that we using strong or weak instruments?

Function/argument Description
Summary() Generic function to produce results summaries of fitting functions
diagnostics Set equal to TRUE it provides a number of diagnostic test.


Reveal Answer
ivreg_covariates <- ivreg(strngpid73 ~ col1 + col2 + voted68 | 
             col1 + col2 + as.factor(knowledge65) + eligible68, data = dinas)

summary_ivreg <- summary(ivreg_covariates, diagnostics = TRUE)
summary_ivreg
# Add clustered robust standard errors
ivreg_covariates_clustered <- cluster.vcov(ivreg_covariates, dinas$v7)
coeftest(ivreg_covariates, ivreg_covariates_clustered)

Based on the modified speification, we observed that voting in 1968 has a positive and statistically significant effect on partisanship strength. Also from the summary function, we see several diagnostic tests generated once we set diagnostic argument equal to TRUE.

If the are more instruments than causal parameters the model is overidentified. If there are as many instruments as causal parameters, the model is just identified. However, if we include more instruments, it is harder to meet the exclusion restriction. One test that we can conduct is the Sagan-Hausman test. This test compares the overidentified model versus a model with a subset of instruments (under the assumption that at least one instrument is valid). The null hypothesis here is that all instruments are valid. In our case, we do not reject this - and conclude that all instruments are valid.

The weak instruments test means that the instrument has a low correlation with the endogenous explanatory variable - which would be problematic for the IV design. The null hypothesis is that the instrument insufficiently predicts treatment - which can be rejected in this case. The Wu-Hausman test performs an efficiency test that reports whether the IV estimation is just as consistent as OLS. Therefore the null hypothesis is that OLS estimates are consistent. In this case, we do not reject the null hypothesis. Therefore, we claim that our IV model is as good as OLS - and the latter being preferable as it is more efficient.


Besides 2SLS, we can also obtain the Local Average Treatment Effect (LATE) by computing the difference of the conditional expectations of the outcome on the instrument (reduced form) divided by the difference of the conditional expectations of the treatment take-up on the instrument (first stage). Put simply, calculating the difference in the mean of the outcome between units assigned to the treatment minus those units not assigned to the treatment. Then, we divide this number by the difference in the share of received treatment in those two groups (i.e. compliance rates).

Variable/Average Description
Y Outcome
Z Instrument
D Endogenous treatment
Y[Z=1] Average outcome conditional for units offered the treatment
Y[Z=0] Average outcome conditional for unit not offered the treatment
D[Z=1] Proportion of units receiving the treatment for those assigned to the treatment
D[Z=0] Proportion of units receiving the treatment for those not offered the treatment

The Wald Estimator is then:

\[\tau=\frac{Y[Z=1]-Y[Z=0]}{D[Z=1]-D[Z=0]}\]

Exercise 10: Manually calculate the Wald Estimator. Use the mean(x, na.rm = T) to calculate the means of each group. You can use the following syntax to obtain the conditional means.

mean(data$outcome[data$instrument == 1], na.rm = TRUE) # 1 for those that voted, 0 for those that didn't vote 

mean(data$endongeous_variable[data$instrument == 1], na.rm = TRUE)  # 1 for those that were eligible, 0 for those that were not eligible. 


Reveal Answer
#Numerator
mean(dinas$strngpid73[dinas$eligible68==1], na.rm=T)
mean(dinas$strngpid73[dinas$eligible68==0], na.rm=T)
#Denominator
mean(dinas$voted68[dinas$eligible68==1], na.rm=T)
mean(dinas$voted68[dinas$eligible68==0], na.rm=T)

Then, \(\tau=\) is equal to:

(mean(dinas$strngpid73[dinas$eligible68==1], na.rm=T) - mean(dinas$strngpid73[dinas$eligible68==0], na.rm=T)) / (mean(dinas$voted68[dinas$eligible68==1], na.rm=T) - mean(dinas$voted68[dinas$eligible68==0], na.rm=T))

We see that the estimate of the Wald estimator is 0.32, which is pretty close to the estimate obtained from the ivreg() function.


How would you compute the Wald estimator for a binary endogenous variable and a binary instrument, but that includes covariates - or for non-binary instruments or endogenous variables?


Reveal Hint 1

Remember that the beta coefficient of your variable of interest (let’s call it \(X_{1i}\)) and the control variable \(X_{2i}\) is equal to:

\[\beta_1 = \frac{Cov(Y_i, \tilde{X_{1i}})}{V(\tilde{X_{1i}})}\]



Reveal Hint 2

The 2SLS estimator is the ratio of the reduced form divided by the first stage, where \(\tilde{Z_i}\) is the residual from the regression of \(Z_i\) on the covariate(s). (The variances are the same, thus they cancel out).

\[\lambda_{\text{2SLS}} = \frac{Cov(Y_i, \tilde{Z_i})}{Cov(D_i, \tilde{Z_i})}\] Here we can use the cov() function. You can see the arguments of this function below:

Function/argument Description
cov(x, y) Calculates the covariance between two variables x and y
use character indicating how missing values should be treated
pairwise.complete.obs Determines how the parameters of the covariance function are computed. More details below

Setting use equal to pairwise.complete.obs computes the mean and variance of x and y using all the non-missing observations. In other words, the correlation between the two variables is calculated using only those observations that both variables have non-missing values.

tau_cov =cov(dinas$eligible68,dinas$strngpid73, use = "pairwise.complete.obs")/
  cov(dinas$eligible68,dinas$voted68, use = "pairwise.complete.obs")
tau_cov


Keep in mind that using IV we can ‘only’ estimate the Local Average Treatment Effect. This means that we are estimating the causal effect for one particular group of treated units, which are the compliers.

Exercise 11: Calculate the proportions of compliers, defiers, always-takers, and never takers. Give some labels to the variables, so we can easily identify each group. You can use the factor() function. You can see a description of the syntax below. Assign the following labels to the eligible68 variable: “Not eligible” and “Eligible”. For the voted68 variable “Not voted”, “Voted”. Lastly: why do we actually impose the monotonicity assumption on IV?*

Function/argument Description
factor To encode a vector as a factor
levels An optional vector of the unique values
labels An optional character vector of labels for the levels
data$variable = factor(data$variable, levels = c(1, 2, 3,..,5), 
                       labels = c("One", "Two", "Three"..."Five")) 


Reveal Answer
dinas$eligible68n=factor(dinas$eligible68,
                        levels=c(0,1),
                        labels=c( "Not Eligible", "Eligible"))
dinas$voted68n=factor(dinas$voted68,
                     levels=c(0,1),
                     labels=c("Not Voted","Voted"))


table(dinas$eligible68n, dinas$voted68n)

From the table above, we can see that the number of respondents that were not eligible and didn’t vote is 132, this group is composed of never-takers and compliers. The 25 subjects are respondents that were not eligible and voted anyway. This group is comprised of always takers and defiers. The 250 are respondents that were eligible but didn’t vote nonetheless. This group is composed of never takers, plus defiers. Finally, we have 373 respondents that were eligible and indeed voted. This group is composed of always takers and compliers.

By imposing the monotonicity assumption, we rule out the existence of defiers. This means that 25 respondents that were not eligible to vote and voted anyway are indeed always takers (25/157=0.16). Similarly, there are 250 respondents who are never takers (250/(373+250) = 0.40). Finally, the proportion of compliers in the control group is 1-0.16 = 0.84 and in the treatment group are 1-0.4=0.6. If you remember from the Wald estimator, the proportion of compliers (in the denominator) was 0.44, which is equivalent to 0.6-0.16 = 0.44.


There are several diagnostics that we could conduct in order to the validity of an instrument. In particular, we can conduct what is called a placebo test. To test whether the difference in partisan strength is driven by the age gap, the author of this study suggests the following: They split all eligible voters into two groups: the “young” eligibles and the “old” eligibles. The young voters are the ones that were born before May 1947 and the old voters are those that were born since June 1947. It is important to stress that both groups are eligible to vote - so there should not be any difference. Then, the younger group is treated as placebo control group.

Exercise 12: Conduct a placebo test. Use the lm() function and as the main outcome the partisan strength measured in 1973, strngpid73, as well as in 1965, strngpid65. Use elig2false as the placebo treatment variable. Remember to cluster the standard errors. What’s your interpretation of the findings?


Reveal Answer
plac <- lm(strngpid73 ~ elig2false, data=dinas)
# Cluster standard errors
plac.vcovCL <- cluster.vcov(plac, dinas$v7)
coeftest(plac, plac.vcovCL)
plac2 <- lm(strngpid65 ~ eligible68, data=dinas)
plac2.vcovCL <- cluster.vcov(plac2, dinas$v7)
coeftest(plac2, plac2.vcovCL)

We see that there is no statistically significant difference between young and old eligible voters. Thus, this suggests that age does not drive the differences in partisanship strength we unravelled earlier.


Exercise 13: Try to think of possible ways how the exclusion restriction could be violated in this setting. Through Which other paths could the instrument affect the outcome, apart from the endogenous treatment?


6 HOMEWORK (Answers will be provided next week)

  1. Should you include all non-endogenous covariates in the first state? Why or why not?
  2. What is the main identification assumption of instrumental variable estimation? How can you test it?
  3. Can you use more than one exogenous variable (multiple Zs) for one endogeneous variable (D)?
  4. What’s the difference between the ITT and LATE from IV? Discuss with reference to compliers.
  5. What’s the forbidden regression? Why is it forbidden?