Link Search Menu Expand Document
Lab 6: Instrumental Variables

Lecture Slides & Data






Recap

The Problem of Unobservables

So far, we discussed randomised experiments and selection on observables. But what about cases in which we do not (cannot) observe covariates? In such case, the conditional independence assumption does not hold. Take, for example, the following scenario: We would like to estimate the effect of D on Y, but are not able to observe the confounding variable U. Since U affects both the independent and dependent variable of interest, any naive estimate of the effect of D will be biased.


Instrumental Variables

An instrumental variable (IV) design helps us circumvent this problem. If D is partly determined by Z, our instrument, we can estimate the effect of D on Y, despite being unable to observe U. To do so, Z must be determined as-if random and only affect Y through D, i.e. affect the outcome only through the treatment. Thus, we can take advantage of the variance randomly introduced by Z. In other words, IVs only allow us to estimate the effect for compliers - that is, those units whose D is affected by Z. The local average treatment effect - or complier average causal effect is then as follows:

\[ LATE = \frac{E[Y_i|Z_i=1] - E[Y_i|Z_i=0]}{E[D_i|Z_i=1] - E[D_i|Z_i=0]} = \frac{ITT_Y}{ITT_D} \]

Two Stages Least Squares (2SLS)

In practice, IV designs are often estimated using 2SLS regressions. As opposed to manual calculations of the treatment effect, these estimators provide correct and robust measures of uncertainty and allow for the inclusion of covariates. The principle is simply: In the first stage, the treatment is regressed on the instrument and, possibly, covariates. The predicted values are then used in the second stage to fit the model. Importantly, both stages always need to include exactly the same covariates.

First Stage : \[ D_i = \alpha_1 + \phi Z_i + \beta_1 X_{1i} + \gamma_1 X_{2i} + e_{1i} \]

Second Stage: \[ Y_i = \alpha_2 + \lambda \hat{D}_i + \beta_2 X_{1i} + \gamma_2 X_{2i} + e_{2i} \]


IV Assumptions

For an IV design to be valid, several assumptions have to be met. In practice, this can be very hard to achieve. The five assumptions are:

  1. Monotonicity: There are no defiers

  2. Exclusion Restriction: The instrument affects the outcome only through the treatment

  3. Non-Zero Complier Proportion: The instrument affects the treatment

  4. Random Assignment of Z: The instrument is unrelated to potential outcomes

  5. SUTVA

To satisfy these assumptions, usually good knowledge of context and the particular mechanisms is required. This is particularly the case for the exclusion restriction, which cannot be tested statistically. Accordingly, we must be able to make a convincing case for the assumption to hold. If there are good reasons to believe that the assumption does not hold, the IV design is likely invalid.


Before starting this seminar

  1. Create a folder called “lab6”

  2. Download the data (you can use the button or the one at the top, or read csv files directly from github):

  3. Open an R script (or Markdown file) and save it in our “lab6” folder.

  4. Set your working directory using the setwd() function or by clicking on “More“. For example setwd(“~/Desktop/Causal Inference/2022/Lab6”)

  5. Let’s install an load packages that we will be using in this lab:

library(stargazer) # generate formated regression tables 
library(texreg) # generate formatted regression tables
library(tidyverse) # to conduct some tidy operations
library(plm) # conduct one-way and two-way fixed effects 
library(estimatr) #  to conduct ols and provides robust standard errors
library(lmtest) # calculates the variance-covariance matrix to be clustered by group.
library(multiwayvcov) # To cluster SEs
library(ivpack) # Calculates IV models
library(ivreg)
library(modelsummary)
library(fixest) 

Seminar Overview

In this seminar, we will cover the following topics:
1. Manually estimate the treatment effect using an instrumental variable and the lm() function
2. Run an IV regression using ivreg(), iv_robust() and iv_feols()
3. Present the output of 2SLS regressions
4. Manually calculate the Wald estimator
5. Use Placebo tests to support the validity of the IV design.
6. Check for weak instruments


Does Choice Bring Loyalty?

Today will work with data from Elias Dinas’ work on Does Choice Bring Loyalty?. In this paper, the author seeks to understand the foundation of partisan strength. There is a general debate over party identification (PID) in the literature. Some scholars claim that party identification strengthens with age. Others, including the author, suggest that voting for a party brings about loyalty and strengthens political attachment. A straightforward but naive empirical strategy would be to estimate the effect of having voted in one election on the strength of party identification a couple of years further down the line. However, both PID and vote choices are predicted by similar confounders - such research design would inevitably face the problem of unobserved covariates.

To address the research question without uing such naive design, the author takes advantage of a comprehensive panel dataset. The original data include four waves - 1965/1973/1982/1997 -, of which the author uses two (1965 and 1973). In a smart move, Dinas then makes use of the timing of elections and the characteristics of participants in the panel: Elections took place in 1968 and in 1972. To be able to use the effect of voting, Dinas exploited the age of respondents. Importantly, respondents who were born in 1947 (76% of the sample) share a very important characteristic. What is it? They turned 21 - which was the voting age at the time - in 1968. Those who turned 21 before election day were able to vote in 1968, those who did not were only eligible to vote in 1972. This allows the author to exploit respondents’ birthdays - which are random - to causally estimate the effect of voting on the strength of party identification. Obviously, however, that not everyone who was eligible to vote in 1968 did vote.



Besides various covariates, we will be using the following key variables:

Variable Description
eligible68 Dummy for eligibility to vote in 1968 election (Instrument).
voted68 Dummy indicating whether participant voted in 1968 election (Treatment)
strngpid73 Strength of party identification in 1973 on an ordinal scale (Outcome)
knowledge65 Political knowledge in 1965
strngpid65 Strength of party identification in 1965
elig2false Dummy variable for placebo tests: 0 for young eligible and 1 for old eligible participants
v7 Numerical code for school (which we’ll use for clustering SEs)


Now let’s load the data. There are two ways to do this:

You can load the brands dataset from your laptop using the read.csv() function.

# Set your working directory
#setwd("~/Desktop/Causal Inference/2022/Lab6")
# 
library(haven)
#dinas <- read.csv("~/dinas.csv")

Or you can download the data from the course website from following url: https://dpir-ci.github.io/CI22/data/dinas.csv.


Exercise 1: Use the head() function to familiarise yourself with the data set.

Reveal Answer
head(dinas)
    v1     v2      v3      v4 v5  v6   v7 v8        v9 v10 v11    m12_1 m12_2
1 7779 Feb-91 student student  2 421 3131  9 524750426  90   7       82    11
2 7779 Feb-91 student student  3 422 3131  9 524750426  65   7       91    19
3 7779 Feb-91 student student  5 424 3131 10 524750426  75   7       84      
4 7779 Feb-91 student student  6 425 3131 10 524750426  70   7 social s    11
5 7779 Feb-91 student student  8 427 3131 10 524750426  80   7       85    94
6 7779 Feb-91 student student 10 725 5371  9 987875681 100  13       91      
  m12_3 m13_1 m13_2 m13_3      v14    m15_1    m15_2 m15_3    m16_1    m16_2
1    20    83    NA    NA liked it                                          
2          83    NA    NA liked it better -                homework         
3          83    NA    NA liked it learning                to do it some -th
4          85    NA    NA liked it  feeling                homework         
5          82    NA    NA liked it being wi like -so                        
6          83    NA    NA liked it learning learning                        
    m16_3 v17      v18    m19_1    m19_2 m19_3    m20_1    m20_2    m20_3
1          no   better  teenage  changed       get good understa         
2          no   better adults d                athlete,                  
3 dislike  no about th teenager teens mo       don-t be going to         
4          no about th judge al                 parties don-t be good per
5          no   better  changed                 parties dating,g get good
6          no   better authorit                athlete, good com         
     m20_4    m21_1    m21_2    m21_3 m21_4      v22      v23      v24     v25
1          insincer                          several          no,not m        
2          very sma not goin isn-t at       no leadi                          
3           extreme those no too good        several          yes,memb crowd 1
4 being fr dull - p                         no leadi                          
5    other very sma positive                one main yes,memb                 
6          don-t be  doesn-t not intr        several          yes,memb crowd 1
  m26_1 m26_2 m26_3 m27_1 m27_2 m28_1 m28_2 m29_1 m29_2 m30_1 m30_2      v31
1    NA    NA    NA     5    NA    55    60    NA    NA    62    NA some tre
2    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA some tre
3    NA    NA    NA    19    NA    12     1    51    NA    18    NA some tre
4    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA treat ev
5     5    17    51    NA    NA    NA    NA    NA    NA    NA    NA some tre
6    NA    NA    NA    58    NA     4    NA    60    NA    62    NA treat ev
  v32      v33      v34 v35      v36 v37      v38      v39    m40_1    m40_2
1  no                    no a good d  no no,extra yes,most shows in obligati
2  no                    no a good d  no no,extra yes,most voting -         
3 yes yes,help no,didn-  no a good d yes yes,extr yes,most voting i         
4  no                    no     some  no yes,extr yes,most to elect voting d
5  no                    no a good d  no no,extra yes,most choose p         
6  no                    no a good d yes no,extra yes,most to choos         
       v41      v42 v43 v44 v45 v46 v47 v48 v49 v50      v51 m52_1 m52_2 m52_3
1 yes, bot     both      no  no  no  no       4   4 6 semest    21    NA    NA
2       no           no yes yes  no  no       2   2 4 semest    11    71    NA
3       no          yes  no  no  no  no       2   2 4 semest    22    24    21
4 yes,outs successf     yes  no  no  no       2   2 2 semest    71    85    NA
5       no          yes yes yes  no  no       4   4 6 semest    14    22    NA
6       no          yes  no  no yes  no       3   3 6 semest    11    43    NA
  m52_4 m53_1 m53_2 m53_3 m54_1 m54_2 m54_3 m54_4      v55      v56      v57
1    NA    42    32    NA    61    20    23    40 disagree    agree    agree
2    NA    70    NA    NA    22    13    16    NA disagree    agree    agree
3    NA    34    NA    NA    23    14    29    43 disagree depends,    agree
4    NA    32    NA    NA    42    19    21    NA disagree    agree    agree
5    NA    30    NA    NA    22    20    30    10 disagree    agree    agree
6    NA    40    49    NA    14    NA    NA    NA    agree    agree disagree
       v58      v59 v60 m61_1 m61_2 m61_3      v62      v63 m64_1 m64_2
1 disagree disagree yes    20    11    60     some  yes,2 -   342   212
2 disagree disagree yes    30    11    20 a good d  yes,2 -   342    NA
3 disagree    agree yes    11    82    NA     some yes,almo   342   212
4    agree depends,  no    NA    NA    NA           yes,2 -   343    NA
5 disagree    agree yes    11    20    82     some  yes,few   343    NA
6    agree disagree yes    30    11    NA a good d yes,almo    17    NA
       v65      v66      v67 v68     v69      v70      v71 v72      v73
1       no                       yes,2 - other ki by mysel     yes,read
2 yes,almo mainly n by mysel     yes,3 - other ki with fam  no yes,read
3 yes,almo mainly n by mysel     yes,2 - mainly n with fam yes yes,read
4 yes,almo other ki by mysel          no                       no,do no
5  yes,2 - mainly n by mysel          no                       yes,read
6 yes,almo mainly n with fam yes yes,2 - mainly n with fam yes yes,read
     m74_1    m74_2    m74_3 m74_4      v75      v76      v77      v78    m79_1
1     time newsweek saturday       newspape yes,seve yes,seve  yes,few civil ri
2     life     look    other          radio yes,seve  yes,few yes,once r only s
3     life                            radio yes,seve yes,seve  yes,few civil ri
4                                     radio       no       no       no         
5     look     look saturday       magazine  yes,few  yes,few       no cuba.  c
6 newsweek                         televisi  yes,few yes,seve       no  nuclear
     m79_2    m79_3      v80    m81_1    m81_2      v82    m83_1 m83_2      v84
1 medicare congress national well-qua more acc state go oth resp       strong d
2                   national know mor well-qua local go know les       not very
3 space. s          national    other                                  strong d
4                   national well-qua  approve local go disappro       not very
5 viet nam civil ri national know mor                                  not very
6 demonstr civil ri state go efficien          local go poorly-q       yes, dem
  m85_1 m85_2 m85_3 m85_4      v86 m87_1 m87_2 m87_3 m87_4      v88      v89
1   709   705   206    NA reps lot    NA    NA   205    NA  johnson democrat
2   402    NA   305    NA reps lit    NA    NA   805   819  johnson about ha
3    NA    NA   119   719             NA    NA    NA    NA  johnson democrat
4    NA    NA    NA    NA reps lit    NA    NA   200    NA  johnson republic
5    NA    NA    NA    NA dems lit    NA    NA    NA    NA goldwate democrat
6   606    NA   606    NA reps mor    NA    NA   200    NA  johnson democrat
       v90      v91      v92      v93      v94      v95      v96 v97      v98
1 good dea not very not much  most of know wha for bene 1 pty al yes yes, bot
2 good dea not very     some  most of don-t kn oth,depe  control  no yes, bot
3 good dea not very     some about al know wha for bene  control     yes, bot
4 not much not very          about al know wha for bene  control     yes, bot
5 not much not very     some  most of don-t kn  few big  control yes yes, bot
6 good dea hardly a     some  some of don-t kn for bene 1 pty al yes yes, bot
       v99     v100     v101     v102     v103     v104     v105     v106
1 yes,live          mother m each par each par pretty m not so w no -skip
2 yes,live          father m  parents each par pretty m about av no -skip
3 yes,live           parents each par each par pretty m extremel      yes
4 yes,live          father m father m each par pretty m extremel no -skip
5 no,doesn mother s mother m mother m each par pretty m about av no -skip
6 yes,live           parents father m  parents disagree extremel      yes
      v107   m108_1 m108_2     v109   m110_1   m110_2     v111     v112
1                          about sa                   pretty c somewhat
2                          better - more ind          pretty c very muc
3 disagree  further        worse -i more mat oth refe very clo very muc
4                          worse -i decrease          very clo very muc
5                          worse -i oth chan understa pretty c very muc
6 disagree automobi        worse -i increase          very clo somewhat
      v113     v114     v115     v116     v117     v118     v119     v120  v121
1 strong d voted fo very clo somewhat strong d voted fo much inf better n  some
2 strong r voted fo pretty c very muc not very voted fo much inf feel fre      
3 strong d voted fo very clo somewhat strong d voted fo much inf feel fre a lot
4 strong r voted fo pretty c somewhat strong d voted fo much inf feel fre  some
5 not very voted fo very clo very muc strong r voted fo much inf feel fre a lot
6 strong d voted fo very clo somewhat not very voted fo some inf feel fre  some
     v122     v123     v124 v125     v126 v127 v128 v129 v130 v131 v132 v133
1   never about av about ri   no            70   30   99   30   99   85   99
2 once in a lot to about ri   no            30   85   50   85   40   85   99
3 once in pretty m about ri  yes parent d   85   85   85   70   85   85   85
4 once in pretty m about ri   no            50   40   50   60   50   50   50
5 once in about av about ri   no            50   85   70   70   70   85   99
6 once in pretty m about ri   no            40   60   50   85   50   70   50
  v134   v135   v136 v137 v138   v139 v140    v141     v142     v143     v144
1   85 #NAME? #NAME?   no      #NAME?  yes most of internat national local af
2   40 #NAME? #NAME?   no      #NAME?  yes some of internat national state af
3   85 #NAME? #NAME?  yes      #NAME?  yes most of national internat state af
4   50        #NAME?   no      #NAME?  yes some of national local af state af
5   40 #NAME? #NAME?  yes      #NAME?  yes some of internat local af national
6   85 #NAME? #NAME?       yes #NAME?  yes some of internat national local af
      v145     v146     v147     v148     v149     v150     v151     v152
1 state af very act pretty s mostly g to chang often gi strong o depends,
2 local af somewhat pretty s depends, depends, depends, middle o  hard to
3 local af somewhat pretty s mostly g depends, depends, strong o  hard to
4 internat somewhat pretty s mostly g things w depends, middle o depends,
5 state af somewhat pretty s mostly g things w often gi strong o  hard to
6 state af somewhat pretty s mostly g things w often gi middle o  hard to
      v153     v154     v155 v156     v157     v158     v159    v160     v161
1 most peo try to b would tr  six yugoslav        9  correct germany democrat
2 most peo other. d would tr  six yugoslav        9  correct germany democrat
3 most peo try to b would tr four yugoslav        9  correct germany republic
4 most peo just loo would tr four yugoslav       10  correct germany democrat
5 most peo just loo would tr four don-t kn don-t kn don-t kn germany don-t kn
6 most peo try to b would tr  six  any oth        9  correct germany democrat
     v162     v163     v164     v165     v166 v167     v168     v169     v170
1 plan to 4 - 5 yr private, coeducat 500 - 99  716 4 -or fi private, coeducat
2 plan to 4 - 5 yr private, coeducat 2000 - 3   85 4 -or fi private, coeducat
3 plan to 4 - 5 yr  public, coeducat 10,000 -  232 4 -or fi  public, coeducat
4 plan to 4 - 5 yr private, coeducat 500 - 99  786 4 -or fi private, coeducat
5 plan to 4 - 5 yr private, coeducat 1000 - 1  213 4 -or fi private, coeducat
6 plan to                                       NA 4 -or fi  public, coeducat
      v171 v172    v173   m174_1   m174_2   m174_3 m174_4 v175 v176 v177
1 500 - 99  716 2 named  parents scholars work ear   loan    9   72    3
2 2000 - 3   85 2 named  parents                             8   62   28
3 10,000 -  232 2 named  parents work ear                    8   65    9
4 500 - 99  786 2 named  parents                             4   19   NA
5 1000 - 1  213 2 named  parents                             8   64    8
6 1000 - 1   22         work ear  parents                    8   65   18
      v178 v179     v180 v181     v182     v183 v184     v185     v186     v187
1  college    a        8  211 worked f all from   no no,don-t                  
2  college    b        2  150      240 part fro  yes yes,less    never no - cod
3  college    b        5   50 worked f part fro  yes yes,less    twice no - cod
4  college    b work for  150 worked f all from  yes yes,abou    never no - cod
5  college    c work for   75 didn-t w part fro  yes yes,twic once - n no - cod
6 commerci    b don-t wo   NA       26 all from   no no,don-t                  
  v188     v189   v190   v191   v192   v193   v194   v195   v196     v197
1   16 yes,seve #NAME? #NAME? #NAME? #NAME? #NAME? #NAME? #NAME?         
2   11       no #NAME? #NAME? #NAME? #NAME? #NAME? #NAME? #NAME?         
3   20       no #NAME? #NAME? #NAME? #NAME? #NAME? #NAME? #NAME? politica
4   10       no #NAME? #NAME? #NAME? #NAME? #NAME? #NAME? #NAME?         
5   16       no #NAME? #NAME? #NAME? #NAME? #NAME? #NAME? #NAME? sports-r
6    4       no #NAME? #NAME? #NAME? #NAME? #NAME? #NAME? #NAME?         
    v198     v199     v200     v201     v202 v203 v204 v205 v206     v207
1        no,but c   jewish few time bible wr    9   68   28   15 some col
2        no,but c protesta few time bible go    9   76   28   17 bachelor
3 #NAME? no,but c   jewish few time bible wr    9   84   28   19 bachelor
4        no,but c methodis almost e bible wr    9   84    5   19 bachelor
5 #NAME? no,but c presbyte almost e bible wr    8   59   23   13 bachelor
6        no,can-t  baptist almost e bible go    1    9   68   12 4 grades
      v208     v209 v210 v211 v212 v213 v214 v215 v216 v217 v218 v219     v220
1 12 grade        3    7    1   NA   NA   NA   15   NA   NA   NA   NA      may
2 some col no broth   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA    april
3 h.s. -na        2    5   NA   NA   NA   NA   14   NA   NA   NA   NA      may
4 12 grade        1   20   NA   NA   NA   NA   NA   NA   NA   NA   NA december
5 12 grade        1   NA   NA   NA   NA   NA   19   NA   NA   NA   NA    april
6 3 grades        5   15   12   11   NA   NA   13   10   NA   NA   NA     june
  v221 v222 v223 v224 v225     v226 v227     v228     v229    v230   v231  v232
1    4 1947  155  151   13            52 suburbs, suburban yes,r-s female white
2   29 1947  113  152                                      yes,r-s   male white
3    5 1947  114  114   14            14                   yes,r-s female white
4   NA 1947  152  152             9                        yes,r-s   male white
5    3 1947  316  152      less tha                        yes,r-s female white
6    5 1947  142  142      all my l                        yes,r-s   male negro
      v233     v234     v235     v236     v237 v238     v239 v240 v241     v242
1 yes,neat yes,good yes,self yes,expr               yes,coop             #NAME?
2 yes,neat          yes,self yes,didn                                  high -i.
3 yes,neat                   yes,expr                                    #NAME?
4                            yes,expr descript      yes,coop             #NAME?
5 yes,neat           yes,not yes,didn descript      yes,coop             #NAME?
6                                                                        #NAME?
      v243     v244     v245     v246 v247     v248     v249 v250     v251 v252
1 no cours high -2. accurate high -di               high per      6 correc   NA
2 no cours low  -0. accurate high -di                             6 correc   NA
3 no cours -2.2.0.0 no diffe med -agr               high per      4 correc   NA
4 no cours -2.2.0.0 accurate low -agr      high sel               4 correc   NA
5 no cours -2.2.0.0 no defin med -agr      high sel               1 correc   NA
6 one cour -2.2.0.0 accurate med -agr      high sel high per      5 correc   NA
  v253 v254 v255     v256 v257     v258 v259     v260 v261 v262     v263  v264
1   NA   70   30 97-100 d   30 97-100 d   85 97-100 d   85    2 mail -st 30032
2   NA   30   85       50   85       40   85 97-100 d   40    3  student  3171
3   NA   85   85       85   70       85   85       85   85    5  student  4087
4   NA   50   40       50   60       50   50       50   50    6  student  3172
5   NA   50   85       70   70       70   85 97-100 d   40    8 mail -st 30041
6   NA   40   60       50   85       50   70       50   85   10  student  3992
  v265    v266 v267 v268 v269     v270     v271 v272     v273 v274     v275
1   NA                NA       vermont 30,000-4    4 female r   NA         
2  313 january   16  110 9873 maryland 30,000-4    3   male r   25 male roo
3  487   april   27  110 4077  florida rural -u   14 female r   25  husband
4  313 january   15  110 9873 maryland 50,000-9    3   male r   25   father
5   NA                NA         texas 10,000-2    5 female r   NA child,se
6  425   april   10  120 6936 arkansas 100,000-    7   male r   25 male roo
  v276     v277 v278     v279 v280 v281 v282 v283  v307    v308     v309
1                                                 a lot some of don-t kn
2   23 male roo   23                              a lot some of know wha
3   27                                            a lot most of know wha
4   58   mother   52 oth male   28                 some some of know wha
5    2                                             some most of know wha
6   22                                            a lot some of don-t kn
      v310     v311 v312     v331     v332     v333     v334    v335     v336
1  few big  yes,2 -  212                                                     
2  few big yes,almo  342                                                     
3  few big  yes,3 -  269                                                     
4 for bene  yes,2 -  343                                                     
5 for bene  yes,2 -  529 govt shl govt shl govt shl govt shl                 
6  few big yes,almo   17          govt shl govt shl govt shl protect stop cri
  v337    v338 v339    v340     v341 v342     v343     v344     v345     v346
1                                         govt shl govt shl                  
2                                                                            
3                                         govt shl govt shl                  
4                                                                            
5                                         mnrty gr                           
6      protect      protect stop cri      govt shl govt shl mnrty gr govt shl
      v347     v348     v349     v350     v351 v352     v353     v354     v355
1                                     make use                                
2                                     make use                                
3                                                                             
4                                                                     set pena
5                                                            make use set pena
6 govt shl govt shl govt shl mnrty gr make use      make use make use         
      v356     v357     v358     v359 v360     v361     v362     v363     v364
1 yes,at l                                                            women me
2 yes,once                                                                    
3 yes,at l bus to a bus to a                                          women me
4 yes, les                                                            women me
5       no keep chi          keep chi                        keep chi         
6 yes, les bus to a bus to a keep chi      bus to a bus to a keep chi  women-s
     v365     v366     v367    v368     v369 v370     v371     v372 v373
1                                                                       
2                                                          court sy     
3                  women me                                people s     
4         women me                  women me               less cen     
5                                                                       
6 women-s women me  women-s women-s  women-s      change f shld spe     
      v374     v375     v376     v377     v378 v379     v380     v381     v382
1                                                   extremel slightly slightly
2                                                    liberal moderate conserva
3                                                    liberal  liberal conserva
4                                                   slightly slightly slightly
5                                     change f      conserva slightly slightly
6 change f no chang change f change f no chang      moderate  liberal slightly
      v383     v384     v385     v386     v387   v388   v389 v390 v391   v392
1  liberal slightly slightly  liberal moderate #NAME? #NAME?           #NAME?
2 slightly moderate conserva  liberal slightly #NAME?         yes      #NAME?
3  liberal  liberal conserva slightly slightly #NAME? #NAME?  yes      #NAME?
4  liberal  liberal moderate  liberal moderate        #NAME?   no      #NAME?
5 slightly extremel slightly  liberal conserva #NAME? #NAME?           #NAME?
6 conserva  liberal conserva moderate slightly #NAME? #NAME?        no #NAME?
     v393 v394     v395     v396 v397     v398     v399     v400  v401     v402
1 no,shld   no                    yes less fai began to          agree disagree
2 no,shld   no could ha lost too  yes became m less fai    agree agree    agree
3 no,shld   no could ha           yes became m ina.,cod disagree agree    agree
4 no,shld   no shld onl           yes became o ina.,cod disagree agree disagree
5 yes,did   no                    yes          ina.,cod          agree disagree
6 no,shld   no shld bom sent few  yes soldiers ina.,cod    agree agree    agree
   v403     v404     v405     v406     v407     v408     v409     v410 v411
1 agree disagree disagree disagree    agree    agree disagree    agree   NA
2 agree    agree disagree disagree disagree    agree disagree    agree  269
3 agree disagree    agree disagree disagree    agree disagree disagree  342
4 agree disagree    agree disagree    agree    agree disagree    agree  219
5 agree disagree    agree disagree disagree    agree disagree    agree   NA
6 agree    agree disagree disagree disagree disagree disagree disagree  311
  v412 v413     v414     v415     v416     v417     v418     v419    v420
1   NA   NA                            pretty s mostly g things w        
2   NA   NA                            pretty s mostly g things w usually
3  244   NA                            pretty s mostly g things w have to
4   NA   NA                            pretty s mostly g things w have to
5   NA   NA best pos best pos best pos sometime bad luck to chang        
6   NA   NA          best pos          sometime bad luck to chang have to
     v421     v422     v423     v424     v425     v426     v427     v428
1                  can-t be try to b would tr                           
2 have to  hard to can-t be try to b would ta       40 50 degre 50 degre
3 have to  hard to can-t be just loo would ta       70       70       60
4 have to  hard to can-t be try to b would tr       30 50 degre 50 degre
5                  can-t be try to b would tr                           
6 have to change m can-t be just loo would ta 50 degre       40 50 degre
      v429     v430     v431     v432     v433     v434    v435     v436
1                                                                       
2       85       40       30       60 50 degre       30      15 50 degre
3 50 degre       40       40       85       70       40      30       70
4       60       40 50 degre 50 degre 97-100 d 50 degre      15 50 degre
5                                                                       
6 97-100 d 50 degre  <actual 50 degre  <actual       85 <actual 50 degre
      v437     v438     v439     v440     v441    v442     v443    v444 v445
1                                                                           
2 50 degre 50 degre 50 degre       40 50 degre      15       70 <actual   85
3       70 50 degre 50 degre       40       40      30       70      30   70
4 50 degre 50 degre 50 degre 50 degre 50 degre      40 50 degre      35   40
5                                                                           
6 50 degre 50 degre       60  <actual       60 <actual       15 <actual   70
      v446 v447    v448     v449 v450     v451     v452     v453     v454
1                                                                        
2       60   30 <actual       60   40       70 too much too litt just abo
3       60   70      40 50 degre   30       70 just abo too litt too litt
4 50 degre   60      40       85   40 50 degre too much too litt just abo
5                                                                        
6 97-100 d   85      15            85 97-100 d too much too litt just abo
      v455     v456     v457     v458     v459     v460     v461     v462
1                                                                        
2 too much just abo too much too much too much just abo too litt too much
3 too much just abo too much too much too much too litt too litt just abo
4 just abo too litt just abo too litt too much just abo too litt just abo
5                                                                        
6 too litt too litt too litt too much too much too litt just abo too litt
      v463     v464     v465     v466     v467     v468     v469     v470
1                                                                        
2 just abo too litt just abo just abo too much just abo just abo too much
3 just abo too litt too litt just abo too much too litt just abo too much
4 too litt too litt too litt too much just abo just abo too much too litt
5                                                                        
6 just abo just abo too litt just abo too much too litt too litt too litt
      v471     v472 v473     v474 v475    v476    v477     v478     v479
1                                                                       
2 just abo just abo  six yugoslav    9 correct germany democrat democrat
3 too litt too litt four yugoslav    9 correct germany democrat democrat
4 just abo just abo  six yugoslav    9 correct germany democrat independ
5                                                                       
6 too much too litt  six yugoslav    9 correct germany republic independ
      v480     v481     v482     v483     v484 v485 v486     v487 v488     v489
1                   oth, min                     NA                            
2 not very          weak dem      yes dem; rep   70                    democrat
3   strong          strong d no, neve            NA                    democrat
4          democrat ind-demo                     NA  yes            72 democrat
5                   ind-inde                     NA                            
6           neither ind-inde                     NA      yes, dem   71 not sure
  v490 v491 v492 v493 v494 v495     v496 v497 v498 v499 v500     v501    v502
1                       NA  yes reps mor   NA   NA        NA lot more   voted
2  yes            200   NA  yes reps mor   NA   NA  201  305 lot more   voted
3  yes            400   NA  yes reps mor   NA   NA  201   NA lot more   voted
4   no                  NA  yes reps mor  104  705  811   NA little m   voted
5                       NA  yes reps mor   NA   NA        NA lot more   voted
6  yes  953             NA  yes reps mor  106   NA  206   NA little m did not
      v503     v504 v505     v506     v507 v508     v509 v510 v511     v512
1 voted fo                                                         yes, vot
2 voted fo           yes democrat differen      mostly d  yes      yes, vot
3 voted fo           yes          differen      mostly d  yes      yes, vot
4 voted fo                                                yes       not old
5 voted fo                                                         no, didn
6 would vo not regi                                        no       not old
      v513  v514 v515 v516 v517 v518 v519     v520 v521     v522 v523 v524 v525
1 humphrey        yes 1968  102 1972  101           yes                      NA
2    nixon        yes 1972  101           all 3 ch   no                      NA
3 humphrey        yes 1968  102           all 3 ch  yes 1970-197  131        NA
4          nixon   no                                no                      NA
5                 yes 1972                           no                      NA
6                  no                                no                      NA
  v526 v527 v528 v529 v530 v531 v532 v533 v534 v535 v536 v537 v538 v539 v540
1  yes                  NA  yes                       no        NA        NA
2   no                  NA  yes 1972  101             no        NA        NA
3   no                  NA   no                       no        NA        NA
4   no                  NA   no                      yes 1972  170 1972  171
5   no                  NA  yes                       no        NA        NA
6   no                  NA   no                       no        NA        NA
      v541 v542 v543 v544 v545 v546 v547     v548 v549 v550 v551 v552     v553
1      yes 1969           1972           no, neve                  NA      yes
2 no, neve                               no, neve                  NA no, neve
3      yes 1972  445                     no, neve                  NA no, neve
4 no, neve                               no, neve                  NA no, neve
5 no, neve                               no, neve                  NA no, neve
6 no, neve                               no, neve                  NA      yes
  v554     v555 v556     v557     v558     v559     v560     v561  v562
1 1967 peace;an 1969 peace;an      yes     1965  private     1967      
2                             no, neve                                 
3                             no, neve                                 
4                             no, neve                                 
5                                  yes 1972-197 ecology, 1972-197 other
6 1965 pro-civi               no, neve                                 
      v563 v564 v565 v566 v567 v568     v569     v570     v571     v572
1 never ma                  NA                                         
2 never ma                  NA                                         
3  married    3   no        NA      democrat strong d yes, vot mcgovern
4 never ma                  NA                                         
5  married    4             NA                                         
6 never ma                  NA                                         
      v573     v574 v575 v576 v577 v578 v579 v580 v581 v582     v583     v584
1                     NA   NA        NA   NA             NA                  
2                     NA   NA        NA   NA             NA somewhat independ
3 yes,pret yes,occa   80   NA  yes   80   NA             NA somewhat democrat
4                     NA   NA        NA   NA             NA somewhat democrat
5                     NA   NA        NA   NA             NA                  
6                     NA   NA        NA   NA             NA somewhat independ
      v585     v586     v587     v588     v589     v590     v591     v592
1 not very                                              strong d         
2 yes, rep                   pretty c somewhat independ yes, rep         
3 strong d yes, she mcgovern pretty c very muc democrat strong r 1 yes, h
4 not very yes, she    nixon pretty c somewhat republic strong r 1 yes, h
5 no, neit                                              no, neit         
6 yes, dem                   pretty c somewhat independ yes, dem         
      v593     v594 v595 v596 v597     v598     v599 v600     v601 v602 v603
1                          NA   NA                          better   NA   NA
2          pretty c   no   NA   NA     same               about th   NA   NA
3 mcgovern pretty c  yes   31   NA worse -i increase        better   54   NA
4    nixon very clo  yes   44   NA     same               about th   NA   NA
5                          NA   NA                           worse   NA   NA
6          pretty c   no   NA   NA better -  members      about th   NA   NA
  v604     v605 v606 v607     v608 v609 v610 v611    v612    v613 v614     v615
1   NA                                         NA                   no         
2   NA r not li 1200      2 - 3 ti             NA                  yes     army
3   NA r not li 1000      2 - 3 ti       350 1000 once yr once yr   no         
4   NA r living                                NA                  yes air forc
5   NA                                         NA                   no         
6   NA r not li   80      once a m             NA                  yes     army
  v616 v617 v618     v619 v620 v621     v622 v623 v624 v625    v626 v627
1   NA   NA                                                 lived 7   NA
2   69   71  yes            no      very dis   no           lived 5   47
3   NA   NA                                                 lived 2   43
4   70   73   no definite   no      very dis   no           lived 3   52
5   NA   NA                                                 lived 6   46
6   67   71  yes           yes   18 somewhat   no           lived 7   71
      v628     v629 v630     v631     v632 v633     v634     v635 v636     v637
1 self rep four yea   71 self rep one year   12 other sm two year   NA         
2 other sm four yea   52 self rep one year   47 other sm one year   49 other sm
3 other sm five yea   43 non-smsa one year   NA                     NA         
4 non-smsa four yea   43 other sm one year   49 other sm one year   NA         
5 non-smsa one year   43 other sm one year   49 non-smsa one year   47 non-smsa
6 self rep two year   NA                     49                     61         
      v638 v639     v640     v641     v642 v643     v644 v645     v646 v647
1 one year   NA          one year  staying   NA           yes college,     
2 three ye   NA          one year  staying   NA           yes college,     
3            NA                    staying   NA           yes college,     
4            NA                    staying   NA           yes college,     
5 one year   52 self-rep one year thinking   NA           yes college,     
6            40                   thinking   21 self-rep   no              
      v648    v649    v650 v651     v652     v653 v654 v655     v656     v657
1 r attend      NA      NA  yes bachelor                 NA                  
2 r attend 5052518  142316  yes bachelor master-s        NA      six no,r not
3 r attend 2322518      NA  yes bachelor                 NA 4 -throu no,r not
4 r attend 7862112 6782515  yes bachelor                 NA     five yes,r co
5 r attend      NA      NA   no                          NA                  
6 r did no      NA      NA                               NA                  
      v658     v659    v660     v661 v662 v663 v664 v665 v666    v667     v668
1                                                                             
2 business no speci b -+,-- dormitor    1    4    4    2    1 parents         
3 sociolog english; c -+,-- apartmen none none none    1   12 parents         
4 psycholo no speci b -+,-- dormitor none none    4   30    2 parents fellowsh
5                                                                             
6                                                                             
  v669     v670 v671     v672 v673 v674    v675     v676    v677 v678 v679
1                               NA                            NA   NA     
2   no               somewhat   NA                            NA   NA     
3  yes my moral      somewhat   12  yes college spouse d 2322518   NA  yes
4   no               very sat   NA                            NA   NA     
5                               12  yes college spouse d      NA   NA  yes
6                               NA                            NA   NA     
      v680 v681 v682     v683     v684 v685 v686 v687 v688    v689 v690 v691
1                              wkg now   NA        58   18                NA
2                              wkg now   NA        18   16 someone        NA
3 bachelor           4 -throu  wkg now   NA        48   19 someone        NA
4                              wkg now   NA        58   19 someone        NA
5 bachelor                    housewif   NA        NA   NA                NA
6                              wkg now   NA       239   13 someone        NA
  v692     v693 v694 v695 v696 v697 v698 v699 v700    v701     v702    v703
1                                NA   NA        NA         r has no        
2   48 one year  yes             NA   NA        NA         r has no        
3   40        3                  NA   NA        NA         r has sp working
4   40 one year   no             NA   NA        NA student r has no        
5                                NA   NA        NA         r has sp working
6   40 one year   no             NA   NA        NA         r has no        
  v704 v705 v706    v707 v708 v709 v710 v711 v712 v713 v714 v715 v716 v717 v718
1   NA        NA                NA   NA                       NA             NA
2   NA        NA                NA   NA                       NA             NA
3   NA        13 someone        NA   40    5                  NA             NA
4   NA        NA                NA   NA                       NA             19
5   NA        64                NA   NA                       NA             NA
6   NA        NA                NA   NA                       NA             NA
      v719 v720 v721 v722 v723 v724 v725 v726 v727 v728 v729     v730     v731
1            NA        NA        NA        NA                                 
2            NA        NA        NA        NA        no      not a me not a me
3            NA        NA        NA        NA        no      fairly a not a me
4 one year   NA        NA        NA        NA        no      not a me not a me
5            NA        NA        NA        NA                                 
6            NA        NA        NA        NA        no      not a me not a me
      v732     v733     v734     v735     v736     v737     v738    v739
1                                                                       
2 not a me fairly a not a me not a me not a me not a me not a me belongs
3 not a me not a me not a me not a me not a me not a me not a me belongs
4 not a me not a me not a me not a me not a me not a me not a me belongs
5                                                                       
6 not a me not a me not a me not a me not a me not a me not a me belongs
     v740     v741 v742 v743     v744     v745     v746 v747     v748     v749
1                                                              jewish    never
2 country fairly a            $25,000 $7,000 t rent -or      no prefe         
3                             $15,000 $7,000 t  own -or        jewish few time
4                             $35,000  $10,000 living h  one methodis few time
5                                                            presbyte few time
6                            $3,000 t $3,000 t rent -or      oth prot every we
      v750     v751   v752  v753     v754     v755     v756     v757     v758
1 bible go          female white high -di                   high sel         
2 bible go no, midd   male white med -agr low -agr          high sel         
3          yes,midd female white low -agr high -di          high sel low pers
4 bible wr yes,midd   male white med -agr high -di          high sel         
5 bible wr          female white med -agr                   low self         
6 bible go  yes,wkg   male black med -agr low -agr low opin low self low pers
      v759     v760     v761     v762 v763 v764 v765 v766     v767 v768 v769
1 low poli            #NAME?            NA   NA           bachelor        99
2          6 correc most cos broad un   NA   NA           bachelor   40   50
3          5 correc   #NAME? broad un   NA   NA           bachelor   70   70
4          6 correc   #NAME? broad un   NA   NA           bachelor   30   50
5                     #NAME?            NA   NA           some col        99
6 low poli 5 correc   #NAME? broad un   NA   NA           no colle   50   40
  v770 v771 v772 v773 v774 v775 eligible68 pid65 pid73 nixon72 mcgovern repelig
1                                        1     0    NA       0        1       0
2   50   40   50   50   50   50          1     5     1       0        1       0
3   60   40   70   70   50   50          1     0     0       0        1       0
4   50   40   50   50   50   50          0     5     2       1        0       0
5                                        1     5     3       1        0       1
6   50   50   50   50   50   60          1     2     3       0        0       0
  demelig black strngpid65 strngpid73 rep65 voted68 dem65 dem73 rep73 polintf
1       1     0          3         NA     0       1     1    NA    NA       1
2       1     0          2          2     1       1     0     1     0       0
3       1     0          3          3     0       1     1     1     0       0
4       0     0          2          1     1       0     0     1     0       0
5       0     0          2          0     1       0     0     0     0       0
6       0     1          1          0     0       0     1     0     0       1
  polintm pidm pidf col1 col2 col3 col4 gn1 vtf1 vtm1 plf1 plf2 plf3 plm1 plm2
1       1    0    0    0    1    0    0   0    1    1    0    1    0    0    1
2       0    5    6    0    1    0    0   1    1    1    1    0    0    1    0
3       1    0    0    0    1    0    0   0    1    1    1    0    0    0    1
4       1    0    6    0    1    0    0   1    0    1    1    0    0    0    1
5       0    6    5    0    1    0    0   0    0    1    1    0    0    1    0
6       1    1    0    0    0    1    0   1    1    1    0    1    0    0    1
  plm3 pm1 pm2 pm3 pm4 pm5 pm6 pf1 pf2 pf3 pf4 pf5 pf6 strngonly73 elig2false
1    0   1   0   0   0   0   0   1   0   0   0   0   0          NA          1
2    0   0   0   0   0   0   1   0   0   0   0   0   0           0          1
3    0   1   0   0   0   0   0   1   0   0   0   0   0           1          1
4    0   1   0   0   0   0   0   0   0   0   0   0   0           0         NA
5    0   0   0   0   0   0   0   0   0   0   0   0   1           0          1
6    0   0   1   0   0   0   0   1   0   0   0   0   0           0          0
  newnixon68 newhumphrey68 newrepelig68 newdemelig68 eldem68 elrep68 hum68dem65
1          0             1            0            1       1       0          1
2          1             0            1            0       0       1          0
3          0             1            0            1       1       0          1
4          1             0            0            0       0       0          0
5          0             0            0            0       0       1          0
6          0             0            0            0       1       0          0
  hum68rep65 newhum68dem65 newhum68rep65 newnx68dem65 newnx68rep65 nx68dem65
1          0             1             0            0            0         0
2          0             0             0            0            1         0
3          0             1             0            0            0         0
4          0             0             0            0            0         0
5          0             0             0            0            0         0
6          0             0             0            0            0         0
  nx68rep65 consistent incons consel inconsel elhumix72 newnixix72 newhumix72
1         0          1      0      1        0         0          0          0
2         1          1      0      1        0         0          0          0
3         0          1      0      1        0         0          0          0
4         1          1      0      0        0         0          1          0
5         0          0      0      0        0         1          0          0
6         0          0      0      0        0         0          0          0
  newrepelig68nix72 newdemelig68nix72 elhummc72 newnixmc72 newhummc72
1                 0                 0         1          0          1
2                 0                 0         1          1          0
3                 0                 0         1          0          1
4                 0                 0         0          0          0
5                 0                 0         0          0          0
6                 0                 0         0          0          0
  newrepelig68mc72 newdemelig68mc72     v314     v323 knowledge65     instr
1                0                1  yes,2 -  most of           7 0.5987159
2                1                0  yes,2 -  most of           7 0.5987159
3                0                1 yes,almo  some of           5 0.5987159
4                0                0       no  some of           5 0.1592357
5                0                0  yes,2 - only now           2 0.5987159
6                0                0  yes,2 -  most of           6 0.5987159

This looks wild. The data set includes various variables which are not labeled really well. For now, let’s focus on the key variables presented above.


Exercise 2: Regress the outcome (strngpid73) on the treatment (voted68) using lm(). Does the OLS provide a causal estimate?

Reveal Answer
ols <- lm(strngpid73 ~  voted68, data= dinas)
summary(ols)

Call:
lm(formula = strngpid73 ~ voted68, data = dinas)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.6187 -0.6187  0.3813  0.5741  1.5741 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.42593    0.04897  29.117  < 2e-16 ***
voted68      0.19276    0.06847   2.815  0.00499 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.9521 on 772 degrees of freedom
  (16 observations deleted due to missingness)
Multiple R-squared:  0.01016,   Adjusted R-squared:  0.008881 
F-statistic: 7.927 on 1 and 772 DF,  p-value: 0.004995


The naive OLS provides an estimate of 0.19, which means that having voted in the 1968 election is associated with an increase of party identification of 0.19 on the PID scale. However, this does not provide a causal estimate: As we know, there are various factors that are likely to affect both the outcome and treatment. All we can say based on the OLS is the size of the bivariate correlation between these two variables.

Let’s also look at the visual relationship between these two variables:

# Using ggplot
ggplot(dinas, aes(x=voted68, y=strngpid73)) + 
  geom_point()+
  geom_smooth(method=lm) +
  xlab("Voted in 1968") + 
  ylab("PID Strength in 1973")

We can actually a slightly significant increase in PID strength for those who voted in 1968. The question is: Can we causally say that having voted is the reason for this?



IV Regression: 2SLS

We now know that a simple OLS doen’t provide any causal estimate. Let us now try to estimate the true treatment effect using an instrumental variable design. Following the author, we will be using the eligibility of respondents to vote in the 1968 election (eligible68) as instrument: That is, we exploit the randomness of respondents’ birthdays that determine their eligibility to vote in 1968. To do so, let’s separately look at the first and second stage.

Exercise 3: Investigate the relationship between treatment (voted68) and instrument (eligible68)

There are several ways to do this. Feel free to pick the option you deem most appropriate.

Reveal Answer
table(dinas$eligible68, dinas$voted68)
   
      0   1
  0 132  25
  1 250 373


This looks ok. Of the 623 respondents who were eligible to vote in 1968, 373 did so. 250 decided not to vote. 25 respondents indicated that they voted although they were not eligible. How can that be? Most likely, they simply reported they voted even though they did not. They might have done so intentionally or misremembered the election - which might happen in such a long panel. It’s a bit annoying, but there’s not much we can do about it.


Let’s now calculate the first stage.

Exercise 4: Regress the treatment on the instrument and extract the predicted values

Note: Make sure to add the argument na.action=na.exclude to your lm() function in order to deal with missing values. You can use predicted_values <- predict(OLS_model) to extract the predicted values.

Reveal Answer
# Calculating the first stage. Note `na.action=na.exclude` deals with NAs so we can use the predicted values for the second stage
first=lm(voted68~eligible68, data=dinas, na.action=na.exclude)

# Extracting predicted values
vote_pred=predict(first)

# Displaying regression output
summary(first)

Call:
lm(formula = voted68 ~ eligible68, data = dinas, na.action = na.exclude)

Residuals:
    Min      1Q  Median      3Q     Max 
-0.5987 -0.5987  0.4013  0.4013  0.8408 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.15924    0.03738    4.26  2.3e-05 ***
eligible68   0.43948    0.04183   10.51  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.4684 on 778 degrees of freedom
  (10 observations deleted due to missingness)
Multiple R-squared:  0.1243,    Adjusted R-squared:  0.1231 
F-statistic: 110.4 on 1 and 778 DF,  p-value: < 2.2e-16


We can see that the instrument (eligible68) is indeed a strong and significant predictor of the treatment. That is what we hope for and expect. It’s also convincing to think that eligibility - i.e. respondents’ birthdays - is fully random.

Unfortunately, tThe first stage cannot tell you whether an instrument is appropriate. However, it can tell you something about inappropriate instruments. A common problem in IV designs are weak instruments. That is, if your instrument is only weakly correlated with the endogenous variable (i.e. the treatment), it is likely to render biased results. The F-Statistic of the first stage can be used to identify weak instruments. As a rule of thumb, your instrument is likely to be problematic if the F-Statistic of your first stage regression is below 10.

Going back to the regression output, we see that our F-Statistic here is about 110 - so nowhere near the conventional threshold. Our instrument is strongly correlated with the treatment as it should be - but note that this does not automatically mean that it necessarily is a valid instrument.


Let’s now proceed to test the exclusion restriction.

Exercise 5: Test the exclusion restriction for the instrument.

Hint: Show that the instrument affects the outcome only through the treatment.

Reveal Answer


If you have regressed the outcome on the instrument (and the treatment), this might help familiarise yourself with the data - but it does not provide a test of the exclusion restriction. In fact, it is impossible to statistically test the exclusion restriction. All we can do is rely on theory and build a convincing case for alternative effects not taking place. The problem with a regression of Y on Z (and D) is that we still cannot observe further confounders and account for their effects. We can’t know if their effect does not come into play in such a regression.

Let’s plot the relationship between the outcome and the instrument nonetheless. As stated above, we can’t tell whether the assumption holds, but we could find that the exclusion restriction is likely to be violated.

Exercise 6: Plot the relationship between the outcome and instrument.

There are several ways to do this. Feel free to pick the option you deem most appropriate.

Reveal Answer
# Using ggplot
ggplot(dinas, aes(x=eligible68, y=strngpid73)) + 
  geom_point()+
  geom_smooth(method=lm) +
  xlab("Elegibility in 1968") + 
  ylab("PID Strength in 1973")


This looks as expected. There is no clear and significant association between the two variables. Recall that eligibility itself should not affect party identification strength unless respondents have voted in 1968 as only voting should affect the outcome.



Let’s now return to our IV model by calculating the second stage of our 2SLS model.

Exercise 7: Regress the outcome on the predicted values from the first stage

Reveal Answer
# Calculating the first stage
second_wrongSE=lm(strngpid73~vote_pred, data=dinas)

# Displaying regression output
summary(second_wrongSE)

Call:
lm(formula = strngpid73 ~ vote_pred, data = dinas)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.5525 -0.5525  0.4475  0.4475  1.5871 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   1.3623     0.1055  12.918   <2e-16 ***
vote_pred     0.3176     0.1952   1.627    0.104    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.9554 on 772 degrees of freedom
  (16 observations deleted due to missingness)
Multiple R-squared:  0.003417,  Adjusted R-squared:  0.002126 
F-statistic: 2.647 on 1 and 772 DF,  p-value: 0.1042


The second stage uses the predicted values for the treatment from the first stage. Calculating the second stage, the output indicates that - once we instrument for voting in 1968 - the decision to cast a vote in 1968 does not have a significant effect on party identification.

However, calculating the two stages separately we have not adjusted standard errors and measures of uncertainty. Accordingly, hypothesis testing is likely to provide false results if we rely on such biased measures.


2 IV Regression using 2SL2 in one step


There are several packages that we could use to retrieve a two-stage least squares instrumental variables estimator. Let’s now conduct 2SLS using the ivreg(), iv_robust(), iv_feols(). See the below the syntax for each of these functions below:

Exercise 8: Conduct a two-stage least squares instrumental variable using strngpid73 as the outcome. voted68 as the endogenous predictor and eligible68 as the instrument. Use the ivreg(), iv_robust() functions. Store these models in a list (list()) and report them using the modelsummary() function. Interpret the results.

Variable Description
O Outcome variable
E Endogenous variable.
I Instrument variable.
FE Fixed Effect variable
ivreg(O ~ E | I, data = data ) # ivreg package

iv_robust(O ~ E | I, data = data) # estimatr package

feols(O  ~ E | FE | I, data = data) # 


Reveal Answer
## ivreg ## 
ivreg_model <- ivreg(strngpid73 ~ voted68 | eligible68, data = dinas)

ivreg_model_clustered <- cluster.vcov(ivreg_model, dinas$v7) #This restimates the model and uses clustered SEs.

iv_clustered <- coeftest(ivreg_model, ivreg_model_clustered)

## iv_robust ## 
iv_robust_model <- iv_robust(strngpid73 ~ voted68 | eligible68, data = dinas, cluster = v7) # cluster by 

ivmodels <- list(ivreg_model, iv_robust_model)
rows <- tribble(~term,  ~ OLS1,  ~OLS2,
                'Covariates', 'No', 'No') # add one row reporting covariates
attr(rows, 'position') <- c(5)  ### Change location accordingly  

title <- 'Two-stage Least Squares Models' # add the title to your model

coeffs <- c('(Intercept)'= 'Intercept',
                     'voted68' = 'Voted') # rename coefficients 

# regression table 
modelsummary(ivmodels, estimate = "{estimate}{stars}",coef_map = coeffs, gof_omit = 'DF|se_type', add_rows = rows, title = title)
Two-stage Least Squares Models
Model 1 Model 2
Intercept 1.362*** 1.362***
(0.106) (0.095)
Voted 0.319 0.319+
(0.196) (0.175)
Covariates No No
Num.Obs. 774 774
R2 0.006 0.006
R2 Adj. 0.005 0.005
Std.Errors by: v7
statistic.endogeneity
p.value.endogeneity
statistic.weakinst
p.value.weakinst
statistic.overid
p.value.overid

We find that both functions generate the same results and standard errors. The Local Average Treatment Effect is 0.319. Remember in your assignments to explain with detail what the coefficient substantially means.


In 2SLS we can include covariates to capture the covariate-adjusted LATE. Let’s include some covariates to the 2SLS. We can also add additional instruments to our model.

Exercise 9: Use the ivreg() function and include the following covariates: col1 and col2. Use the same endogenous treatment variable voted68. Include the following instruments col1, col2, eligible68 as instruments. Report the results of this estimation using the summary() function. Include the arguments in the table below to the summary function. Report what is the F-Statistics for this specification. Are the instruments that we using strong or weak instruments?

Function/argument Description
Summary() Generic function to produce results summaries of fitting functions
diagnostics Set equal to TRUE it provides a number of diagnostic test.


Reveal Answer
ivreg_covariates <- ivreg(strngpid73 ~ col1 + col2 + voted68 | 
             col1 + col2 + as.factor(knowledge65) + eligible68, data = dinas)

summary_ivreg <- summary(ivreg_covariates, diagnostics = TRUE)
summary_ivreg

Call:
ivreg(formula = strngpid73 ~ col1 + col2 + voted68 | col1 + col2 + 
    as.factor(knowledge65) + eligible68, data = dinas)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.8142 -0.6389  0.1858  0.7256  1.8114 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.44974    0.09496  15.267   <2e-16 ***
col1        -0.26116    0.14418  -1.811   0.0705 .  
col2        -0.17531    0.07732  -2.267   0.0236 *  
voted68      0.36445    0.18310   1.990   0.0469 *  

Diagnostic tests:
                 df1 df2 statistic p-value    
Weak instruments   8 763    15.967  <2e-16 ***
Wu-Hausman         1 769     0.816   0.367    
Sargan             7  NA     4.225   0.754    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.9527 on 770 degrees of freedom
Multiple R-Squared: 0.0115, Adjusted R-squared: 0.007647 
Wald test:  2.81 on 3 and 770 DF,  p-value: 0.03861 
# Add clustered robust standard errors
ivreg_covariates_clustered <- cluster.vcov(ivreg_covariates, dinas$v7)
coeftest(ivreg_covariates, ivreg_covariates_clustered)

t test of coefficients:

             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.449743   0.083194 17.4260  < 2e-16 ***
col1        -0.261162   0.124099 -2.1045  0.03566 *  
col2        -0.175310   0.073302 -2.3916  0.01701 *  
voted68      0.364451   0.158050  2.3059  0.02138 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We observed that voting in 1968 has a positive and statistically significant effect on partisanship strength. Also from the summary function, we see several diagnostic tests generated once we set diagnostic argument equal to TRUE.

If the are more instruments than causal parameters the model is overidentified. If there are as many instruments as causal parameters, the model is just identified. However, if we include more instruments, it is harder to meet the exclusion restriction. One test that we can conduct is the Sagan-Hausman test. This test compares the overidentified model versus a model with a subset of instruments, and how they differ in their sampling variation. In our case, the Sargan test is not significant. The null hypothesis here is that all instruments are valid.

The weak instruments test means that the instrument has a low correlation with the endogenous explanatory variable, which support the assumption of independence that means the instrument doesn’t affect the outcome directly. The Wu-Hausman test performs an efficiency test that reports whether the IV estimation is just as consistent as OLS. Therefore the null hypothesis is that OSL estimates are consistent. In this case, we can claim that our IV model is as good as OLS (which is more efficient).


We can obtain the Local Average Treatment Effect (LATE) by computing the difference of the conditional expectations of the outcome on the instrument (reduced form) divided by the difference of the conditional expectations of the treatment take-up on the instrument (first stage). Put it more simply, calculating the difference in the mean of the outcome between units assigned to the treatment minus those units not assigned to the treatment. Then, we divide this number by the difference in compliance rates.

Variable/Average Description
Y Outcome
Z Instrument
D Endogenous treatment
Y[Z=1] Average outcome conditional for units offered the treatment
Y[Z=0] Average outcome conditional for unit not offered the treatment
D[Z=1] Proportion of units receiving the treatment for those assigned to the treatment
D[Z=0] Proportion of units receiving the treatment for those not offered the treatment

The Wald Estimator is then:

\[\tau=\frac{Y[Z=1]-Y[Z=0]}{D[Z=1]-D[Z=0]}\]

Exercise 10: Manually calculate the Wald Estimator. Use the mean(x, na.rm = T) to calculate the means of each group. You can use the following syntax to obtain the conditional means.

mean(data$outcome[data$endongeous_variable == 1], na.rm = TRUE) # 1 for those that voted, 0 for those that didn't vote 

mean(data$outcome[data$instrument == 1], na.rm = TRUE)  # 1 for those that were eligible, 0 for those that were not eligible. 


Reveal Answer
#Numerator
mean(dinas$strngpid73[dinas$eligible68==1], na.rm=T)
[1] 1.5488
mean(dinas$strngpid73[dinas$eligible68==0], na.rm=T)
[1] 1.407643
#Denominator
mean(dinas$voted68[dinas$eligible68==1], na.rm=T)
[1] 0.5987159
mean(dinas$voted68[dinas$eligible68==0], na.rm=T)
[1] 0.1592357

Then, \(\tau=\) is equal to:

(mean(dinas$strngpid73[dinas$eligible68==1], na.rm=T) - mean(dinas$strngpid73[dinas$eligible68==0], na.rm=T)) / (mean(dinas$voted68[dinas$eligible68==1], na.rm=T) - mean(dinas$voted68[dinas$eligible68==0], na.rm=T))
[1] 0.3211901

We see that the estimate of the Wald estimator is 0.32, which is pretty close to the estimate obtained from the ivreg() function. In your assignments remember to state what the 0.32 means with much detail as possible.


How would you compute the Wald estimator for a binary endogenous variable and a binary instrument, but that includes covariates?


Reveal Hint 1

Remember that the beta coefficient of your variable of interest (let’s call it \(X_{1i}\)) and the control variable \(X_{2i}\) is equal to:

\[\beta_1 = \frac{Cov(Y_i, \tilde{X_{1i}})}{V(\tilde{X_{1i}})}\]



Reveal Hint 2

The 2SLS estimator is the ratio of the reduced form divided by the first stage, where \(\tilde{Z_i}\) is the residual from the regression of \(Z_i\) on the covariate(s). (The variances are the same, thus they cancel out).

\[\lambda_{\text{2SLS}} = \frac{Cov(Y_i, \tilde{Z_i})}{Cov(D_i, \tilde{Z_i})}\] Here we can use the cov() function. You can see the arguments of this function below:

Function/argument Description
cov(x, y) Calculates the covariance between two variables x and y
use character indicating how missing values should be treated
pairwise.complete.obs Determines how the parameters of the covariance function are computed. More details below

Setting use equal to pairwise.complete.obs it computes the mean and variance of x and y using all the non-missing observations separately. Then, the correlation between the two variables is calculated using only those observations that both variables have non-missing values.

tau_cov =cov(dinas$eligible68,dinas$strngpid73, use = "pairwise.complete.obs")/
  cov(dinas$eligible68,dinas$voted68, use = "pairwise.complete.obs")
tau_cov
[1] 0.3205741


We know that using IV we can only estimate the Local Average Treatment Effect. This means that we are estimating the causal effect for one particular group of treated units, which are the compliers.

Exercise 11: Calculate the proportions of compliers, defiers, always-takers, and never takers. Give some labels to the variables, so we can easily identify each group. You can use the factor() function. You can see a description of the syntax below. Give the following labels to the eligible68 variable: “Not eligible” and “Eligible”. For the voted68 variable “Not voted”, “Voted”. Finally, why do we impose the monotonicity assumption on IV?*

Function/argument Description
factor To encode a vector as a factor
levels An optional vector of the unique values
labels An optional character vector of labels for the levels
data$variable = factor(data$variable, levels = c(1, 2, 3,..,5), 
                       labels = c("One", "Two", "Three"..."Five")) 


Reveal Answer
dinas$eligible68n=factor(dinas$eligible68,
                        levels=c(0,1),
                        labels=c( "Not Eligible", "Eligible"))
dinas$voted68n=factor(dinas$voted68,
                     levels=c(0,1),
                     labels=c("Not Voted","Voted"))


table(dinas$eligible68n, dinas$voted68n)
              
               Not Voted Voted
  Not Eligible       132    25
  Eligible           250   373

From the table above, we can see that the number of respondents that were not eligible and didn’t vote is 132, this group is composed of never-takers and compliers. The 25 subjects are respondents that were not eligible and voted anyway. This group is comprised of always takers and defiers. The 250 are respondents that were eligible but didn’t vote anyway. This group is composed of never takers, plus defiers. Finally, we have 373 respondents that were indeed eligible and indeed voted. This group is composed of always takers and compliers.

By imposing the monotonicity assumption, we rule out the existence of defiers, thus this means that 25 respondents that were not eligible to vote and voted anyway are indeed always takers (25/373=0.06). Similarly, the 250 respondents are all never takers (250/(132+250) = 0.65). Finally, the proportion of compliers in the control group is 1-0.65 = 0.35 and in the treatment group are 1-0.06=0.94. If you remember from the Wald estimator, the proportion of compliers (in the denominator) was 0.59, which is the same as 0.94-0.35 = 0.59.


There are several diagnostics that we could conduct in order to the validity of an instrument. In particular, we can conduct what is called a placebo test. In this study to test whether the differences in partisan strength is driven by the age gap the author does the following: It splits all eligible voters into two groups the “young” eligibles and the “old” eligibles. The young voters are the ones that were born before May 1947 and the old voters are those that were born since June 1947. It is important to stress that both groups are eligible to vote. Then, the younger group is treated as non-eligible to vote in 1968.

Exercise 12: Conduct a placebo test Use the lm() function and as the main outcome the partisans strength measured on 1973 strngpid73 and also in 1965. strngpid65. Use the elig2false as the placebo treatment variable. Remember to cluster the standard errors. How do you interpret this?


Reveal Answer
plac <- lm(strngpid73 ~ elig2false, data=dinas)
# Cluster standard errors
plac.vcovCL <- cluster.vcov(plac, dinas$v7)
coeftest(plac, plac.vcovCL)

t test of coefficients:

             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.571111   0.046317 33.9208   <2e-16 ***
elig2false  -0.079683   0.085554 -0.9314    0.352    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plac2 <- lm(strngpid65 ~ eligible68, data=dinas)
plac2.vcovCL <- cluster.vcov(plac2, dinas$v7)
coeftest(plac2, plac2.vcovCL)

t test of coefficients:

             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.782609   0.067535 26.3953   <2e-16 ***
eligible68  -0.025852   0.081025 -0.3191   0.7498    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We see that there is no statistically significant difference between young and old eligible voters. Thus this evidence suggests that age would not be driving differences in partisanship strength.


Exercise 13: Think in potential ways that the exclusion restriction could be violated in this setting. Which other paths the instrument could affect the outcome apart from the endogenous treatment. We will discuss this at the end of the lab.


HOMEWORK (We will provide the answers next week)

  1. Should you include all non-endogenous covariates in the first state? Why yes or why not?
  2. What is the main identification assumption of instrumental variable estimation? How can you test it?
  3. Can you use more than one exogenous variable (multiple Zs) for one endogeneous (D)?
  4. What’s the difference between ITT and LATE from IV? Discuss w/ reference to compliers.
  5. What’s the forbidden regression? Why is it forbidden?

Copyright © 2022 Felipe Torres Raposo & Kenneth Stiller.