Lab 6: Instrumental Variables
Week 6 HT 2022
Recap
The Problem of Unobservables
So far, we discussed randomised experiments and selection on observables. But what about cases in which we do not (cannot) observe covariates? In such case, the conditional independence assumption does not hold. Take, for example, the following scenario: We would like to estimate the effect of D on Y, but are not able to observe the confounding variable U. Since U affects both the independent and dependent variable of interest, any naive estimate of the effect of D will be biased.
Instrumental Variables
An instrumental variable (IV) design helps us circumvent this problem. If D is partly determined by Z, our instrument, we can estimate the effect of D on Y, despite being unable to observe U. To do so, Z must be determined as-if random and only affect Y through D, i.e. affect the outcome only through the treatment. Thus, we can take advantage of the variance randomly introduced by Z. In other words, IVs only allow us to estimate the effect for compliers - that is, those units whose D is affected by Z. The local average treatment effect - or complier average causal effect is then as follows:
\[ LATE = \frac{E[Y_i|Z_i=1] - E[Y_i|Z_i=0]}{E[D_i|Z_i=1] - E[D_i|Z_i=0]} = \frac{ITT_Y}{ITT_D} \]
Two Stages Least Squares (2SLS)
In practice, IV designs are often estimated using 2SLS regressions. As opposed to manual calculations of the treatment effect, these estimators provide correct and robust measures of uncertainty and allow for the inclusion of covariates. The principle is simply: In the first stage, the treatment is regressed on the instrument and, possibly, covariates. The predicted values are then used in the second stage to fit the model. Importantly, both stages always need to include exactly the same covariates.
First Stage : \[ D_i = \alpha_1 + \phi Z_i + \beta_1 X_{1i} + \gamma_1 X_{2i} + e_{1i} \]
Second Stage: \[ Y_i = \alpha_2 + \lambda \hat{D}_i + \beta_2 X_{1i} + \gamma_2 X_{2i} + e_{2i} \]
IV Assumptions
For an IV design to be valid, several assumptions have to be met. In practice, this can be very hard to achieve. The five assumptions are:
Monotonicity: There are no defiers
Exclusion Restriction: The instrument affects the outcome only through the treatment
Non-Zero Complier Proportion: The instrument affects the treatment
Random Assignment of Z: The instrument is unrelated to potential outcomes
SUTVA
To satisfy these assumptions, usually good knowledge of context and the particular mechanisms is required. This is particularly the case for the exclusion restriction, which cannot be tested statistically. Accordingly, we must be able to make a convincing case for the assumption to hold. If there are good reasons to believe that the assumption does not hold, the IV design is likely invalid.
Before starting this seminar
Create a folder called “lab6”
Download the data (you can use the button or the one at the top, or read csv files directly from github):
Open an R script (or Markdown file) and save it in our “lab6” folder.
Set your working directory using the setwd() function or by clicking on “More“. For example setwd(“~/Desktop/Causal Inference/2022/Lab6”)
Let’s install an load packages that we will be using in this lab:
library(stargazer) # generate formated regression tables
library(texreg) # generate formatted regression tables
library(tidyverse) # to conduct some tidy operations
library(plm) # conduct one-way and two-way fixed effects
library(estimatr) # to conduct ols and provides robust standard errors
library(lmtest) # calculates the variance-covariance matrix to be clustered by group.
library(multiwayvcov) # To cluster SEs
library(ivpack) # Calculates IV models
library(ivreg)
library(modelsummary)
library(fixest)
Seminar Overview
In this seminar, we will cover the following topics:
1. Manually estimate the treatment effect using an instrumental variable and the lm()
function
2. Run an IV regression using ivreg()
, iv_robust()
and iv_feols()
3. Present the output of 2SLS regressions
4. Manually calculate the Wald estimator
5. Use Placebo tests to support the validity of the IV design.
6. Check for weak instruments
Does Choice Bring Loyalty?
Today will work with data from Elias Dinas’ work on Does Choice Bring Loyalty?. In this paper, the author seeks to understand the foundation of partisan strength. There is a general debate over party identification (PID) in the literature. Some scholars claim that party identification strengthens with age. Others, including the author, suggest that voting for a party brings about loyalty and strengthens political attachment. A straightforward but naive empirical strategy would be to estimate the effect of having voted in one election on the strength of party identification a couple of years further down the line. However, both PID and vote choices are predicted by similar confounders - such research design would inevitably face the problem of unobserved covariates.
To address the research question without uing such naive design, the author takes advantage of a comprehensive panel dataset. The original data include four waves - 1965/1973/1982/1997 -, of which the author uses two (1965 and 1973). In a smart move, Dinas then makes use of the timing of elections and the characteristics of participants in the panel: Elections took place in 1968 and in 1972. To be able to use the effect of voting, Dinas exploited the age of respondents. Importantly, respondents who were born in 1947 (76% of the sample) share a very important characteristic. What is it? They turned 21 - which was the voting age at the time - in 1968. Those who turned 21 before election day were able to vote in 1968, those who did not were only eligible to vote in 1972. This allows the author to exploit respondents’ birthdays - which are random - to causally estimate the effect of voting on the strength of party identification. Obviously, however, that not everyone who was eligible to vote in 1968 did vote.
Besides various covariates, we will be using the following key variables:
Variable | Description |
---|---|
eligible68 | Dummy for eligibility to vote in 1968 election (Instrument). |
voted68 | Dummy indicating whether participant voted in 1968 election (Treatment) |
strngpid73 | Strength of party identification in 1973 on an ordinal scale (Outcome) |
knowledge65 | Political knowledge in 1965 |
strngpid65 | Strength of party identification in 1965 |
elig2false | Dummy variable for placebo tests: 0 for young eligible and 1 for old eligible participants |
v7 | Numerical code for school (which we’ll use for clustering SEs) |
Now let’s load the data. There are two ways to do this:
You can load the brands dataset from your laptop using the read.csv()
function.
# Set your working directory
#setwd("~/Desktop/Causal Inference/2022/Lab6")
#
library(haven)
#dinas <- read.csv("~/dinas.csv")
Or you can download the data from the course website from following url: https://dpir-ci.github.io/CI22/data/dinas.csv.
Exercise 1: Use the head()
function to familiarise yourself with the data set.
Reveal Answer
head(dinas)
v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 m12_1 m12_2
1 7779 Feb-91 student student 2 421 3131 9 524750426 90 7 82 11
2 7779 Feb-91 student student 3 422 3131 9 524750426 65 7 91 19
3 7779 Feb-91 student student 5 424 3131 10 524750426 75 7 84
4 7779 Feb-91 student student 6 425 3131 10 524750426 70 7 social s 11
5 7779 Feb-91 student student 8 427 3131 10 524750426 80 7 85 94
6 7779 Feb-91 student student 10 725 5371 9 987875681 100 13 91
m12_3 m13_1 m13_2 m13_3 v14 m15_1 m15_2 m15_3 m16_1 m16_2
1 20 83 NA NA liked it
2 83 NA NA liked it better - homework
3 83 NA NA liked it learning to do it some -th
4 85 NA NA liked it feeling homework
5 82 NA NA liked it being wi like -so
6 83 NA NA liked it learning learning
m16_3 v17 v18 m19_1 m19_2 m19_3 m20_1 m20_2 m20_3
1 no better teenage changed get good understa
2 no better adults d athlete,
3 dislike no about th teenager teens mo don-t be going to
4 no about th judge al parties don-t be good per
5 no better changed parties dating,g get good
6 no better authorit athlete, good com
m20_4 m21_1 m21_2 m21_3 m21_4 v22 v23 v24 v25
1 insincer several no,not m
2 very sma not goin isn-t at no leadi
3 extreme those no too good several yes,memb crowd 1
4 being fr dull - p no leadi
5 other very sma positive one main yes,memb
6 don-t be doesn-t not intr several yes,memb crowd 1
m26_1 m26_2 m26_3 m27_1 m27_2 m28_1 m28_2 m29_1 m29_2 m30_1 m30_2 v31
1 NA NA NA 5 NA 55 60 NA NA 62 NA some tre
2 NA NA NA NA NA NA NA NA NA NA NA some tre
3 NA NA NA 19 NA 12 1 51 NA 18 NA some tre
4 NA NA NA NA NA NA NA NA NA NA NA treat ev
5 5 17 51 NA NA NA NA NA NA NA NA some tre
6 NA NA NA 58 NA 4 NA 60 NA 62 NA treat ev
v32 v33 v34 v35 v36 v37 v38 v39 m40_1 m40_2
1 no no a good d no no,extra yes,most shows in obligati
2 no no a good d no no,extra yes,most voting -
3 yes yes,help no,didn- no a good d yes yes,extr yes,most voting i
4 no no some no yes,extr yes,most to elect voting d
5 no no a good d no no,extra yes,most choose p
6 no no a good d yes no,extra yes,most to choos
v41 v42 v43 v44 v45 v46 v47 v48 v49 v50 v51 m52_1 m52_2 m52_3
1 yes, bot both no no no no 4 4 6 semest 21 NA NA
2 no no yes yes no no 2 2 4 semest 11 71 NA
3 no yes no no no no 2 2 4 semest 22 24 21
4 yes,outs successf yes no no no 2 2 2 semest 71 85 NA
5 no yes yes yes no no 4 4 6 semest 14 22 NA
6 no yes no no yes no 3 3 6 semest 11 43 NA
m52_4 m53_1 m53_2 m53_3 m54_1 m54_2 m54_3 m54_4 v55 v56 v57
1 NA 42 32 NA 61 20 23 40 disagree agree agree
2 NA 70 NA NA 22 13 16 NA disagree agree agree
3 NA 34 NA NA 23 14 29 43 disagree depends, agree
4 NA 32 NA NA 42 19 21 NA disagree agree agree
5 NA 30 NA NA 22 20 30 10 disagree agree agree
6 NA 40 49 NA 14 NA NA NA agree agree disagree
v58 v59 v60 m61_1 m61_2 m61_3 v62 v63 m64_1 m64_2
1 disagree disagree yes 20 11 60 some yes,2 - 342 212
2 disagree disagree yes 30 11 20 a good d yes,2 - 342 NA
3 disagree agree yes 11 82 NA some yes,almo 342 212
4 agree depends, no NA NA NA yes,2 - 343 NA
5 disagree agree yes 11 20 82 some yes,few 343 NA
6 agree disagree yes 30 11 NA a good d yes,almo 17 NA
v65 v66 v67 v68 v69 v70 v71 v72 v73
1 no yes,2 - other ki by mysel yes,read
2 yes,almo mainly n by mysel yes,3 - other ki with fam no yes,read
3 yes,almo mainly n by mysel yes,2 - mainly n with fam yes yes,read
4 yes,almo other ki by mysel no no,do no
5 yes,2 - mainly n by mysel no yes,read
6 yes,almo mainly n with fam yes yes,2 - mainly n with fam yes yes,read
m74_1 m74_2 m74_3 m74_4 v75 v76 v77 v78 m79_1
1 time newsweek saturday newspape yes,seve yes,seve yes,few civil ri
2 life look other radio yes,seve yes,few yes,once r only s
3 life radio yes,seve yes,seve yes,few civil ri
4 radio no no no
5 look look saturday magazine yes,few yes,few no cuba. c
6 newsweek televisi yes,few yes,seve no nuclear
m79_2 m79_3 v80 m81_1 m81_2 v82 m83_1 m83_2 v84
1 medicare congress national well-qua more acc state go oth resp strong d
2 national know mor well-qua local go know les not very
3 space. s national other strong d
4 national well-qua approve local go disappro not very
5 viet nam civil ri national know mor not very
6 demonstr civil ri state go efficien local go poorly-q yes, dem
m85_1 m85_2 m85_3 m85_4 v86 m87_1 m87_2 m87_3 m87_4 v88 v89
1 709 705 206 NA reps lot NA NA 205 NA johnson democrat
2 402 NA 305 NA reps lit NA NA 805 819 johnson about ha
3 NA NA 119 719 NA NA NA NA johnson democrat
4 NA NA NA NA reps lit NA NA 200 NA johnson republic
5 NA NA NA NA dems lit NA NA NA NA goldwate democrat
6 606 NA 606 NA reps mor NA NA 200 NA johnson democrat
v90 v91 v92 v93 v94 v95 v96 v97 v98
1 good dea not very not much most of know wha for bene 1 pty al yes yes, bot
2 good dea not very some most of don-t kn oth,depe control no yes, bot
3 good dea not very some about al know wha for bene control yes, bot
4 not much not very about al know wha for bene control yes, bot
5 not much not very some most of don-t kn few big control yes yes, bot
6 good dea hardly a some some of don-t kn for bene 1 pty al yes yes, bot
v99 v100 v101 v102 v103 v104 v105 v106
1 yes,live mother m each par each par pretty m not so w no -skip
2 yes,live father m parents each par pretty m about av no -skip
3 yes,live parents each par each par pretty m extremel yes
4 yes,live father m father m each par pretty m extremel no -skip
5 no,doesn mother s mother m mother m each par pretty m about av no -skip
6 yes,live parents father m parents disagree extremel yes
v107 m108_1 m108_2 v109 m110_1 m110_2 v111 v112
1 about sa pretty c somewhat
2 better - more ind pretty c very muc
3 disagree further worse -i more mat oth refe very clo very muc
4 worse -i decrease very clo very muc
5 worse -i oth chan understa pretty c very muc
6 disagree automobi worse -i increase very clo somewhat
v113 v114 v115 v116 v117 v118 v119 v120 v121
1 strong d voted fo very clo somewhat strong d voted fo much inf better n some
2 strong r voted fo pretty c very muc not very voted fo much inf feel fre
3 strong d voted fo very clo somewhat strong d voted fo much inf feel fre a lot
4 strong r voted fo pretty c somewhat strong d voted fo much inf feel fre some
5 not very voted fo very clo very muc strong r voted fo much inf feel fre a lot
6 strong d voted fo very clo somewhat not very voted fo some inf feel fre some
v122 v123 v124 v125 v126 v127 v128 v129 v130 v131 v132 v133
1 never about av about ri no 70 30 99 30 99 85 99
2 once in a lot to about ri no 30 85 50 85 40 85 99
3 once in pretty m about ri yes parent d 85 85 85 70 85 85 85
4 once in pretty m about ri no 50 40 50 60 50 50 50
5 once in about av about ri no 50 85 70 70 70 85 99
6 once in pretty m about ri no 40 60 50 85 50 70 50
v134 v135 v136 v137 v138 v139 v140 v141 v142 v143 v144
1 85 #NAME? #NAME? no #NAME? yes most of internat national local af
2 40 #NAME? #NAME? no #NAME? yes some of internat national state af
3 85 #NAME? #NAME? yes #NAME? yes most of national internat state af
4 50 #NAME? no #NAME? yes some of national local af state af
5 40 #NAME? #NAME? yes #NAME? yes some of internat local af national
6 85 #NAME? #NAME? yes #NAME? yes some of internat national local af
v145 v146 v147 v148 v149 v150 v151 v152
1 state af very act pretty s mostly g to chang often gi strong o depends,
2 local af somewhat pretty s depends, depends, depends, middle o hard to
3 local af somewhat pretty s mostly g depends, depends, strong o hard to
4 internat somewhat pretty s mostly g things w depends, middle o depends,
5 state af somewhat pretty s mostly g things w often gi strong o hard to
6 state af somewhat pretty s mostly g things w often gi middle o hard to
v153 v154 v155 v156 v157 v158 v159 v160 v161
1 most peo try to b would tr six yugoslav 9 correct germany democrat
2 most peo other. d would tr six yugoslav 9 correct germany democrat
3 most peo try to b would tr four yugoslav 9 correct germany republic
4 most peo just loo would tr four yugoslav 10 correct germany democrat
5 most peo just loo would tr four don-t kn don-t kn don-t kn germany don-t kn
6 most peo try to b would tr six any oth 9 correct germany democrat
v162 v163 v164 v165 v166 v167 v168 v169 v170
1 plan to 4 - 5 yr private, coeducat 500 - 99 716 4 -or fi private, coeducat
2 plan to 4 - 5 yr private, coeducat 2000 - 3 85 4 -or fi private, coeducat
3 plan to 4 - 5 yr public, coeducat 10,000 - 232 4 -or fi public, coeducat
4 plan to 4 - 5 yr private, coeducat 500 - 99 786 4 -or fi private, coeducat
5 plan to 4 - 5 yr private, coeducat 1000 - 1 213 4 -or fi private, coeducat
6 plan to NA 4 -or fi public, coeducat
v171 v172 v173 m174_1 m174_2 m174_3 m174_4 v175 v176 v177
1 500 - 99 716 2 named parents scholars work ear loan 9 72 3
2 2000 - 3 85 2 named parents 8 62 28
3 10,000 - 232 2 named parents work ear 8 65 9
4 500 - 99 786 2 named parents 4 19 NA
5 1000 - 1 213 2 named parents 8 64 8
6 1000 - 1 22 work ear parents 8 65 18
v178 v179 v180 v181 v182 v183 v184 v185 v186 v187
1 college a 8 211 worked f all from no no,don-t
2 college b 2 150 240 part fro yes yes,less never no - cod
3 college b 5 50 worked f part fro yes yes,less twice no - cod
4 college b work for 150 worked f all from yes yes,abou never no - cod
5 college c work for 75 didn-t w part fro yes yes,twic once - n no - cod
6 commerci b don-t wo NA 26 all from no no,don-t
v188 v189 v190 v191 v192 v193 v194 v195 v196 v197
1 16 yes,seve #NAME? #NAME? #NAME? #NAME? #NAME? #NAME? #NAME?
2 11 no #NAME? #NAME? #NAME? #NAME? #NAME? #NAME? #NAME?
3 20 no #NAME? #NAME? #NAME? #NAME? #NAME? #NAME? #NAME? politica
4 10 no #NAME? #NAME? #NAME? #NAME? #NAME? #NAME? #NAME?
5 16 no #NAME? #NAME? #NAME? #NAME? #NAME? #NAME? #NAME? sports-r
6 4 no #NAME? #NAME? #NAME? #NAME? #NAME? #NAME? #NAME?
v198 v199 v200 v201 v202 v203 v204 v205 v206 v207
1 no,but c jewish few time bible wr 9 68 28 15 some col
2 no,but c protesta few time bible go 9 76 28 17 bachelor
3 #NAME? no,but c jewish few time bible wr 9 84 28 19 bachelor
4 no,but c methodis almost e bible wr 9 84 5 19 bachelor
5 #NAME? no,but c presbyte almost e bible wr 8 59 23 13 bachelor
6 no,can-t baptist almost e bible go 1 9 68 12 4 grades
v208 v209 v210 v211 v212 v213 v214 v215 v216 v217 v218 v219 v220
1 12 grade 3 7 1 NA NA NA 15 NA NA NA NA may
2 some col no broth NA NA NA NA NA NA NA NA NA NA april
3 h.s. -na 2 5 NA NA NA NA 14 NA NA NA NA may
4 12 grade 1 20 NA NA NA NA NA NA NA NA NA december
5 12 grade 1 NA NA NA NA NA 19 NA NA NA NA april
6 3 grades 5 15 12 11 NA NA 13 10 NA NA NA june
v221 v222 v223 v224 v225 v226 v227 v228 v229 v230 v231 v232
1 4 1947 155 151 13 52 suburbs, suburban yes,r-s female white
2 29 1947 113 152 yes,r-s male white
3 5 1947 114 114 14 14 yes,r-s female white
4 NA 1947 152 152 9 yes,r-s male white
5 3 1947 316 152 less tha yes,r-s female white
6 5 1947 142 142 all my l yes,r-s male negro
v233 v234 v235 v236 v237 v238 v239 v240 v241 v242
1 yes,neat yes,good yes,self yes,expr yes,coop #NAME?
2 yes,neat yes,self yes,didn high -i.
3 yes,neat yes,expr #NAME?
4 yes,expr descript yes,coop #NAME?
5 yes,neat yes,not yes,didn descript yes,coop #NAME?
6 #NAME?
v243 v244 v245 v246 v247 v248 v249 v250 v251 v252
1 no cours high -2. accurate high -di high per 6 correc NA
2 no cours low -0. accurate high -di 6 correc NA
3 no cours -2.2.0.0 no diffe med -agr high per 4 correc NA
4 no cours -2.2.0.0 accurate low -agr high sel 4 correc NA
5 no cours -2.2.0.0 no defin med -agr high sel 1 correc NA
6 one cour -2.2.0.0 accurate med -agr high sel high per 5 correc NA
v253 v254 v255 v256 v257 v258 v259 v260 v261 v262 v263 v264
1 NA 70 30 97-100 d 30 97-100 d 85 97-100 d 85 2 mail -st 30032
2 NA 30 85 50 85 40 85 97-100 d 40 3 student 3171
3 NA 85 85 85 70 85 85 85 85 5 student 4087
4 NA 50 40 50 60 50 50 50 50 6 student 3172
5 NA 50 85 70 70 70 85 97-100 d 40 8 mail -st 30041
6 NA 40 60 50 85 50 70 50 85 10 student 3992
v265 v266 v267 v268 v269 v270 v271 v272 v273 v274 v275
1 NA NA vermont 30,000-4 4 female r NA
2 313 january 16 110 9873 maryland 30,000-4 3 male r 25 male roo
3 487 april 27 110 4077 florida rural -u 14 female r 25 husband
4 313 january 15 110 9873 maryland 50,000-9 3 male r 25 father
5 NA NA texas 10,000-2 5 female r NA child,se
6 425 april 10 120 6936 arkansas 100,000- 7 male r 25 male roo
v276 v277 v278 v279 v280 v281 v282 v283 v307 v308 v309
1 a lot some of don-t kn
2 23 male roo 23 a lot some of know wha
3 27 a lot most of know wha
4 58 mother 52 oth male 28 some some of know wha
5 2 some most of know wha
6 22 a lot some of don-t kn
v310 v311 v312 v331 v332 v333 v334 v335 v336
1 few big yes,2 - 212
2 few big yes,almo 342
3 few big yes,3 - 269
4 for bene yes,2 - 343
5 for bene yes,2 - 529 govt shl govt shl govt shl govt shl
6 few big yes,almo 17 govt shl govt shl govt shl protect stop cri
v337 v338 v339 v340 v341 v342 v343 v344 v345 v346
1 govt shl govt shl
2
3 govt shl govt shl
4
5 mnrty gr
6 protect protect stop cri govt shl govt shl mnrty gr govt shl
v347 v348 v349 v350 v351 v352 v353 v354 v355
1 make use
2 make use
3
4 set pena
5 make use set pena
6 govt shl govt shl govt shl mnrty gr make use make use make use
v356 v357 v358 v359 v360 v361 v362 v363 v364
1 yes,at l women me
2 yes,once
3 yes,at l bus to a bus to a women me
4 yes, les women me
5 no keep chi keep chi keep chi
6 yes, les bus to a bus to a keep chi bus to a bus to a keep chi women-s
v365 v366 v367 v368 v369 v370 v371 v372 v373
1
2 court sy
3 women me people s
4 women me women me less cen
5
6 women-s women me women-s women-s women-s change f shld spe
v374 v375 v376 v377 v378 v379 v380 v381 v382
1 extremel slightly slightly
2 liberal moderate conserva
3 liberal liberal conserva
4 slightly slightly slightly
5 change f conserva slightly slightly
6 change f no chang change f change f no chang moderate liberal slightly
v383 v384 v385 v386 v387 v388 v389 v390 v391 v392
1 liberal slightly slightly liberal moderate #NAME? #NAME? #NAME?
2 slightly moderate conserva liberal slightly #NAME? yes #NAME?
3 liberal liberal conserva slightly slightly #NAME? #NAME? yes #NAME?
4 liberal liberal moderate liberal moderate #NAME? no #NAME?
5 slightly extremel slightly liberal conserva #NAME? #NAME? #NAME?
6 conserva liberal conserva moderate slightly #NAME? #NAME? no #NAME?
v393 v394 v395 v396 v397 v398 v399 v400 v401 v402
1 no,shld no yes less fai began to agree disagree
2 no,shld no could ha lost too yes became m less fai agree agree agree
3 no,shld no could ha yes became m ina.,cod disagree agree agree
4 no,shld no shld onl yes became o ina.,cod disagree agree disagree
5 yes,did no yes ina.,cod agree disagree
6 no,shld no shld bom sent few yes soldiers ina.,cod agree agree agree
v403 v404 v405 v406 v407 v408 v409 v410 v411
1 agree disagree disagree disagree agree agree disagree agree NA
2 agree agree disagree disagree disagree agree disagree agree 269
3 agree disagree agree disagree disagree agree disagree disagree 342
4 agree disagree agree disagree agree agree disagree agree 219
5 agree disagree agree disagree disagree agree disagree agree NA
6 agree agree disagree disagree disagree disagree disagree disagree 311
v412 v413 v414 v415 v416 v417 v418 v419 v420
1 NA NA pretty s mostly g things w
2 NA NA pretty s mostly g things w usually
3 244 NA pretty s mostly g things w have to
4 NA NA pretty s mostly g things w have to
5 NA NA best pos best pos best pos sometime bad luck to chang
6 NA NA best pos sometime bad luck to chang have to
v421 v422 v423 v424 v425 v426 v427 v428
1 can-t be try to b would tr
2 have to hard to can-t be try to b would ta 40 50 degre 50 degre
3 have to hard to can-t be just loo would ta 70 70 60
4 have to hard to can-t be try to b would tr 30 50 degre 50 degre
5 can-t be try to b would tr
6 have to change m can-t be just loo would ta 50 degre 40 50 degre
v429 v430 v431 v432 v433 v434 v435 v436
1
2 85 40 30 60 50 degre 30 15 50 degre
3 50 degre 40 40 85 70 40 30 70
4 60 40 50 degre 50 degre 97-100 d 50 degre 15 50 degre
5
6 97-100 d 50 degre <actual 50 degre <actual 85 <actual 50 degre
v437 v438 v439 v440 v441 v442 v443 v444 v445
1
2 50 degre 50 degre 50 degre 40 50 degre 15 70 <actual 85
3 70 50 degre 50 degre 40 40 30 70 30 70
4 50 degre 50 degre 50 degre 50 degre 50 degre 40 50 degre 35 40
5
6 50 degre 50 degre 60 <actual 60 <actual 15 <actual 70
v446 v447 v448 v449 v450 v451 v452 v453 v454
1
2 60 30 <actual 60 40 70 too much too litt just abo
3 60 70 40 50 degre 30 70 just abo too litt too litt
4 50 degre 60 40 85 40 50 degre too much too litt just abo
5
6 97-100 d 85 15 85 97-100 d too much too litt just abo
v455 v456 v457 v458 v459 v460 v461 v462
1
2 too much just abo too much too much too much just abo too litt too much
3 too much just abo too much too much too much too litt too litt just abo
4 just abo too litt just abo too litt too much just abo too litt just abo
5
6 too litt too litt too litt too much too much too litt just abo too litt
v463 v464 v465 v466 v467 v468 v469 v470
1
2 just abo too litt just abo just abo too much just abo just abo too much
3 just abo too litt too litt just abo too much too litt just abo too much
4 too litt too litt too litt too much just abo just abo too much too litt
5
6 just abo just abo too litt just abo too much too litt too litt too litt
v471 v472 v473 v474 v475 v476 v477 v478 v479
1
2 just abo just abo six yugoslav 9 correct germany democrat democrat
3 too litt too litt four yugoslav 9 correct germany democrat democrat
4 just abo just abo six yugoslav 9 correct germany democrat independ
5
6 too much too litt six yugoslav 9 correct germany republic independ
v480 v481 v482 v483 v484 v485 v486 v487 v488 v489
1 oth, min NA
2 not very weak dem yes dem; rep 70 democrat
3 strong strong d no, neve NA democrat
4 democrat ind-demo NA yes 72 democrat
5 ind-inde NA
6 neither ind-inde NA yes, dem 71 not sure
v490 v491 v492 v493 v494 v495 v496 v497 v498 v499 v500 v501 v502
1 NA yes reps mor NA NA NA lot more voted
2 yes 200 NA yes reps mor NA NA 201 305 lot more voted
3 yes 400 NA yes reps mor NA NA 201 NA lot more voted
4 no NA yes reps mor 104 705 811 NA little m voted
5 NA yes reps mor NA NA NA lot more voted
6 yes 953 NA yes reps mor 106 NA 206 NA little m did not
v503 v504 v505 v506 v507 v508 v509 v510 v511 v512
1 voted fo yes, vot
2 voted fo yes democrat differen mostly d yes yes, vot
3 voted fo yes differen mostly d yes yes, vot
4 voted fo yes not old
5 voted fo no, didn
6 would vo not regi no not old
v513 v514 v515 v516 v517 v518 v519 v520 v521 v522 v523 v524 v525
1 humphrey yes 1968 102 1972 101 yes NA
2 nixon yes 1972 101 all 3 ch no NA
3 humphrey yes 1968 102 all 3 ch yes 1970-197 131 NA
4 nixon no no NA
5 yes 1972 no NA
6 no no NA
v526 v527 v528 v529 v530 v531 v532 v533 v534 v535 v536 v537 v538 v539 v540
1 yes NA yes no NA NA
2 no NA yes 1972 101 no NA NA
3 no NA no no NA NA
4 no NA no yes 1972 170 1972 171
5 no NA yes no NA NA
6 no NA no no NA NA
v541 v542 v543 v544 v545 v546 v547 v548 v549 v550 v551 v552 v553
1 yes 1969 1972 no, neve NA yes
2 no, neve no, neve NA no, neve
3 yes 1972 445 no, neve NA no, neve
4 no, neve no, neve NA no, neve
5 no, neve no, neve NA no, neve
6 no, neve no, neve NA yes
v554 v555 v556 v557 v558 v559 v560 v561 v562
1 1967 peace;an 1969 peace;an yes 1965 private 1967
2 no, neve
3 no, neve
4 no, neve
5 yes 1972-197 ecology, 1972-197 other
6 1965 pro-civi no, neve
v563 v564 v565 v566 v567 v568 v569 v570 v571 v572
1 never ma NA
2 never ma NA
3 married 3 no NA democrat strong d yes, vot mcgovern
4 never ma NA
5 married 4 NA
6 never ma NA
v573 v574 v575 v576 v577 v578 v579 v580 v581 v582 v583 v584
1 NA NA NA NA NA
2 NA NA NA NA NA somewhat independ
3 yes,pret yes,occa 80 NA yes 80 NA NA somewhat democrat
4 NA NA NA NA NA somewhat democrat
5 NA NA NA NA NA
6 NA NA NA NA NA somewhat independ
v585 v586 v587 v588 v589 v590 v591 v592
1 not very strong d
2 yes, rep pretty c somewhat independ yes, rep
3 strong d yes, she mcgovern pretty c very muc democrat strong r 1 yes, h
4 not very yes, she nixon pretty c somewhat republic strong r 1 yes, h
5 no, neit no, neit
6 yes, dem pretty c somewhat independ yes, dem
v593 v594 v595 v596 v597 v598 v599 v600 v601 v602 v603
1 NA NA better NA NA
2 pretty c no NA NA same about th NA NA
3 mcgovern pretty c yes 31 NA worse -i increase better 54 NA
4 nixon very clo yes 44 NA same about th NA NA
5 NA NA worse NA NA
6 pretty c no NA NA better - members about th NA NA
v604 v605 v606 v607 v608 v609 v610 v611 v612 v613 v614 v615
1 NA NA no
2 NA r not li 1200 2 - 3 ti NA yes army
3 NA r not li 1000 2 - 3 ti 350 1000 once yr once yr no
4 NA r living NA yes air forc
5 NA NA no
6 NA r not li 80 once a m NA yes army
v616 v617 v618 v619 v620 v621 v622 v623 v624 v625 v626 v627
1 NA NA lived 7 NA
2 69 71 yes no very dis no lived 5 47
3 NA NA lived 2 43
4 70 73 no definite no very dis no lived 3 52
5 NA NA lived 6 46
6 67 71 yes yes 18 somewhat no lived 7 71
v628 v629 v630 v631 v632 v633 v634 v635 v636 v637
1 self rep four yea 71 self rep one year 12 other sm two year NA
2 other sm four yea 52 self rep one year 47 other sm one year 49 other sm
3 other sm five yea 43 non-smsa one year NA NA
4 non-smsa four yea 43 other sm one year 49 other sm one year NA
5 non-smsa one year 43 other sm one year 49 non-smsa one year 47 non-smsa
6 self rep two year NA 49 61
v638 v639 v640 v641 v642 v643 v644 v645 v646 v647
1 one year NA one year staying NA yes college,
2 three ye NA one year staying NA yes college,
3 NA staying NA yes college,
4 NA staying NA yes college,
5 one year 52 self-rep one year thinking NA yes college,
6 40 thinking 21 self-rep no
v648 v649 v650 v651 v652 v653 v654 v655 v656 v657
1 r attend NA NA yes bachelor NA
2 r attend 5052518 142316 yes bachelor master-s NA six no,r not
3 r attend 2322518 NA yes bachelor NA 4 -throu no,r not
4 r attend 7862112 6782515 yes bachelor NA five yes,r co
5 r attend NA NA no NA
6 r did no NA NA NA
v658 v659 v660 v661 v662 v663 v664 v665 v666 v667 v668
1
2 business no speci b -+,-- dormitor 1 4 4 2 1 parents
3 sociolog english; c -+,-- apartmen none none none 1 12 parents
4 psycholo no speci b -+,-- dormitor none none 4 30 2 parents fellowsh
5
6
v669 v670 v671 v672 v673 v674 v675 v676 v677 v678 v679
1 NA NA NA
2 no somewhat NA NA NA
3 yes my moral somewhat 12 yes college spouse d 2322518 NA yes
4 no very sat NA NA NA
5 12 yes college spouse d NA NA yes
6 NA NA NA
v680 v681 v682 v683 v684 v685 v686 v687 v688 v689 v690 v691
1 wkg now NA 58 18 NA
2 wkg now NA 18 16 someone NA
3 bachelor 4 -throu wkg now NA 48 19 someone NA
4 wkg now NA 58 19 someone NA
5 bachelor housewif NA NA NA NA
6 wkg now NA 239 13 someone NA
v692 v693 v694 v695 v696 v697 v698 v699 v700 v701 v702 v703
1 NA NA NA r has no
2 48 one year yes NA NA NA r has no
3 40 3 NA NA NA r has sp working
4 40 one year no NA NA NA student r has no
5 NA NA NA r has sp working
6 40 one year no NA NA NA r has no
v704 v705 v706 v707 v708 v709 v710 v711 v712 v713 v714 v715 v716 v717 v718
1 NA NA NA NA NA NA
2 NA NA NA NA NA NA
3 NA 13 someone NA 40 5 NA NA
4 NA NA NA NA NA 19
5 NA 64 NA NA NA NA
6 NA NA NA NA NA NA
v719 v720 v721 v722 v723 v724 v725 v726 v727 v728 v729 v730 v731
1 NA NA NA NA
2 NA NA NA NA no not a me not a me
3 NA NA NA NA no fairly a not a me
4 one year NA NA NA NA no not a me not a me
5 NA NA NA NA
6 NA NA NA NA no not a me not a me
v732 v733 v734 v735 v736 v737 v738 v739
1
2 not a me fairly a not a me not a me not a me not a me not a me belongs
3 not a me not a me not a me not a me not a me not a me not a me belongs
4 not a me not a me not a me not a me not a me not a me not a me belongs
5
6 not a me not a me not a me not a me not a me not a me not a me belongs
v740 v741 v742 v743 v744 v745 v746 v747 v748 v749
1 jewish never
2 country fairly a $25,000 $7,000 t rent -or no prefe
3 $15,000 $7,000 t own -or jewish few time
4 $35,000 $10,000 living h one methodis few time
5 presbyte few time
6 $3,000 t $3,000 t rent -or oth prot every we
v750 v751 v752 v753 v754 v755 v756 v757 v758
1 bible go female white high -di high sel
2 bible go no, midd male white med -agr low -agr high sel
3 yes,midd female white low -agr high -di high sel low pers
4 bible wr yes,midd male white med -agr high -di high sel
5 bible wr female white med -agr low self
6 bible go yes,wkg male black med -agr low -agr low opin low self low pers
v759 v760 v761 v762 v763 v764 v765 v766 v767 v768 v769
1 low poli #NAME? NA NA bachelor 99
2 6 correc most cos broad un NA NA bachelor 40 50
3 5 correc #NAME? broad un NA NA bachelor 70 70
4 6 correc #NAME? broad un NA NA bachelor 30 50
5 #NAME? NA NA some col 99
6 low poli 5 correc #NAME? broad un NA NA no colle 50 40
v770 v771 v772 v773 v774 v775 eligible68 pid65 pid73 nixon72 mcgovern repelig
1 1 0 NA 0 1 0
2 50 40 50 50 50 50 1 5 1 0 1 0
3 60 40 70 70 50 50 1 0 0 0 1 0
4 50 40 50 50 50 50 0 5 2 1 0 0
5 1 5 3 1 0 1
6 50 50 50 50 50 60 1 2 3 0 0 0
demelig black strngpid65 strngpid73 rep65 voted68 dem65 dem73 rep73 polintf
1 1 0 3 NA 0 1 1 NA NA 1
2 1 0 2 2 1 1 0 1 0 0
3 1 0 3 3 0 1 1 1 0 0
4 0 0 2 1 1 0 0 1 0 0
5 0 0 2 0 1 0 0 0 0 0
6 0 1 1 0 0 0 1 0 0 1
polintm pidm pidf col1 col2 col3 col4 gn1 vtf1 vtm1 plf1 plf2 plf3 plm1 plm2
1 1 0 0 0 1 0 0 0 1 1 0 1 0 0 1
2 0 5 6 0 1 0 0 1 1 1 1 0 0 1 0
3 1 0 0 0 1 0 0 0 1 1 1 0 0 0 1
4 1 0 6 0 1 0 0 1 0 1 1 0 0 0 1
5 0 6 5 0 1 0 0 0 0 1 1 0 0 1 0
6 1 1 0 0 0 1 0 1 1 1 0 1 0 0 1
plm3 pm1 pm2 pm3 pm4 pm5 pm6 pf1 pf2 pf3 pf4 pf5 pf6 strngonly73 elig2false
1 0 1 0 0 0 0 0 1 0 0 0 0 0 NA 1
2 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1
3 0 1 0 0 0 0 0 1 0 0 0 0 0 1 1
4 0 1 0 0 0 0 0 0 0 0 0 0 0 0 NA
5 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1
6 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0
newnixon68 newhumphrey68 newrepelig68 newdemelig68 eldem68 elrep68 hum68dem65
1 0 1 0 1 1 0 1
2 1 0 1 0 0 1 0
3 0 1 0 1 1 0 1
4 1 0 0 0 0 0 0
5 0 0 0 0 0 1 0
6 0 0 0 0 1 0 0
hum68rep65 newhum68dem65 newhum68rep65 newnx68dem65 newnx68rep65 nx68dem65
1 0 1 0 0 0 0
2 0 0 0 0 1 0
3 0 1 0 0 0 0
4 0 0 0 0 0 0
5 0 0 0 0 0 0
6 0 0 0 0 0 0
nx68rep65 consistent incons consel inconsel elhumix72 newnixix72 newhumix72
1 0 1 0 1 0 0 0 0
2 1 1 0 1 0 0 0 0
3 0 1 0 1 0 0 0 0
4 1 1 0 0 0 0 1 0
5 0 0 0 0 0 1 0 0
6 0 0 0 0 0 0 0 0
newrepelig68nix72 newdemelig68nix72 elhummc72 newnixmc72 newhummc72
1 0 0 1 0 1
2 0 0 1 1 0
3 0 0 1 0 1
4 0 0 0 0 0
5 0 0 0 0 0
6 0 0 0 0 0
newrepelig68mc72 newdemelig68mc72 v314 v323 knowledge65 instr
1 0 1 yes,2 - most of 7 0.5987159
2 1 0 yes,2 - most of 7 0.5987159
3 0 1 yes,almo some of 5 0.5987159
4 0 0 no some of 5 0.1592357
5 0 0 yes,2 - only now 2 0.5987159
6 0 0 yes,2 - most of 6 0.5987159
This looks wild. The data set includes various variables which are not labeled really well. For now, let’s focus on the key variables presented above.
Exercise 2: Regress the outcome (strngpid73
) on the treatment (voted68
) using lm()
. Does the OLS provide a causal estimate?
Reveal Answer
ols <- lm(strngpid73 ~ voted68, data= dinas)
summary(ols)
Call:
lm(formula = strngpid73 ~ voted68, data = dinas)
Residuals:
Min 1Q Median 3Q Max
-1.6187 -0.6187 0.3813 0.5741 1.5741
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.42593 0.04897 29.117 < 2e-16 ***
voted68 0.19276 0.06847 2.815 0.00499 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.9521 on 772 degrees of freedom
(16 observations deleted due to missingness)
Multiple R-squared: 0.01016, Adjusted R-squared: 0.008881
F-statistic: 7.927 on 1 and 772 DF, p-value: 0.004995
The naive OLS provides an estimate of 0.19
, which means that having voted in the 1968 election is associated with an increase of party identification of 0.19
on the PID scale. However, this does not provide a causal estimate: As we know, there are various factors that are likely to affect both the outcome and treatment. All we can say based on the OLS is the size of the bivariate correlation between these two variables.
Let’s also look at the visual relationship between these two variables:
# Using ggplot
ggplot(dinas, aes(x=voted68, y=strngpid73)) +
geom_point()+
geom_smooth(method=lm) +
xlab("Voted in 1968") +
ylab("PID Strength in 1973")
We can actually a slightly significant increase in PID strength for those who voted in 1968. The question is: Can we causally say that having voted is the reason for this?
IV Regression: 2SLS
We now know that a simple OLS doen’t provide any causal estimate. Let us now try to estimate the true treatment effect using an instrumental variable design. Following the author, we will be using the eligibility of respondents to vote in the 1968 election (eligible68
) as instrument: That is, we exploit the randomness of respondents’ birthdays that determine their eligibility to vote in 1968. To do so, let’s separately look at the first and second stage.
Exercise 3: Investigate the relationship between treatment (voted68
) and instrument (eligible68
)
There are several ways to do this. Feel free to pick the option you deem most appropriate.
Reveal Answer
table(dinas$eligible68, dinas$voted68)
0 1
0 132 25
1 250 373
Let’s now calculate the first stage.
Exercise 4: Regress the treatment on the instrument and extract the predicted values
Note: Make sure to add the argument na.action=na.exclude
to your lm()
function in order to deal with missing values. You can use predicted_values <- predict(OLS_model)
to extract the predicted values.
Reveal Answer
# Calculating the first stage. Note `na.action=na.exclude` deals with NAs so we can use the predicted values for the second stage
first=lm(voted68~eligible68, data=dinas, na.action=na.exclude)
# Extracting predicted values
vote_pred=predict(first)
# Displaying regression output
summary(first)
Call:
lm(formula = voted68 ~ eligible68, data = dinas, na.action = na.exclude)
Residuals:
Min 1Q Median 3Q Max
-0.5987 -0.5987 0.4013 0.4013 0.8408
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.15924 0.03738 4.26 2.3e-05 ***
eligible68 0.43948 0.04183 10.51 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.4684 on 778 degrees of freedom
(10 observations deleted due to missingness)
Multiple R-squared: 0.1243, Adjusted R-squared: 0.1231
F-statistic: 110.4 on 1 and 778 DF, p-value: < 2.2e-16
We can see that the instrument (eligible68
) is indeed a strong and significant predictor of the treatment. That is what we hope for and expect. It’s also convincing to think that eligibility - i.e. respondents’ birthdays - is fully random.
Unfortunately, tThe first stage cannot tell you whether an instrument is appropriate. However, it can tell you something about inappropriate instruments. A common problem in IV designs are weak instruments. That is, if your instrument is only weakly correlated with the endogenous variable (i.e. the treatment), it is likely to render biased results. The F-Statistic of the first stage can be used to identify weak instruments. As a rule of thumb, your instrument is likely to be problematic if the F-Statistic of your first stage regression is below 10.
Going back to the regression output, we see that our F-Statistic here is about 110
- so nowhere near the conventional threshold. Our instrument is strongly correlated with the treatment as it should be - but note that this does not automatically mean that it necessarily is a valid instrument.
Let’s now proceed to test the exclusion restriction.
Exercise 5: Test the exclusion restriction for the instrument.
Hint: Show that the instrument affects the outcome only through the treatment.
Reveal Answer
If you have regressed the outcome on the instrument (and the treatment), this might help familiarise yourself with the data - but it does not provide a test of the exclusion restriction. In fact, it is impossible to statistically test the exclusion restriction. All we can do is rely on theory and build a convincing case for alternative effects not taking place. The problem with a regression of Y on Z (and D) is that we still cannot observe further confounders and account for their effects. We can’t know if their effect does not come into play in such a regression.
Let’s plot the relationship between the outcome and the instrument nonetheless. As stated above, we can’t tell whether the assumption holds, but we could find that the exclusion restriction is likely to be violated.
Exercise 6: Plot the relationship between the outcome and instrument.
There are several ways to do this. Feel free to pick the option you deem most appropriate.
Reveal Answer
# Using ggplot
ggplot(dinas, aes(x=eligible68, y=strngpid73)) +
geom_point()+
geom_smooth(method=lm) +
xlab("Elegibility in 1968") +
ylab("PID Strength in 1973")
This looks as expected. There is no clear and significant association between the two variables. Recall that eligibility itself should not affect party identification strength unless respondents have voted in 1968 as only voting should affect the outcome.
Let’s now return to our IV model by calculating the second stage of our 2SLS model.
Exercise 7: Regress the outcome on the predicted values from the first stage
Reveal Answer
# Calculating the first stage
second_wrongSE=lm(strngpid73~vote_pred, data=dinas)
# Displaying regression output
summary(second_wrongSE)
Call:
lm(formula = strngpid73 ~ vote_pred, data = dinas)
Residuals:
Min 1Q Median 3Q Max
-1.5525 -0.5525 0.4475 0.4475 1.5871
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.3623 0.1055 12.918 <2e-16 ***
vote_pred 0.3176 0.1952 1.627 0.104
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.9554 on 772 degrees of freedom
(16 observations deleted due to missingness)
Multiple R-squared: 0.003417, Adjusted R-squared: 0.002126
F-statistic: 2.647 on 1 and 772 DF, p-value: 0.1042
The second stage uses the predicted values for the treatment from the first stage. Calculating the second stage, the output indicates that - once we instrument for voting in 1968 - the decision to cast a vote in 1968 does not have a significant effect on party identification.
However, calculating the two stages separately we have not adjusted standard errors and measures of uncertainty. Accordingly, hypothesis testing is likely to provide false results if we rely on such biased measures.
2 IV Regression using 2SL2 in one step
There are several packages that we could use to retrieve a two-stage least squares instrumental variables estimator. Let’s now conduct 2SLS using the ivreg()
, iv_robust()
, iv_feols()
. See the below the syntax for each of these functions below:
Exercise 8: Conduct a two-stage least squares instrumental variable using strngpid73
as the outcome. voted68
as the endogenous predictor and eligible68
as the instrument. Use the ivreg()
, iv_robust()
functions. Store these models in a list (list()
) and report them using the modelsummary()
function. Interpret the results.
Variable | Description |
---|---|
O | Outcome variable |
E | Endogenous variable. |
I | Instrument variable. |
FE | Fixed Effect variable |
ivreg(O ~ E | I, data = data ) # ivreg package
iv_robust(O ~ E | I, data = data) # estimatr package
feols(O ~ E | FE | I, data = data) #
Reveal Answer
## ivreg ##
ivreg_model <- ivreg(strngpid73 ~ voted68 | eligible68, data = dinas)
ivreg_model_clustered <- cluster.vcov(ivreg_model, dinas$v7) #This restimates the model and uses clustered SEs.
iv_clustered <- coeftest(ivreg_model, ivreg_model_clustered)
## iv_robust ##
iv_robust_model <- iv_robust(strngpid73 ~ voted68 | eligible68, data = dinas, cluster = v7) # cluster by
ivmodels <- list(ivreg_model, iv_robust_model)
rows <- tribble(~term, ~ OLS1, ~OLS2,
'Covariates', 'No', 'No') # add one row reporting covariates
attr(rows, 'position') <- c(5) ### Change location accordingly
title <- 'Two-stage Least Squares Models' # add the title to your model
coeffs <- c('(Intercept)'= 'Intercept',
'voted68' = 'Voted') # rename coefficients
# regression table
modelsummary(ivmodels, estimate = "{estimate}{stars}",coef_map = coeffs, gof_omit = 'DF|se_type', add_rows = rows, title = title)
Model 1 | Model 2 | |
---|---|---|
Intercept | 1.362*** | 1.362*** |
(0.106) | (0.095) | |
Voted | 0.319 | 0.319+ |
(0.196) | (0.175) | |
Covariates | No | No |
Num.Obs. | 774 | 774 |
R2 | 0.006 | 0.006 |
R2 Adj. | 0.005 | 0.005 |
Std.Errors | by: v7 | |
statistic.endogeneity | ||
p.value.endogeneity | ||
statistic.weakinst | ||
p.value.weakinst | ||
statistic.overid | ||
p.value.overid |
We find that both functions generate the same results and standard errors. The Local Average Treatment Effect is 0.319. Remember in your assignments to explain with detail what the coefficient substantially means.
In 2SLS we can include covariates to capture the covariate-adjusted LATE. Let’s include some covariates to the 2SLS. We can also add additional instruments to our model.
Exercise 9: Use the ivreg()
function and include the following covariates: col1
and col2
. Use the same endogenous treatment variable voted68
. Include the following instruments col1
, col2
, eligible68
as instruments. Report the results of this estimation using the summary()
function. Include the arguments in the table below to the summary function. Report what is the F-Statistics for this specification. Are the instruments that we using strong or weak instruments?
Function/argument | Description |
---|---|
Summary() | Generic function to produce results summaries of fitting functions |
diagnostics | Set equal to TRUE it provides a number of diagnostic test. |
Reveal Answer
ivreg_covariates <- ivreg(strngpid73 ~ col1 + col2 + voted68 |
col1 + col2 + as.factor(knowledge65) + eligible68, data = dinas)
summary_ivreg <- summary(ivreg_covariates, diagnostics = TRUE)
summary_ivreg
Call:
ivreg(formula = strngpid73 ~ col1 + col2 + voted68 | col1 + col2 +
as.factor(knowledge65) + eligible68, data = dinas)
Residuals:
Min 1Q Median 3Q Max
-1.8142 -0.6389 0.1858 0.7256 1.8114
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.44974 0.09496 15.267 <2e-16 ***
col1 -0.26116 0.14418 -1.811 0.0705 .
col2 -0.17531 0.07732 -2.267 0.0236 *
voted68 0.36445 0.18310 1.990 0.0469 *
Diagnostic tests:
df1 df2 statistic p-value
Weak instruments 8 763 15.967 <2e-16 ***
Wu-Hausman 1 769 0.816 0.367
Sargan 7 NA 4.225 0.754
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.9527 on 770 degrees of freedom
Multiple R-Squared: 0.0115, Adjusted R-squared: 0.007647
Wald test: 2.81 on 3 and 770 DF, p-value: 0.03861
# Add clustered robust standard errors
ivreg_covariates_clustered <- cluster.vcov(ivreg_covariates, dinas$v7)
coeftest(ivreg_covariates, ivreg_covariates_clustered)
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.449743 0.083194 17.4260 < 2e-16 ***
col1 -0.261162 0.124099 -2.1045 0.03566 *
col2 -0.175310 0.073302 -2.3916 0.01701 *
voted68 0.364451 0.158050 2.3059 0.02138 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We observed that voting in 1968 has a positive and statistically significant effect on partisanship strength. Also from the summary function, we see several diagnostic tests generated once we set diagnostic argument equal to TRUE.
If the are more instruments than causal parameters the model is overidentified. If there are as many instruments as causal parameters, the model is just identified. However, if we include more instruments, it is harder to meet the exclusion restriction. One test that we can conduct is the Sagan-Hausman test. This test compares the overidentified model versus a model with a subset of instruments, and how they differ in their sampling variation. In our case, the Sargan test is not significant. The null hypothesis here is that all instruments are valid.
The weak instruments test means that the instrument has a low correlation with the endogenous explanatory variable, which support the assumption of independence that means the instrument doesn’t affect the outcome directly. The Wu-Hausman test performs an efficiency test that reports whether the IV estimation is just as consistent as OLS. Therefore the null hypothesis is that OSL estimates are consistent. In this case, we can claim that our IV model is as good as OLS (which is more efficient).
We can obtain the Local Average Treatment Effect (LATE) by computing the difference of the conditional expectations of the outcome on the instrument (reduced form) divided by the difference of the conditional expectations of the treatment take-up on the instrument (first stage). Put it more simply, calculating the difference in the mean of the outcome between units assigned to the treatment minus those units not assigned to the treatment. Then, we divide this number by the difference in compliance rates.
Variable/Average | Description |
---|---|
Y | Outcome |
Z | Instrument |
D | Endogenous treatment |
Y[Z=1] | Average outcome conditional for units offered the treatment |
Y[Z=0] | Average outcome conditional for unit not offered the treatment |
D[Z=1] | Proportion of units receiving the treatment for those assigned to the treatment |
D[Z=0] | Proportion of units receiving the treatment for those not offered the treatment |
The Wald Estimator is then:
\[\tau=\frac{Y[Z=1]-Y[Z=0]}{D[Z=1]-D[Z=0]}\]
Exercise 10: Manually calculate the Wald Estimator. Use the mean(x, na.rm = T)
to calculate the means of each group. You can use the following syntax to obtain the conditional means.
mean(data$outcome[data$endongeous_variable == 1], na.rm = TRUE) # 1 for those that voted, 0 for those that didn't vote
mean(data$outcome[data$instrument == 1], na.rm = TRUE) # 1 for those that were eligible, 0 for those that were not eligible.
Reveal Answer
#Numerator
mean(dinas$strngpid73[dinas$eligible68==1], na.rm=T)
[1] 1.5488
mean(dinas$strngpid73[dinas$eligible68==0], na.rm=T)
[1] 1.407643
#Denominator
mean(dinas$voted68[dinas$eligible68==1], na.rm=T)
[1] 0.5987159
mean(dinas$voted68[dinas$eligible68==0], na.rm=T)
[1] 0.1592357
Then, \(\tau=\) is equal to:
(mean(dinas$strngpid73[dinas$eligible68==1], na.rm=T) - mean(dinas$strngpid73[dinas$eligible68==0], na.rm=T)) / (mean(dinas$voted68[dinas$eligible68==1], na.rm=T) - mean(dinas$voted68[dinas$eligible68==0], na.rm=T))
[1] 0.3211901
We see that the estimate of the Wald estimator is 0.32, which is pretty close to the estimate obtained from the ivreg()
function. In your assignments remember to state what the 0.32 means with much detail as possible.
How would you compute the Wald estimator for a binary endogenous variable and a binary instrument, but that includes covariates?
Reveal Hint 1
Remember that the beta coefficient of your variable of interest (let’s call it \(X_{1i}\)) and the control variable \(X_{2i}\) is equal to:
\[\beta_1 = \frac{Cov(Y_i, \tilde{X_{1i}})}{V(\tilde{X_{1i}})}\]
Reveal Hint 2
The 2SLS estimator is the ratio of the reduced form divided by the first stage, where \(\tilde{Z_i}\) is the residual from the regression of \(Z_i\) on the covariate(s). (The variances are the same, thus they cancel out).
\[\lambda_{\text{2SLS}} = \frac{Cov(Y_i, \tilde{Z_i})}{Cov(D_i, \tilde{Z_i})}\] Here we can use the cov()
function. You can see the arguments of this function below:
Function/argument | Description |
---|---|
cov(x, y) | Calculates the covariance between two variables x and y |
use | character indicating how missing values should be treated |
pairwise.complete.obs | Determines how the parameters of the covariance function are computed. More details below |
Setting use equal to pairwise.complete.obs it computes the mean and variance of x and y using all the non-missing observations separately. Then, the correlation between the two variables is calculated using only those observations that both variables have non-missing values.
tau_cov =cov(dinas$eligible68,dinas$strngpid73, use = "pairwise.complete.obs")/
cov(dinas$eligible68,dinas$voted68, use = "pairwise.complete.obs")
tau_cov
[1] 0.3205741
We know that using IV we can only estimate the Local Average Treatment Effect. This means that we are estimating the causal effect for one particular group of treated units, which are the compliers.
Exercise 11: Calculate the proportions of compliers, defiers, always-takers, and never takers. Give some labels to the variables, so we can easily identify each group. You can use the factor()
function. You can see a description of the syntax below. Give the following labels to the eligible68
variable: “Not eligible” and “Eligible”. For the voted68
variable “Not voted”, “Voted”. Finally, why do we impose the monotonicity assumption on IV?*
Function/argument | Description |
---|---|
factor | To encode a vector as a factor |
levels | An optional vector of the unique values |
labels | An optional character vector of labels for the levels |
data$variable = factor(data$variable, levels = c(1, 2, 3,..,5),
labels = c("One", "Two", "Three"..."Five"))
Reveal Answer
dinas$eligible68n=factor(dinas$eligible68,
levels=c(0,1),
labels=c( "Not Eligible", "Eligible"))
dinas$voted68n=factor(dinas$voted68,
levels=c(0,1),
labels=c("Not Voted","Voted"))
table(dinas$eligible68n, dinas$voted68n)
Not Voted Voted
Not Eligible 132 25
Eligible 250 373
From the table above, we can see that the number of respondents that were not eligible and didn’t vote is 132, this group is composed of never-takers and compliers. The 25 subjects are respondents that were not eligible and voted anyway. This group is comprised of always takers and defiers. The 250 are respondents that were eligible but didn’t vote anyway. This group is composed of never takers, plus defiers. Finally, we have 373 respondents that were indeed eligible and indeed voted. This group is composed of always takers and compliers.
By imposing the monotonicity assumption, we rule out the existence of defiers, thus this means that 25 respondents that were not eligible to vote and voted anyway are indeed always takers (25/373=0.06). Similarly, the 250 respondents are all never takers (250/(132+250) = 0.65). Finally, the proportion of compliers in the control group is 1-0.65 = 0.35 and in the treatment group are 1-0.06=0.94. If you remember from the Wald estimator, the proportion of compliers (in the denominator) was 0.59, which is the same as 0.94-0.35 = 0.59.
There are several diagnostics that we could conduct in order to the validity of an instrument. In particular, we can conduct what is called a placebo test. In this study to test whether the differences in partisan strength is driven by the age gap the author does the following: It splits all eligible voters into two groups the “young” eligibles and the “old” eligibles. The young voters are the ones that were born before May 1947 and the old voters are those that were born since June 1947. It is important to stress that both groups are eligible to vote. Then, the younger group is treated as non-eligible to vote in 1968.
Exercise 12: Conduct a placebo test Use the lm()
function and as the main outcome the partisans strength measured on 1973 strngpid73
and also in 1965. strngpid65
. Use the elig2false
as the placebo treatment variable. Remember to cluster the standard errors. How do you interpret this?
Reveal Answer
plac <- lm(strngpid73 ~ elig2false, data=dinas)
# Cluster standard errors
plac.vcovCL <- cluster.vcov(plac, dinas$v7)
coeftest(plac, plac.vcovCL)
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.571111 0.046317 33.9208 <2e-16 ***
elig2false -0.079683 0.085554 -0.9314 0.352
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plac2 <- lm(strngpid65 ~ eligible68, data=dinas)
plac2.vcovCL <- cluster.vcov(plac2, dinas$v7)
coeftest(plac2, plac2.vcovCL)
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.782609 0.067535 26.3953 <2e-16 ***
eligible68 -0.025852 0.081025 -0.3191 0.7498
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We see that there is no statistically significant difference between young and old eligible voters. Thus this evidence suggests that age would not be driving differences in partisanship strength.
Exercise 13: Think in potential ways that the exclusion restriction could be violated in this setting. Which other paths the instrument could affect the outcome apart from the endogenous treatment. We will discuss this at the end of the lab.
HOMEWORK (We will provide the answers next week)
- Should you include all non-endogenous covariates in the first state? Why yes or why not?
- What is the main identification assumption of instrumental variable estimation? How can you test it?
- Can you use more than one exogenous variable (multiple Zs) for one endogeneous (D)?
- What’s the difference between ITT and LATE from IV? Discuss w/ reference to compliers.
- What’s the forbidden regression? Why is it forbidden?