Lab 7: Sharp Regression Discontinuity
Week 7 HT 2022
Recap
Estimation
There are different ways that we could estimate the causal effect from RDD:
- Linear
- Linear with different slopes
- High-order polynomial (or non-linear)
- Non-parametric
- Local polynomial
Choice of Kernel function and Polynomial Order.
Whenever we conduct RD we choose the following:
- Polynomial order \(p\) function
- Kernel function \(K(\dot)\), which determines how observations within a bandwidth would be weighted, and there are different options: “Uniform”, “Triangular”, and “Epanechnikov”.
- Bandwidth size: But there is a trade-off between bias and efficiency. The closer you get to the cut-off, the less bias in the estimator, but more variance that there should be fewer observations.
- We conduct weighted least squares and obtain the intercept of the chosen polynomial order above and below the cut-off.
- Calculate the sharp point estimate by subtracting the intercepts of these polynomials: \(\hat{\tau_{SRD}} = \alpha_{above} - \alpha_{below}\)
Falsification tests:
Whenever we conduct RD we choose the following:
We learned that we can conduct multiple sensitivity and falsification tests:
- Sensitivity: To check that the results of our estimation are robust to different specifications
- Continuity: To check whether covariates do not jump at the threshold.
- Sorting: To check that units do not sort around the threshold.
- Placebo cut-offs: To check that outcomes do not change abruptly at an arbitrary threshold (to test the continuity assumption)
Before starting this seminar
Create a folder called “lab7”
Download the data (you can use the button or the one at the top, or read csv files directly from github):
Open an R script (or Markdown file) and save it in our “lab7” folder.
Set your working directory using the setwd() function or by clicking on “More“. For example setwd(“~/Desktop/Causal Inference/2022/Lab7”)
Let’s install an load packages that we will be using in this lab:
library(stargazer) # generate formated regression tables
library(texreg) # generate formatted regression tables
library(tidyverse) # to conduct some tidy operations
library(ggplot2)
#install.packages(c(rdrobust, rddensity))
library(tidyverse)
#install.packages("rdd")
library(rdd) # to conduct McCrary Sorting Test
library(rdrobust) # to conduct non-parametric rd estimates
library(rddensity) # to conduct a density test and density plots
Seminar Overview
In this seminar, we will cover the following topics:
1. Conduct regression discontinuity using global polynomial estimation using lm()
function
2. Calculate LATE using non-parametric estimations rdrobust()
.
3. Conduct the following robustness/falsification tests such as balance test, placebo outcome, density test, and falsification test.
Islamic Political Rule and Women’s Empowerment - Meyerson (2014)
In this paper, Meyerson is interested in the effects of Islamic parties’ control of local governments on women’s rights. He focuses on the educational attainment of young women. Meyerson conducts a Sharp RD design, based on close elections in Turkey. The challenge here is to compare municipalities where the support for Islamic parties is high and win the election, versus those that elected a secular mayor.
You would expect that municipalities controlled by Islamic parties would systematically differ from those that are controlled by a secular mayor. Particularly, if religious conservatism affects the educational outcomes of women. However, we can use RD to isolate the treatment effect of interest from all systematic differences between treated and untreated units.
We can compare municipalities the Islamic party barely won the election versus municipalities where the Islamic party barely lost. This reveals the causal (local) effect of Islamic party control on women’s educational attainment. One condition to meet in this setup is that parties cannot systematically manipulate the vote percentage they obtain.
The data used in this is from the 2014 mayoral election in Turkey. The unit of analysis is the municipality, and the running variable is the margin of victory. The outcome of interest is the educational attainment of women who were (potentially) in high school during 1994-2000, calculated as a percentage of the cohort of women aged 15 to 20 in 2000 who had completed high school by 2000.
We will be using the following variables:
Variable | Description |
---|---|
margin | This variable represents the margin of victory of the Islamic party’s in the 1994 election. If the margin is positive, it means that an Islamic party won |
school_men | secondary school completion rate for men aged between 15 and 20 |
school_women | the secondary school completion rate for women aged 15-20 |
log_pop | log of the municipality population in 1994 |
sex_ratio | gender ratio of the municipality in 1994 |
log_area | log of the municipality area in 1994 |
Now let’s load the data. There are two ways to do this:
You can load the brands dataset from your laptop using the read.csv()
function. You can call this data set educ
# Set your working directory
#setwd("~/Desktop/Causal Inference/2022/Lab7")
#
library(readr)
#educ <- read.csv("~/islamic_women.csv")
head(educ)
# A tibble: 6 × 6
margin school_men school_women log_pop sex_ratio log_area
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 -0.356 0.379 0.230 8.00 0.977 11.2
2 -0.543 0.309 0.254 7.36 1.12 10.6
3 -0.204 0.226 0.227 13.2 0.975 NA
4 -0.450 0.171 0.159 7.62 1.03 10.9
5 -0.201 0.173 0.182 7.65 1.18 11.5
6 -0.506 0.266 0.25 7.31 1.02 10.3
Let’s start by visualising the data. We will use plot()
function to do this. This is the simplest scatter plot that you came up with.
Exercise 1: Generate a plot using the plot(X,Y)
function. Replace X with educ$margin
and Y with educ$school_women
.
Reveal Answer
plot(educ$margin, educ$school_women)
This is a very simple plot that shows the raw relationship between the margin of victory and the outcome variable. However, it conveys some important information. For example, the margin of victory is clustered around -0.5 and roughly 0.3. Also, the outcome variable, school attainment, usually goes from 0 to 40%. Now let’s generate a slightly fancier plot.
Exercise 2: Generate a scatter plot using ggplot()
function. Use the functions below to add some additional features into this plot.
ggplot(aes(x = running variable, y = outcome, colour =outcome), data = data) +
# Make points small and semi-transparent since there are lots of them
geom_point(size = 0.5, alpha = 0.5, position = position_jitter(width = 0, height = 0.25, seed = 1234)) +
# Add vertical line
geom_vline(xintercept = 0) +
# Add labels
labs(x = "Label X", y = "Label Y") +
# Turn off the color legend, since it's redundant
guides(color = FALSE)
Reveal Answer
# Let's check if this is a sharp RD.
ggplot(educ, aes(x = margin, y = school_women, colour =school_women)) +
# Make points small and semi-transparent since there are lots of them
geom_point(size = 0.5, alpha = 0.5,
position = position_jitter(width = 0, height = 0.25, seed = 1234)) +
# Add vertical line
geom_vline(xintercept = 0) +
# Add labels
labs(x = "Vote share", y = "Womens' Educational Attainment (Proportion)") +
# Turn off the color legend, since it's redundant
guides(color = FALSE)
We can now see more clearly that most municipalities elected a secular mayor. Remember that we are using the margin of victory for the Islamic parties. This means that if the margin of victory is positive, it means that the Islamic parties won the election in that municipality.
Exercise 3: Let’s generate a dummy variable that is equal to “Treated” if the margin of victory is equal or greater than zero and “Untreated” otherwise. You can use the mutate()
function and the ifelse()
functions to do this. Remember to use the pipeline operator to store the variable into your existing data frame. See below the syntax for more information
Function/argument | Description |
---|---|
data <- data %>% | Pipeline operator to assign the new operation into a new data or existing data frame |
mutate(new variable = ifelse(variable >= "condition", "Treated", "Untreated") | If the condition is met, the new variable takes value equal to “Treated” and “Untreated” otherwise |
data <- data %>%
mutate(newvariable = ifelse(variable >= 0 , "Treated", "Untreated"))
Reveal Answer
educ <- educ %>%
mutate(treat = ifelse(margin >= 0 , "Treated", "Untreated"))
Now that we have created our treatment condition variable, let’s generate an additional plot that conveys information about the distribution of the running variable for both treated and untreated municipalities.
Exercise 4: Generate a plot looking at the distribution margin of victory variable. Using the ggplot()
function let’s set the x
argument equal to the margin of victory variable. Set fill
equal the new treatment variable treat
. There is no need to add the y
argument given that we expect to generate a histogram that will give us the number of observations for each value in the margin of victory variable. Add the geom_histogram()
function. Set the argument binwidth
in this function equal to 0.01 and set colour
argument equal to “dark”. Let’s add the geom_vline()
function and add a vertical line by setting the xintercept
argument equal to zero. Add the labs()
function and set x
label equal to “Margin of victory” and the y
label equal to “count”, and the fill
argument equal to “Treatment Status”.
Function/argument | Description |
---|---|
``gplot(aes(x = variable, fill = condition), data = data) | Map data components into the graph | | geom_histogram(binwidth = 0.01, color = “dark”)| To generate a histogram | | geom_vline(xintercept = 0)| To add a vertical line that goes through zero | | labs(x = “Label X”, y = “Label Y”, fill = “Label Legend”)``` | To add labels |
ggplot(educ, aes(x = running_variable, fill = treat_status_variable)) +
geom_histogram(binwidth = 0.01, color = "white") +
geom_vline(xintercept = 0) +
labs(x = "Running variable", y = "Count", fill = "Treatment Status")
Reveal Answer
ggplot(educ, aes(x = margin, fill = treat)) +
geom_histogram(binwidth = 0.01, color = "white") +
geom_vline(xintercept = 0) +
labs(x = "Margin of Victory", y = "Count", fill = "Treatment Status")
In this plot, we can see the distribution of the running variable (margin of victory). Again, we can see that in the majority of the municipalities a secular mayor was elected.
Global Parametric Estimation:
As we discussed in the lecture, there are different ways to estimate the causal effect using RD. These different approaches differ on how many observations and how to estimate the average outcome for those units just above the cut-off and below the cut-off. One way to estimate the effect of the intervention in RD is using OLS, but only using the running variable as the main predictor. In this case, we use the running variable measured as the distance from the cut-off. Stated formally: \(\tilde{X_l} = X - c\).
In this case, the regression in the left-hand side would be equal to:
\[Y= \alpha_l + \tilde{X} + \epsilon\] Whereas the regression above the cut-off is equal to:
\[Y= \alpha_r + \tilde{X} + \epsilon\] It’s important to point out that for all estimations the treatment effect is equal to the differences of the intercepts of the regressions above and below the cut-off.
\[\tau = \alpha_r - \alpha_l\]
In this case, we need to subset the data for those observations above and below the threshold. Then, regress the outcome on the running variable on . Finally, subtract the intercepts from each regression. Let’s do that in the following exercise.
Exercise 5: Run a regression using only margin
variable as the predictor. Set your outcome variable equal to school_women
variable. Set the data
argument equal to educ
(unless you called your data frame differently). Add the subset
function inside of the lm()
function and set it equal to margin >= 0
for the regression above the cut-off and margin < 0 for the regression below the cut-off
. Use the summary()
function to report your results, but also store the output this function into an object. You can retrieve the intercept from the object that you stored the output from the summary()
function this way:
object <- summary(lm(outcome ~ variable, data = data, subset = variable >= condition))
# intercept
object$coefficient[1]
Reveal Answer
# above
above <- summary(lm(school_women~margin, data = educ, subset = margin >= 0))
above$coefficients[1]
[1] 0.1564529
# above
below <- summary(lm(school_women~margin, data = educ, subset = margin < 0))
below$coefficients[1]
[1] 0.1620016
# 0.156453 - 0.162002
tau_rd = above$coefficients[1] - below$coefficients[1]
tau_rd
[1] -0.00554867
Based on this approach the intercepts of the two regressions yield the estimated value of the average outcome at the cut-off point for the treated and untreated units. This difference of intercept is the causal effect of whenever an Islamic party is in power (at the municipal level), there is a 0.5 percentage decrease in women’s educational attainment.
However, a more direct way of estimating the treatment effect is to run a pooled regression on both sides of the cut-off point, using the following specification: \(Y = \alpha + \tau D + \beta \tilde{X} + \epsilon\)
Where \(\tau\) is the coefficient of interest. Here again LATE is the difference between the two intercepts: \(\tau = \alpha_r -\alpha_l\). When \(D\) switches off and also we are also controlling the different values of the forcing variable, \(\tilde{X}\), we get the slope of the regression below the threshold. Conversely, for units above the cut-off, \(D\) switches on, and we control for different values of the forcing variable, we get the slope of the regression above the cut-off. The final step is to reweigh both coefficients from both regressions and obtain one single \(\tau\) coefficient.
However, you are constraining the slope of the regression lines to the same on both sides of the cut-off. (\(\beta_l = \beta_r\)) This strategy is not consistent with the idea of RD designs, as estimates are calculated approximating from each side to the cutoff.
Exercise 6: Calculate the effect of the intervention using a regression model including the margin
and treat
variables. Use thelm()
function to conduct this analysis. Store this regression into an object and call it global2
. Use the summary()
to inspect your results. Interpret the coefficient of interest.
Reveal Answer
# a more direct way is to run a pooled regression on both sides of the cut-off (constraining the slopes)
global2 <- lm(school_women~treat+margin, data = educ)
summary(global2)
Call:
lm(formula = school_women ~ treat + margin, data = educ)
Residuals:
Min 1Q Median 3Q Max
-0.17051 -0.07650 -0.00691 0.06527 0.51306
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.142358 0.005494 25.912 <2e-16 ***
treatUntreated 0.017347 0.007467 2.323 0.0202 *
margin -0.019321 0.010967 -1.762 0.0782 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.09543 on 2627 degrees of freedom
(574 observations deleted due to missingness)
Multiple R-squared: 0.008805, Adjusted R-squared: 0.00805
F-statistic: 11.67 on 2 and 2627 DF, p-value: 9.016e-06
Using this estimation approach, we obtain that, on average, the effect of a municipality in control of an Islamic party leads to a 1.7%-point increase in women’s educational attainment.
We can also allow the regression function to differ on both sides of the cut-off by including interaction terms between \(D\) and \(\tilde{X}\). This would be as follows:
\[Y = \alpha_l + \tau D + \beta_0\tilde{X} + \beta_1 (D \times\tilde{X})+ \epsilon\]
Let’s do that in the following exercise.
Exercise 7: Calculate the effect of the intervention using a regression model including the margin
and treat
variables and the interaction between these two variables. Use thelm()
function to conduct this analysis Store this regression into an object and call it global3
. Use the summary()
to inspect your results. Interpret the coefficient of interest.
Reveal Answer
# a more direct way is to run a pooled regression on both sides of the cut-off (constraining the slopes)
global3 <- lm(school_women~treat+margin + treat*margin, data = educ)
summary(global3)
Call:
lm(formula = school_women ~ treat + margin + treat * margin,
data = educ)
Residuals:
Min 1Q Median 3Q Max
-0.16896 -0.07655 -0.00723 0.06561 0.51348
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.156453 0.007439 21.030 < 2e-16 ***
treatUntreated 0.005549 0.008561 0.648 0.51697
margin -0.156501 0.050110 -3.123 0.00181 **
treatUntreated:margin 0.144062 0.051352 2.805 0.00506 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.09531 on 2626 degrees of freedom
(574 observations deleted due to missingness)
Multiple R-squared: 0.01177, Adjusted R-squared: 0.01064
F-statistic: 10.42 on 3 and 2626 DF, p-value: 8.144e-07
Here, we find that the coefficient of interest is again positive (0.005). This means that, on average, the effect of an Islamic party in control of the municipal government leads to an increase in women’s educational attainment.
The main limitation of these estimations is that we are relying upon observations that are far away from the cut-off. And the evidence against using a global polynomial approach is quite substantial. According to Cattaneo et al (2019), this estimation technique does not provide accurate point estimators and inference procedures with good statistical properties.
Exercise 7 (no coding required): Think about how the global polynomial approach weights each observation when it calculates the coefficient of interest?
Reveal Answer
OLS will estimate \(\tau\) based on all observations across the score. This means that the observation that is very far from the cut-off weights is equal to the one that is very close. In the worst-case scenario, if the observations are clustered far from the cut-off, the estimation of \(\tau\) would be heavily influenced by those values rather than those close.
We can also use high-order polynomial to retrieve LATE. However, the evidence against using high-order polynomial seems to be quite robust: High-order polynomials The issues with using high-order polynomials is that they leads to noisy estimates, they are sensitivity to the degree of the polynomial, and they have poor coverage of confidence intervals.
Exercise 8: Conduct third-order polynomial regression function. Include in this model the treat
and the margin
variables. Also, add the margin
variable raised at power 2 and then at the power of 3. We also need to include the I()
function or insulate function for the margin
variable that is raised at the power of 2 and 3. The I()
function insulates whatever is inside this function. It creates a new predictor that is the product of the margin variable by itself two and three times. Store this output in an object and call that object global4
. Then, use the summary()
to check the results of this specification. Interpret the results.
Reveal Answer
global4 <- lm(school_women~ treat + margin+I(margin^2)+ I(margin^3) , data = educ)
summary(global4)
Call:
lm(formula = school_women ~ treat + margin + I(margin^2) + I(margin^3),
data = educ)
Residuals:
Min 1Q Median 3Q Max
-0.17226 -0.07581 -0.00592 0.06335 0.50378
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.160107 0.006060 26.418 < 2e-16 ***
treatUntreated -0.021484 0.008893 -2.416 0.01577 *
margin -0.179024 0.023219 -7.710 1.77e-14 ***
I(margin^2) -0.134706 0.037469 -3.595 0.00033 ***
I(margin^3) 0.190585 0.046510 4.098 4.30e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.09434 on 2625 degrees of freedom
(574 observations deleted due to missingness)
Multiple R-squared: 0.03201, Adjusted R-squared: 0.03054
F-statistic: 21.7 on 4 and 2625 DF, p-value: < 2.2e-16
Using a high-order polynomial function, we find that womens’ educational attainment decreases, on average, roughly 2.1 per cent when an Islamic party is in power.
Finally for this section, what we can also do is use only use observations that are close to the cut-off (above and below). Let’s run the unconstrained model from Exercise 7, but only using observations that are 0.5 percentage points above and below the threshold.
Exercise 9: Run the same model used in Exercise 7
, but subset your data taking only observations that are above and below 0.5 points from the threshold. Store the results from this regression into an object and call it local
. Use the summary()
to check your results. If you don’t remember how to subset data in the lm()
function. See the syntax below
lm(outcome ~ variable1 + variable2 + variable1 * variable2, data=data,
subset=(running_variable>=-0.5 & running_variable<=0.5))
Reveal Answer
local <- lm(school_women ~ margin + treat + treat * margin, data=educ,
subset=(margin>=-0.5 & margin<=0.5))
summary(local)
Call:
lm(formula = school_women ~ margin + treat + treat * margin,
data = educ, subset = (margin >= -0.5 & margin <= 0.5))
Residuals:
Min 1Q Median 3Q Max
-0.18414 -0.07517 -0.00407 0.06421 0.50304
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.159465 0.007898 20.190 < 2e-16 ***
margin -0.190408 0.060455 -3.150 0.00166 **
treatUntreated -0.010577 0.009179 -1.152 0.24930
margin:treatUntreated 0.118165 0.062268 1.898 0.05787 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.09352 on 2174 degrees of freedom
(574 observations deleted due to missingness)
Multiple R-squared: 0.02567, Adjusted R-squared: 0.02432
F-statistic: 19.09 on 3 and 2174 DF, p-value: 3.224e-12
We can see that using a local polynomial function, we find again a negative coefficient. This suggest that being under the controlled of a Islamic party leads, on average, to a decrease by 1% in women’s educational attainment.
It is important to stress that modern RD empirical work employs local polynomial methods. In this case, we are estimating the average outcomes for treated and untreated units using observations that are near the cut-off. This approach tends to be more robust and less sensitive to boundary and overfitting problems. In local polynomial point estimation, we are still using linear regression, but within a specific bandwidth near the threshold. In the following section, we will look at how to use the approach using a non-parametric estimation strategy.
Non-Parametric Estimation
Let’s now estimate the LATE using a non-parametric estimator. Conveniently, we can easily do so by using the rdrobust
package. As the name indicates, the package allows us to estimate robust measurements of uncertainty such as standard errors and confidence intervals. It is based on theoretical and technical work by Calonico, Cattaneo and Titiunik. rdrobust
estimates robust bias-corrected confidence interval that address the problem of undersmoothing conventional confidence intervals face in RDDs. In other words, a small bias would be required for them to be valid, which might not be the case. Moreover, they also address the poor performance of (non-robust) bias-corrected confidence intervals.
As suggested by the authors and somewhat counter-intuitively, we therefore use the point estimate provided by the conventional estimator, but robust standard errors, confidence intervals and p-values to make statistical inferences.
The rdrobust
command has the following minimal syntax. You can use the same bandwidth or specify two different ones. We will work with further arguments later. Note that you do not need to specify if the running variable is centred on the cutoff.
robust_model = rdrobust(data$running_var, data$dependent_var,
c=[cutoff], h=[bandwidth])
Exercise 10: Estimate the LATE using rdrobust
with a bandwidth of 5% on either side of the cutoff. Interpret your result.
Reveal Answer
robust_5=rdrobust(educ$school_women, educ$margin, c=0, h=0.05)
summary(robust_5)
Call: rdrobust
Number of Obs. 2630
BW type Manual
Kernel Triangular
VCE method NN
Number of Obs. 2315 315
Eff. Number of Obs. 148 109
Order est. (p) 1 1
Order bias (q) 2 2
BW est. (h) 0.050 0.050
BW bias (b) 0.050 0.050
rho (h/b) 1.000 1.000
Unique Obs. 2313 315
=============================================================================
Method Coef. Std. Err. z P>|z| [ 95% C.I. ]
=============================================================================
Conventional 0.023 0.027 0.848 0.396 [-0.030 , 0.075]
Robust - - 0.774 0.439 [-0.049 , 0.112]
=============================================================================
Our point estimate is 0.023
. That is,a victory of an Islamic party would be associated with an increase in the rate of women who complete school by 2.3%
- however, the ´p-value and confidence intervals indicate that the estimate is not statistically significant. Therefore, based on this model we would conclude that winning an election does not have a an effect on women schooling.
A bandwidth of 5% seems about reasonable. But we should better check different ones, too. Let’s see what happens if we halve the bandwidth.
Exercise 11: Estimate the same model as before with a bandwidth of 2.5%. Report and interpret your results.
Reveal Answer
robust_25=rdrobust(educ$school_women, educ$margin, c=0, h=0.025)
summary(robust_25)
Call: rdrobust
Number of Obs. 2630
BW type Manual
Kernel Triangular
VCE method NN
Number of Obs. 2315 315
Eff. Number of Obs. 68 62
Order est. (p) 1 1
Order bias (q) 2 2
BW est. (h) 0.025 0.025
BW bias (b) 0.025 0.025
rho (h/b) 1.000 1.000
Unique Obs. 2313 315
=============================================================================
Method Coef. Std. Err. z P>|z| [ 95% C.I. ]
=============================================================================
Conventional 0.027 0.040 0.691 0.490 [-0.050 , 0.105]
Robust - - -0.333 0.739 [-0.145 , 0.103]
=============================================================================
Well, we can see that the number of observations used to estimate the effect has been reduced - which is reasonable and we should be aware of. The point estimate did not change much and, as before, we find that there is in fact no significant effect at the cutoff.
Bandwidths of 5% or 2.5% around the cutoff seem somewhat reasonable in this case - but so would several others. How can we know what bandwidth we should use to estimate our effect?
Recall the trade-off we are facing when choosing bandwidths that was discussed in the lecture: On the one hand we know that more narrow bandwidths are associated with less biased estimates - we rely on units that are indeed comparable with their distance in the running variable being as-if random the closer we get to the cut-off. On the other hand: the wider the bandwidths, the smaller the variance. As in several cases before, the structure of our data, such as the number of observations, plays an important role. Even if small bandwidths are desirable, it can be hard [impossible] to estimate a robust and significant effect if the number of observations around the cut-off is very small - even if there is a true effect.
Luckily, the rdrobust
package provides a remedy for this. The packages allows us to specify that we want to use bandwidths that are optimal given the data input. The rdrobust
command then picks the bandwidth that optimally reduces the mean square error - in other words, MSE-optimal bandwidths. Note that this is the default bandwidth if you don’t specify any. Let’s try to find out what this would be in our case.
Exercise 12: Estimate the LATE using rdrobust
and MSE-optimal bandwidths. To specify the model, replace the h
argument with bwselect="mserd"
.
Reveal Answer
robust_mserd=rdrobust(educ$school_women, educ$margin, c=0, bwselect="mserd")
summary(robust_mserd)
Call: rdrobust
Number of Obs. 2630
BW type mserd
Kernel Triangular
VCE method NN
Number of Obs. 2315 315
Eff. Number of Obs. 529 266
Order est. (p) 1 1
Order bias (q) 2 2
BW est. (h) 0.172 0.172
BW bias (b) 0.286 0.286
rho (h/b) 0.603 0.603
Unique Obs. 2313 315
=============================================================================
Method Coef. Std. Err. z P>|z| [ 95% C.I. ]
=============================================================================
Conventional 0.030 0.014 2.116 0.034 [0.002 , 0.058]
Robust - - 1.776 0.076 [-0.003 , 0.063]
=============================================================================
This now looks very different. We estimate a LATE of about 3%
, which is significant at the 90% lvel. Note that the optimal bandwidths has been estimated to be 17.2%-points
on either side of the cut-off, with a separate optimal bandwidth for the error term. Note that the MSE-optimal bandwidth is optimal in statistical terms - we should always make sure to asses the bandwidths against our theory. Here, we’d compare parties’ winning margins up to 17%-points
.
Note that, so far, we have used a single bandwidth for data below and above the cut-off. We can also specify different ones - both manually and in terms of optimal bandwidths. As the structure of our data might differ, different bandwidths might be optimal. We can specify two different MSE-optimal bandwidth selectors by specifying bwselect="msetwo"
.
Exercise 13: Estimate the LATE using rdrobust
and two MSE-optimal bandwidths. Interpret your results and compare it the model with a single optimal bandwidth.
Reveal Answer
robust_msetwo=rdrobust(educ$school_women, educ$margin, c=0, bwselect="msetwo")
summary(robust_msetwo)
Call: rdrobust
Number of Obs. 2630
BW type msetwo
Kernel Triangular
VCE method NN
Number of Obs. 2315 315
Eff. Number of Obs. 607 267
Order est. (p) 1 1
Order bias (q) 2 2
BW est. (h) 0.200 0.174
BW bias (b) 0.323 0.297
rho (h/b) 0.619 0.584
Unique Obs. 2313 315
=============================================================================
Method Coef. Std. Err. z P>|z| [ 95% C.I. ]
=============================================================================
Conventional 0.030 0.014 2.134 0.033 [0.002 , 0.057]
Robust - - 1.810 0.070 [-0.002 , 0.062]
=============================================================================
This now looks pretty similar to the single optimal bandwidth, which is a god sign. In fact, the optimal bandwidth for data points above the cut-off remains virtually unchanged. For the ones below the cut-off, the bandwidths is slightly extended. The point estimate and measures of uncertainty also remain virtually unchanged. If specifying two optimal bandwidths alters the results significantly, this is an indicator that the data should be inspected closely for the cause of the diverging bandwidths.
Kernels
One of the advantages of the non-parametric RDD is that observations can be weighted based on their proximity to the cutoff. This is based on the idea that those closer to the cut-off provide better comparisons than those further away. Think of victory margins as a running variable: If we look at fractions of %-points around the cut-off, it is fair to say that victory is as-if random. The farther away we move, the less plausible this statement becomes.
As you learned in the lecture, there are different ways to do so. The default kernel used by the rdrobust
package is a triangular kernel - which continuously assigns higher weight to observations closer to the cut-off. This is the one we have been using so far as we didn’t explicitly specify the kernel. As the figure below shows, other possible options include epachnechnikov and uniform kernels. These can be specified via the kernel
argument.
Exercise 14: Estimate the LATE using rdrobust
and a single MSE-optimal bandwidths but specifying a uniform kernel. Interpret your results.
Reveal Answer
robust_uniform=rdrobust(educ$school_women, educ$margin, c=0, bwselect="mserd", kernel="uniform")
summary(robust_uniform)
Call: rdrobust
Number of Obs. 2630
BW type mserd
Kernel Uniform
VCE method NN
Number of Obs. 2315 315
Eff. Number of Obs. 474 251
Order est. (p) 1 1
Order bias (q) 2 2
BW est. (h) 0.154 0.154
BW bias (b) 0.283 0.283
rho (h/b) 0.546 0.546
Unique Obs. 2313 315
=============================================================================
Method Coef. Std. Err. z P>|z| [ 95% C.I. ]
=============================================================================
Conventional 0.032 0.014 2.360 0.018 [0.005 , 0.059]
Robust - - 2.044 0.041 [0.001 , 0.063]
=============================================================================
As compared to the triangular kernel, our point estimate is slightly larger and we find that our robust measures of uncertainty show that the effect is significant at the 95% level. Note that the optimal bandwidth is slightly narrower in the model with a uniform kernel. We have to be aware that this model is less conservative though: We assign the same weight to each observation, independent of its distance to the cutoff - which introduces growing bias over increasing bandwidths.
Falsification Checks
Sensitivity
If you apply an RDD, you should always make sure that results are robust to different specifications of the model. Importantly, this involves sensitivity checks. This means you should make sure that results are robust across model specifications and, importantly, different bandwidths.
While there are packages that help produce useful plots that visualise robustness across bandwidths, they can be cumbersome to adjust and are not very flexible. Hence, one avenue to create such a plot can be to loop rdrobust
mdoels over different bandwidths. This solution involves somewhat more coding, but gives you a lot of flexibility as to what you want to display and highlight.
Exercise 15: Create a plot that shows estimates and confidence intervals for estimates of the effect across different bandwidths.
There are several ways to do this. Feel free to play around a bit and try to come up with your own way.
Reveal Hint
Hint: You could use the following approach:
- Create a data frame with all variables you need for the plot and a and observation for each bandwidth.
- Extract the values from the rdrobust
output which you need for your plot.
- Loop the regression and the extraction of output over the bandwidths indicated in the initial data frame.
- Save the output in your loop to the respective row in the data frame.
- Plot the output from the newly created data frame.
Reveal Answer
bandwidth <- seq(from = 0.05, to = 1, by = 0.05) #create a vector with values for each bandwidth you want to estimate
coefficient<- NA
se <- NA
obs <- NA
bw <- NA
ci_u <- NA
ci_l <- NA
data_extract <- data.frame(bandwidth, coefficient, se, obs, bw, ci_u, ci_l) # create a data.frame with all variables you want to incldue in your dataset
# create a loop for each bandwidth that is indicated by 'i'
for(i in bandwidth){
rdbw <- rdrobust(educ$school_women, educ$margin, c=0, h=i) # run the model
# extract the model output (make sure to extract *robust* statistics)
data_extract$coefficient[data_extract$bandwidth==i] <- rdbw$coef[3]
data_extract$se[data_extract$bandwidth==i] <- rdbw$se[3]
data_extract$obs[data_extract$bandwidth==i] <- (rdbw$N_h[1] + rdbw$N_h[2])
data_extract$bw[data_extract$bandwidth==i] <- (rdbw$bws[1, 1])
data_extract$ci_l[data_extract$bandwidth==i] <- rdbw$ci[3,1]
data_extract$ci_u[data_extract$bandwidth==i] <- rdbw$ci[3,2]
}
# Make sure the coefficient (and all other values) are numeric
data_extract$coefficient <- as.numeric(data_extract$coefficient)
# Plot the estimates across bandwidths
ggplot(data = data_extract,
aes(x = bandwidth, y = coefficient)) +
geom_point(size = 0.8) +
geom_ribbon(aes(ymin = ci_l, ymax = ci_u), alpha = 0.2) +
geom_hline(aes(yintercept = 0), col = "red", linetype = 2) +
coord_cartesian(ylim = c(-.05, 0.15)) +
theme_minimal() +
labs(y = "LATE at Discontinuity", x = "Bandwidths (Vote Margin)")
Sorting
The key identification assumption of RDDs is that there is no sorting on the running variable. That is, the running variable must be continuous around the threshold. If this is not the case, we have a problem: Then there’s a good chance that the observations close to the threshold (or on either side of the threshold) are not random. In other words: Presence of sorting is usually interpreted as empirical evidence of self-selection or non-random sorting of units into control and treatment status. Let’s check if sorting is problem here. We can do this using density checks. This means we’re looking at the density of the running variable.
There are several ways to check for sorting. Let’s start with McCrary’s density test. We can use the DCdenstiy
command to conduct both a visual test and statistical test. The syntax is the following:
DCdensity(running_var ,cutpoint= [cutoff] ,plot=TRUE, ext.out = TRUE)
title(xlab="X Lab",ylab="Y Lab")
abline(v=0, lty=1) # Adding a vertical line at the cutoff
Exercise 16: Examine the density of the running variable using the DCdensity
command.
Reveal Answer
DCdensity(educ$margin ,cutpoint= 0 ,plot=TRUE, ext.out = TRUE)
$theta
[1] -0.09537487
$se
[1] 0.1466241
$z
[1] -0.6504719
$p
[1] 0.5153874
$binsize
[1] 0.008611425
$bw
[1] 0.1654581
$cutpoint
[1] 0
$data
cellmp cellval
1 -1.003231050 0.04365594
2 -0.994619625 0.04365594
3 -0.986008199 0.00000000
4 -0.977396774 0.00000000
5 -0.968785349 0.04365594
6 -0.960173923 0.00000000
7 -0.951562498 0.04365594
8 -0.942951073 0.00000000
9 -0.934339647 0.00000000
10 -0.925728222 0.04365594
11 -0.917116797 0.00000000
12 -0.908505371 0.04365594
13 -0.899893946 0.00000000
14 -0.891282521 0.08731188
15 -0.882671095 0.00000000
16 -0.874059670 0.00000000
17 -0.865448245 0.00000000
18 -0.856836819 0.04365594
19 -0.848225394 0.04365594
20 -0.839613969 0.04365594
21 -0.831002543 0.08731188
22 -0.822391118 0.04365594
23 -0.813779693 0.00000000
24 -0.805168268 0.00000000
25 -0.796556842 0.04365594
26 -0.787945417 0.00000000
27 -0.779333992 0.00000000
28 -0.770722566 0.00000000
29 -0.762111141 0.08731188
30 -0.753499716 0.00000000
31 -0.744888290 0.00000000
32 -0.736276865 0.04365594
33 -0.727665440 0.04365594
34 -0.719054014 0.08731188
35 -0.710442589 0.17462375
36 -0.701831164 0.08731188
37 -0.693219738 0.08731188
38 -0.684608313 0.13096781
39 -0.675996888 0.08731188
40 -0.667385462 0.04365594
41 -0.658774037 0.17462375
42 -0.650162612 0.21827969
43 -0.641551186 0.21827969
44 -0.632939761 0.26193563
45 -0.624328336 0.39290344
46 -0.615716910 0.48021532
47 -0.607105485 0.21827969
48 -0.598494060 0.87311876
49 -0.589882634 0.52387126
50 -0.581271209 0.61118313
51 -0.572659784 0.65483907
52 -0.564048359 0.78580688
53 -0.555436933 0.91677470
54 -0.546825508 1.30967814
55 -0.538214083 1.65892564
56 -0.529602657 1.74623752
57 -0.520991232 2.66301222
58 -0.512379807 1.70258158
59 -0.503768381 2.75032409
60 -0.495156956 2.48838846
61 -0.486545531 2.44473253
62 -0.477934105 1.78989346
63 -0.469322680 1.70258158
64 -0.460711255 1.35333408
65 -0.452099829 1.65892564
66 -0.443488404 1.83354939
67 -0.434876979 1.17871033
68 -0.426265553 1.65892564
69 -0.417654128 2.22645284
70 -0.409042703 1.70258158
71 -0.400431277 1.17871033
72 -0.391819852 1.57161377
73 -0.383208427 2.00817315
74 -0.374597001 1.78989346
75 -0.365985576 1.96451721
76 -0.357374151 1.83354939
77 -0.348762726 1.30967814
78 -0.340151300 2.22645284
79 -0.331539875 1.57161377
80 -0.322928450 1.57161377
81 -0.314317024 1.30967814
82 -0.305705599 1.48430189
83 -0.297094174 1.52795783
84 -0.288482748 1.52795783
85 -0.279871323 1.57161377
86 -0.271259898 1.35333408
87 -0.262648472 1.35333408
88 -0.254037047 1.35333408
89 -0.245425622 1.09139845
90 -0.236814196 1.00408657
91 -0.228202771 1.30967814
92 -0.219591346 1.00408657
93 -0.210979920 1.35333408
94 -0.202368495 1.09139845
95 -0.193757070 0.91677470
96 -0.185145644 1.30967814
97 -0.176534219 1.00408657
98 -0.167922794 0.96043064
99 -0.159311368 1.39699001
100 -0.150699943 0.96043064
101 -0.142088518 1.35333408
102 -0.133477092 1.22236626
103 -0.124865667 1.09139845
104 -0.116254242 1.00408657
105 -0.107642817 1.52795783
106 -0.099031391 1.44064595
107 -0.090419966 1.26602220
108 -0.081808541 1.17871033
109 -0.073197115 0.78580688
110 -0.064585690 1.13505439
111 -0.055974265 1.35333408
112 -0.047362839 0.96043064
113 -0.038751414 1.61526970
114 -0.030139989 1.22236626
115 -0.021528563 1.17871033
116 -0.012917138 1.00408657
117 -0.004305713 0.82946282
118 0.004305713 1.13505439
119 0.012917138 0.96043064
120 0.021528563 0.69849501
121 0.030139989 0.78580688
122 0.038751414 0.61118313
123 0.047362839 0.78580688
124 0.055974265 0.82946282
125 0.064585690 0.65483907
126 0.073197115 0.56752719
127 0.081808541 0.61118313
128 0.090419966 0.61118313
129 0.099031391 0.56752719
130 0.107642817 0.65483907
131 0.116254242 0.52387126
132 0.124865667 0.34924750
133 0.133477092 0.34924750
134 0.142088518 0.26193563
135 0.150699943 0.39290344
136 0.159311368 0.30559157
137 0.167922794 0.39290344
138 0.176534219 0.17462375
139 0.185145644 0.21827969
140 0.193757070 0.17462375
141 0.202368495 0.13096781
142 0.210979920 0.17462375
143 0.219591346 0.17462375
144 0.228202771 0.17462375
145 0.236814196 0.08731188
146 0.245425622 0.00000000
147 0.254037047 0.08731188
148 0.262648472 0.17462375
149 0.271259898 0.08731188
150 0.279871323 0.00000000
151 0.288482748 0.00000000
152 0.297094174 0.08731188
153 0.305705599 0.00000000
154 0.314317024 0.04365594
155 0.322928450 0.00000000
156 0.331539875 0.04365594
157 0.340151300 0.00000000
158 0.348762726 0.04365594
159 0.357374151 0.00000000
160 0.365985576 0.00000000
161 0.374597001 0.00000000
162 0.383208427 0.00000000
163 0.391819852 0.00000000
164 0.400431277 0.00000000
165 0.409042703 0.00000000
166 0.417654128 0.00000000
167 0.426265553 0.00000000
168 0.434876979 0.04365594
169 0.443488404 0.00000000
170 0.452099829 0.08731188
171 0.460711255 0.00000000
172 0.469322680 0.04365594
173 0.477934105 0.00000000
174 0.486545531 0.04365594
175 0.495156956 0.04365594
176 0.503768381 0.00000000
177 0.512379807 0.00000000
178 0.520991232 0.00000000
179 0.529602657 0.04365594
180 0.538214083 0.00000000
181 0.546825508 0.00000000
182 0.555436933 0.00000000
183 0.564048359 0.00000000
184 0.572659784 0.00000000
185 0.581271209 0.00000000
186 0.589882634 0.04365594
187 0.598494060 0.00000000
188 0.607105485 0.00000000
189 0.615716910 0.00000000
190 0.624328336 0.00000000
191 0.632939761 0.00000000
192 0.641551186 0.00000000
193 0.650162612 0.00000000
194 0.658774037 0.00000000
195 0.667385462 0.00000000
196 0.675996888 0.00000000
197 0.684608313 0.00000000
198 0.693219738 0.00000000
199 0.701831164 0.00000000
200 0.710442589 0.00000000
201 0.719054014 0.00000000
202 0.727665440 0.00000000
203 0.736276865 0.00000000
204 0.744888290 0.00000000
205 0.753499716 0.00000000
206 0.762111141 0.00000000
207 0.770722566 0.00000000
208 0.779333992 0.00000000
209 0.787945417 0.00000000
210 0.796556842 0.00000000
211 0.805168268 0.00000000
212 0.813779693 0.00000000
213 0.822391118 0.00000000
214 0.831002543 0.00000000
215 0.839613969 0.00000000
216 0.848225394 0.00000000
217 0.856836819 0.00000000
218 0.865448245 0.00000000
219 0.874059670 0.00000000
220 0.882671095 0.00000000
221 0.891282521 0.00000000
222 0.899893946 0.00000000
223 0.908505371 0.00000000
224 0.917116797 0.00000000
225 0.925728222 0.00000000
226 0.934339647 0.00000000
227 0.942951073 0.00000000
228 0.951562498 0.00000000
229 0.960173923 0.00000000
230 0.968785349 0.00000000
231 0.977396774 0.00000000
232 0.986008199 0.00000000
233 0.994619625 0.04365594
title(xlab="Winning Margin",ylab="Density") # add labels (always add labels. Really. Always.)
abline(v=0, lty=1)
The test provides relatively clear visual evidence that sorting is not a problem in this case. There is a clear overlap of the confidence intervals at the cut-off, so the continuity assumption holds. This command provides also statistical tests. You can see that information on the individual cells - or bins - is provided. For now, we focus on the inference test for sorting. The p-value of about 0.52
makes clear that we do not reject the null hypothesis of continuity.
Let’s further explore the structure of our data and check if sorting really isn’t a problem here. Recall, we want to make sure the density of units should be continuous near the cut-off. We can use the rddensity
package to do so. This package applies a different test than the one we used before. It also is somewhat more powerful and provides additional information.
The command is straightforward. You only have to insert the running variable as argument and, if different from 0, the cutoff. Note that you can optionally specify bandwidths, as above, using the h
argument. Otherwise, an optimal bandwidth will be specified automatically. You can then save this as an object and run the summary
command to see the output.
Exercise 17: Examine the density of the running variable using the rddensity
command. Interpret your findings. Are they equivalent to those in the previous exercise?
Reveal Answer
rdd <- rddensity(educ$margin, c=0)
summary(rdd)
Manipulation testing using local polynomial density estimation.
Number of obs = 2660
Model = unrestricted
Kernel = triangular
BW method = estimated
VCE method = jackknife
c = 0 Left of c Right of c
Number of obs 2332 328
Eff. Number of obs 769 314
Order est. (p) 2 2
Order bias (q) 3 3
BW est. (h) 0.25 0.282
Method T P > |T|
Robust -0.9203 0.3574
P-values of binomial tests (H0: p=0.5).
Window Length / 2 <c >=c P>|T|
0.004 11 9 0.8238
0.009 18 26 0.2912
0.013 32 34 0.9022
0.017 42 48 0.5984
0.021 52 58 0.6338
0.026 68 63 0.7269
0.030 76 73 0.8699
0.034 97 82 0.2954
0.038 116 89 0.0691
0.043 134 93 0.0078
This output looks clearer. Note that the test is based on quadratic local polynomial functions. We also get a different bandwidth - recall that we are solely looking at the density of the running variable here.
Importantly, the test reveals a p-value of 0.357
. We again do not reject the null hypothesis of continuity around the cut-off. Both tests indicating the same result gives us confidence in doing so.
The second part of the output indicates the results of binomial tests for the allocation of observations around the cut-off for different windows. The number of observations is indicated as well as individual p-values for significance tests, with the null hypothesis being equal distribution on either side of the cut-off. Note that this can give us an indication, but modest unbalance is not evidence for sorting.
Let’s now plot this density test, too. The rdplotdensity
function allows us to do so easily. It uses the results of the test conducted in rddensity
. The syntax is simple:
rdplotdensity([rdd_object] ,[running_var])
We can also add some useful arguments. Feel free to adjust your plot:
Argument | Description |
---|---|
plotRange | Indicates starting and end point of plot |
plotN | Number of grid points used on either side of the cutoff |
CIuniform | True or False. If true, CI are displayed as continuous bands |
Exercise 18: Use the rplotdensity
to conduct a visual density check.
Reveal Answer
plot_rdd <- rdplotdensity(rdd,educ$margin, plotRange = c(-.2, 0.2), CIuniform = TRUE)
Again, we find visual evidence for continuity around the cutoff. Now we can be confident about sorting not being a problem with this data set.
Balance
One important falsification test is examining whether near the cut-off treated units are similar to control units in terms of observable characteristics. The logic of this test is to identify that if units cannot manipulate the score they receive, there should not be systematic differences between untreated and treated units with similar values of the score.
Exercise 19: Let’s check for balance around the cutoff. Use the rdrobust()
function. Set the cut-off argument c
equal to 0, and the bwselect
argument equal to “mserd”. In this case, the outcome variables are the covariates that we want to check for balance. Check if there is balance for the following covariates: log_pop
, school_men
, sex_ratio
, log_area
. Remember to use the summary()
.
Reveal Answer
summary(rdrobust(educ$log_pop, educ$margin, c=0, bwselect="mserd"))
Call: rdrobust
Number of Obs. 2660
BW type mserd
Kernel Triangular
VCE method NN
Number of Obs. 2332 328
Eff. Number of Obs. 385 232
Order est. (p) 1 1
Order bias (q) 2 2
BW est. (h) 0.122 0.122
BW bias (b) 0.195 0.195
rho (h/b) 0.627 0.627
Unique Obs. 2330 328
=============================================================================
Method Coef. Std. Err. z P>|z| [ 95% C.I. ]
=============================================================================
Conventional 0.013 0.290 0.045 0.964 [-0.555 , 0.581]
Robust - - 0.005 0.996 [-0.676 , 0.680]
=============================================================================
summary(rdrobust(educ$school_men, educ$margin, c=0, bwselect="mserd"))
Call: rdrobust
Number of Obs. 2632
BW type mserd
Kernel Triangular
VCE method NN
Number of Obs. 2317 315
Eff. Number of Obs. 373 222
Order est. (p) 1 1
Order bias (q) 2 2
BW est. (h) 0.121 0.121
BW bias (b) 0.187 0.187
rho (h/b) 0.646 0.646
Unique Obs. 2315 315
=============================================================================
Method Coef. Std. Err. z P>|z| [ 95% C.I. ]
=============================================================================
Conventional 0.013 0.014 0.916 0.360 [-0.015 , 0.040]
Robust - - 0.747 0.455 [-0.021 , 0.046]
=============================================================================
summary(rdrobust(educ$sex_ratio, educ$margin, c=0, bwselect="mserd"))
Call: rdrobust
Number of Obs. 2659
BW type mserd
Kernel Triangular
VCE method NN
Number of Obs. 2331 328
Eff. Number of Obs. 333 206
Order est. (p) 1 1
Order bias (q) 2 2
BW est. (h) 0.106 0.106
BW bias (b) 0.191 0.191
rho (h/b) 0.554 0.554
Unique Obs. 2329 328
=============================================================================
Method Coef. Std. Err. z P>|z| [ 95% C.I. ]
=============================================================================
Conventional 0.017 0.019 0.920 0.357 [-0.020 , 0.055]
Robust - - 1.006 0.314 [-0.021 , 0.065]
=============================================================================
summary(rdrobust(educ$log_area, educ$margin, c=0, bwselect="mserd"))
Call: rdrobust
Number of Obs. 2642
BW type mserd
Kernel Triangular
VCE method NN
Number of Obs. 2320 322
Eff. Number of Obs. 352 214
Order est. (p) 1 1
Order bias (q) 2 2
BW est. (h) 0.112 0.112
BW bias (b) 0.185 0.185
rho (h/b) 0.607 0.607
Unique Obs. 2318 322
=============================================================================
Method Coef. Std. Err. z P>|z| [ 95% C.I. ]
=============================================================================
Conventional 0.029 0.234 0.124 0.901 [-0.431 , 0.489]
Robust - - 0.045 0.964 [-0.534 , 0.559]
=============================================================================
# We will advise you to create a loop to conduct this analysis.
We observe that there is balance across all the covariates of interest.
Placebo cut-offs
Another falsification test is to identify treatment effects at artificials or placebo cut-off values. If you remember from the lecture one of the identifying assumptions is the continuity (or lack of abrupt changes) of the regression functions for the treated and control units around the cut-off. However, we cannot empirically test this assumption, we can test continuity in other points apart from the threshold. Although this does not imply continuity assumption hold at the cut-off, it helps to rule out discontinuities away from this point.
This test implies looking at the outcome variable but using different cut-offs where we should not expect changes in the outcome. Thus, the estimates from this test should be near zero (not statistically significant as well). One important step to conduct this test is that you need to split the sample for those observations that are above the cut-off and those that are below. We do this to avoid “contamination” due to the real treatment effects. It also ensures that the analysis of each placebo cut-off is conducted using only observations with the same treatment status.
Exercise 20: Conduct placebo cut-off test using the following placebo values: -0.2 and -0.25 for the placebo tests below the cut-off. Conduct the same test above the cut-off using the following placebo values: 0.20, and 0.25. Use the rdrobust()
function to perform this test. Set the argument bwselect
equal to “mserd”. Replace the cut-off argument c
with the values listed before. Use the summary()
to report the results. Remember to create two new data frames. You can use the filter function from tidyverse to do this..
Function/argument | Description |
---|---|
new_data <- data | Creates a new data frame using the conditions set before the pipeline operator %>>% |
filter(variable == "condition") | Subset the data based on a logical operation |
new_data <- data %>%
filter(variable >= 0)
Reveal Answer
educ_above <- educ %>%
filter(margin >= 0)
educ_below <- educ %>%
filter(margin < 0)
summary(rdrobust(educ_above$school_women, educ_above$margin, c=0.2, bwselect = "mserd")) # -0.044
Call: rdrobust
Number of Obs. 315
BW type mserd
Kernel Triangular
VCE method NN
Number of Obs. 280 35
Eff. Number of Obs. 38 14
Order est. (p) 1 1
Order bias (q) 2 2
BW est. (h) 0.053 0.053
BW bias (b) 0.093 0.093
rho (h/b) 0.577 0.577
Unique Obs. 280 35
=============================================================================
Method Coef. Std. Err. z P>|z| [ 95% C.I. ]
=============================================================================
Conventional -0.044 0.034 -1.287 0.198 [-0.112 , 0.023]
Robust - - -1.022 0.307 [-0.122 , 0.038]
=============================================================================
summary(rdrobust(educ_above$school_women, educ_above$margin, c=0.25, bwselect="mserd")) #-0.025
Call: rdrobust
Number of Obs. 315
BW type mserd
Kernel Triangular
VCE method NN
Number of Obs. 294 21
Eff. Number of Obs. 24 10
Order est. (p) 1 1
Order bias (q) 2 2
BW est. (h) 0.066 0.066
BW bias (b) 0.119 0.119
rho (h/b) 0.560 0.560
Unique Obs. 294 21
=============================================================================
Method Coef. Std. Err. z P>|z| [ 95% C.I. ]
=============================================================================
Conventional -0.025 0.067 -0.380 0.704 [-0.156 , 0.105]
Robust - - -0.323 0.747 [-0.185 , 0.133]
=============================================================================
summary(rdrobust(educ_below$school_women, educ_below$margin, c=-0.2, bwselect="mserd")) # 0.063
Call: rdrobust
Number of Obs. 2315
BW type mserd
Kernel Triangular
VCE method NN
Number of Obs. 1707 608
Eff. Number of Obs. 105 96
Order est. (p) 1 1
Order bias (q) 2 2
BW est. (h) 0.034 0.034
BW bias (b) 0.064 0.064
rho (h/b) 0.542 0.542
Unique Obs. 1705 608
=============================================================================
Method Coef. Std. Err. z P>|z| [ 95% C.I. ]
=============================================================================
Conventional 0.063 0.030 2.081 0.037 [0.004 , 0.123]
Robust - - 2.172 0.030 [0.007 , 0.142]
=============================================================================
summary(rdrobust(educ_below$school_women, educ_below$margin, c=-0.25, bwselect="mserd")) # -0.008
Call: rdrobust
Number of Obs. 2315
BW type mserd
Kernel Triangular
VCE method NN
Number of Obs. 1560 755
Eff. Number of Obs. 430 308
Order est. (p) 1 1
Order bias (q) 2 2
BW est. (h) 0.106 0.106
BW bias (b) 0.160 0.160
rho (h/b) 0.663 0.663
Unique Obs. 1558 755
=============================================================================
Method Coef. Std. Err. z P>|z| [ 95% C.I. ]
=============================================================================
Conventional -0.008 0.015 -0.506 0.613 [-0.037 , 0.022]
Robust - - -0.371 0.711 [-0.041 , 0.028]
=============================================================================
We find that for almost all the placebo cut-off values, the coefficients are all nearly zero and not statistically significant (with the exception of -0.2. This evidence suggests that the assumption of continuity hold in this case. Our outcome of interest does not jump abruptly at artificial cut-offs.