Lab 5: Difference in Difference

Lecture Slides & Data

Recap

The intuition of the DD strategy is to combine two simpler approaches. The first difference (before and after) eliminates unit-specific fixed effects. Then, the second difference eliminates time fixed effects. With this approach, we get an unbiased estimate of the intervention.

We learned that we can break down the difference between treated units and untreated units in post-treatment as the Average Treatment Effect amongst the Treated (ATT), different time trends and selection bias. However, we can impose additional estimation assumptions to retrieve a credible estimate of the effect of treatment. The key assumption in diff-in-diff studies is the so-called Parallel Trends or Common Trends assumption. This assumption states that in the absence of the treatment/policy, we should expect to see that treated units would follow similar trends as the untreated ones. Unfortunately, we cannot test whether this assumption holds, but we can at least conduct some tests that would be indicated that the Parallel Trends holds.

We also learned that we can calculate the ATT in different ways. We learned that we can manually calculate the difference-in-difference estimator.

Group-period interactions: Here the treatment variable is equal to 1 for all the years since the unit received the treatment. Then, our coefficient of interest is captured by the interaction between the treatment and the time variable.

If you go back to the Golden Dawn, we can see how we retrieve the beta coefficient from the interaction.

\[gdper_{mt} = \beta_0 + \beta_1evertr_m + \beta_2post_t + \beta_3evertr_m \times post_t + u_{mt}\]

	Post = 0	Post = 1
Treat = 0	$\beta_0 + u_{mt}$	$\beta_0 + \beta_2 + u_{mt}$
Treat = 1	$\beta_0 +\beta_1 + u_{mt}$	$\beta_0 + \beta_1 + \beta_2 + \beta_3 + u_{mt}$

Then, we can get our estimate by the calculating the difference of the outcome variable for both treated and untreated units and, and then subtract these differences:

\[((\beta_0 + \beta_1 + \beta_2 + \beta_3 + u_{mt}) - (\beta_0 + \beta_1 + u_mt)) -((\beta_0 + \beta_2 + u_{mt})-(\beta_0 + u_{mt}))\] \[(\beta_2 + \beta_3) -(\beta_2)\] \[\beta_3\] - Unit and time dummies and a treatment indicator: Again, in the case of the Golden Dawn example, we can represent this estimation strategy using the following regression model:

\[gdper_{mt} = \beta_1treatment_{mt} + \alpha_m \text{unit dummy} + \gamma_t \text{time dummy} + u_{mt}\] For treated before treated $Treatment = 0$:

\[gdper_{mt} = \beta_1\times 0 + \alpha_m \times 1 + \gamma_t \times + u_{mt}\] \[gdper_{mt} = \alpha_m + \gamma_t + u_{mt}\] For treated after treated $Treatment = 1$:

\[gdper_{mt} = \beta_1 + \alpha_m + \gamma_t + u_{mt}\] Then, we can take the difference before and after:

\[(\beta_1 + \alpha_m + \gamma_t + u_{mt}) - (\alpha_m + \gamma_t + u_{mt})\] \[\beta_1\] Again we are levering the parallel trends assumption by assuming that time-trends are the same for both treated and untreated units.

Finally, we discussed inference and in particular standard errors. Given that we have repeated observations, this type of data exhibits serially correlated regressors and residuals, we need to make the appropriate adjustments to calculate standard errors. One way is to address this is to use clustered standard errors.

Before starting this seminar

Create a folder called “lab5”
Download the data (you can use this button or the one at the top, or read csv files directly from github):
Open an R script (or Markdown file) and save it in our “lab5” folder.
Set your working directory using the setwd() function or by clicking on “More“. For example setwd(“~/Desktop/Causal Inference/2022/Lab5”)
Let’s install an load the packages that we will be using in this lab:

library(jtools) # generate plots with regression coefficients
library(stargazer) # generate formated regression tables 
library(texreg) # generate formatted regression tables
library(tidyverse) # to conduct some tidy operations
library(ggplot2) # to generate plots 
library(plm) # conduct one-way and two-way fixed effects 
library(estimatr) #  to conduct ols and provides robust standard errors
library(lmtest) # calculates the variance-covariance matrix to be clustered by group.
library(sandwich) # to calculate heteroscedasticity-Consistent Covariance Matrix Estimation
library(haven) # upload dta files 
#install.packages("wesanderson")
library("wesanderson") # Wes Anderson palette (Let's make our plots as Indie as possible)
#install.packages("modelsummary")
library(modelsummary)

Seminar Overview

In this seminar, we will cover the following topics:
1. “Manually” calculate the difference-in-difference estimator"
2. Obtaining the difference in difference estimator using the lm() function
3. Check for parallel trends
4. Conduct a fixed effect in difference in difference estimation using both lm_robust() and plm() function.
4. Conduct a placebo test

Waking Up the Golden Dawn: Does Exposure to the Refugee Crisis Increase Support for Extreme-Right Parties? (Dinas et al, 2018)

The main question of this paper is the following: Did the influx of refugees in Greece increase support for the right-wing Golden Dawn party in 2015?.

The authors exploit that the Aegean islands close to the Turkish border experienced sudden and drastic increases in the number of Syrian refugees while other islands slightly farther away—but with otherwise similar institutional and socio-economic characteristics—did not.

We can see here in the Figure in the left that level of exposure to refugees across the Aegean islands. The figure on the right shows how sudden the influx of refugees was.

Timing

The refugee crisis started in the spring of 2015
Greece held an election September, 2015, right after the first wave of refugee arrival, where in the data is coded as 1 for the post variable and coded as year 2016
The previous election had taken place only eight months prior in January 2015, before significant number of refugees arrived, which is coded in the data as year 2015.
Two more elections were held in 2012. The election in May 2012 is coded as 2012 and the election in June 2013 is coded as 2013.
The unit of analysis are municipalities

Now let’s familiarise ourselves with the data. A description of some of the variables that we will use today is below:

Variable	Description
`Year`	Election Year from 2012 to 2016
`municipality`	Municipality id
`post`	A dummy variable that switches on year 2016
`treatment`	A dummy variable indicating if the municipality received refugees on the 2016 election
`evertr`	Treatment variable indicating if the municipality received refugees
`gdper`	Vote share for the Golden Dawn at the municipality level as percentage of total votes

Now let’s load the data. There are two ways to do this:

You can load the dataset from your laptop using the read_dta() function. We will call this data frame greekislands.

# Set your working directory
#setwd("~/Desktop/Causal Inference/2022/Lab5")
# 
#setwd("/cloud/project")

Or you can download the data from the course website from following url: https://github.com/dpir-ci/CI22/blob/gh-pages/docs/data/greekislands.dta?raw=true.

Let’s start by checking our data as always.

Exercise 1: Use the head() function to familiarise yourself with the data set.

Reveal Answer

head(greekislands)

# A tibble: 6 x 9
   year  muni treatment distance gdper  post  gddif evertr post2
  <dbl> <dbl>     <dbl>    <dbl> <dbl> <dbl>  <dbl>  <dbl> <dbl>
1  2012     1         0     240.  7.98     0 NA          0     0
2  2013     1         0     240.  7.28     0 -0.705      0     0
3  2015     1         0     240.  6.36     0 NA          0     1
4  2016     1         0     240.  7.62     1  1.25       0     1
5  2012     2         0     318.  2.58     0 NA          0     0
6  2013     2         0     318.  4.28     0  1.70       0     0

This is what we would expect. Recall that the evertr variable indicates municipalities that are treated at some point - independent of the timing -, while the treatment variable marks municipalities after they were treated.

As we can see, the data set covers multiple election years. Before working with the data, let’s make sure we know how many and which elections are included in the data.

Exercise 2: How many elections, i.e. years are covered by the data?

Reveal Answer

unique(greekislands$year)

[1] 2012 2013 2015 2016

Using the unique command, we find that four elections are covered by the data: The elections that took place in 2012, 2013, 2015, and 2016.

Let’s have a look at general voting patterns of the Golden Dawn over time, irrespective of treatment status of municipalities.

Exercise 3: Plot the vote share of the Golden Dawn Party (gdper) over time.

There are several ways to do this. For instance, you could plot the dispersion across municipalities by using the boxplot command or, alternatively, calculate average values per year. Feel free to pick the option you deem most appropriate.

Reveal Answer

# First option: Boxplot
boxplot(gdper ~ year, 
        data = greekislands, 
        xlab = "Election Year", 
        ylab = "Golden Dawn Vote Share", 
        las = 2) # This argument rotates the axis labels

# Second Option: Plot averages
# Calculating and storing means
average_data <- greekislands %>% 
            group_by(year) %>% 
            summarize(gd_averages = mean(gdper))

# Plotting means using ggplot

ggplot(average_data, aes(x = year, y = gd_averages)) + 
  geom_point() + geom_line() + xlab("Election Year") + ylab("Average GD Vote Share") + ylim(0,6.5)

We can se that the vote share for the Golden Dawn party has remained somewhat stable between the 2012 and 2013 elections, and dropped in the 2015 election to an average value (per municipality) of 4.46%. In the 2016 election, the party’s vote share rose substantively - to a new high of 6%.

Difference-in-Differences

Being aware of the general trends of the Golden Dawn’s vote share is an important information about context and the party’s history. However, it cannot tell us anything about the treatment effect we seek to analyse: The arrival of refugees in some Greek municipalities in the summer of 2015.

A naive observer might propose identifying this effect by looking at the differences between treated and untreated units in the post-treatment periods. Would this, however, be an appropriate representation of a possible treatment effect? It clearly would not! Comparing post-treatment differences doesn’t allows us to account for unit/municipality-specific effects and voting patterns. Treatment, after all, was not assigned randomly. We would not be able to say what the effect of the treatment is unless we can make a statement about how the treatment changed the outcome or resulted in the diversion from a previous trajectory. Using a difference-in-differences design allows us to do so.

Exercise 4: Estimate the treatment effect by calculating difference-in-differences between 2015 and 2016 using the mean() function.

Hint: Calculate the differences between treated and untreated units for the years 2015 and 2016 first.

Reveal Answer

# Difference in means between treated and untreated in 2016 (post-treatment).
post_difference <- mean(greekislands$gdper[greekislands$evertr == 1 & greekislands$year == 2016]) - 
   mean(greekislands$gdper[greekislands$evertr == 0 & greekislands$year == 2016])
post_difference

[1] 2.744838

# Difference in means between treated and untreated in 2015 (pre-treatment).
pre_difference <- mean(greekislands$gdper[greekislands$evertr == 1 & greekislands$year == 2015]) - 
   mean(greekislands$gdper[greekislands$evertr == 0 & greekislands$year == 2015])
pre_difference

[1] 0.6212737

# Now calculate the difference between the two differences above
diff_in_diff <- post_difference - pre_difference
diff_in_diff

[1] 2.123564

The difference in the Golden Dawn’s vote share between treated and untreated municipalities has increased in 2016. The difference-in-differences amount to 2.12. This suggests that the treatment (i.e. the arrival of refugees) increased the vote share of the golden dawn by roughly 2%points in the affected municipalities.

While it is important to understand what exactly the difference-in-difference means, we usually do not have to calculate it manually. In fact, we can simply use an OLS to estimate differences-in-differences.

Exercise 5: Estimate the difference-in-difference between 2015 and 2016 using an OLS regression.

Hint

You can run a simple OLS with the interaction term of evertr and a dummy variable for the post-treatment period (post) as independent variables. However, you should restrict the data to the years 2015 and 2016, specifying data = greekislands[greekislands$year>=2015,] as argument for your OLS.

Reveal Answer

# You can use `'post'. Howeverv, if you ever need to create a similar variable, it's pretty simple:
greekislands$post_treatment <- greekislands$year == 2016

ols_did <- lm_robust(gdper ~ evertr * post_treatment, data = greekislands[greekislands$year>=2015,])
summary(ols_did)


Call:
lm_robust(formula = gdper ~ evertr * post_treatment, data = greekislands[greekislands$year >= 
    2015, ])

Standard error type:  HC2 

Coefficients:
                          Estimate Std. Error t value  Pr(>|t|) CI Lower
(Intercept)                 4.3854     0.2221 19.7409 1.045e-47   3.9471
evertr                      0.6213     0.7996  0.7769 4.382e-01  -0.9561
post_treatmentTRUE          1.2737     0.3190  3.9935 9.333e-05   0.6445
evertr:post_treatmentTRUE   2.1236     1.3052  1.6270 1.054e-01  -0.4511
                          CI Upper  DF
(Intercept)                  4.824 188
evertr                       2.199 188
post_treatmentTRUE           1.903 188
evertr:post_treatmentTRUE    4.698 188

Multiple R-squared:  0.1762 ,   Adjusted R-squared:  0.163 
F-statistic: 9.036 on 3 and 188 DF,  p-value: 1.279e-05

The estimate for the interaction term, i.e. treated units after they were treated, corresponds to the difference-in-differences. As you can see, the OLS provides exactly the same point estimate as the manual calculation of the difference-in-differences, 2.12 - however, the OLS also provides measures of uncertainty, showing us that the estimate is significant to conventional levels. Further, the OLS output provides quite some intuitive estimates: The intercept corresponds to the Golden Dawn’s vote share in untreated municipalities in 2015. Treated municipalities had, on average, a higher vote share for the party by roughly 0.62%points. The parties vote share increased by about 1.27%points in untreated municipalities in 2016.

Note that there are multiple ways to calculate the difference-in-differences. Fixed effects are usually the preferred approach as they provide the most flexible (e.g. in terms of multiple time periods) and most efficient estimator. Let’s therefore calculate a fixed effects model for the same difference-in-differences as in Exercise 5.

Exercise 6: Estimate the difference-in-differences between 2015 and 2016 using a Fixed Effects regression.

Hint: You can use either lm_robust(), plm() or lm() with dummy variables.

Reveal Answer

# lm_robust
fe_robust <- lm_robust(gdper ~ evertr*post_treatment, fixed_effects= factor(muni), data=greekislands[greekislands$year>=2015,])
summary(fe_robust)


Call:
lm_robust(formula = gdper ~ evertr * post_treatment, data = greekislands[greekislands$year >= 
    2015, ], fixed_effects = factor(muni))

Standard error type:  HC2 

Coefficients: (1 not defined because the design matrix is rank deficient)
                          Estimate Std. Error t value  Pr(>|t|) CI Lower
evertr                          NA         NA      NA        NA       NA
post_treatmentTRUE           1.274    0.09139  13.938 1.333e-24    1.092
evertr:post_treatmentTRUE    2.124    0.39686   5.351 6.152e-07    1.336
                          CI Upper DF
evertr                          NA NA
post_treatmentTRUE           1.455 94
evertr:post_treatmentTRUE    2.912 94

Multiple R-squared:  0.9651 ,   Adjusted R-squared:  0.9292
Multiple R-squared (proj. model):  0.7791 , Adjusted R-squared (proj. model):  0.5511 
F-statistic (proj. model):    NA on 2 and 94 DF,  p-value: NA

# plm
fe_plm <- plm(gdper ~ evertr*post_treatment, model = "within", index = c("muni"), data=greekislands[greekislands$year>=2015,])
summary(fe_plm)

Oneway (individual) effect Within Model

Call:
plm(formula = gdper ~ evertr * post_treatment, data = greekislands[greekislands$year >= 
    2015, ], model = "within", index = c("muni"))

Balanced Panel: n = 96, T = 2, N = 192

Residuals:
    Min.  1st Qu.   Median  3rd Qu.     Max. 
-1.28954 -0.27782  0.00000  0.27782  1.28954 

Coefficients:
                          Estimate Std. Error t-value  Pr(>|t|)    
post_treatmentTRUE        1.273734   0.099336 12.8225 < 2.2e-16 ***
evertr:post_treatmentTRUE 2.123564   0.280964  7.5581  2.67e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Total Sum of Squares:    176.35
Residual Sum of Squares: 38.957
R-Squared:      0.77909
Adj. R-Squared: 0.55113
F-statistic: 165.754 on 2 and 94 DF, p-value: < 2.22e-16

fe_dummy <- lm(gdper ~ evertr * post_treatment + factor(muni), data = greekislands[greekislands$year>=2015,])
summary(fe_dummy)


Call:
lm(formula = gdper ~ evertr * post_treatment + factor(muni), 
    data = greekislands[greekislands$year >= 2015, ])

Residuals:
    Min      1Q  Median      3Q     Max 
-1.2895 -0.2778  0.0000  0.2778  1.2895 

Coefficients: (1 not defined because of singularities)
                          Estimate Std. Error t value Pr(>|t|)    
(Intercept)                6.35368    0.45792  13.875  < 2e-16 ***
evertr                    -2.69291    0.65892  -4.087 9.20e-05 ***
post_treatmentTRUE         1.27373    0.09934  12.822  < 2e-16 ***
factor(muni)2             -3.78604    0.64377  -5.881 6.20e-08 ***
factor(muni)3             -1.55751    0.64377  -2.419 0.017472 *  
factor(muni)4             -3.10872    0.64377  -4.829 5.31e-06 ***
factor(muni)5             -1.22008    0.64377  -1.895 0.061138 .  
factor(muni)6              1.06490    0.64377   1.654 0.101431    
factor(muni)7             -1.13881    0.64377  -1.769 0.080143 .  
factor(muni)8             -4.10152    0.64377  -6.371 6.88e-09 ***
factor(muni)9             -1.05344    0.64377  -1.636 0.105109    
factor(muni)10            -2.66002    0.64377  -4.132 7.80e-05 ***
factor(muni)11             0.54228    0.64377   0.842 0.401730    
factor(muni)12            -2.99074    0.64377  -4.646 1.10e-05 ***
factor(muni)13            -6.99054    0.64377 -10.859  < 2e-16 ***
factor(muni)14            -2.18372    0.64377  -3.392 0.001016 ** 
factor(muni)15            -4.63345    0.64377  -7.197 1.48e-10 ***
factor(muni)16            -4.98236    0.64377  -7.739 1.12e-11 ***
factor(muni)17            -5.23368    0.64377  -8.130 1.70e-12 ***
factor(muni)18            -3.42812    0.64377  -5.325 6.86e-07 ***
factor(muni)19            -3.48572    0.64377  -5.415 4.69e-07 ***
factor(muni)20             0.73748    0.64377   1.146 0.254884    
factor(muni)21            -0.23429    0.64377  -0.364 0.716720    
factor(muni)22             2.13230    0.64377   3.312 0.001315 ** 
factor(muni)23            -1.23974    0.64377  -1.926 0.057158 .  
factor(muni)24            -3.75036    0.64377  -5.826 7.92e-08 ***
factor(muni)25            -0.93581    0.64377  -1.454 0.149375    
factor(muni)26             0.44822    0.64377   0.696 0.487996    
factor(muni)27            -3.71715    0.64377  -5.774 9.93e-08 ***
factor(muni)28            -2.61289    0.64377  -4.059 0.000102 ***
factor(muni)29            -2.46477    0.64377  -3.829 0.000232 ***
factor(muni)30            -4.21976    0.64377  -6.555 2.97e-09 ***
factor(muni)31             0.26073    0.64377   0.405 0.686390    
factor(muni)32            -3.01851    0.64377  -4.689 9.29e-06 ***
factor(muni)33            -2.79332    0.64377  -4.339 3.60e-05 ***
factor(muni)34            -0.28907    0.64377  -0.449 0.654451    
factor(muni)35            -0.43482    0.64377  -0.675 0.501058    
factor(muni)36            -2.67040    0.64377  -4.148 7.35e-05 ***
factor(muni)37            -0.17064    0.64377  -0.265 0.791537    
factor(muni)38             2.30562    0.64377   3.581 0.000543 ***
factor(muni)39            -0.01353    0.64377  -0.021 0.983281    
factor(muni)40            -3.86558    0.64377  -6.005 3.59e-08 ***
factor(muni)41            -1.47426    0.64377  -2.290 0.024255 *  
factor(muni)42            -1.19513    0.64377  -1.856 0.066522 .  
factor(muni)43             2.65292    0.64377   4.121 8.12e-05 ***
factor(muni)44            -2.72179    0.64377  -4.228 5.47e-05 ***
factor(muni)45            -1.81245    0.64377  -2.815 0.005936 ** 
factor(muni)46             1.70164    0.64377   2.643 0.009621 ** 
factor(muni)47             1.11290    0.64377   1.729 0.087143 .  
factor(muni)48            -1.44698    0.64377  -2.248 0.026936 *  
factor(muni)49            -0.92296    0.64377  -1.434 0.154981    
factor(muni)50            -1.76604    0.64377  -2.743 0.007285 ** 
factor(muni)51            -3.26407    0.64377  -5.070 1.99e-06 ***
factor(muni)52            -3.88610    0.64377  -6.036 3.11e-08 ***
factor(muni)53            -1.08901    0.64377  -1.692 0.094034 .  
factor(muni)54             9.36023    0.64377  14.540  < 2e-16 ***
factor(muni)55            -4.05763    0.64377  -6.303 9.38e-09 ***
factor(muni)56            -3.62824    0.64377  -5.636 1.81e-07 ***
factor(muni)57            -0.74727    0.64377  -1.161 0.248673    
factor(muni)58            -4.08326    0.64377  -6.343 7.83e-09 ***
factor(muni)59            -1.07415    0.64377  -1.669 0.098538 .  
factor(muni)60            -3.56730    0.64377  -5.541 2.73e-07 ***
factor(muni)61            -3.65170    0.64377  -5.672 1.55e-07 ***
factor(muni)62            -4.22268    0.64377  -6.559 2.91e-09 ***
factor(muni)63            -0.84999    0.64377  -1.320 0.189933    
factor(muni)64            -2.15650    0.64377  -3.350 0.001165 ** 
factor(muni)65            -1.77899    0.64377  -2.763 0.006883 ** 
factor(muni)66            -0.99105    0.64377  -1.539 0.127055    
factor(muni)67             0.55694    0.64377   0.865 0.389177    
factor(muni)68            -2.68379    0.64377  -4.169 6.81e-05 ***
factor(muni)69            -0.13410    0.64377  -0.208 0.835441    
factor(muni)70             2.82223    0.64377   4.384 3.04e-05 ***
factor(muni)71            -2.23698    0.64377  -3.475 0.000775 ***
factor(muni)72             4.37066    0.64377   6.789 1.01e-09 ***
factor(muni)73            -0.05003    0.64377  -0.078 0.938224    
factor(muni)74            -2.14293    0.64377  -3.329 0.001247 ** 
factor(muni)75            -3.96392    0.64377  -6.157 1.81e-08 ***
factor(muni)76            -4.56502    0.64377  -7.091 2.45e-10 ***
factor(muni)77             2.60295    0.64377   4.043 0.000108 ***
factor(muni)78            -2.79773    0.64377  -4.346 3.51e-05 ***
factor(muni)79             0.45699    0.64377   0.710 0.479545    
factor(muni)80            -1.29930    0.64377  -2.018 0.046413 *  
factor(muni)81            -1.49319    0.64377  -2.319 0.022537 *  
factor(muni)82             3.19897    0.64377   4.969 3.01e-06 ***
factor(muni)83            -1.54715    0.64377  -2.403 0.018213 *  
factor(muni)84            -0.44015    0.64377  -0.684 0.495840    
factor(muni)85            -4.15859    0.64377  -6.460 4.59e-09 ***
factor(muni)86             0.32742    0.64377   0.509 0.612231    
factor(muni)87            -2.77789    0.64377  -4.315 3.94e-05 ***
factor(muni)88            -3.95815    0.64377  -6.148 1.89e-08 ***
factor(muni)89            -5.22449    0.64377  -8.115 1.83e-12 ***
factor(muni)90            -1.68049    0.64377  -2.610 0.010525 *  
factor(muni)91                  NA         NA      NA       NA    
factor(muni)92             1.09935    0.64377   1.708 0.090996 .  
factor(muni)93            -1.90673    0.64377  -2.962 0.003873 ** 
factor(muni)94            -2.30515    0.64377  -3.581 0.000545 ***
factor(muni)95            -3.84034    0.64377  -5.965 4.27e-08 ***
factor(muni)96            -2.36403    0.64377  -3.672 0.000399 ***
evertr:post_treatmentTRUE  2.12356    0.28096   7.558 2.67e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.6438 on 94 degrees of freedom
Multiple R-squared:  0.9651,    Adjusted R-squared:  0.9292 
F-statistic: 26.83 on 97 and 94 DF,  p-value: < 2.2e-16

All three fixed effects estimators provide the same estimate - which corresponds to the one we manually calculated at the beginning of this lab. Note that we could also use the first differences estimator - in the case of two time periods it is equivalent to the fixed effects estimator. However, the fixed effects estimator provides much more flexibility when more than two time periods are being analysed.

Generalised Diff-in-Diff

Let’s now extend our analysis by including all pre-treatment periods in our analysis. The easiest way to do so is running a two-way fixed effects regression.

Exercise 7: Estimate the difference-in-differences using a two-way Fixed Effects regression with all time periods and treatment as independent variable.

Reveal Answer

twoway_FE <- plm(gdper ~ treatment, effect="twoways",model="within",index=c("muni","year"), data=greekislands)
summary(twoway_FE)

Twoways effects Within Model

Call:
plm(formula = gdper ~ treatment, data = greekislands, effect = "twoways", 
    model = "within", index = c("muni", "year"))

Balanced Panel: n = 96, T = 4, N = 384

Residuals:
      Min.    1st Qu.     Median    3rd Qu.       Max. 
-4.5964070 -0.5205535 -0.0089621  0.4459476  7.0013287 

Coefficients:
          Estimate Std. Error t-value  Pr(>|t|)    
treatment  2.08700    0.39328  5.3066 2.253e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Total Sum of Squares:    380.22
Residual Sum of Squares: 345.92
R-Squared:      0.090211
Adj. R-Squared: -0.22693
F-statistic: 28.1603 on 1 and 284 DF, p-value: 2.2526e-07

As you can see, the estimate is now slightly smaller than the one we got by comparing 2015 and 2016 only. This is not surprising as the regression considers all pre-treatment periods now and, therefore, attributes some of the variance to other factors such as unit-specific heterogeneity. However, it is noteworthy that the change is negligible. We still estimate that the treatment effect, or ATT, is roughly above 2%points.

Exercise 8: Calculate robust standard errors for (i) the plm FE model, (ii) the two-way FE model and present the regression output in a single table. Include the simple OLS model in the table.

Note: There is no need to adjust standard errors after using lm_robust() as the command automatically does that.

Reveal Answer

### ADJUSTING STANDARD ERRORS
## plm FE model
fe_plm_se <- coeftest(fe_plm, vcov = vcovHC(fe_plm, type = "HC1",
                                         cluster = "group"))
fe_plm_se


t test of coefficients:

                          Estimate Std. Error t value  Pr(>|t|)    
post_treatmentTRUE        1.273734   0.091318 13.9483 < 2.2e-16 ***
evertr:post_treatmentTRUE 2.123564   0.382751  5.5482 2.649e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## two-way FE model

twoway_FE_se <- coeftest(twoway_FE, vcov = vcovHC(twoway_FE, type = "HC1",
                                         cluster = c("group", "time")))
twoway_FE_se


t test of coefficients:

          Estimate Std. Error t value  Pr(>|t|)    
treatment  2.08700    0.34699  6.0147 5.535e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## CREATING A REGRESSION TABLE

screenreg(l=list(ols_did, fe_plm_se, twoway_FE_se), custom.header = list("OLS (2-period)" = 1, "FE (2-period)" = 2, "2-way FE" = 3), custom.coef.map = list("treatment" = "Treatment", "evertr:post_treatmentTRUE" = "Treatment (2-periods)"))


==========================================================
                       OLS (2-period  FE (2-per   2-way FE
                       -------------  ---------  ---------
                       Model 1        Model 2    Model 3  
----------------------------------------------------------
Treatment                                         2.09 ***
                                                 (0.35)   
Treatment (2-periods)    2.12          2.12 ***           
                       [-0.45; 4.70]  (0.38)              
----------------------------------------------------------
R^2                      0.18                             
Adj. R^2                 0.16                             
Num. obs.              192                                
RMSE                     2.21                             
==========================================================
*** p < 0.001; ** p < 0.01; * p < 0.05 (or Null hypothesis value outside the confidence interval).

Pararell trends

As we know, we cannot test whether the parallel trends holds, but we can conduct some visual inspects and statistical analyses. One way to do this is to plot the outcome variable for both treated and untreated units over time before the intervention.

Let’s first get the mean vote share for the Golden Dawn for the treated and untreated units. Let’s do using Tidyverse using the group_by() and summarise() functions.

Exercise 9: Calculate the mean vote share for the Golden Dawn for the treated and the untreated units for all the elections. Store this into a new data frame called plot.parallel. Use tidy to create this new data frame.The syntax and the description of the functions below:

new.data <- data %>% 
  group_by(variable_1, variable_2) %>% 
  summarise(new_variable = mean(outcome, na,rm = TRUE)) %>% 
  mutate(condition = ifelse(evertr == 1, "Treatment", "Control") )

Function	Description
`new data <- dataframe`	Assignment operator where the new data frame will be stored
`group_by()`	Group observations by a variable or set of variables
`summarise()`	Creates a new data frame with one ore more rows of each combination of grouping variables
`mutate()`	Allows to create new variable or modify existing ones

Reveal Answer

plot.parallel <- greekislands %>% 
  group_by(evertr, year) %>% 
  summarise(vote = mean(gdper)) %>% 
  mutate(condition = ifelse(evertr == 1, "Treatment", "Control"))

head(plot.parallel)

# A tibble: 6 x 4
# Groups:   evertr [2]
  evertr  year  vote condition
   <dbl> <dbl> <dbl> <chr>    
1      0  2012  5.49 Control  
2      0  2013  5.34 Control  
3      0  2015  4.39 Control  
4      0  2016  5.66 Control  
5      1  2012  6.16 Treatment
6      1  2013  6.02 Treatment

We can see now that we have the average vote share for treated and untreated for every election year. For example, we can see that the average vote share for the Golden Dawn in untreated islands was 5.33 for the 2013 election.

Right, so we see that we have the average vote share for treated and untreated units for each election from 2012 to 2016. Now, let’s plot the trends before and after the intervention.

Exercise 10: Plot the parallel trends. Set vote share for the Golden Dawn in the y axis, year in the x-axis. Connect the data points of the two groups using a line. Place the legend of your plot at the bottom. Change the default colour and use the Wes Anderson Palette. Below you will find all the functions necessary to generate this plot. Remember to use the plus sign between functions.

Function	Description
`ggplot(x = , y =, colour = )`	map data components into the graph
`geom_point()`	To generate a scatterplot
`scale_color_manual(values=wes_palette("Royal1"))`	Replace the default palette
`theme(legend.position = "bottom")`	To place the legend at the bottom
`geom_line(aes(x=year,y=vote, color = condition, group = condition))`	To connect the dots with lines by group

Reveal Answer

ggplot(aes(x = year, y = vote, colour = condition), data = plot.parallel) +
  geom_point() + scale_color_manual(values=wes_palette("Royal1")) + theme(legend.position = "bottom") + geom_line(aes(x=year,y=vote, color = condition, group = condition))

The plot suggests that parallel trends assumption would hold in this case. The vote share for the Golden Dawn follows a similar path for treated and untreated units for at least three elections before the intervention.

Now let’s look at the leads to identify any anticipatory effects. Let’s imagine that the Golden Dawn back in 2012 believed that there was going to be a major humanitarian crisis in the future. Then, they thought that they could exploit this situation to increase their electoral gains. In that case, we wouldn’t be able to disentangle whether changes in vote share are due to the previous campaigning efforts on the part of the Golden Dawn or due to the influx of assylum seekers to Greece. We can use leads to identify if there are any anticipatory effects. If we find systematic differences between treated and untreated units, this would suggest that units in one or both groups are responding to the treatment before receiving it.

Exercise 11: Create dummy year variables equal to 1 for every year and only for the treated municipality. Call this variable leads, plus the year. For example, the lead2012, will take value 1 only for treated municipalities and only for observations of these municipalities in the year 2012. You can see an example below. use the mutate() function to create these new variables. You can also use the ifelse() function to create these dummy variables. The syntax of the ifelse() function is the following: new variable = ifelse(condition, "value if condition is met", "value if the condition is not met"). Create these dummy variables for the elections in 2012, 2013, and 2015.

greekislands <- greekislands %>% 
  mutate(lead2012 = ifelse(evertr == 1 & year == 2012, 1, 0))

Reveal Answer

table(greekislands$year)


2012 2013 2015 2016 
  96   96   96   96

greekislands <- greekislands %>% 
  mutate(lead2012 = ifelse(evertr == 1 & year == 2012, 1, 0),
         lead2013 = ifelse(evertr == 1 & year == 2013, 1, 0),
         lead2015 = ifelse(evertr == 1 & year == 2015, 1, 0))

Now that we have created these dummy variables, we can run a two-way fixed effect model and see if they are anticipatory effects.

Exercise 12: Conduct the same two-way fixed-effect model that we used before, but rather than using the treatment variable, replace this variable with the new leads variables that you created. Ran separate estimations for each lead. Store the outputs of these regressions into different objects. Does the evidence suggest that there are any anticipatory effects? Are the results of these three models statistically significant? You can use the summary() or screenreg() functions to take a look at your results.

Reveal Answer

lead2012 <- plm(gdper ~ lead2012, model = "within", effect = "twoways", 
           index = c("muni", "year"), data = greekislands)

lead2013 <- plm(gdper ~ lead2013, model = "within", effect = "twoways", 
           index = c("muni", "year"), data = greekislands)

lead2015 <- plm(gdper ~ lead2015, model = "within", effect = "twoways", 
           index = c("muni", "year"), data = greekislands)

summary(lead2015) # as an example.

Twoways effects Within Model

Call:
plm(formula = gdper ~ lead2015, data = greekislands, effect = "twoways", 
    model = "within", index = c("muni", "year"))

Balanced Panel: n = 96, T = 4, N = 384

Residuals:
     Min.   1st Qu.    Median   3rd Qu.      Max. 
-4.600977 -0.530094  0.017942  0.464218  7.089811 

Coefficients:
         Estimate Std. Error t-value Pr(>|t|)  
lead2015 -0.74442    0.40995 -1.8159  0.07044 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Total Sum of Squares:    380.22
Residual Sum of Squares: 375.86
R-Squared:      0.011477
Adj. R-Squared: -0.33311
F-statistic: 3.29745 on 1 and 284 DF, p-value: 0.070442

The evidence gathered from the three models suggests there are no anticipatory effects. We don’t find any statistically significant dynamics between treated and untreated municipalities on vote share for the Golden Dawn before the 2016 election, which is good!

Now, let’s plot all the two-way fixed effects models where we used the plm() function into a single figure.

Exercise 13: Plot the coefficients from the leads models, plus the two-way fixed model for the 2016 election that you used in Question 7 (“twoway_FE”). Use the plot_coef() function to generate this plot. Add the argument scale and set it equal to TRUE. Also, include the argument robust and set it equal to TRUE. In addition to the plot_coefs() include the coord_flip() function. This function will flip the Cartesian coordinates of the plot, so we have the models (years) in the x-axis and the coefficients in the y axis. Remember to add plus sign operator between two functions. You can also add the xlab("Year") function to a label in the x-axis.

Reveal Answer

plot_coefs(lead2012, lead2013, lead2015, twoway_FE, scale = TRUE, robust = TRUE) + 
coord_flip() + ylab("Year")

Again, we can observe that pre-treatment coefficients are nearly zero and they are not statistically significant, but then for the 2016 election, we find that the vote share for Golden Dawn has risen substantially. Thus, we found strong evidence that the increase in vote share is caused by the influx of refugees into the Greek islands.

Placebo test

We can conduct a placebo test to evaluate whether the parallel trend holds. We are trying to prove that there is no clear difference in trending tendencies between treated and untreated municipalities.

The steps to conduct are the following:

Use data for periods that came before the treatment was implemented/happened.
2 Create a dummy variable for each before treatment that is equal to 1 only for that specific year and only for treated units. (As we did before).
Estimate the difference-in-difference using plm() or lm_robust() function.
If you find statistically significant results, this may suggest a violation of parallel trends.

Exercise 14: Drop all the observations of the year of the intervention (2016). Do this using the filter() function. Then, create a fake treatment variable and call it post2 and set it equal to 1 for all observations in year 2015. This variable would indicate as the hypothetical case that the municipality would received refugees in 2015. You can see an example below.

greekislands <- greekislands %>% 
  filter(variable != 2016) %>% 
  mutate(new variable = year variable == "year")

## or alternatively 

greekislands <- greekislands %>% 
  filter(variable != 2016) %>% 
  mutate(new.variable = ifelse(year variable == "year", 1, 0))

Reveal Answer

table(greekislands$year)


2012 2013 2015 2016 
  96   96   96   96

greekislands <- greekislands %>%
  filter(year != 2016) %>% 
  mutate(post2 = ifelse(year == "2015", 1, 0))

Great! Now let’s use this dummy variable to perform the placebo test. We are essentially going to test is that there are no differential trends between treated and untreated, as we did before when we looked at anticipatory trends.

Exercise 15: Conduct the same two-way fixed effect model using the lm() and use the post2 variable. Store all the models in a list and plug this list inside of the modelsummary(list) function to report your results. Did you find statistically significant differences between treated and untreated units pre-treatment?. You can see an example of how to store multiple models in a list list(). Also, subset the data so one model will only conduct the placebo test using the 2012 and 2015 elections, and another model only using the observations from the 2013 and 2015 elections.

models <- list(
    "plm "= lm(outcome ~ treatment, model = "within", effect = "twoways", index = c("muni", "year"), data = data[data$year != year,]), 
    "lm" = lm(outcome ~ outcome, data = data, subset=(year!=year)), # set year equal to year that you want to exclude. 
    "lm_robust" = lm_robust(outcome ~ treatment, data = data[data$year != year,], cluster = unit)) # set year equal to year that you want to exclude.

Reveal Answer

models <- list(
     "1215"=lm_robust(gdper ~ evertr + post2+evertr*post2, data = greekislands, subset=(year!=2013), cluster=muni), # placebo test 2012 and 2015
    "1314" = lm_robust(gdper ~ evertr + post2+evertr*post2, data = greekislands, subset=(year!=2012),cluster=muni)) # placeblo test 2013 and 2014


modelsummary(models)

	1215	1314
(Intercept)	5.491	5.339
	(0.334)	(0.261)
evertr	0.668	0.684
	(1.127)	(0.651)
post2	-1.106	-0.954
	(0.227)	(0.126)
evertr × post2	-0.047	-0.063
	(0.601)	(0.403)
Num.Obs.	192	192
R2	0.048	0.054
R2 Adj.	0.033	0.038
Std.Errors	by: muni	by: muni

	Post = 0	Post = 1
Treat = 0	\(\beta_0 + u_{mt}\)	\(\beta_0 + \beta_2 + u_{mt}\)
Treat = 1	\(\beta_0 +\beta_1 + u_{mt}\)	\(\beta_0 + \beta_1 + \beta_2 + \beta_3 + u_{mt}\)

Lab 5: Difference in Difference

Week 5 HT 2022

Lecture Slides & Data

Recap

Seminar Overview

Waking Up the Golden Dawn: Does Exposure to the Refugee Crisis Increase Support for Extreme-Right Parties? (Dinas et al, 2018)

Timing

Difference-in-Differences

Generalised Diff-in-Diff

Pararell trends

Placebo test