Link Search Menu Expand Document
Lab 5: Difference in Difference

Lecture Slides & Data



Recap

The intuition of the DD strategy is to combine two simpler approaches. The first difference (before and after) eliminates unit-specific fixed effects. Then, the second difference eliminates time fixed effects. With this approach, we get an unbiased estimate of the intervention.

We learned that we can break down the difference between treated units and untreated units in post-treatment as the Average Treatment Effect amongst the Treated (ATT), different time trends and selection bias. However, we can impose additional estimation assumptions to retrieve a credible estimate of the effect of treatment. The key assumption in diff-in-diff studies is the so-called Parallel Trends or Common Trends assumption. This assumption states that in the absence of the treatment/policy, we should expect to see that treated units would follow similar trends as the untreated ones. Unfortunately, we cannot test whether this assumption holds, but we can at least conduct some tests that would be indicated that the Parallel Trends holds.

We also learned that we can calculate the ATT in different ways. We learned that we can manually calculate the difference-in-difference estimator.

  • Group-period interactions: Here the treatment variable is equal to 1 for all the years since the unit received the treatment. Then, our coefficient of interest is captured by the interaction between the treatment and the time variable.

If you go back to the Golden Dawn, we can see how we retrieve the beta coefficient from the interaction.

\[gdper_{mt} = \beta_0 + \beta_1evertr_m + \beta_2post_t + \beta_3evertr_m \times post_t + u_{mt}\]

Post = 0 Post = 1
Treat = 0 \(\beta_0 + u_{mt}\) \(\beta_0 + \beta_2 + u_{mt}\)
Treat = 1 \(\beta_0 +\beta_1 + u_{mt}\) \(\beta_0 + \beta_1 + \beta_2 + \beta_3 + u_{mt}\)

Then, we can get our estimate by the calculating the difference of the outcome variable for both treated and untreated units and, and then subtract these differences:

\[((\beta_0 + \beta_1 + \beta_2 + \beta_3 + u_{mt}) - (\beta_0 + \beta_1 + u_mt)) -((\beta_0 + \beta_2 + u_{mt})-(\beta_0 + u_{mt}))\] \[(\beta_2 + \beta_3) -(\beta_2)\] \[\beta_3\] - Unit and time dummies and a treatment indicator: Again, in the case of the Golden Dawn example, we can represent this estimation strategy using the following regression model:

\[gdper_{mt} = \beta_1treatment_{mt} + \alpha_m \text{unit dummy} + \gamma_t \text{time dummy} + u_{mt}\] For treated before treated \(Treatment = 0\):

\[gdper_{mt} = \beta_1\times 0 + \alpha_m \times 1 + \gamma_t \times + u_{mt}\] \[gdper_{mt} = \alpha_m + \gamma_t + u_{mt}\] For treated after treated \(Treatment = 1\):

\[gdper_{mt} = \beta_1 + \alpha_m + \gamma_t + u_{mt}\] Then, we can take the difference before and after:

\[(\beta_1 + \alpha_m + \gamma_t + u_{mt}) - (\alpha_m + \gamma_t + u_{mt})\] \[\beta_1\] Again we are levering the parallel trends assumption by assuming that time-trends are the same for both treated and untreated units.

Finally, we discussed inference and in particular standard errors. Given that we have repeated observations, this type of data exhibits serially correlated regressors and residuals, we need to make the appropriate adjustments to calculate standard errors. One way is to address this is to use clustered standard errors.

Before starting this seminar

  1. Create a folder called “lab5”

  2. Download the data (you can use this button or the one at the top, or read csv files directly from github):

  3. Open an R script (or Markdown file) and save it in our “lab5” folder.

  4. Set your working directory using the setwd() function or by clicking on “More“. For example setwd(“~/Desktop/Causal Inference/2022/Lab5”)

  5. Let’s install an load the packages that we will be using in this lab:


library(jtools) # generate plots with regression coefficients
library(stargazer) # generate formated regression tables 
library(texreg) # generate formatted regression tables
library(tidyverse) # to conduct some tidy operations
library(ggplot2) # to generate plots 
library(plm) # conduct one-way and two-way fixed effects 
library(estimatr) #  to conduct ols and provides robust standard errors
library(lmtest) # calculates the variance-covariance matrix to be clustered by group.
library(sandwich) # to calculate heteroscedasticity-Consistent Covariance Matrix Estimation
library(haven) # upload dta files 
#install.packages("wesanderson")
library("wesanderson") # Wes Anderson palette (Let's make our plots as Indie as possible)
#install.packages("modelsummary")
library(modelsummary)

Seminar Overview

In this seminar, we will cover the following topics:
1. “Manually” calculate the difference-in-difference estimator"
2. Obtaining the difference in difference estimator using the lm() function
3. Check for parallel trends
4. Conduct a fixed effect in difference in difference estimation using both lm_robust() and plm() function.
4. Conduct a placebo test


Waking Up the Golden Dawn: Does Exposure to the Refugee Crisis Increase Support for Extreme-Right Parties? (Dinas et al, 2018)

The main question of this paper is the following: Did the influx of refugees in Greece increase support for the right-wing Golden Dawn party in 2015?.

The authors exploit that the Aegean islands close to the Turkish border experienced sudden and drastic increases in the number of Syrian refugees while other islands slightly farther away—but with otherwise similar institutional and socio-economic characteristics—did not.

We can see here in the Figure in the left that level of exposure to refugees across the Aegean islands. The figure on the right shows how sudden the influx of refugees was.

Timing

  • The refugee crisis started in the spring of 2015
  • Greece held an election September, 2015, right after the first wave of refugee arrival, where in the data is coded as 1 for the post variable and coded as year 2016
  • The previous election had taken place only eight months prior in January 2015, before significant number of refugees arrived, which is coded in the data as year 2015.
  • Two more elections were held in 2012. The election in May 2012 is coded as 2012 and the election in June 2013 is coded as 2013.
  • The unit of analysis are municipalities

Now let’s familiarise ourselves with the data. A description of some of the variables that we will use today is below:

Variable Description
Year Election Year from 2012 to 2016
municipality Municipality id
post A dummy variable that switches on year 2016
treatment A dummy variable indicating if the municipality received refugees on the 2016 election
evertr Treatment variable indicating if the municipality received refugees
gdper Vote share for the Golden Dawn at the municipality level as percentage of total votes

Now let’s load the data. There are two ways to do this:

You can load the dataset from your laptop using the read_dta() function. We will call this data frame greekislands.

# Set your working directory
#setwd("~/Desktop/Causal Inference/2022/Lab5")
# 
#setwd("/cloud/project")

Or you can download the data from the course website from following url: https://github.com/dpir-ci/CI22/blob/gh-pages/docs/data/greekislands.dta?raw=true.


Let’s start by checking our data as always.

Exercise 1: Use the head() function to familiarise yourself with the data set.

Reveal Answer

head(greekislands)
# A tibble: 6 x 9
   year  muni treatment distance gdper  post  gddif evertr post2
  <dbl> <dbl>     <dbl>    <dbl> <dbl> <dbl>  <dbl>  <dbl> <dbl>
1  2012     1         0     240.  7.98     0 NA          0     0
2  2013     1         0     240.  7.28     0 -0.705      0     0
3  2015     1         0     240.  6.36     0 NA          0     1
4  2016     1         0     240.  7.62     1  1.25       0     1
5  2012     2         0     318.  2.58     0 NA          0     0
6  2013     2         0     318.  4.28     0  1.70       0     0

This is what we would expect. Recall that the evertr variable indicates municipalities that are treated at some point - independent of the timing -, while the treatment variable marks municipalities after they were treated.


As we can see, the data set covers multiple election years. Before working with the data, let’s make sure we know how many and which elections are included in the data.

Exercise 2: How many elections, i.e. years are covered by the data?

Reveal Answer

unique(greekislands$year)
[1] 2012 2013 2015 2016


Using the unique command, we find that four elections are covered by the data: The elections that took place in 2012, 2013, 2015, and 2016.


Let’s have a look at general voting patterns of the Golden Dawn over time, irrespective of treatment status of municipalities.

Exercise 3: Plot the vote share of the Golden Dawn Party (gdper) over time.

There are several ways to do this. For instance, you could plot the dispersion across municipalities by using the boxplot command or, alternatively, calculate average values per year. Feel free to pick the option you deem most appropriate.

Reveal Answer

# First option: Boxplot
boxplot(gdper ~ year, 
        data = greekislands, 
        xlab = "Election Year", 
        ylab = "Golden Dawn Vote Share", 
        las = 2) # This argument rotates the axis labels

# Second Option: Plot averages
# Calculating and storing means
average_data <- greekislands %>% 
            group_by(year) %>% 
            summarize(gd_averages = mean(gdper))

# Plotting means using ggplot

ggplot(average_data, aes(x = year, y = gd_averages)) + 
  geom_point() + geom_line() + xlab("Election Year") + ylab("Average GD Vote Share") + ylim(0,6.5)


We can se that the vote share for the Golden Dawn party has remained somewhat stable between the 2012 and 2013 elections, and dropped in the 2015 election to an average value (per municipality) of 4.46%. In the 2016 election, the party’s vote share rose substantively - to a new high of 6%.


Difference-in-Differences


Being aware of the general trends of the Golden Dawn’s vote share is an important information about context and the party’s history. However, it cannot tell us anything about the treatment effect we seek to analyse: The arrival of refugees in some Greek municipalities in the summer of 2015.

A naive observer might propose identifying this effect by looking at the differences between treated and untreated units in the post-treatment periods. Would this, however, be an appropriate representation of a possible treatment effect? It clearly would not! Comparing post-treatment differences doesn’t allows us to account for unit/municipality-specific effects and voting patterns. Treatment, after all, was not assigned randomly. We would not be able to say what the effect of the treatment is unless we can make a statement about how the treatment changed the outcome or resulted in the diversion from a previous trajectory. Using a difference-in-differences design allows us to do so.

Exercise 4: Estimate the treatment effect by calculating difference-in-differences between 2015 and 2016 using the mean() function.

Hint: Calculate the differences between treated and untreated units for the years 2015 and 2016 first.

Reveal Answer

# Difference in means between treated and untreated in 2016 (post-treatment).
post_difference <- mean(greekislands$gdper[greekislands$evertr == 1 & greekislands$year == 2016]) - 
   mean(greekislands$gdper[greekislands$evertr == 0 & greekislands$year == 2016])
post_difference
[1] 2.744838
# Difference in means between treated and untreated in 2015 (pre-treatment).
pre_difference <- mean(greekislands$gdper[greekislands$evertr == 1 & greekislands$year == 2015]) - 
   mean(greekislands$gdper[greekislands$evertr == 0 & greekislands$year == 2015])
pre_difference
[1] 0.6212737
# Now calculate the difference between the two differences above
diff_in_diff <- post_difference - pre_difference
diff_in_diff
[1] 2.123564

 

The difference in the Golden Dawn’s vote share between treated and untreated municipalities has increased in 2016. The difference-in-differences amount to 2.12. This suggests that the treatment (i.e. the arrival of refugees) increased the vote share of the golden dawn by roughly 2%points in the affected municipalities.


While it is important to understand what exactly the difference-in-difference means, we usually do not have to calculate it manually. In fact, we can simply use an OLS to estimate differences-in-differences.

Exercise 5: Estimate the difference-in-difference between 2015 and 2016 using an OLS regression.

Hint

You can run a simple OLS with the interaction term of evertr and a dummy variable for the post-treatment period (post) as independent variables. However, you should restrict the data to the years 2015 and 2016, specifying data = greekislands[greekislands$year>=2015,] as argument for your OLS.


Reveal Answer

# You can use `'post'. Howeverv, if you ever need to create a similar variable, it's pretty simple:
greekislands$post_treatment <- greekislands$year == 2016

ols_did <- lm_robust(gdper ~ evertr * post_treatment, data = greekislands[greekislands$year>=2015,])
summary(ols_did)

Call:
lm_robust(formula = gdper ~ evertr * post_treatment, data = greekislands[greekislands$year >= 
    2015, ])

Standard error type:  HC2 

Coefficients:
                          Estimate Std. Error t value  Pr(>|t|) CI Lower
(Intercept)                 4.3854     0.2221 19.7409 1.045e-47   3.9471
evertr                      0.6213     0.7996  0.7769 4.382e-01  -0.9561
post_treatmentTRUE          1.2737     0.3190  3.9935 9.333e-05   0.6445
evertr:post_treatmentTRUE   2.1236     1.3052  1.6270 1.054e-01  -0.4511
                          CI Upper  DF
(Intercept)                  4.824 188
evertr                       2.199 188
post_treatmentTRUE           1.903 188
evertr:post_treatmentTRUE    4.698 188

Multiple R-squared:  0.1762 ,   Adjusted R-squared:  0.163 
F-statistic: 9.036 on 3 and 188 DF,  p-value: 1.279e-05
The estimate for the interaction term, i.e. treated units after they were treated, corresponds to the difference-in-differences. As you can see, the OLS provides exactly the same point estimate as the manual calculation of the difference-in-differences, 2.12 - however, the OLS also provides measures of uncertainty, showing us that the estimate is significant to conventional levels. Further, the OLS output provides quite some intuitive estimates: The intercept corresponds to the Golden Dawn’s vote share in untreated municipalities in 2015. Treated municipalities had, on average, a higher vote share for the party by roughly 0.62%points. The parties vote share increased by about 1.27%points in untreated municipalities in 2016.


Note that there are multiple ways to calculate the difference-in-differences. Fixed effects are usually the preferred approach as they provide the most flexible (e.g. in terms of multiple time periods) and most efficient estimator. Let’s therefore calculate a fixed effects model for the same difference-in-differences as in Exercise 5.

Exercise 6: Estimate the difference-in-differences between 2015 and 2016 using a Fixed Effects regression.

Hint: You can use either lm_robust(), plm() or lm() with dummy variables.

Reveal Answer

# lm_robust
fe_robust <- lm_robust(gdper ~ evertr*post_treatment, fixed_effects= factor(muni), data=greekislands[greekislands$year>=2015,])
summary(fe_robust)

Call:
lm_robust(formula = gdper ~ evertr * post_treatment, data = greekislands[greekislands$year >= 
    2015, ], fixed_effects = factor(muni))

Standard error type:  HC2 

Coefficients: (1 not defined because the design matrix is rank deficient)
                          Estimate Std. Error t value  Pr(>|t|) CI Lower
evertr                          NA         NA      NA        NA       NA
post_treatmentTRUE           1.274    0.09139  13.938 1.333e-24    1.092
evertr:post_treatmentTRUE    2.124    0.39686   5.351 6.152e-07    1.336
                          CI Upper DF
evertr                          NA NA
post_treatmentTRUE           1.455 94
evertr:post_treatmentTRUE    2.912 94

Multiple R-squared:  0.9651 ,   Adjusted R-squared:  0.9292
Multiple R-squared (proj. model):  0.7791 , Adjusted R-squared (proj. model):  0.5511 
F-statistic (proj. model):    NA on 2 and 94 DF,  p-value: NA
# plm
fe_plm <- plm(gdper ~ evertr*post_treatment, model = "within", index = c("muni"), data=greekislands[greekislands$year>=2015,])
summary(fe_plm)
Oneway (individual) effect Within Model

Call:
plm(formula = gdper ~ evertr * post_treatment, data = greekislands[greekislands$year >= 
    2015, ], model = "within", index = c("muni"))

Balanced Panel: n = 96, T = 2, N = 192

Residuals:
    Min.  1st Qu.   Median  3rd Qu.     Max. 
-1.28954 -0.27782  0.00000  0.27782  1.28954 

Coefficients:
                          Estimate Std. Error t-value  Pr(>|t|)    
post_treatmentTRUE        1.273734   0.099336 12.8225 < 2.2e-16 ***
evertr:post_treatmentTRUE 2.123564   0.280964  7.5581  2.67e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Total Sum of Squares:    176.35
Residual Sum of Squares: 38.957
R-Squared:      0.77909
Adj. R-Squared: 0.55113
F-statistic: 165.754 on 2 and 94 DF, p-value: < 2.22e-16
fe_dummy <- lm(gdper ~ evertr * post_treatment + factor(muni), data = greekislands[greekislands$year>=2015,])
summary(fe_dummy)

Call:
lm(formula = gdper ~ evertr * post_treatment + factor(muni), 
    data = greekislands[greekislands$year >= 2015, ])

Residuals:
    Min      1Q  Median      3Q     Max 
-1.2895 -0.2778  0.0000  0.2778  1.2895 

Coefficients: (1 not defined because of singularities)
                          Estimate Std. Error t value Pr(>|t|)    
(Intercept)                6.35368    0.45792  13.875  < 2e-16 ***
evertr                    -2.69291    0.65892  -4.087 9.20e-05 ***
post_treatmentTRUE         1.27373    0.09934  12.822  < 2e-16 ***
factor(muni)2             -3.78604    0.64377  -5.881 6.20e-08 ***
factor(muni)3             -1.55751    0.64377  -2.419 0.017472 *  
factor(muni)4             -3.10872    0.64377  -4.829 5.31e-06 ***
factor(muni)5             -1.22008    0.64377  -1.895 0.061138 .  
factor(muni)6              1.06490    0.64377   1.654 0.101431    
factor(muni)7             -1.13881    0.64377  -1.769 0.080143 .  
factor(muni)8             -4.10152    0.64377  -6.371 6.88e-09 ***
factor(muni)9             -1.05344    0.64377  -1.636 0.105109    
factor(muni)10            -2.66002    0.64377  -4.132 7.80e-05 ***
factor(muni)11             0.54228    0.64377   0.842 0.401730    
factor(muni)12            -2.99074    0.64377  -4.646 1.10e-05 ***
factor(muni)13            -6.99054    0.64377 -10.859  < 2e-16 ***
factor(muni)14            -2.18372    0.64377  -3.392 0.001016 ** 
factor(muni)15            -4.63345    0.64377  -7.197 1.48e-10 ***
factor(muni)16            -4.98236    0.64377  -7.739 1.12e-11 ***
factor(muni)17            -5.23368    0.64377  -8.130 1.70e-12 ***
factor(muni)18            -3.42812    0.64377  -5.325 6.86e-07 ***
factor(muni)19            -3.48572    0.64377  -5.415 4.69e-07 ***
factor(muni)20             0.73748    0.64377   1.146 0.254884    
factor(muni)21            -0.23429    0.64377  -0.364 0.716720    
factor(muni)22             2.13230    0.64377   3.312 0.001315 ** 
factor(muni)23            -1.23974    0.64377  -1.926 0.057158 .  
factor(muni)24            -3.75036    0.64377  -5.826 7.92e-08 ***
factor(muni)25            -0.93581    0.64377  -1.454 0.149375    
factor(muni)26             0.44822    0.64377   0.696 0.487996    
factor(muni)27            -3.71715    0.64377  -5.774 9.93e-08 ***
factor(muni)28            -2.61289    0.64377  -4.059 0.000102 ***
factor(muni)29            -2.46477    0.64377  -3.829 0.000232 ***
factor(muni)30            -4.21976    0.64377  -6.555 2.97e-09 ***
factor(muni)31             0.26073    0.64377   0.405 0.686390    
factor(muni)32            -3.01851    0.64377  -4.689 9.29e-06 ***
factor(muni)33            -2.79332    0.64377  -4.339 3.60e-05 ***
factor(muni)34            -0.28907    0.64377  -0.449 0.654451    
factor(muni)35            -0.43482    0.64377  -0.675 0.501058    
factor(muni)36            -2.67040    0.64377  -4.148 7.35e-05 ***
factor(muni)37            -0.17064    0.64377  -0.265 0.791537    
factor(muni)38             2.30562    0.64377   3.581 0.000543 ***
factor(muni)39            -0.01353    0.64377  -0.021 0.983281    
factor(muni)40            -3.86558    0.64377  -6.005 3.59e-08 ***
factor(muni)41            -1.47426    0.64377  -2.290 0.024255 *  
factor(muni)42            -1.19513    0.64377  -1.856 0.066522 .  
factor(muni)43             2.65292    0.64377   4.121 8.12e-05 ***
factor(muni)44            -2.72179    0.64377  -4.228 5.47e-05 ***
factor(muni)45            -1.81245    0.64377  -2.815 0.005936 ** 
factor(muni)46             1.70164    0.64377   2.643 0.009621 ** 
factor(muni)47             1.11290    0.64377   1.729 0.087143 .  
factor(muni)48            -1.44698    0.64377  -2.248 0.026936 *  
factor(muni)49            -0.92296    0.64377  -1.434 0.154981    
factor(muni)50            -1.76604    0.64377  -2.743 0.007285 ** 
factor(muni)51            -3.26407    0.64377  -5.070 1.99e-06 ***
factor(muni)52            -3.88610    0.64377  -6.036 3.11e-08 ***
factor(muni)53            -1.08901    0.64377  -1.692 0.094034 .  
factor(muni)54             9.36023    0.64377  14.540  < 2e-16 ***
factor(muni)55            -4.05763    0.64377  -6.303 9.38e-09 ***
factor(muni)56            -3.62824    0.64377  -5.636 1.81e-07 ***
factor(muni)57            -0.74727    0.64377  -1.161 0.248673    
factor(muni)58            -4.08326    0.64377  -6.343 7.83e-09 ***
factor(muni)59            -1.07415    0.64377  -1.669 0.098538 .  
factor(muni)60            -3.56730    0.64377  -5.541 2.73e-07 ***
factor(muni)61            -3.65170    0.64377  -5.672 1.55e-07 ***
factor(muni)62            -4.22268    0.64377  -6.559 2.91e-09 ***
factor(muni)63            -0.84999    0.64377  -1.320 0.189933    
factor(muni)64            -2.15650    0.64377  -3.350 0.001165 ** 
factor(muni)65            -1.77899    0.64377  -2.763 0.006883 ** 
factor(muni)66            -0.99105    0.64377  -1.539 0.127055    
factor(muni)67             0.55694    0.64377   0.865 0.389177    
factor(muni)68            -2.68379    0.64377  -4.169 6.81e-05 ***
factor(muni)69            -0.13410    0.64377  -0.208 0.835441    
factor(muni)70             2.82223    0.64377   4.384 3.04e-05 ***
factor(muni)71            -2.23698    0.64377  -3.475 0.000775 ***
factor(muni)72             4.37066    0.64377   6.789 1.01e-09 ***
factor(muni)73            -0.05003    0.64377  -0.078 0.938224    
factor(muni)74            -2.14293    0.64377  -3.329 0.001247 ** 
factor(muni)75            -3.96392    0.64377  -6.157 1.81e-08 ***
factor(muni)76            -4.56502    0.64377  -7.091 2.45e-10 ***
factor(muni)77             2.60295    0.64377   4.043 0.000108 ***
factor(muni)78            -2.79773    0.64377  -4.346 3.51e-05 ***
factor(muni)79             0.45699    0.64377   0.710 0.479545    
factor(muni)80            -1.29930    0.64377  -2.018 0.046413 *  
factor(muni)81            -1.49319    0.64377  -2.319 0.022537 *  
factor(muni)82             3.19897    0.64377   4.969 3.01e-06 ***
factor(muni)83            -1.54715    0.64377  -2.403 0.018213 *  
factor(muni)84            -0.44015    0.64377  -0.684 0.495840    
factor(muni)85            -4.15859    0.64377  -6.460 4.59e-09 ***
factor(muni)86             0.32742    0.64377   0.509 0.612231    
factor(muni)87            -2.77789    0.64377  -4.315 3.94e-05 ***
factor(muni)88            -3.95815    0.64377  -6.148 1.89e-08 ***
factor(muni)89            -5.22449    0.64377  -8.115 1.83e-12 ***
factor(muni)90            -1.68049    0.64377  -2.610 0.010525 *  
factor(muni)91                  NA         NA      NA       NA    
factor(muni)92             1.09935    0.64377   1.708 0.090996 .  
factor(muni)93            -1.90673    0.64377  -2.962 0.003873 ** 
factor(muni)94            -2.30515    0.64377  -3.581 0.000545 ***
factor(muni)95            -3.84034    0.64377  -5.965 4.27e-08 ***
factor(muni)96            -2.36403    0.64377  -3.672 0.000399 ***
evertr:post_treatmentTRUE  2.12356    0.28096   7.558 2.67e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.6438 on 94 degrees of freedom
Multiple R-squared:  0.9651,    Adjusted R-squared:  0.9292 
F-statistic: 26.83 on 97 and 94 DF,  p-value: < 2.2e-16


All three fixed effects estimators provide the same estimate - which corresponds to the one we manually calculated at the beginning of this lab. Note that we could also use the first differences estimator - in the case of two time periods it is equivalent to the fixed effects estimator. However, the fixed effects estimator provides much more flexibility when more than two time periods are being analysed.


Generalised Diff-in-Diff

Let’s now extend our analysis by including all pre-treatment periods in our analysis. The easiest way to do so is running a two-way fixed effects regression.

Exercise 7: Estimate the difference-in-differences using a two-way Fixed Effects regression with all time periods and treatment as independent variable.

Reveal Answer

twoway_FE <- plm(gdper ~ treatment, effect="twoways",model="within",index=c("muni","year"), data=greekislands)
summary(twoway_FE)
Twoways effects Within Model

Call:
plm(formula = gdper ~ treatment, data = greekislands, effect = "twoways", 
    model = "within", index = c("muni", "year"))

Balanced Panel: n = 96, T = 4, N = 384

Residuals:
      Min.    1st Qu.     Median    3rd Qu.       Max. 
-4.5964070 -0.5205535 -0.0089621  0.4459476  7.0013287 

Coefficients:
          Estimate Std. Error t-value  Pr(>|t|)    
treatment  2.08700    0.39328  5.3066 2.253e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Total Sum of Squares:    380.22
Residual Sum of Squares: 345.92
R-Squared:      0.090211
Adj. R-Squared: -0.22693
F-statistic: 28.1603 on 1 and 284 DF, p-value: 2.2526e-07


As you can see, the estimate is now slightly smaller than the one we got by comparing 2015 and 2016 only. This is not surprising as the regression considers all pre-treatment periods now and, therefore, attributes some of the variance to other factors such as unit-specific heterogeneity. However, it is noteworthy that the change is negligible. We still estimate that the treatment effect, or ATT, is roughly above 2%points.


Exercise 8: Calculate robust standard errors for (i) the plm FE model, (ii) the two-way FE model and present the regression output in a single table. Include the simple OLS model in the table.

Note: There is no need to adjust standard errors after using lm_robust() as the command automatically does that.

Reveal Answer

### ADJUSTING STANDARD ERRORS
## plm FE model
fe_plm_se <- coeftest(fe_plm, vcov = vcovHC(fe_plm, type = "HC1",
                                         cluster = "group"))
fe_plm_se

t test of coefficients:

                          Estimate Std. Error t value  Pr(>|t|)    
post_treatmentTRUE        1.273734   0.091318 13.9483 < 2.2e-16 ***
evertr:post_treatmentTRUE 2.123564   0.382751  5.5482 2.649e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## two-way FE model

twoway_FE_se <- coeftest(twoway_FE, vcov = vcovHC(twoway_FE, type = "HC1",
                                         cluster = c("group", "time")))
twoway_FE_se

t test of coefficients:

          Estimate Std. Error t value  Pr(>|t|)    
treatment  2.08700    0.34699  6.0147 5.535e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## CREATING A REGRESSION TABLE

screenreg(l=list(ols_did, fe_plm_se, twoway_FE_se), custom.header = list("OLS (2-period)" = 1, "FE (2-period)" = 2, "2-way FE" = 3), custom.coef.map = list("treatment" = "Treatment", "evertr:post_treatmentTRUE" = "Treatment (2-periods)"))

==========================================================
                       OLS (2-period  FE (2-per   2-way FE
                       -------------  ---------  ---------
                       Model 1        Model 2    Model 3  
----------------------------------------------------------
Treatment                                         2.09 ***
                                                 (0.35)   
Treatment (2-periods)    2.12          2.12 ***           
                       [-0.45; 4.70]  (0.38)              
----------------------------------------------------------
R^2                      0.18                             
Adj. R^2                 0.16                             
Num. obs.              192                                
RMSE                     2.21                             
==========================================================
*** p < 0.001; ** p < 0.01; * p < 0.05 (or Null hypothesis value outside the confidence interval).



Copyright © 2022 Felipe Torres Raposo & Kenneth Stiller.