2 Experiments

knitr::opts_chunk$set(
    echo = TRUE,
    message = TRUE,
    warning = TRUE
)

library(tidyverse)
library(magrittr)
library(tidymodels)

rm(list = ls())

#1. Data: Party Brands Today we are using data from Noam Lupu’s work on in Argentina. We use it to familiarize ourselves with survey experiments by replicating some of his results.

What we need to know about the paper:

Two main theories on the dynamics of partisanship:
- Partisanship as a social identity (stable)
- Partisanship as a rational maximisation expected utility (unstable)

What is Lupu trying to empirically test on this paper?

That partisanship as a social identify does not imply necessarily partisan stability
Test his theory that incorporates comparative fit, which means that people think their group differs from other groups
Testable implication: When parties converge, their brands are diluted, thus, it become more difficult for voters to distinguish parties from one another and subsequently party attachments are weaken.

3 Details of the experiment

A survey experiment
Manipulation of information about political parties that suggest either diverge or converge among the parties
Between-subject design with three-treatment arms, plus a control
Information was gleaned from party manifestos and official statements
Control: Party symbols with name of the party leaders
Outcomes of interest: -Partisanship as the proportion of respondents said they identified with a political party -Partisan attachment on a 0 - 10 scale We load the data, in .csv format, as follows:

You can use the following shortcuts to insert a r chunk: Ctrl + Alt + I or Cmd + Option + I

brands=read.csv("http://www.spyroskosmidis.net/wp-content/uploads/2018/01/lupubrands.csv")

# tidy version:
brands_tidy <- as_tibble(brands)

What variables does the data include?

head(brands) #Checking first 6 rows of the data 

# tidy version:
glimpse(brands_tidy)

brands_tidy %>%
  head() %>%
  knitr::kable()

Among these variables, let’s focus on the treatment variables and the outcome. There are four variables associated with the treatments and they are captured by the categorical variableformulario

1 = Control group
2 = Platform Information
3 = Alliance Switching
4 = Full Text

This variable is recoded into the dummies frm2=1 which takes the value 1 if the respondent received the platform information treatment, frm3=1 for alliance switching and frm4=1 for the full text. Note the omitted dummy represents the control group.

For now, however, let’s use the original, categorical variable. As it is conventional, we aim to have the control group assigned to value 0.

brands$treat=brands$formulario-1  #Assigning value 0 to the control group

# tidy version: 
brands_tidy <- brands_tidy %>%
  mutate(treat = formulario - 1)

Did we define the treatment variable correctly? Let’s check again:

head(cbind(brands$formulario, brands$treat, brands$frm2,brands$frm3,brands$frm4))

assignment <- brands_tidy %>%
  select(formulario, treat, frm2, frm3, frm4)

head(assignment)

Everything looks good! Taking the first row as example, when formulario==3 (column1), the treatment variable we generated takes the value 2 (formulario-1) and the dummy for value 3 (frm3) take the value 1, whereas the others are 0.

The Model: OLS Estimations

Lupu is interested in the effect of the three treatment conditions on partisanship. Among the two outcomes, pid (whether the individual identifies with a party) and pidstr (a 10 point scale representing the strength of the respondent’s party identification), we will focus below mainly on the latter.

#3.Let’s beging with a simple OLS model with no covariates:

summary(lm(pidstr~treat, data=brands))

# tidy version: 
lin_fit <- linear_reg() %>%
  set_engine("lm") %>%
  fit(pidstr ~ treat, data = brands_tidy)

tidy(lin_fit)

Does this regression make sense? Hint: class(brands$treat).

#Let's assign labels to each group. This will help with the output!

brands$treat[brands$treat==0]="Control"
brands$treat[brands$treat==1]="Platform"
brands$treat[brands$treat==2]="Switch"
brands$treat[brands$treat==3]="Full"


# tidy version:
brands_tidy <- brands_tidy %>%
  mutate(treat = recode(
    treat,
    `0` = "Control",
    `1` = "Platform",
    `2` = "Switch",
    `3` = "Full"
  ))

Re-running the regression from above and telling R to model treat as a factor variable:

summary(lm(pidstr~factor(treat), data=brands))

# tidy version: 
lin_fit <- linear_reg() %>%
  set_engine("lm") %>%
  fit(pidstr ~ factor(treat), data = brands_tidy)

tidy(lin_fit)

The OLS regression results look like mere mean differences. Let’s see if that is true:

pid.control=mean(brands$pidstr[brands$treat=="Control"], na.rm=T)
pid.platform=mean(brands$pidstr[brands$treat=="Platform"],na.rm=T)
pid.switch=mean(brands$pidstr[brands$treat=="Switch"],na.rm=T)
pid.full=mean(brands$pidstr[brands$treat=="Full"],na.rm=T)

ate.platform=pid.platform-pid.control
ate.switch= pid.switch-pid.control
ate.full=pid.full-pid.control

# tidy version:

pid.control <- brands_tidy %>% # E_Y_X_control
  filter(treat == "Control") %>%
  summarise(conditional_mean = mean(pidstr, na.rm = TRUE))

pid.platform <- brands_tidy %>% # E_Y_X_platform
  filter(treat == "Platform") %>%
  summarise(conditional_mean = mean(pidstr, na.rm = TRUE))

pid.switch <- brands_tidy %>% # E_Y_X_switch
  filter(treat == "Switch") %>%
  summarise(conditional_mean = mean(pidstr, na.rm = TRUE))

pid.switch <- brands_tidy %>% # E_Y_X_1_full
  filter(treat == "Full") %>%
  summarise(conditional_mean = mean(pidstr, na.rm = TRUE)) 

ate.platform=pid.platform-pid.control
ate.switch= pid.switch-pid.control
ate.full=pid.full-pid.control

ate.platform
ate.switch
ate.full

The ATEs we calculated just now are identical to the OLS slopes we calculated before. So, why do OLS?

Firstly, we prefer models that can calculate measures of dispersion for the estimates. In other words, through substracting the means by treatment group we would not be able to know whether the ATEs are statistically significant. Secondly, the advantage of using OLS is that we can include -pre-treatment- covariates.

Still, can you think of other ways to estimate statistical uncertainty without using OLS?

Let’s see what happens when we add covariates:

summary(lm(pidstr~factor(treat)+age+income+educgrp+info, data=brands))

Are the results the same?

#Re-estimate the main model
atemod=lm(pidstr~factor(treat), data=brands)
summary(atemod)
nobs(atemod)

#Add covariates
atemodcont=lm(pidstr~factor(treat)+age+income+educgrp+info, data=brands)
summary(atemodcont)
nobs(atemodcont)
#Why do we lose so many observations? Is this is a fair comparison of the ATE?

#Let's constrain the estimation to the N of the model with the added covariates
esample=rownames(as.matrix(resid(atemodcont)))

atemodsample=lm(pidstr~factor(treat), data=brands[esample,])
summary(atemodsample)
nobs(atemodsample)

#install.packages("stargazer")
library(stargazer)

stargazer(atemodcont, atemodsample,  type = "text")

# sort of a tidy version:

constrain <- brands_tidy %>% 
  filter(row.names(brands_tidy) %in% esample) 

atemodsample=lm(pidstr~factor(treat), data=constrain)
summary(atemodsample)
nobs(atemodsample)

Randomization Checks

Let’s do some randomization checks. Is the mean value of age similar across treatment groups?

t.test(brands$age~brands$frm2, data=brands, subset=c(brands$frm3!=1,brands$frm4!=1))

#and, similarly, for the other treatments
t.test(brands$age~brands$frm3, data=brands, subset=c(brands$frm2!=1,brands$frm4!=1))
t.test(brands$age~brands$frm4, data=brands, subset=c(brands$frm3!=1,brands$frm2!=1))

How about income?

t.test(brands$income~brands$frm2, data=brands, subset=c(brands$frm3!=1,brands$frm4!=1))
t.test(brands$income~brands$frm3, data=brands, subset=c(brands$frm2!=1,brands$frm4!=1))
t.test(brands$income~brands$frm4, data=brands, subset=c(brands$frm3!=1,brands$frm2!=1))

Or gender?

table(brands$frm2,brands$female[brands$treat!=3 & brands$treat!=2])
prop.test(table(brands$frm2,brands$female[brands$treat!=3 & brands$treat!=2]))

prop.test(table(brands$frm3,brands$female[brands$treat!=3 & brands$treat!=1]))
prop.test(table(brands$frm4,brands$female[brands$treat!=3 & brands$treat!=2]))

# tidy version - need to double check: 
library(broom)

brands_tidy %>%
  filter(treat ==  c("Switch", "Control")) %>% 
  group_by(frm4, female) %>%
  summarise(cases = n()) %>%
  mutate(pop = sum(cases)) %>%
  rowwise() %>%
  mutate(tst = list(broom::tidy(prop.test(
    cases, pop, conf.level = 0.95
  )))) %>%
  tidyr::unnest(tst)

An equivalent to a ttest is the kolomogorov smirnov test that compares distributions (only works for continuous variables)

\[ D = \max_{1 \leq i \leq N} (F(Y_i) - \frac{i - 1}{N}, \frac{i}{N} - F(Y_i)) \]

4 Non compliance in Experiments

4.1 What Experiment?

You already know various types of experiments, such as survey experiments, lab experiments, field experiments.

How do natural experiments fit in?

4.2 What Experiment?

You already know various types of experiments, such as survey experiments, lab experiments, field experiments.

How do natural experiments fit in?

IR: Rotating Presidency of the Council of the EU (Carnegie/Marinov 2017)
CP: Population size & electoral system (Eggers 2015)
many others…
IR/CP: Election Observation (Hyde 2007)

4.3 1. Data: Election Observation

Today we are using data from Susan Hyde’s work on the observer effect in international politics in Armenia.

Research Question: Do international monitoring missions have an effect on electoral outcomes?
Hyde’s reasoning: Cross-sectional comparisons are not suitable as there will be endogeneity
Hyde’s solution: Moving to the microlevel! (What effect do observers have on election-day fraud?)
- 2003 Presidential elections in Armenia
The Experiment: Analysing observed and unobserved polling stations, relying on OSCE missions in terms of incumbent’s vote share
Outcome: Hyde concludes there was a 5.9% difference in first round and a 2% difference in the second round (simple differences in means).

4.4 Let’s check this

Do we buy that?

What might be the problems associated with this natural experiment?

4.5 Let’s check this

First, as always, we load the file (this time from Stata)…

library(readstata13)
hyde2007 <- read.dta13(file = '/cloud/project/hyde2007_uploadR.dta')

…and look at the data, that is at some variables of interest:

head(cbind(hyde2007$pollingstation, hyde2007$urban, hyde2007$voterspersqkm, hyde2007$kocharian, hyde2007$mon_voting, hyde2007$mon_voting_R2, hyde2007$KocharianR2, hyde2007$osce_assignment))

Exercise: For the first round results, (i) calculate the difference in means of the incumbent’s vote share (kocharian) between observed and unobserved polling stations and (ii) find out whether randomization worked regarding the rural-urban divide (urban).

4.6 ATE & Randomization: Round 1

plot(density(hyde2007$kocharian[hyde2007$mon_voting==1], na.rm=T, from=min(0, na.rm=T), to=max(1, na.rm=T)))
lines(density(hyde2007$kocharian[hyde2007$mon_voting==0], na.rm=T, from=min(0, na.rm=T), to=max(1, na.rm=T)))

4.7 ATE & Randomization: Round 1

To calculate the difference in means, we could use

a t.test

t.test(hyde2007$kocharian ~ hyde2007$mon_voting)

an OLS regression

ols1=lm(kocharian ~ mon_voting, data=hyde2007)
summary(ols1)

We can do the same (using a prop.test instead of a t.test) in order to check whether randomization worked.

prop.test(table(hyde2007$urban, hyde2007$mon_voting))

ols_urban=lm(urban ~ mon_voting, data=hyde2007)
summary(ols_urban)

What do we conclude? Do we have reason to think this might affect the validity of the results?

4.8 Round 2

Let’s quickly do the same for Round 2:

ols2=lm(KocharianR2 ~ mon_voting_R2, data=hyde2007)
summary(ols2)

This looks like a somewhat weaker effect of the observers…

ols2_urban=lm(urban ~ mon_voting_R2, data=hyde2007)
summary(ols2_urban)

Round 2 seems to be even less balanced in terms of the representation of urban and rural polling stations.

What else could be relevant in Round 2? Why should we account for this?

4.9 Round 2

Hyde concludes that her “results suggest that if first-round monitoring took place then second-round monitoring had only a marginal additional deterrent effect.” (p.56). She claims that first-round observation has a persistant effect.
Your thoughts on this?

4.10 Round 2

Hyde concludes that her “results suggest that if first-round monitoring took place then second-round monitoring had only a marginal additional deterrent effect.” (p.56). She claims that first-round observation has a persistant effect.
Your thoughts on this?

Let’s see whether monitoring in the two rounds was independent. First, we check the number of polling stations observed per round.

sum(with(hyde2007, mon_voting==0 & mon_voting_R2==0))
sum(with(hyde2007, mon_voting==1 & mon_voting_R2==0))
sum(with(hyde2007, mon_voting==0 & mon_voting_R2==1))
sum(with(hyde2007, mon_voting==1 & mon_voting_R2==1))

Well, in Round 2 more stations were observed that had been monitored in Round 1 than those that had not been monitored. Let’s confirm significance.

prop.test(table(hyde2007$mon_voting_R2, hyde2007$mon_voting))


ols2_full=lm(mon_voting_R2 ~ urban + mon_voting, data=hyde2007)
summary(ols2_full)

What do we conclude?

Round 2 monitoring was not randomized either.
If we apply experimental standards (which is what we should do), we conclude that Round 2 monitoring was not independent from Round 1 monitoring.

4.11 2. Non-Compliance

One of the reasons for the imbalance between rural and urban polling station could be that observers did not adhere to their assigned polling stations. Let’s now assume, the OSCE did a good job and assigned polling stations randomly (osce_assignment).

In that case, one-sided non-compliance would be a problem. Polling stations that were supposed to be monitored were, in fact, not monitored on election day. Let’s check Hyde’s results once we consider non-compliance to be a problem

attach(hyde2007) # it helps to avoid using the dollar sign

Let’s start with a (some naive) simple differences in means:

#As-treated Analysis
(mean(KocharianR2[mon_voting_R2==1], na.rm = TRUE) - mean(KocharianR2[mon_voting_R2==0], na.rm = TRUE))

library(tidyverse)

hyde2007 <- hyde2007 %>%  # remove any NAs
  drop_na(KocharianR2)

Let’s look at a simple crosstab, showing assigned treatment and actual treatment.

#Compliance Status - Binary
tab.1 <- table(osce_assignment, mon_voting_R2)
print(tab.1)

Now, we calculate the ITT.y. We can do this using differences in means or simply running an OLS.

#ITT_Y
itt.y <- (mean(KocharianR2[osce_assignment==1], na.rm = TRUE) - mean(KocharianR2[osce_assignment==0], na.rm = TRUE))
itt.y

# ITT_Y using OLS
library(sandwich)
library(lmtest)

itt_fit <- lm(KocharianR2 ~ osce_assignment)
summary(itt_fit)
#coeftest(itt_fit, vcovHC(itt_fit))  # Heteroscedasticity-Consistent Covariance Matrix Estimation-Consistent Covariance

Accordingly, we calculate the ITT.d

#ITT_D
itt.d <- (mean(mon_voting_R2[osce_assignment==1], na.rm = TRUE) - mean(mon_voting_R2[osce_assignment==0], na.rm = TRUE))
itt.d

## ITT_D using OLS 
itt_d_fit <- lm(mon_voting_R2 ~ osce_assignment)
summary(itt_d_fit)
# coeftest(itt_d_fit, vcovHC(itt_d_fit)) # Heteroscedasticity-Consistent Covariance Matrix Estimation

Let’s quickly have a look at the compliance status and the broad picture.

#Let's remember the compliance status [1] 883 [2]258 [3] 0 [4]623
tab.1 <- table(osce_assignment, mon_voting_R2)
print(tab.1)

#Always-Takers  
# Under one-sided non-compliance there are no Always takers, so this should be zero
# 0 / (883 + 0) = 0 
at <- tab.1[3]/(tab.1[1] + tab.1[3])  
at

#Never Takers
# 258/(258 + 623) = 0.29
# This means that from those polling stations that were assigned to treatment, 29% will not be monitored
nt <- tab.1[2]/(tab.1[2] + tab.1[4])
nt

#Compliers (No Defiers By Assumption)
1 - at - nt

Finally, using the ITT.y and the ITT.d , we can calculate the CACE:

#CACE: ITT_Y divided by ITT_D # COMPLIER AVERAGE CAUSAL EFFECT
(mean(KocharianR2[osce_assignment==1], na.rm = TRUE) - mean(KocharianR2[osce_assignment==0], na.rm = TRUE))/(mean(mon_voting_R2[osce_assignment==1], na.rm = TRUE) - mean(mon_voting_R2[osce_assignment==0], na.rm = TRUE))

cace <- itt.y/itt.d # This should be the same! 
print(cace)

We can also calculate the significance of the CACE. To do so, we calculate the standard errors of the CACE (this is in fact, an approximation):

SE(CACE) ≈ SE(ITT.Y) / ITT.D

se_cace <- summary(itt_fit)$coefficients[4]/itt_d_fit$coefficients 
print(se_cace)

We then calculate the p-value as we did last week:

t_cace <- cace / se_cace
print(t_cace)

p <- (1 - pnorm(abs(t_cace), 0, 1)) * 2
p

As we will be seeing in a few weeks time, we can also use 2SLS regressions to calculate the CACE.

library(AER)

#Via TSLS
summary(ivreg(KocharianR2 ~ mon_voting_R2|osce_assignment, data=hyde2007))

4.12 So?

What do we make of Susan Hyde’s paper?
Is this a “bad” experiment? Why?
Can there be “perfect” natural experiments?