Lab 3: Selection on Observables
Week 3 HT 2022
Recap
Selection on Observables
Sometimes we cannot randomise treatment. While randomising treatment is the clearly preferable approach - as we learned in the lectures - we still might want to estimate a treatment effect for some interventions or causes we cannot influence. In fact, this might be the case for some of the most interesting research questions in social science.
Simply using observational data comes with all kinds of problems - remember the fundamental problem of causal inference. To mitigate these, we need to assume that, conditional on observable covariates, potential outcomes are the same for treated and untreated units. That is, we must check if treated and untreated units are different in ways we can observe.
Two assumptions are crucial to such designs based on selection on observables:
- Conditional Independence Assumption
\[ ((Y_i(1), Y_i(0)) \perp D_i, X_i\]
This assumptions maintains that potential outcomes are independent of treatment (\(D_i\)), conditional on covariates (\(X_i\)). That is, for the same outcome of the covariates (or the same \(X_i\)), treatment is as good as random.
- Common Support Assumption
\[ 0 < Pr(D_i=1| X_i=x) < 1\]
This assumptions describes that the probability of treatment is neither 0 nor 1 for a given \(X\). There must be variation in the observations we compare to each other: For each level of \(X_i\), there are both treated and untreated units.
Note that these assumptions are not the same as the ones we saw under random assignment. Under random assignment, the assumption is:
\[ (Y_i(1), Y_i(0)) \perp D_i\] which implies that \(E[Y_{0i} | D_i = 1] = E[Y_{0i} | D_i = 0]\) and \(E[Y_{1i} | D_i = 1] = E[Y_{1i} | D_i = 0]\). In other words, the potential outcomes of all units are the same, regardless of treatment status.
Now, once we select on observables, the conditional independence assumption implies that \(E[Y_{0i} | D_i = 1, X_i = x] = E[Y_{0i} | D_i = 0, X_i = x]\) and \(E[Y_{1i} | D_i = 1, X_i = x] = E[Y_{1i} | D_i = 0, X_i = x]\). In other words, the potential outcomes of all units with the same \(X_i\) are the same, regardless of treatment status.
Accordingly, we can derive our estimator as:
\[\Gamma^{SOO} = E[Y_i(1)| X_i = x] - E[Y_i(0)| X_i = x]\]
Taking the weighted average across \(X_i\), we can - in principle - calculate the ATE. However, doing this in practice inevitably involves some degree of uncertainty and imprecision. The conditional independence assumption is a very strong assumption - it hardly can be satisfied in practice. First, we cannot account for unobserved variables. As soon as we are working with continuous variables, we are likely to introduce bias as we need to make trade-offs regarding the similarity of observations. One major - more general - obstacle is the curse of dimensionality. While we can relatively easily satisfy the condition for one variable (especially if it is binary), this becomes exponentially more difficult if we increase the number of variables: We would need a meaningful number of observation for each combination of outcomes.
Selection on observables should therefore used with caution. Some treat it as a last-resort approach - using it if and only if randomisation and other identification strategies are not available. However, there might well be cases in which matching can be the most credible design to test hypotheses.
There are two approaches to selecting on observables:
1. Multivariate Regression
This one should be familiar to you. Running a multivariate regression means that the estimation of a treatment effect and conditioning on observables happen in a single step.
This is a parametric technique, which often involves extrapolation. That is, outcomes of \(X\) that are being compared are not necessarily close to each other.
2. Matching
As opposed to regressions, balancing coviariates and the estimating of a treatment effect are separated when we apply Matching techniques. This allows us to influence how balance is being achieved. In other words, we can make adjustments to the process of making units more comparable. We can, for instance, introduce much more narrow restrictions and only compare observations that are identical with respect to certain covariates. Think of binary variables and combinations thereof. Matching is a non-parametric estimation technique, which means that assumptions about the functional form we are implicitly making when running regressions do not need to be satisfied here.
There are different ways to match observations, such as using the most similar observation (‘nearest neighbour’), defining bandwidths or propensity scores. We will use a few of these approaches in this lab.
Before starting this seminar
Create a folder called “lab3”
Download the data (you can use this button or the one at the top, or read csv files directly from github):
Open an R script (or Markdown file) and save it in our “lab3” folder.
Set your working directory using the setwd() function or by clicking on “More“. For example setwd(“~/Desktop/Causal Inference/2022/Lab3”)
Let’s install an load packages that we will be using in this lab:
library(jtools) # generate plots with regression coefficients
library(stargazer) # generate formated regression tables
library(texreg) # generate formatted regression tables
library(nnet) # If you want to use glm() to check for balance
library(tidyverse) # to conduct some tidy operations
library(ggplot2) # to generate plots
library(Matching) #provides functions for multivariate and propensity score matching and for finding optimal balance based on a genetic search algorithm.
require(cobalt) # Generate balance tables and plots for covariates of groups preprocessed through matching, weighting or subclassification
Seminar Overview
In this seminar, we will cover the following topics:
1. See that failure to condition on relevant covariates can result in heavily biased (even outrigth wrong) estimates.
2. Run various multivariate regressions, conditioning on observable covariates.
3. Test and plot visual evidence for the common support assumption.
4. Learn how to match observations.
5. Conduct pre and post-balance checks using the Matching
package.
6. Apply propensity score matching, both without and with replacement.
7. Use Mahalanobis distances to match observations.
The Effect of Daughters on Voting on Women’s Issues
Today We will be working with data from Ebonya Washington’s work on Female Socialization: How Daughters Affect Their Legislator Fathers’ Voting on Women’s Issue. In this paper, Washington seeks to answer a behavioural question: Does having a daughter alter the political attitudes of members of Congress? To address this question, she develops an identification strategy to reveal the causal effect of having a daughter on liberal voting with regard to issues such as reproductive rights.
Personal circumstances are well known to affect legislators opinions and even voting behaviour. Relatively famous is the case of Ronald Reagan: After being diagnosed with Alzheimer’s his inner circle started advocating in favour of stemm cell research in the hope of finding a cure, although Reagan staunchly rejected any use of stemm cells for ideological reasons throughout his political career.
This paper looks at the behavioural effect children exert on their parents’ political attitudes. Washington specifically looks at the shift in attitudes towards women’s issues that are associated with raising a daughter. She finds clear evidence that parenting a daughter results in a more liberal voting record of members of Congress with regard to several issues, but particularly so for reproductive rights.
Importantly, Washington finds that - conditional on the total number of children - the effect is strong and significant. Obviously, members of Congress differ in several other characteristics as well, not last in terms of political predisposition. She therefore takes into account further personal as well as political characteristics of legislators.
The identification strategy of this paper rests on a few important assumptions:
- The gender of legislators’ children is entirely random. In other words, there is no sex selection. While in some societies there can be a structural preference of parents for one sex over the other, this would pose a critical problem to this study.
- Voting patterns (in popular elections) are not correlated with children’s gender. Correlation would be a problem as voting preferences in elections to Congress would determine the winning candidate and the number of daughters of elected candidates. Washington shows that this seems not to be the case, with remaining imbalances underestimating the treatment effect rather than the other way around.
- There is no selection on unobserved variables.
Washington concludes that she finds strong support for the behavioural effect hypothesis, which indeed is most pronounced for reproductive rights.
Her data set includes the voting records of all the Representatives in the US Congress during the 105th Congress, comprising the years 1997–1998. A description of the variables is listed below:
Variable | Description |
---|---|
nowtot | Congressional voting record scores as compiled by the National Organisation of Women (NOW). A higher level in this variable indicates a more liberal record (the outcome). |
ngirls | The number of daughters a legislator has (the treatment) |
totchi | The total number of children of a legislator |
repub | Binary variable for membership in the Republican party |
female | Binary variable equal to 1 if legislator is female |
white | Binary variable equal to 1 if legislator is white |
reld3 | Binary variable equal to 1 if legislator is catholic |
age | Continuous variable indicating legislators’ age |
srvlng | Service length in Congress (in years) |
demvote | Democratic vote share in legislator’s district |
propgirls | Proportion of legislator’s children that are female |
Now let’s load the data. There are two ways to do this:
You can load the brands dataset from your laptop using the load()
function. If you stored your data in the same directory that set your working directory, you only need to write the name of the dataset, plus the extension (.dta).
# Set your working directory
#setwd("~/Desktop/Causal Inference/2022/Lab3")
#
load("daughters.rdata") # loading your data from your laptop
Or you can download the data from the course website from following url: https://github.com/dpir-ci/CI22/raw/gh-pages/docs/data/daughters.rdata.
Let’s start this lab by inspecting the data.
head(daughters)
# A tibble: 6 x 14
ngirls totchi repub srvlng female white age demvote aauw rtl nowtot nboys
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0 0 0 7 0 1 58 0.570 100 90 90 0
2 1 3 0 15 0 1 54 0.600 88 95 90 2
3 0 0 1 1 0 1 31 0.430 0 0 0 0
4 2 2 0 1 0 1 51 0.520 100 95 95 0
5 2 2 0 7 0 1 39 0.590 100 94 90 0
6 3 7 1 27 0 1 68 0.280 0 0 0 4
# ... with 2 more variables: reld3 <dbl>, propgirls <dbl>
Descriptive Statistics
That looks as expected. Let us start with a few descriptive statistics to become familiar with the data.
Exercise 1: How many children does the average US representative have?
Reveal Answer
mean(daughters$totchi, na.rm = TRUE)
[1] 2.493088
# or use the summary() command
summary(daughters$totchi)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
0.000 2.000 2.000 2.493 3.000 10.000 1
Simply using the mean
function, we can calculate the average number of children per observation, i.e. Representative. As we see, during that term a legislator had on average 2.49
children.
Exercise 2: How many daughters does the average US representative have?
Reveal Answer
average_daughters <- mean(daughters$ngirls, na.rm = TRUE)
print(average_daughters)
[1] 1.274194
# or use the summary() command
summary(daughters$ngirls)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
0.000 0.000 1.000 1.274 2.000 7.000 1
Following the same logic as above, we can calculate the average number of daughters - feel free to store the number in an object. A legislator had on average 1.27
daughters.
Exercise 3: What is the proportion of daughters among US representatives? What would you expect?
Reveal Answer
share_daughters <- sum(daughters$ngirls , na.rm = TRUE)/sum(daughters$totchi, na.rm = TRUE)
print(share_daughters)
[1] 0.5110906
# or use the summary() command
summary(daughters$propgirls)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
0.0000 0.3333 0.5000 0.5200 0.6667 1.0000 61
We can calculate the proportion of daughters by calculating both the total number of daughters and the total number of children using the sum
command. Alternatively, we can calculate the average proportion of daughters among legislators. Note that - while they should roughly match - these are not the same parameters. We see that 51%
of all children were female, while the average proportion of daughters is 52%
- this is roughly what we’d expect when gender is random.
Exercise 4: What is the NOW score for the average Republican member? And Democrat?
Hint: Make sure to exclude NAs before calculating the average NOW scores!
Reveal Answer
# Republicans
now_rep <- mean(daughters$nowtot[daughters$repub=="1"], na.rm = TRUE)
now_rep
[1] 12.04846
# Democrats
now_demo <- mean(daughters$nowtot[daughters$repub=="0"], na.rm = TRUE)
now_demo
[1] 73.87255
The average NOW score for Republicans is 12
, while the average score for Democrats is 74
. This is a massive difference, which we would expect to see in Congress, especially when it comes to rather polarised issues.
Exercise 5: Run a multivariate regression of the NOW score on all the legislator characteristics that you think contribute to explain voting records, excluding any variable on the number or gender of children
Reveal Answer
# One possible regression could look like this:
ols_covariates <- lm(nowtot ~ repub + female + age + reld3 + white + demvote + srvlng, data=daughters)
summary(ols_covariates)
Call:
lm(formula = nowtot ~ repub + female + age + reld3 + white +
demvote + srvlng, data = daughters)
Residuals:
Min 1Q Median 3Q Max
-51.640 -9.889 -0.333 10.001 59.447
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.95838 9.39890 0.634 0.52646
repub -47.14717 2.10973 -22.348 < 2e-16 ***
female 12.57095 2.74017 4.588 5.91e-06 ***
age 0.02396 0.11021 0.217 0.82797
reld3 -4.33507 1.82870 -2.371 0.01821 *
white 8.73260 3.17143 2.754 0.00615 **
demvote 102.63605 10.10667 10.155 < 2e-16 ***
srvlng -0.02605 0.13663 -0.191 0.84888
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 16.87 on 423 degrees of freedom
(4 observations deleted due to missingness)
Multiple R-squared: 0.7902, Adjusted R-squared: 0.7868
F-statistic: 227.6 on 7 and 423 DF, p-value: < 2.2e-16
We can see clear differences across legislator characteristics and partisan variables. From the regression we can see that party affiliation, gender, race, and religion, and the preferences of the voter in the legislator’s district are strong determinants of legislator’s voting record. Republicans (repub
) are much less likely to vote for liberal policies in Congress. Similarly, higher democratic vote shares (demvote
) on a continuous scale are associated with more liberal voting of legislators. Female
representatives tend to have more liberal voting records as do white
representatives. Being catholic (reld3
) is associated with more conservative voting records - which we would expect to be the case.
Randomness of Gender
Let us now assess the randomness of children’s gender. This assumption is crucial to the research design’s identification strategy.
In countries like China and India, there is a strong parental preference for boys. Access to prenatal scanning techniques and selective abortion practices have strongly skewed the gender mix towards males. The likelihood of giving birth to a boy rather than a girl is further strongly correlated with parental characteristics. For instance, it is higher among the Indian middle classes than lower down the socio-economic scale. It is also higher in Northern Indian states than in more southerly states. This implies a causal relation from parental characteristics (including, probably, ideological preferences) to the gender of the children. It is important to understand whether US legislators display similar children gender-preferences. Further, are these preferences potentially correlated with the legislator’s characteristics (such as party affiliation, race etc)?
Exercise 6: Test the hypothesis that the gender mix among children of US legislators is 50%. Can we reject the null hypothesis?
Hint: Use the t-test
command and change the alternative hypothesis you want to test by adding the argument mu = 0.5
.
Reveal Answer
t.test(daughters$propgirls,mu = 0.5)
One Sample t-test
data: daughters$propgirls
t = 1.217, df = 373, p-value = 0.2244
alternative hypothesis: true mean is not equal to 0.5
95 percent confidence interval:
0.4876701 0.5523775
sample estimates:
mean of x
0.5200238
The mu
argument alters the alternative (‘null’) hypothesis we are testing. If we do not specify it, the default null hypothesis will be a difference or value of 0
. When we specify mu = 0.5
, the t-test shows whether we can reject the null hypothesis of a mean value of 0.5
. The t-test fails to reject the null hypothesis at the 10% level as the p-value is 0.22
.
Exercise 7: Can we say that the number of daughters is a randomly allocated variable for legislators with two children? And for those with one child? How about across partisan lines?
Reveal Answer
# Two-children only
t.test(daughters$propgirls[daughters$totchi==2],mu = 0.5)
One Sample t-test
data: daughters$propgirls[daughters$totchi == 2]
t = 0.38936, df = 137, p-value = 0.6976
alternative hypothesis: true mean is not equal to 0.5
95 percent confidence interval:
0.4556672 0.5660719
sample estimates:
mean of x
0.5108696
# One-child only
t.test(daughters$propgirls[daughters$totchi==1],mu = 0.5)
One Sample t-test
data: daughters$propgirls[daughters$totchi == 1]
t = 1.6125, df = 39, p-value = 0.1149
alternative hypothesis: true mean is not equal to 0.5
95 percent confidence interval:
0.4681975 0.7818025
sample estimates:
mean of x
0.625
# Reps v Democrats
t.test(daughters$propgirls[daughters$repub==0],
daughters$propgirls[daughters$repub==1])
Welch Two Sample t-test
data: daughters$propgirls[daughters$repub == 0] and daughters$propgirls[daughters$repub == 1]
t = 1.2576, df = 348.43, p-value = 0.2094
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.02355635 0.10710714
sample estimates:
mean of x mean of y
0.5421402 0.5003648
Using the same syntax as before, you can just subset the data to consider legislators with exactly two children ([daughters$totchi==2]
). The mean is now even slightly closer to 50% and we again fail to reject the null hypothesis of a mean of 0.5. We cannot reject the null hypothesis for legislators with one child only either. To compare Democrats and Republicans, we use a two-sample t-test as we have done before in previous classes. With a p-value of 0.2094
, we maintain the null hypothesis of no difference between Republicans and Democrats.
Remember: The finding that gender is indeed random is crucial to the paper’s identification strategy.
(Multivariate) Regressions
Before we run any regressions, let us make sure we include observations with missing values. This way, we can make fair comparisons:
daughters <- daughters %>%
drop_na()
Exercise 8: Run a bivariate regression of NOW score on the number of daughters.
Are you surprised by your findings? In this regression, do you think that the independent variable of interest is uncorrelated with the error term?
Reveal Answer
### To run a regression of NOW score on the number of daughters:
lm_nowtot_bivariate <- lm(nowtot ~ ngirls, data=daughters)
summary(lm_nowtot_bivariate)
Call:
lm(formula = nowtot ~ ngirls, data = daughters)
Residuals:
Min 1Q Median 3Q Max
-43.250 -34.744 -9.744 40.256 60.256
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 43.250 3.232 13.383 <2e-16 ***
ngirls -1.753 1.765 -0.993 0.321
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 36.79 on 369 degrees of freedom
Multiple R-squared: 0.002667, Adjusted R-squared: -3.53e-05
F-statistic: 0.9869 on 1 and 369 DF, p-value: 0.3211
The estimate of the relation is negative, which is surprising. However, we cannot say that the independent variable in this regression is uncorrelated with the error term. The number of daughters is clearly positively correlated with the number of children, which in turn is positively correlated with ideological preferences. For instance, very religious and conservative people tend to have larger families and therefore more daughters.
Exercise 9: Run a multivariate regression of NOW score on the number of daughters controlling for the number of children.
Are you surprised by your findings? In this regression, do you think that the independent variable of interest is uncorrelated with the error term?
Reveal Answer
# To run a regression of NOW score on the number of daughters, controlling for the number of children:
lm_nowtot_multivariate <- lm(nowtot ~ ngirls + totchi, data=daughters)
summary(lm_nowtot_multivariate)
Call:
lm(formula = nowtot ~ ngirls + totchi, data = daughters)
Residuals:
Min 1Q Median 3Q Max
-54.20 -32.20 -10.32 35.86 88.67
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 57.321 4.227 13.560 < 2e-16 ***
ngirls 5.378 2.235 2.406 0.0166 *
totchi -8.499 1.714 -4.958 1.09e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 35.67 on 368 degrees of freedom
Multiple R-squared: 0.06513, Adjusted R-squared: 0.06005
F-statistic: 12.82 on 2 and 368 DF, p-value: 4.155e-06
The point estimate is now strongly positive, which is consistent with our predictions. We have seen that the gender mix among US legislators is random. Therefore, we can conclude that the number of daughters (after we have controlled for the number of children) is uncorrelated with the error term of this regression.
Note that conditioning on the number of children - or ‘controlling’ for it - did not only alter the magnitude of the estimated effect of the number of daughters but also its direction. If we had (wrongly) disregarded the total number of children, we’d have a completely false impression of the true effect size and direction.
Exercise 10: Run a multivariate regression of NOW score on all relevant variables.
Run the regression and compile a table showing all three regression models using the stargazer
, screenreg
or multisum
package.
Reveal Answer
# To run a regression of NOW score on the number of daughters, controlling for the number of children:
lm_nowtot_all <- lm(nowtot ~ ngirls + totchi + age + repub + female + white + reld3 + demvote + srvlng, data=daughters)
summary(lm_nowtot_all)
Call:
lm(formula = nowtot ~ ngirls + totchi + age + repub + female +
white + reld3 + demvote + srvlng, data = daughters)
Residuals:
Min 1Q Median 3Q Max
-52.137 -9.278 -0.407 8.814 61.286
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.77303 10.09259 0.770 0.44170
ngirls 2.69925 1.04319 2.587 0.01006 *
totchi -2.47275 0.84987 -2.910 0.00384 **
age 0.04931 0.12407 0.397 0.69127
repub -46.69068 2.24967 -20.754 < 2e-16 ***
female 12.73937 2.81649 4.523 8.28e-06 ***
white 8.32240 3.31740 2.509 0.01255 *
reld3 -4.77126 1.97183 -2.420 0.01603 *
demvote 100.71891 10.58918 9.511 < 2e-16 ***
srvlng 0.04046 0.14044 0.288 0.77342
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 16.46 on 361 degrees of freedom
Multiple R-squared: 0.8047, Adjusted R-squared: 0.7998
F-statistic: 165.3 on 9 and 361 DF, p-value: < 2.2e-16
# To produce a table
library(stargazer)
stargazer(lm_nowtot_bivariate, lm_nowtot_multivariate, lm_nowtot_all, title = "Models", column.labels=c("Bivariate","Multivariate", "Complete"), align = TRUE, type = "text")
Models
========================================================================================
Dependent variable:
--------------------------------------------------------------------
nowtot
Bivariate Multivariate Complete
(1) (2) (3)
----------------------------------------------------------------------------------------
ngirls -1.753 5.378** 2.699**
(1.765) (2.235) (1.043)
totchi -8.499*** -2.473***
(1.714) (0.850)
age 0.049
(0.124)
repub -46.691***
(2.250)
female 12.739***
(2.816)
white 8.322**
(3.317)
reld3 -4.771**
(1.972)
demvote 100.719***
(10.589)
srvlng 0.040
(0.140)
Constant 43.250*** 57.321*** 7.773
(3.232) (4.227) (10.093)
----------------------------------------------------------------------------------------
Observations 371 371 371
R2 0.003 0.065 0.805
Adjusted R2 -0.00004 0.060 0.800
Residual Std. Error 36.789 (df = 369) 35.667 (df = 368) 16.459 (df = 361)
F Statistic 0.987 (df = 1; 369) 12.818*** (df = 2; 368) 165.272*** (df = 9; 361)
========================================================================================
Note: *p<0.1; **p<0.05; ***p<0.01
We can see that even after controlling for other characteristics an increase in the number of daughters is still associated with a meaningfully and significantly more liberal voting record. By conditioning on the total number of children we have exogenised ngirls
, which suffered from endogeneity in the bivariate model.
Matching
Generally, when applying Matching techniques you should follow these steps:
Run probit (or logit) with the treatment as outcome and (relevant) covariates as predictors
Predict probabilities – this will create the propensity score. Note here that some (Sekhon e.g.) suggest that the logit score is superior to the propensity score.
Match each participant to one or more non-participants using the propensity score. You can do this in a variety of ways: Nearest Neighbour Matching (NN), Caliper Matching, Mahalanobis Distance, Exact Matching etc. The default of the
Matching
package we use today is NN.
Check balance after matching
Estimate the ATT
You can also use Matching for continuous treatments. However, to simplify the example in this lab we will turn the continuous treatment variable (ngirls
) into a binary treatment variable that distinguishes legislators with at least one daughter from those that do not have any daughters at all. To do so, we create the binary variable treat
:
# Creating a binary treatment variable.
daughters <- daughters %>%
mutate(treat = ifelse(ngirls > 0, 1, 0))
table(daughters$treat)
0 1
59 312
Let us start with a pre-matching balance check. Note that this is essentially the same as randomization checks but in the context of Matching we call this process “Balance Check”. Since the treatment is now binary, we are going to run a logistic regression (or probit or less conservatively OLS).
Exercise 11: Run a logistic regression with the binary treatment variable (treat
) as outcome.
Make sure to include all covariates from the complete multivariate regression model.
Reveal Answer
pre_matching <- glm(treat ~ totchi + age + repub + female + white + reld3 + demvote + srvlng,
data = daughters, family = binomial("logit"))
summary(pre_matching)
Call:
glm(formula = treat ~ totchi + age + repub + female + white +
reld3 + demvote + srvlng, family = binomial("logit"), data = daughters)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.6325 0.2479 0.4585 0.6491 1.2078
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.10354 1.72165 -0.641 0.5215
totchi 0.76629 0.16640 4.605 4.12e-06 ***
age 0.01556 0.02248 0.692 0.4888
repub -0.11453 0.38568 -0.297 0.7665
female 0.37737 0.54813 0.688 0.4912
white 0.28492 0.55719 0.511 0.6091
reld3 -0.03414 0.34187 -0.100 0.9204
demvote 0.72026 1.82422 0.395 0.6930
srvlng -0.05790 0.02464 -2.350 0.0188 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 325.04 on 370 degrees of freedom
Residual deviance: 288.25 on 362 degrees of freedom
AIC: 306.25
Number of Fisher Scoring iterations: 5
As expected the total number of children is strongly associated with having at least one daughter. Interestingly, there is a somewhat weak correlation between previous time served in Congress and having a daughter. Apart from this, however, all other covariates are not significant predictors of the treat
variable.
Now, we are going to create a data frame by predicting the outcome from the regression we just ran. The first column will be the propensity score, i.e. the predicted values or the probability of receiving the treatment given the covariates; and the second will be a binary indicator for those treated (treat==1)
. We use the Matching package
to do this.
library(Matching)
ps.df <- data.frame(pr_score = predict(pre_matching, type = "response"),
treat = pre_matching$model$treat)
head(ps.df, n=5)
pr_score treat
1 0.8681191 1
2 0.8610881 1
3 0.7925949 1
4 0.9835946 1
5 0.9620695 1
As you can see, the data frame shows the calculated propensity scores for each observation as well as the treatment status, i.e. whether legislators had at least one daughter.
The Common Support Assumption requires that for each value of \(X\), we can compare treated and untreated units. Using the propensity score, we have reduced the dimensionality of our covariates. That is, instead of having to pay attention to individual covariates, we look at the probability of being assigned to treatment. This makes it easier for us to check the assumption. We can do this by plotting the propensity score by treatment status.
Exercise 12: Plot the distribution of the propensity score by treatment status.
Hint: You can use the the sm
package’s sm.density.compare
command to the do so. The first argument is the propensity score, while the second indicates treatment status.
Reveal Answer
require(sm)
sm.density.compare(ps.df$pr_score, ps.df$treat, xlab="Propensity Score")
title(main="Propensity score by treatment status")
text(0.6,10, "Control", col="red")
text(0.6,8, "Treated", col="green")
We can see that treated observations are associated with a larger propensity score than untreated units. This is what we would expect. Moreover, there is a significant overlap: For a large range of propensity scores there are both treated and untreated units.
We are going to use Matching
, a package with two important functions:
- The
MatchBalance()
function gives you balance before and after matching - The
Match()
performs propensity score and distance (i.e. mahalanobis) Matching
Let’s start with Step 1 of our list: Run a logit or probit with the treatment as outcome. Let’s check balance before matching with this package. One parameter that we will look at is the standarised difference in means between treated and untreated units. Below you can see the formula to calculate this estimator:
\[smd = \frac{\bar{X}_t- \bar{X}_c}{\sqrt{\frac{SD^2_{1}}{n_1} + \frac{SD^2_{1}}{n_2}}}\] But let’s use the MatchBalance()
to check for balance. The syntax is the following: MatchBalance(treat assignment variable ~ covariates, data= dataset)
. We will then include the match.out
argument into this function to check for balance between before and after matching.
bal.bm=MatchBalance(treat~ repub + srvlng + female + white + age +
demvote + totchi + reld3 ,data=daughters)
What does this tell us?
We can see from this output at the end what are the variables that there is unbalance. For example, we see that there is unbalance on the totchi
covariate.
Now, let’s match! We use propensity score matching (PSM). By default, we estimate NN – what does this mean? NN sets \(C(\bar{P}_i)=min|\bar{P_i}-\bar{P}_j|\). Basically, the non-participant(s) \(\bar{P}_j\) closest to our participant \(\bar{P}_i\) will be selected as a match. The estimator has the option to perform matching with or without replacement. Today, we will conduct first matching without replacement. It’s good to know though that this method has the disadvantage that the final estimate will usually depend on the initial ordering of the treated observations for which the matches were selected. How many neighbours should you use? It’s usually better to oversample – use more neighbours. This, however, involves a trade-off between bias and variance: it trades lower variance with increased bias. Generally, in choosing between different matching estimators, the one with the best balance is the most satisfactory one.
To retrieve the propensity score, we estimate a logit model predicting assignment to treatment – just what we did above in pre_matching
. We’ll call this ps
though it’s exactly the same model as pre_matching
.
Exercise 13: regress treatment variable treat
on totchi
, repub
, srvlng
, female
, white
, age
, demvote
, and reld3
. using the glm()
function. The syntax is the following: glm(treatment assignment variable ~ covariate 1 + covariate 2...covariate n, data = data, family = binomial("logit"))
. Store the output from the glm()
function into an object called ps
.
Reveal Answer
ps <- glm(treat ~ repub + srvlng + female + white +
age + demvote + reld3 + totchi , data = daughters, family = binomial("logit"))
Next, we are going to match each treatment observation with a untreated unit, without replacement (that means, we are only going to use unique pairs of treatment-control, by contrast to matching with replacement, where the same individual can be control for several treated units). Let’s use the Match()
function!.
The syntax is the following: Match(Y = df$outcome, Tr = df$treat, X = ps$fitted, replace = FALSE)
.
When we set the replace
argument equal to FALSE
we are setting our matching algorithm without replacement. The second argument of this function is treatment status (Tr
) has to be equal to the treatment assignment variable daughters$treat
. The X
is the argument that determines the vector that will use to match the units. When we are conducting PSM, we set X equal to the propensity score obtained from the predicted values from the logistic regression ps$fitted
Exercise 14: Conduct matching using Match()
without replacement. Match using propensity scores. Store the output of the Match()
function into an object. Then, use summary()
function to report the results. We need to set seed to replicate our results. Set a seed equal to 1111 set.seed(1111)
. This is to replicate the results from matching.
Reveal Answer
set.seed(1111)
match.ps<- Match(Y=daughters$nowtot, Tr=daughters$treat, X=ps$fitted, replace=FALSE)
summary(match.ps)
Estimate... -0.16949
SE......... 6.5292
T-stat..... -0.025959
p.val...... 0.97929
Original number of observations.............. 371
Original number of treated obs............... 312
Matched number of observations............... 59
Matched number of observations (unweighted). 59
We got that our point estimate is -0.17
. Does this seem good? Let’s compare this point estimate to the regression estimates from OLS.
Exercise 15: Regress using lm()
function setting as a dependent variable nowtot
and indepent variables treat
, totchi
, repub
, srvlng
, female
, white
, age
, demvote
, and reld3
.Store the output from lm()
into an object. Use the summary function summary()
to report the results. What the estimate for the treat
variable.
Reveal Answer
multivariate <- lm(nowtot ~ treat + totchi + repub + srvlng + female + white + age + demvote + reld3, data=daughters)
summary(multivariate)
Call:
lm(formula = nowtot ~ treat + totchi + repub + srvlng + female +
white + age + demvote + reld3, data = daughters)
Residuals:
Min 1Q Median 3Q Max
-52.052 -9.092 0.325 8.907 59.732
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.00863 10.22942 0.587 0.5573
treat 4.09385 2.45198 1.670 0.0959 .
totchi -1.35926 0.68149 -1.995 0.0468 *
repub -46.88293 2.25988 -20.746 < 2e-16 ***
srvlng 0.03889 0.14183 0.274 0.7841
female 12.84109 2.83150 4.535 7.85e-06 ***
white 7.94178 3.33734 2.380 0.0178 *
age 0.04104 0.12479 0.329 0.7424
demvote 100.64213 10.64798 9.452 < 2e-16 ***
reld3 -4.79761 1.98234 -2.420 0.0160 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 16.55 on 361 degrees of freedom
Multiple R-squared: 0.8026, Adjusted R-squared: 0.7977
F-statistic: 163.1 on 9 and 361 DF, p-value: < 2.2e-16
We see that the point estimates substantially differ. From matching using nearest neighbour without replacement we got -0.17
, whereas from OLS we got 4.09
.
Now that we’ve done the matching, let’s check balance after matching. For that, let’s use the MatchBalance()
function. Just to remember the syntax is the following: MatchBalance(treat assignment variable ~ covariates, data= dataset)
, but now let’s include the match.out
argument into the function to check before and after matching. MatchBalance(treat assignment variable ~ covariates, match.out = output Match, data= dataset).
Exercise: 16: Check for balance using the MatchBalance()
for the Nearest Neighbour matching without replacement.
Reveal Answer
After matching, we are unable to achieve balance for the totchi
covariate. [Why?]
We can see that the MatchBalance
reports four difference sets of balance statistics for each variable. It contains the means of each group, standardised empirical quantile-quantile plots or so-called empirical-QQ plots (these plots try to depict if two datasets come from the same population with a common distribution). MatchBalance()
also reports a summary statistics from the raw empirical-QQ plots. It provides variance ratio of the treatment over the control, and the t test of difference in means. Finally, it also provides summary statistics of the Empirical Cumulative Distribution Function (eCDF), which is the formal name of a cumulative distribution of a sample.
We can plot a balance plot using the love.plot()
and bal.tab()
function. The bal.tab()
function calculates the standardised difference in means. The love.plot()
function generates a plot that displays covariate balance before and after matching. Some of the arguments are stat
which determines what statistics you want to report, one option is mean.diffs
which reports the mean differences for each covariate. We can set a threshold
to set an optional value as a threshold marker. We can also set that the statistics are presented in absolute value by adding the abs
function equal TRUE
. Finally, we can also set the order of which the statistics are presented by including the var.order
argument. If we set var.order
equal to “unadjusted” the statistics will be ordered by the balance statistic of the unadjusted sample. See the example below. Always remember to add the object of the Match()
function at the beginning of the bal.tab()
function.
love.plot(bal.tab(match.ps, treat~ repub + srvlng + female + white + age + demvote + totchi + reld3,
data = daughters),stat = "mean.diffs",
threshold = .1, var.order = "unadjusted" ,abs=T)
Ideally, all covariates would be on the 0 line. The further away the absolute mean differences are, the worse the balance. Now we have seen what happens with the means of the covariates. Let’s see next what happens with their distributions. Remember, it can well be that the means are the same but the distributions are statistically different. We’ll test that like we did last week with a Kolomogorov-Smirnov Statistic.
Exercise 17: Check for balance using the MatchBalance()
for the Nearest Neighbour matching without replacement, but rather than reporting mean differences, check for balance in their cumulative distribution by setting stat = "ks.statistics"
.
Reveal Answer
love.plot(bal.tab(match.ps, treat~ repub + srvlng + female + white + age + demvote + totchi + reld3,
data = daughters),stat = "ks.statistics",
threshold = .1, var.order = "unadjusted" ,abs=T)
Again, ideally everything would be on 0 (that would mean the covariate distributions are identical between the treatment and the control). Note also that kolomogorov-smirnov statistics are only available for continuous covariates.
Propensity Score, With Replacement
Now, let’s use propensity score WITH replacement. Remember, this means that a single control unit can be matched to several treated units. Matching without replacement can yield very bad matches if the number of comparison observations comparable to the treated observations is small.
How many neighbours should you use? It’s usually better to oversample – use more neighbours. This, however, involves a trade-off between bias and variance
To do matching with replacement we only need to change set the argument replace
of the Match()
function equal to TRUE
. Let run Matching with replacement.
Exercise 18: Conduct Matching using the Match()
function, set the replace
argument equal to TRUE
. Use the summary()
to report your results. Interpret the output from the Match()
function. With matching with replacement, do you get closer to the multivariate regression benchmark? Do you think that matching with replacement leads to more or less bias to matching with replacement?. Does it leads to less or variance compared to without matching replacement? Remember to set the treatment status (Tr
) function equal to daughters$treat
. Set X
equal to the propensity score obtained from the predicted values from the logistic regression ps$fitted
.
Reveal Answer
match.ps2 <- Match(Y=daughters$nowtot, Tr=daughters$treat, X=ps$fitted,
replace=T)
summary(match.ps2)
Estimate... -3.9583
AI SE...... 9.5929
T-stat..... -0.41263
p.val...... 0.67988
Original number of observations.............. 371
Original number of treated obs............... 312
Matched number of observations............... 312
Matched number of observations (unweighted). 316
We found that the estimate is now -3.93
. Very far from the multivariate estimate. We even got a negative coefficient!
Regarding the output from the Match()
function, we also have AI SE which is matching corrected standard error Abadie and Imbens, 2022. The T-stat and p.val are the standard t-test and p-value of the point estimate and they are interpreted as usual.
In terms of variance-bias trade off. Matching with replacement has the advantage of always matching each treated unit to the closest untreated unit. However, it produces larger bias reduction than matching with replacement.
Exercise 19: Let’s check if we actually achieved balance using the MatchBalance()
function on a set of covariates. One thing that we need to do is subset our data so we can conduct our balance test only a data frame thaat contains only covariates. Let’s drop the outcome variables and some covariates from the analysis. Let’s also remove columns 1, 9, 10, 11, 12, 14 which are either outcomes or are variables that have been created as a function of other variables. You can use the squared brackets to remove these columns. You can use the following syntax: data.frame[ ,-c(1,9:12, 14)]. Then, use the bal.tab()
function to report the covariate balance. The syntax is the following: bal.tab(object output Match function, treatment assignment variable ~ covariates, data = dataframe (excluding columns 1,9,10,11, 12 and 14))
. Remember to install/load the cobalt
library to use bal.tab()
function. Identify in which variables there is unbalance before and after matching.
Reveal Answer
balancepost=MatchBalance(treat~., match.out=match.ps2, data=daughters[,-c(1, 9:12, 14)])
We observe that before matching there is unbalance for the following variable(s): totchi
which is the total number of children. After matching we got unbalance for the following covariates: srvlng
, age
, and totchi
.
Let’s see the results using the bal.tab()
function
bal.tab(match.ps2, treat~., data=daughters[,-c(1, 9:12, 14)])
The bal.tab()
reports substantially less information. It reports the standardised difference between the average of the treated and untreated units after matching. It also provides the numbers of observations in the treatment and control group, and the effective sample size, which is the number of observations matched in both groups. Finally it reports "Weighted matching’, which is when you introduce weighting within strata and proportional weighting within strata as alternative weighting methods.
Now plot a balance plot for Matching with replacement.
Exercise 20: Generate a balance plot before and after matching using the love.plot()
and bal.tab()
functions . Include the following arguments into the function: Set the stat
argument equal to “variance.ratios”. As you may imagine, balance ratios are essentially the ratio of the variance of a covariate for the treated group over the variance of the same covariate for the control group. Also, let’s set the order of the variables by the size of the variance ratio for the adjusted measures by setting the var.order
argument equal to “adjusted”. Also set the abs
argument equal to TRUE
.
Reveal Answer
daughters_cov=daughters[,-c(1, 9:12, 14)]
var_names <- data.frame(old = c("age", "educ", "black", "hispan",
"married", "nodegree", "re74", "re75"),
new = c("Age", "Years of Education", "Black",
"Hispanic", "Married", "No Degree",
"Earnings 1974", "Earnings 1975"))
love.plot(bal.tab(match.ps2, treat~., data=daughters[,-c(1, 9:12, 14)]), stat = "variance.ratios", var.order = "adjusted", abs = TRUE)
We don’t see to much improvement in terms balance for variance.
So far the closest we’ve got to the multivariate regression benchmark has been using matching without replacement. Let’s try Caliper Matching Now. Caliper Matching is a variant of Nearest Neighbour method that attempts to avoid ‘bad’ matches (\(\bar{P}_j\) that are far from our \(\bar{P}_i\). In order to do this, Caliper Matching imposes a tolerance on the maximum distance between treated and untreated units. If, for some participants, no matches are found within the specified caliper (distance) then they will be excluded from the analysis. Theoretically, Caliper should be better than NN, since it corrects for ‘bad’ matches. Compared to NN, Caliper (also Kernel, LLR) use weighted averages. Caliper for example uses the weighted average over multiple persons within the caliper. A caliper for matching is usually defined as a fraction of the standard deviation of the logit of the propensity score.
Using a caliper should improve the quality of matching and also enforces common support because untreated units far from the treated units are discarded.
Let’s estimate Caliper with a distance of 0.1 standard deviations.
Exercise 21: Perform matching setting a caliper of 0.1. To do this, add the following argument to the Match()
function: caliper = 0.1
. Remember to place a comma between the arguments. Set the argument replacement
equals to FALSE (or F). Let’s use the same outcome. nowtot
. Use the summary()
to report your results.
Reveal Answer
match.ps.cal0.1=Match(Y=daughters$nowtot, Tr=daughters$treat,
X=ps$fitted, caliper=0.1, replace=F)
summary(match.ps.cal0.1)
Estimate... 4.7273
SE......... 6.9561
T-stat..... 0.67959
p.val...... 0.49677
Original number of observations.............. 371
Original number of treated obs............... 312
Matched number of observations............... 55
Matched number of observations (unweighted). 55
Caliper (SDs)........................................ 0.1
Number of obs dropped by 'exact' or 'caliper' 257
We got4.73
That’s `better than before (assuming that there is, in fact, a positive effect of having daughters). At least we got a positive coefficient. How’s the balance though?
Exercise 22: Perform balance a covariate evaluation using the MatchBalance()
function. Include the following variable: “repub”, “srvlng”, “female”, “white”, “age” , “totchi”, “demvote”,“reld3”. Identify which variables are unbalance before and after matching. Remember to store the output from the Match()
function into a new object. Then set the argument match.out
equal to new output. Identify which variable are unbalance before and after matching.
Reveal Answer
bal.ps.cal0.1=MatchBalance(treat~ repub + srvlng + female + white + age +
demvote + reld3 + totchi,
match.out=match.ps.cal0.1,data=daughters)
We find that using a caliper of 0.1 we are finally able to balance the covariate totchi
!
Exercise 23: Again, plot the results of your balance covariate evaluation as you did before. Use the love.plot(bal.tab(....))
functions to generate this plot. Report the difference in means by setting stat = "mean.diffs"
. Include a threshold by adding the threshold
argument equal to .1. Order the unadjusted difference in means in a decreasing order by adding the argument var.order = "unadjusted"
. Finally, include the abs = TRUE
argument and set it equal to TRUE, so you report only positive values of the difference in means. This is the syntax putting everything together: love.plot(bal.tab(output matching, treat variable + covariates, data = data, stat= “means.diffs”, threshold = .1, var.order = "unadjusted, abs = T).
Reveal Answer
love.plot(bal.tab(match.ps.cal0.1, treat ~ repub + srvlng + female + white + age + demvote + reld3 + totchi,
data = daughters),stat = "mean.diffs",
threshold = .1, var.order = "unadjusted" ,abs=T)
This looks pretty decent! Finally, we achieve balance in most of the covariates!
Exercise 24: Try again by changing the caliper equal to 0.25. Perform matching using the Match()
function. Do this matching without replacement, so set the replace
argument equal to TRUE (T).
Reveal Answer
match.ps.cal25 <- Match(Y = daughters$nowtot, Tr = daughters$treat, X = ps$fitted,
caliper = 0.25, replace = F)
summary(match.ps.cal25)
Estimate... 4.6364
SE......... 6.8657
T-stat..... 0.67529
p.val...... 0.49949
Original number of observations.............. 371
Original number of treated obs............... 312
Matched number of observations............... 55
Matched number of observations (unweighted). 55
Caliper (SDs)........................................ 0.25
Number of obs dropped by 'exact' or 'caliper' 257
Again, we find a point estimate way larger than our regression benchmark The estimated coefficient is now 4.63
. It doesn’t change substantially once we used a caliper equal to 0.10. It seems that most observations are tightly clustered around a single propensity score. If you see from the previous matching with caliper 0.1, 55 units were matched, and with a 0.25 caliper it is the same number! So it seems that making the caliper bigger made a difference. Note the rather small number of observations that can be matched even with a caliper of 0.25
. The estimate of our coefficient seems about right (under the assumption that there is an effect), but we would still reject the null hypothesis given the p-value
. We’d conclude that we’d need more observations here!
Note also that our findings from caliper matching indicate that NN matching included quite a lot of matches of observations that were rather distinct in terms of their proposensity scores: While they were the nearest available neighbours, they were not actually good neighbours. Being aware of the variance-bias trade-off, it seems that NN matching did decrease the variance by introducing a meaningful bias.
Exercise 24: Let’s examine again how well the matching procedure did in producing balance. Check for balance using the MatchBalance()
function. Remember to store the output of the Match()
function into an object and replace this object equal to the match.out
argument.
Reveal Answer
# let's check for balance before and after matching using a 0.25 caliper.
bal.ps.cal25 = MatchBalance(treat ~ repub + srvlng + female + white + age +
demvote + reld3 + totchi,
match.out = match.ps.cal25, data = daughters)
***** (V1) repub *****
Before Matching After Matching
mean treatment........ 0.53846 0.54545
mean control.......... 0.49153 0.52727
std mean diff......... 9.4 3.6181
mean raw eQQ diff..... 0.050847 0.018182
med raw eQQ diff..... 0 0
max raw eQQ diff..... 1 1
mean eCDF diff........ 0.023468 0.0090909
med eCDF diff........ 0.023468 0.0090909
max eCDF diff........ 0.046936 0.018182
var ratio (Tr/Co)..... 0.98066 0.99469
T-test p-value........ 0.51323 0.84218
***** (V2) srvlng *****
Before Matching After Matching
mean treatment........ 8.5865 9.7273
mean control.......... 10.78 9.3455
std mean diff......... -29.042 4.9089
mean raw eQQ diff..... 2.1525 1.0364
med raw eQQ diff..... 2 0
max raw eQQ diff..... 8 4
mean eCDF diff........ 0.061969 0.034091
med eCDF diff........ 0.060354 0.027273
max eCDF diff........ 0.13521 0.090909
var ratio (Tr/Co)..... 0.54418 0.82865
T-test p-value........ 0.12162 0.77354
KS Bootstrap p-value.. 0.154 0.842
KS Naive p-value...... 0.32452 0.97691
KS Statistic.......... 0.13521 0.090909
***** (V3) female *****
Before Matching After Matching
mean treatment........ 0.125 0.054545
mean control.......... 0.084746 0.090909
std mean diff......... 12.152 -15.867
mean raw eQQ diff..... 0.050847 0.036364
med raw eQQ diff..... 0 0
max raw eQQ diff..... 1 1
mean eCDF diff........ 0.020127 0.018182
med eCDF diff........ 0.020127 0.018182
max eCDF diff........ 0.040254 0.036364
var ratio (Tr/Co)..... 1.3907 0.624
T-test p-value........ 0.32993 0.41498
***** (V4) white *****
Before Matching After Matching
mean treatment........ 0.87179 0.87273
mean control.......... 0.84746 0.87273
std mean diff......... 7.268 0
mean raw eQQ diff..... 0.016949 0
med raw eQQ diff..... 0 0
max raw eQQ diff..... 1 0
mean eCDF diff........ 0.012169 0
med eCDF diff........ 0.012169 0
max eCDF diff........ 0.024337 0
var ratio (Tr/Co)..... 0.85267 1
T-test p-value........ 0.63372 1
***** (V5) age *****
Before Matching After Matching
mean treatment........ 52.628 52
mean control.......... 52.203 51.164
std mean diff......... 4.7262 9.3791
mean raw eQQ diff..... 1.4915 1.4545
med raw eQQ diff..... 1 1
max raw eQQ diff..... 7 6
mean eCDF diff........ 0.033654 0.032536
med eCDF diff........ 0.029444 0.018182
max eCDF diff........ 0.087082 0.090909
var ratio (Tr/Co)..... 0.67918 0.79821
T-test p-value........ 0.77901 0.65186
KS Bootstrap p-value.. 0.674 0.936
KS Naive p-value...... 0.84606 0.97691
KS Statistic.......... 0.087082 0.090909
***** (V6) demvote *****
Before Matching After Matching
mean treatment........ 0.49929 0.48182
mean control.......... 0.51017 0.49945
std mean diff......... -8.5332 -13.764
mean raw eQQ diff..... 0.016271 0.034364
med raw eQQ diff..... 0.01 0.02
max raw eQQ diff..... 0.08 0.12
mean eCDF diff........ 0.02367 0.058442
med eCDF diff........ 0.018307 0.054545
max eCDF diff........ 0.071056 0.14545
var ratio (Tr/Co)..... 0.9728 1.1032
T-test p-value........ 0.55409 0.42878
KS Bootstrap p-value.. 0.848 0.504
KS Naive p-value...... 0.96361 0.60572
KS Statistic.......... 0.071056 0.14545
***** (V7) reld3 *****
Before Matching After Matching
mean treatment........ 0.27244 0.25455
mean control.......... 0.27119 0.27273
std mean diff......... 0.28019 -4.1358
mean raw eQQ diff..... 0 0.018182
med raw eQQ diff..... 0 0
max raw eQQ diff..... 0 1
mean eCDF diff........ 0.00062473 0.0090909
med eCDF diff........ 0.00062473 0.0090909
max eCDF diff........ 0.0012495 0.018182
var ratio (Tr/Co)..... 0.98906 0.95667
T-test p-value........ 0.98437 0.82801
***** (V8) totchi *****
Before Matching After Matching
mean treatment........ 3.0417 2.2909
mean control.......... 2.1186 2.1818
std mean diff......... 64.22 10.406
mean raw eQQ diff..... 0.9661 0.14545
med raw eQQ diff..... 1 0
max raw eQQ diff..... 4 1
mean eCDF diff........ 0.092302 0.020779
med eCDF diff........ 0.026782 0.018182
max eCDF diff........ 0.29933 0.072727
var ratio (Tr/Co)..... 2.2119 1.1826
T-test p-value........ 1.1455e-08 0.25628
KS Bootstrap p-value.. < 2.22e-16 0.728
KS Naive p-value...... 0.00027523 0.99864
KS Statistic.......... 0.29933 0.072727
Before Matching Minimum p.value: < 2.22e-16
Variable Name(s): totchi Number(s): 8
After Matching Minimum p.value: 0.25628
Variable Name(s): totchi Number(s): 8
We find that balance for all covariate!
Mahalanobis Distance
One of the most common methods of multivariate matching is based on Mahalanobis distance.
Like propensity score matching, Mahalanobis distance matching is built on the notion of distance between observations of pretreatment covariates. It uses the complete variance covariance matrix, which means that the relationship between variables is included in the analysis. The contribution to the distance calculation of two highly correlated covariates will then be lower than that of less correlated ones. In essence Mahalanobis distance matching scales the distance by the inverse of the covariance matrix. Here is the mathematical formula to compute the Mahalanobis distance:
\[MD(X_i, X_j) = \sqrt{(X_i - X_j)'\Sigma^{-1}(X_i - X_j)}\]
For exact match, \[MD = (X_i, X_j) = 0\]
Exercise 25: Conduct matching using Mahalanobis distance as the matching algorithm. First, create a data frame that only includes the covariates of the daughters
dataset. There are different ways to do this, you can as we did before use the square brackers, in the second argument of the square brackets (after the comma) add the concatenate function c()
with all the variables that you want to remove. You need to put the column number of the variable that you want to remove. For example, if want to remove the first variable, you need to put 1. After you include the column numbers, then add a negative sign -
before each column number. Finally, assign the new data frame into a new dataframe using the assignment operator <-
. An example of this would be as follows: df.cov <- df[,c(-1, -2, -3)]
. Alternatively, you could use the select()
from the dplyr
package. The syntax is the following:
df_cov <- df %>%
select(-c(var1, var2, var3))
Exercise 26: Use the Match()
function to conduct matching using Mahalabonis. Set Y
argument equal to our outcome of interest daughters$nowtot
. Set the treatment assignment argument Tr
equal to daughters$treat
. For th X
argument set it equal to the data frame that only includes the covariates. What is the ATT using Mahalanobis.
Reveal Answer
daughters_cov <- daughters %>%
dplyr::select(-c(repub, ngirls, aauw, rtl, nboys, nowtot, propgirls))
match.maha <- Match(Y=daughters$nowtot,Tr=daughters$treat,X=daughters_cov,
BiasAdjust=F,estimand="ATT",M=1, Weight = 2)
summary(match.maha)
Estimate... -0.59295
AI SE...... 5.1488
T-stat..... -0.11516
p.val...... 0.90832
Original number of observations.............. 371
Original number of treated obs............... 312
Matched number of observations............... 312
Matched number of observations (unweighted). 312
Mahalanobis does seem to suffer from the same problem we encountered above. Its point estimate (-0.59
) seems to indicate that - given the rather small number of observations - it matches observations that are rather bad matches.
And now again, let’s check for balance:
Exercise 27: Check for balance using the data set that only includes the covariates. Remember to store the output from the Match function and set it equal to match.out.
Reveal Answer
bal.maha <- MatchBalance(daughters$treat~.,match.out = match.maha, data=daughters_cov)
Again, we keep getting unbalance for the totchi
variable.
Exercise 28: As a final exercise, plot the balance plot using the love.plot(bal.tab(....))
as we did before.
Reveal Answer
love.plot(bal.tab(match.maha, treat~.,data=daughters_cov),stat = "mean.diffs",
threshold = 0.1, var.order = "unadjusted" ,abs=T)
Below we have a table that summarises the results from all the matching techniques and the multivariate regression. As we can see, Capiler matching did relatively a good job compared to the other matching techniques, if we assume there is a true significant effect about the size of the OLS estimator. Pay attention to the random order of observations and the variance-bias trade off when doing matching. Often, the output of different estimators will give you an idea of what is ‘wrong’ with your data if you get vastly divergent results.
Matching technique | Estimation |
---|---|
Multivariate | 4.09 |
NN without rep | -0.18 |
NN with rep | -3.95 |
Caliper (0.1) | 4.73 |
Caliper (0.25) | 4.63 |
Mahalanobis | -0.59 |