3 Matching

rm(list = ls())
knitr::opts_chunk$set(echo = TRUE)

#1. Data: Job Training Data
Today we are using data from , which represents a subset of the original Lalonde 1986 data (including only male). We use it to familiarize ourselves with matching techniques.

We load the data, in .csv format:

lalonde=read.csv("http://www.spyroskosmidis.net/wp-content/uploads/2018/01/lalonde.csv")

What variables does the data include?

head(lalonde) #Checking first 6 rows of the data

treat = dummy for treatment (job-training) or control
age = participant age
educ = participant education
black, hispanic = dummies for participant race
married = participant’s marital status
nodegree = participant has no school degree
re74 = real earnings in 1974
re75 = real earnings in 1975
unem74 = unemployed in 1974
unem75 = unemployed in 1975
re78 = real earnings in 1978 (outcome variable)

The data comes from a labour market experiment that randomized participants into treatment (on-job training) and control with the aim to estimate its effect on participants’ earnings. The experimental benchmark is $1794 – i.e. this is the effect of the treatment on earnings. Let’s start with an exercise. The model below shows the treatment effect with no covariates.

summary(lm(re78~treat,data=lalonde))

summary(lm(re78~treat + age + educ + black + hispan + nodegree + 
                married + re74 + re75 + unem74 + unem75 ,data=lalonde))

As we can see, from the full comparison group the estimate (difference in means) is -$8,498. What does this suggest about the earnings of those treated?

#2. Exercise1

Try to retrieve the experimental benchmark by adding covariates into the above regression.

#3. Observational Data

We are trying to obtain the experimental benchmark effect in observational data and we are going to try to do that with matching. Before we proceed, though, let’s run some randomization checks. Note here that in the context of Matching we call this process “Balance Check”. Since the treatment is binary we are going to run a logistic regression (or probit or less conservatively OLS).

random  <- glm(treat    ~ age + educ + black + hispan + nodegree + 
                married + re74 + re75+unem74+unem75,    data    =lalonde,   family  = binomial("logit"))
summary(random)

What do we notice?

Now, we are going to create a data frame by predicting the outcome from the regression we just ran. The first column will be the propensity score and the second will be a binary indicator for those treated (==1).

ps.df <- data.frame(pr_score = predict(random, type = "response"),
                     treat = random$model$treat)
head(ps.df, n=100)

Before we proceed, let’s install and load one more package.

#install.packages("sm")#This is a package that allows you to quickly plot two densities. 
#One can do this in many different ways.
require(sm)

Now, let’s run the density plot

sm.density.compare(ps.df$pr_score, ps.df$treat, xlab="Propensity Score")
title(main="Propensity score by treatment status")
text(0.6,10, "Control", col="red")
text(0.6,8, "Treated", col="green")

#3. Matching

Generally, you should follow these steps:

Run probit (or logit) with the treatment as outcome and (relevant) covariates as predictors
Predict probabilities – this will create the propensity score. Note here that some (Sekhon e.g.) suggest that the logit score is superior to the propensity score.
Match each participant to one or more non-participants using the propensity score. You can do this in a variety of ways: Nearest Neighbour Matching (NN), Caliper Matching, Mahalanobis Distance, Exact Matching etc. The default of the Matching package we use today is NN.
Check balance after matching
Estimate ATT

Let’s start by installing and loading some useful packages.

#install.packages("Matching")
#install.packages("cobalt")
library(Matching) #provides functions for multivariate and propensity score matching and for finding optimal balance based on a genetic search algorithm.
require(cobalt) # Generate balance tables and plots for covariates of groups preprocessed through matching, weighting or subclassification

Think back to our random model – was the treatment randomly assigned? No. This is why we’re doing matching…

We are going to use Matching, a package with three important functions:

MatchBalance gives you balance before and after matching
Match performs propensity score and distance (i.e. mahalanobis) Matching
GenMatch performs genetic matching

Let’s start with (1). Let’s check balance before matching with this package.

\[smd = \frac{\bar{X}_t- \bar{X}_c}{\sqrt{\frac{SD^2_{1}}{n_1} + \frac{SD^2_{1}}{n_2}}}\]

bal.bm=MatchBalance(treat~+age + educ + black + hispan + nodegree + 
                  married + re74 + re75+unem74+unem75,data=lalonde)

What does this tell us?

Now, let’s match! We use propensity score matching (PSM). By default, we estimate NN – what does this mean? NN sets $C(\bar{P}_i)=min|\bar{P_i}-\bar{P}_j|$. Basically, the non-participant(s) $\bar{P}_j$ closest to our participant $\bar{P}_i$ will be selected as a match. The estimator has the option to perform matching with or without replacement. Today, we will only use matching without replacement. It’s good to know though that this method has the disadvantage that the final estimate will usually depend on the initial ordering of the treated observations for which the matches were selected. How many neighbours should you use? It’s usually better to oversample – use more neighbours. This, however, involves a trade-off between bias and variance: it trades lower variance with increased bias. Generally, in choosing between different matching estimators, the one with the best balance is the most satisfactory one.

To retrieve the propensity score, we estimate a logit model predicting assignment to treatment – just what we did above in random. We’ll call this ps though it’s exactly the same model as random.

ps  <- glm(treat    ~ age + educ + black + hispan + nodegree + 
            married + re74 + re75+unem74+unem75,    
          data  =lalonde,   family  = binomial())

Formula to calculate the conditional probability of receiving the treatment (predicted probabilities): \[P(W_I|X_i = x_i) = E(W_i) = \frac{e^{x_i\beta_i}}{1 + e^{x_i\beta_i}} = \frac{1}{1 + e^{e^{-\beta_i}}}\]

Next, we are going to match each treatment observation with a control, without replacement (that means, we are only going to use unique pairs of treatment-control, by contrast to matching with replacement, where the same individual can be control for several treated units).

set.seed(1111) 
match.ps<- Match(Y=lalonde$re78,    Tr=lalonde$treat,   X=ps$fitted,
                 replace=FALSE)
summary(match.ps)

We got $1642. Pretty good, right?

Now that we’ve done the matching, let’s check balance after matching.

bal.ps=MatchBalance(treat~+age + educ + black + hispan + nodegree + 
                      married + re74 + re75+unem74+unem75,
                    match.out=match.ps,data=lalonde)

It’s getting better, but how about age? To see better what is happening, let’s plot first the standardized mean differences with a threshold of 0.1. Note here that the package gives you the standardized mean differences for the unmatched and matched data. Have a look at the code below to see how the standardized mean differences are calculated for the unmatched age variable.

sddiffage=(mean(lalonde$age[lalonde$treat==1])-mean(lalonde$age[lalonde$treat==0]))/
  sd(lalonde$age[lalonde$treat==1])
abs(sddiffage*100)

The package gives you something very similar. The following plot by cobalt does not multiply it by 100.

love.plot(bal.tab(match.ps, treat~age + educ + black + hispan +
                    nodegree + married + re74 + re75+unem74+unem75,
                  data = lalonde),stat = "mean.diffs", 
          threshold = .1, var.order = "unadjusted" ,abs=T)

Ideally, all covariates would be on the 0 line. The further away the absolute mean differences are, the worse the balance. Now we have seen what happens with the means of the covariates. Let’s see next what happens with their distributions. Remember, it can well be that the means are the same but the distributions are statistically different. We’ll test that like we did last week with a Kolomogorov-Smirnov Statistic.

love.plot(bal.tab(match.ps, treat~age + educ + black + hispan + 
                    nodegree + married + re74 + re75+unem74+
                    unem75, data = lalonde),
          stat = "ks.statistics", var.order = "unadjusted")

Again, ideally everything would be on 0 (that would mean the covariate distributions are identical between the treatment and the control). Note also that kolomogorov-smirnov statistics are only available for continuous covariates.

3.1 3. Propensity Score, With Replacement

Now, let’s use propensity score WITH replacement. Remember, this means that a single control unit can be matched to several treated units. Matching without replacement can yield very bad matches if the number of comparison observations comparable to the treated observations is small. It keeps variance low at the cost of potential bias. Matching with replacement keeps bias low at the cost of a larger variance since you are using the same subjects again and again.

How many neighbors should you use? It’s usually better to oversample – use more neighbors. This, however, involves a trade-off between bias and variance: it trades lower variance with less potential bias.

set.seed(7777)
match.ps2<- Match(Y=lalonde$re78,   Tr=lalonde$treat,   X=ps$fitted,
                  replace=T)
summary(match.ps2)

$1314, not great, not terrible. And how’s the balance?

balancepost=MatchBalance(treat~., match.out=match.ps2,data=lalonde[,-c(1,11)])

Let’s plot the standardized mean differences:

love.plot(bal.tab(match.ps2, treat~.,data=lalonde[,-c(1,11)]),stat = "mean.diffs", 
          threshold = .1, var.order = "unadjusted" ,abs=T)

#4. Exercise 2

So far the closest we’ve got to the experimental benchmark was with 1642. Let’s try Caliper Matching Now. Caliper Matching is a variant of NN method that attempts to avoid ‘bad’ matches ($\bar{P}_j$ that are far from our $\bar{P}_i$. In order to do this, Caliper Matching imposes a tolerance on the maximum distance between the two. If, for some participants, no matches are found within the specified caliper (distance) then they will be excluded from the analysis. Theoretically, Caliper should be better than NN, since it corrects for ‘bad’ matches. Compared to NN, Caliper (also Kernel, LLR) use weighted averages. Caliper for example uses the weighted average over multiple persons within the caliper.

Let’s estimate Caliper with a distance of 0.1 standard deviations.

set.seed(1111)
match.ps.cal0.1=Match(Y=lalonde$re78,   Tr=lalonde$treat,   
                          X=ps$fitted, caliper=0.1, replace=F)
summary(match.ps.cal0.1)

We got $1374 That’s worse than before. How’s the balance though?

bal.ps.cal0.1=MatchBalance(treat~+age + educ + black + hispan + nodegree + 
             married + re74+re75+unem74+unem75,
             match.out=match.ps.cal0.1,data=lalonde)

and plot that

love.plot(bal.tab(match.ps.cal0.1, treat~age + educ + black + hispan +
                    nodegree + married + re74 + re75+unem74+unem75,
                  data = lalonde),stat = "mean.diffs", 
          threshold = .1, var.order = "unadjusted" ,abs=T)

Does this look good? Remember we wanted to get as close as possible to $1794. Try again by changing the caliper.

# let's set seed equal to 5555
# let's try a caliper equal to 0.25  
# let's output of our Match function  "match.ps.cal25"

set.seed(5555)
match.ps.cal25 <- Match(Y = lalonde$re78, Tr = lalonde$treat, X = ps$fitted,
                        caliper = 0.25, replace = F)

summary(match.ps.cal25)

# let's check for balance before and after matching using a 0.25 caliper. 

bal.ps.cal25 = MatchBalance(treat~+age + educ + black + hispan + nodegree +
                              married + re74 + re75 + unem74 + unem75,
                            match.out = match.ps.cal25, data = lalonde)

# let's generate the balance plot

#5. Mahalanobis Distance

Like propensity score matching, Mahalanobis distance matching is built on the notion of distance between observations of pretreatment covariates. It uses the complete variance covariance matrix, which means that the relationship between variables is included in the analysis. The contribution to the distance calculation of two highly correlated covariates will then be lower than that of less correlated ones. In essence Mahalanobis distance matching scales the distance by the inverse of the covariance matrix.

\[MD(X_i, X_j) = \sqrt{(X_i - X_j)'\Sigma^{-1}(X_i - X_j)}\]

For exact match, \[MD = (X_i, X_j) = 0\]

library(tidyverse)
lalonde_cov <- lalonde %>%
  dplyr::select(-X, -treat, -re78)

set.seed(2222)
match.maha  <-  Match(Y=lalonde$re78,Tr=lalonde$treat,X=lalonde_cov,
                     BiasAdjust=F,estimand="ATT",M=1, Weight = 2)


summary(match.maha)

With Mahalanobis, we get $1715, so it’s slightly better than the caliper =0.1 and both NNs matching

And checking balance:

bal.maha  <- MatchBalance(lalonde$treat~.,match.out = match.maha, data=lalonde_cov,ks=FALSE)

Visualizing that in a plot

love.plot(bal.tab(match.maha, treat~.,data=lalonde[,-c(1,11)]),stat = "mean.diffs", 
          threshold = 0.1, var.order = "unadjusted" ,abs=T)

Matching technique	Estimation	Diff to benchmark
Benchmark	1794	0
Regression	-8506	10300
Regression + cov	1068	726
NN without rep	1642	152
NN with rep	1714	80
Caliper (0.1)	1374	420
Capiler (0.25)	1839	45
Mahalanobis	1715	79