Lab 3: Selection on Observables

Lecture Slides & Data

Recap

Selection on Observables

Sometimes we cannot randomise treatment. While randomising treatment is the clearly preferable approach - as we learned in the lectures - we still might want to estimate a treatment effect for some interventions or causes we cannot influence. In fact, this might be the case for some of the most interesting research questions in social science.

Simply using observational data comes with all kinds of problems - remember the fundamental problem of causal inference. To mitigate these, we need to assume that, conditional on observable covariates, potential outcomes are the same for treated and untreated units. That is, we must check if treated and untreated units are different in ways we can observe.

Two assumptions are crucial to such designs based on selection on observables:

Conditional Independence Assumption

\[ ((Y_i(1), Y_i(0)) \perp D_i, X_i\]

This assumptions maintains that potential outcomes are independent of treatment ($D_i$), conditional on covariates ($X_i$). That is, for the same outcome of the covariates (or the same $X_i$), treatment is as good as random.

Common Support Assumption

\[ 0 < Pr(D_i=1| X_i=x) < 1\]

This assumptions describes that the probability of treatment is neither 0 nor 1 for a given $X$. There must be variation in the observations we compare to each other: For each level of $X_i$, there are both treated and untreated units.

Note that these assumptions are not the same as the ones we saw under random assignment. Under random assignment, the assumption is:

\[ (Y_i(1), Y_i(0)) \perp D_i\] which implies that $E[Y_{0i} | D_i = 1] = E[Y_{0i} | D_i = 0]$ and $E[Y_{1i} | D_i = 1] = E[Y_{1i} | D_i = 0]$. In other words, the potential outcomes of all units are the same, regardless of treatment status.

Now, once we select on observables, the conditional independence assumption implies that $E[Y_{0i} | D_i = 1, X_i = x] = E[Y_{0i} | D_i = 0, X_i = x]$ and $E[Y_{1i} | D_i = 1, X_i = x] = E[Y_{1i} | D_i = 0, X_i = x]$. In other words, the potential outcomes of all units with the same $X_i$ are the same, regardless of treatment status.

Accordingly, we can derive our estimator as:

\[\Gamma^{SOO} = E[Y_i(1)| X_i = x] - E[Y_i(0)| X_i = x]\]

Taking the weighted average across $X_i$, we can - in principle - calculate the ATE. However, doing this in practice inevitably involves some degree of uncertainty and imprecision. The conditional independence assumption is a very strong assumption - it hardly can be satisfied in practice. First, we cannot account for unobserved variables. As soon as we are working with continuous variables, we are likely to introduce bias as we need to make trade-offs regarding the similarity of observations. One major - more general - obstacle is the curse of dimensionality. While we can relatively easily satisfy the condition for one variable (especially if it is binary), this becomes exponentially more difficult if we increase the number of variables: We would need a meaningful number of observation for each combination of outcomes.

Selection on observables should therefore used with caution. Some treat it as a last-resort approach - using it if and only if randomisation and other identification strategies are not available. However, there might well be cases in which matching can be the most credible design to test hypotheses.

There are two approaches to selecting on observables:

1. Multivariate Regression

This one should be familiar to you. Running a multivariate regression means that the estimation of a treatment effect and conditioning on observables happen in a single step.

This is a parametric technique, which often involves extrapolation. That is, outcomes of $X$ that are being compared are not necessarily close to each other.

2. Matching

As opposed to regressions, balancing coviariates and the estimating of a treatment effect are separated when we apply Matching techniques. This allows us to influence how balance is being achieved. In other words, we can make adjustments to the process of making units more comparable. We can, for instance, introduce much more narrow restrictions and only compare observations that are identical with respect to certain covariates. Think of binary variables and combinations thereof. Matching is a non-parametric estimation technique, which means that assumptions about the functional form we are implicitly making when running regressions do not need to be satisfied here.

There are different ways to match observations, such as using the most similar observation (‘nearest neighbour’), defining bandwidths or propensity scores. We will use a few of these approaches in this lab.

Before starting this seminar

Create a folder called “lab3”
Download the data (you can use this button or the one at the top, or read csv files directly from github):
Open an R script (or Markdown file) and save it in our “lab3” folder.
Set your working directory using the setwd() function or by clicking on “More“. For example setwd(“~/Desktop/Causal Inference/2022/Lab3”)
Let’s install an load packages that we will be using in this lab:

library(jtools) # generate plots with regression coefficients
library(stargazer) # generate formated regression tables 
library(texreg) # generate formatted regression tables
library(nnet)  # If you want to use glm() to check for balance
library(tidyverse) # to conduct some tidy operations
library(ggplot2) # to generate plots 
library(Matching) #provides functions for multivariate and propensity score matching and for finding optimal balance based on a genetic search algorithm.
require(cobalt) # Generate balance tables and plots for covariates of groups preprocessed through matching, weighting or subclassification

Seminar Overview

In this seminar, we will cover the following topics:
1. See that failure to condition on relevant covariates can result in heavily biased (even outrigth wrong) estimates.
2. Run various multivariate regressions, conditioning on observable covariates.
3. Test and plot visual evidence for the common support assumption.
4. Learn how to match observations.
5. Conduct pre and post-balance checks using the Matching package.
6. Apply propensity score matching, both without and with replacement.
7. Use Mahalanobis distances to match observations.

The Effect of Daughters on Voting on Women’s Issues

Today We will be working with data from Ebonya Washington’s work on Female Socialization: How Daughters Affect Their Legislator Fathers’ Voting on Women’s Issue. In this paper, Washington seeks to answer a behavioural question: Does having a daughter alter the political attitudes of members of Congress? To address this question, she develops an identification strategy to reveal the causal effect of having a daughter on liberal voting with regard to issues such as reproductive rights.

Personal circumstances are well known to affect legislators opinions and even voting behaviour. Relatively famous is the case of Ronald Reagan: After being diagnosed with Alzheimer’s his inner circle started advocating in favour of stemm cell research in the hope of finding a cure, although Reagan staunchly rejected any use of stemm cells for ideological reasons throughout his political career.

This paper looks at the behavioural effect children exert on their parents’ political attitudes. Washington specifically looks at the shift in attitudes towards women’s issues that are associated with raising a daughter. She finds clear evidence that parenting a daughter results in a more liberal voting record of members of Congress with regard to several issues, but particularly so for reproductive rights.

Importantly, Washington finds that - conditional on the total number of children - the effect is strong and significant. Obviously, members of Congress differ in several other characteristics as well, not last in terms of political predisposition. She therefore takes into account further personal as well as political characteristics of legislators.

The identification strategy of this paper rests on a few important assumptions:

The gender of legislators’ children is entirely random. In other words, there is no sex selection. While in some societies there can be a structural preference of parents for one sex over the other, this would pose a critical problem to this study.
Voting patterns (in popular elections) are not correlated with children’s gender. Correlation would be a problem as voting preferences in elections to Congress would determine the winning candidate and the number of daughters of elected candidates. Washington shows that this seems not to be the case, with remaining imbalances underestimating the treatment effect rather than the other way around.
There is no selection on unobserved variables.

Washington concludes that she finds strong support for the behavioural effect hypothesis, which indeed is most pronounced for reproductive rights.

Her data set includes the voting records of all the Representatives in the US Congress during the 105th Congress, comprising the years 1997–1998. A description of the variables is listed below:

Variable	Description
`nowtot`	Congressional voting record scores as compiled by the National Organisation of Women (NOW). A higher level in this variable indicates a more liberal record (the outcome).
`ngirls`	The number of daughters a legislator has (the treatment)
`totchi`	The total number of children of a legislator
`repub`	Binary variable for membership in the Republican party
`female`	Binary variable equal to 1 if legislator is female
`white`	Binary variable equal to 1 if legislator is white
`reld3`	Binary variable equal to 1 if legislator is catholic
`age`	Continuous variable indicating legislators’ age
`srvlng`	Service length in Congress (in years)
`demvote`	Democratic vote share in legislator’s district
`propgirls`	Proportion of legislator’s children that are female

Now let’s load the data. There are two ways to do this:

You can load the brands dataset from your laptop using the load() function. If you stored your data in the same directory that set your working directory, you only need to write the name of the dataset, plus the extension (.dta).

# Set your working directory
#setwd("~/Desktop/Causal Inference/2022/Lab3")
# 
load("daughters.rdata") # loading your data from your laptop

Or you can download the data from the course website from following url: https://github.com/dpir-ci/CI22/raw/gh-pages/docs/data/daughters.rdata.

Let’s start this lab by inspecting the data.

head(daughters)

# A tibble: 6 x 14
  ngirls totchi repub srvlng female white   age demvote  aauw   rtl nowtot nboys
   <dbl>  <dbl> <dbl>  <dbl>  <dbl> <dbl> <dbl>   <dbl> <dbl> <dbl>  <dbl> <dbl>
1      0      0     0      7      0     1    58   0.570   100    90     90     0
2      1      3     0     15      0     1    54   0.600    88    95     90     2
3      0      0     1      1      0     1    31   0.430     0     0      0     0
4      2      2     0      1      0     1    51   0.520   100    95     95     0
5      2      2     0      7      0     1    39   0.590   100    94     90     0
6      3      7     1     27      0     1    68   0.280     0     0      0     4
# ... with 2 more variables: reld3 <dbl>, propgirls <dbl>

Descriptive Statistics

That looks as expected. Let us start with a few descriptive statistics to become familiar with the data.

Exercise 1: How many children does the average US representative have?

Reveal Answer

mean(daughters$totchi, na.rm = TRUE)

[1] 2.493088

# or use the summary() command
summary(daughters$totchi)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
  0.000   2.000   2.000   2.493   3.000  10.000       1

Simply using the mean function, we can calculate the average number of children per observation, i.e. Representative. As we see, during that term a legislator had on average 2.49 children.

Exercise 2: How many daughters does the average US representative have?

Reveal Answer

average_daughters <- mean(daughters$ngirls, na.rm = TRUE)
print(average_daughters)

[1] 1.274194

# or use the summary() command
summary(daughters$ngirls)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
  0.000   0.000   1.000   1.274   2.000   7.000       1

Following the same logic as above, we can calculate the average number of daughters - feel free to store the number in an object. A legislator had on average 1.27 daughters.

Exercise 3: What is the proportion of daughters among US representatives? What would you expect?

Reveal Answer

share_daughters <- sum(daughters$ngirls , na.rm = TRUE)/sum(daughters$totchi, na.rm = TRUE)
print(share_daughters)

[1] 0.5110906

# or use the summary() command
summary(daughters$propgirls)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
 0.0000  0.3333  0.5000  0.5200  0.6667  1.0000      61

We can calculate the proportion of daughters by calculating both the total number of daughters and the total number of children using the sum command. Alternatively, we can calculate the average proportion of daughters among legislators. Note that - while they should roughly match - these are not the same parameters. We see that 51% of all children were female, while the average proportion of daughters is 52% - this is roughly what we’d expect when gender is random.

Exercise 4: What is the NOW score for the average Republican member? And Democrat?

Hint: Make sure to exclude NAs before calculating the average NOW scores!

Reveal Answer

# Republicans
now_rep <- mean(daughters$nowtot[daughters$repub=="1"], na.rm = TRUE)
now_rep

[1] 12.04846

# Democrats
now_demo <- mean(daughters$nowtot[daughters$repub=="0"], na.rm = TRUE)
now_demo

[1] 73.87255

The average NOW score for Republicans is 12, while the average score for Democrats is 74. This is a massive difference, which we would expect to see in Congress, especially when it comes to rather polarised issues.

Exercise 5: Run a multivariate regression of the NOW score on all the legislator characteristics that you think contribute to explain voting records, excluding any variable on the number or gender of children

Reveal Answer

# One possible regression could look like this:
ols_covariates <- lm(nowtot ~ repub + female + age + reld3 + white + demvote + srvlng, data=daughters)
summary(ols_covariates)


Call:
lm(formula = nowtot ~ repub + female + age + reld3 + white + 
    demvote + srvlng, data = daughters)

Residuals:
    Min      1Q  Median      3Q     Max 
-51.640  -9.889  -0.333  10.001  59.447 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)   5.95838    9.39890   0.634  0.52646    
repub       -47.14717    2.10973 -22.348  < 2e-16 ***
female       12.57095    2.74017   4.588 5.91e-06 ***
age           0.02396    0.11021   0.217  0.82797    
reld3        -4.33507    1.82870  -2.371  0.01821 *  
white         8.73260    3.17143   2.754  0.00615 ** 
demvote     102.63605   10.10667  10.155  < 2e-16 ***
srvlng       -0.02605    0.13663  -0.191  0.84888    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 16.87 on 423 degrees of freedom
  (4 observations deleted due to missingness)
Multiple R-squared:  0.7902,    Adjusted R-squared:  0.7868 
F-statistic: 227.6 on 7 and 423 DF,  p-value: < 2.2e-16

We can see clear differences across legislator characteristics and partisan variables. From the regression we can see that party affiliation, gender, race, and religion, and the preferences of the voter in the legislator’s district are strong determinants of legislator’s voting record. Republicans (repub) are much less likely to vote for liberal policies in Congress. Similarly, higher democratic vote shares (demvote) on a continuous scale are associated with more liberal voting of legislators. Female representatives tend to have more liberal voting records as do white representatives. Being catholic (reld3) is associated with more conservative voting records - which we would expect to be the case.

Randomness of Gender

Let us now assess the randomness of children’s gender. This assumption is crucial to the research design’s identification strategy.

In countries like China and India, there is a strong parental preference for boys. Access to prenatal scanning techniques and selective abortion practices have strongly skewed the gender mix towards males. The likelihood of giving birth to a boy rather than a girl is further strongly correlated with parental characteristics. For instance, it is higher among the Indian middle classes than lower down the socio-economic scale. It is also higher in Northern Indian states than in more southerly states. This implies a causal relation from parental characteristics (including, probably, ideological preferences) to the gender of the children. It is important to understand whether US legislators display similar children gender-preferences. Further, are these preferences potentially correlated with the legislator’s characteristics (such as party affiliation, race etc)?

Exercise 6: Test the hypothesis that the gender mix among children of US legislators is 50%. Can we reject the null hypothesis?

Hint: Use the t-test command and change the alternative hypothesis you want to test by adding the argument mu = 0.5.

Reveal Answer

t.test(daughters$propgirls,mu = 0.5)


    One Sample t-test

data:  daughters$propgirls
t = 1.217, df = 373, p-value = 0.2244
alternative hypothesis: true mean is not equal to 0.5
95 percent confidence interval:
 0.4876701 0.5523775
sample estimates:
mean of x 
0.5200238

The mu argument alters the alternative (‘null’) hypothesis we are testing. If we do not specify it, the default null hypothesis will be a difference or value of 0. When we specify mu = 0.5, the t-test shows whether we can reject the null hypothesis of a mean value of 0.5. The t-test fails to reject the null hypothesis at the 10% level as the p-value is 0.22.

Exercise 7: Can we say that the number of daughters is a randomly allocated variable for legislators with two children? And for those with one child? How about across partisan lines?

Reveal Answer

# Two-children only
t.test(daughters$propgirls[daughters$totchi==2],mu = 0.5)


    One Sample t-test

data:  daughters$propgirls[daughters$totchi == 2]
t = 0.38936, df = 137, p-value = 0.6976
alternative hypothesis: true mean is not equal to 0.5
95 percent confidence interval:
 0.4556672 0.5660719
sample estimates:
mean of x 
0.5108696

# One-child only
t.test(daughters$propgirls[daughters$totchi==1],mu = 0.5)


    One Sample t-test

data:  daughters$propgirls[daughters$totchi == 1]
t = 1.6125, df = 39, p-value = 0.1149
alternative hypothesis: true mean is not equal to 0.5
95 percent confidence interval:
 0.4681975 0.7818025
sample estimates:
mean of x 
    0.625

# Reps v Democrats
  t.test(daughters$propgirls[daughters$repub==0],
       daughters$propgirls[daughters$repub==1])


    Welch Two Sample t-test

data:  daughters$propgirls[daughters$repub == 0] and daughters$propgirls[daughters$repub == 1]
t = 1.2576, df = 348.43, p-value = 0.2094
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.02355635  0.10710714
sample estimates:
mean of x mean of y 
0.5421402 0.5003648

Using the same syntax as before, you can just subset the data to consider legislators with exactly two children ([daughters$totchi==2]). The mean is now even slightly closer to 50% and we again fail to reject the null hypothesis of a mean of 0.5. We cannot reject the null hypothesis for legislators with one child only either. To compare Democrats and Republicans, we use a two-sample t-test as we have done before in previous classes. With a p-value of 0.2094, we maintain the null hypothesis of no difference between Republicans and Democrats.

Remember: The finding that gender is indeed random is crucial to the paper’s identification strategy.

(Multivariate) Regressions

Before we run any regressions, let us make sure we include observations with missing values. This way, we can make fair comparisons:

daughters <- daughters %>% 
  drop_na()

Exercise 8: Run a bivariate regression of NOW score on the number of daughters.

Are you surprised by your findings? In this regression, do you think that the independent variable of interest is uncorrelated with the error term?

Reveal Answer

### To run a regression of NOW score on the number of daughters:
lm_nowtot_bivariate <- lm(nowtot ~ ngirls, data=daughters)  
summary(lm_nowtot_bivariate)


Call:
lm(formula = nowtot ~ ngirls, data = daughters)

Residuals:
    Min      1Q  Median      3Q     Max 
-43.250 -34.744  -9.744  40.256  60.256 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   43.250      3.232  13.383   <2e-16 ***
ngirls        -1.753      1.765  -0.993    0.321    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 36.79 on 369 degrees of freedom
Multiple R-squared:  0.002667,  Adjusted R-squared:  -3.53e-05 
F-statistic: 0.9869 on 1 and 369 DF,  p-value: 0.3211

The estimate of the relation is negative, which is surprising. However, we cannot say that the independent variable in this regression is uncorrelated with the error term. The number of daughters is clearly positively correlated with the number of children, which in turn is positively correlated with ideological preferences. For instance, very religious and conservative people tend to have larger families and therefore more daughters.

Exercise 9: Run a multivariate regression of NOW score on the number of daughters controlling for the number of children.

Are you surprised by your findings? In this regression, do you think that the independent variable of interest is uncorrelated with the error term?

Reveal Answer

  # To run a regression of NOW score on the number of daughters, controlling for the number of children:
lm_nowtot_multivariate <- lm(nowtot ~ ngirls + totchi, data=daughters)
summary(lm_nowtot_multivariate)


Call:
lm(formula = nowtot ~ ngirls + totchi, data = daughters)

Residuals:
   Min     1Q Median     3Q    Max 
-54.20 -32.20 -10.32  35.86  88.67 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   57.321      4.227  13.560  < 2e-16 ***
ngirls         5.378      2.235   2.406   0.0166 *  
totchi        -8.499      1.714  -4.958 1.09e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 35.67 on 368 degrees of freedom
Multiple R-squared:  0.06513,   Adjusted R-squared:  0.06005 
F-statistic: 12.82 on 2 and 368 DF,  p-value: 4.155e-06

The point estimate is now strongly positive, which is consistent with our predictions. We have seen that the gender mix among US legislators is random. Therefore, we can conclude that the number of daughters (after we have controlled for the number of children) is uncorrelated with the error term of this regression.

Note that conditioning on the number of children - or ‘controlling’ for it - did not only alter the magnitude of the estimated effect of the number of daughters but also its direction. If we had (wrongly) disregarded the total number of children, we’d have a completely false impression of the true effect size and direction.

Exercise 10: Run a multivariate regression of NOW score on all relevant variables.

Run the regression and compile a table showing all three regression models using the stargazer, screenreg or multisum package.

Reveal Answer

  # To run a regression of NOW score on the number of daughters, controlling for the number of children:
lm_nowtot_all <- lm(nowtot ~ ngirls + totchi + age + repub + female +  white +  reld3 + demvote + srvlng, data=daughters)
summary(lm_nowtot_all)


Call:
lm(formula = nowtot ~ ngirls + totchi + age + repub + female + 
    white + reld3 + demvote + srvlng, data = daughters)

Residuals:
    Min      1Q  Median      3Q     Max 
-52.137  -9.278  -0.407   8.814  61.286 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)   7.77303   10.09259   0.770  0.44170    
ngirls        2.69925    1.04319   2.587  0.01006 *  
totchi       -2.47275    0.84987  -2.910  0.00384 ** 
age           0.04931    0.12407   0.397  0.69127    
repub       -46.69068    2.24967 -20.754  < 2e-16 ***
female       12.73937    2.81649   4.523 8.28e-06 ***
white         8.32240    3.31740   2.509  0.01255 *  
reld3        -4.77126    1.97183  -2.420  0.01603 *  
demvote     100.71891   10.58918   9.511  < 2e-16 ***
srvlng        0.04046    0.14044   0.288  0.77342    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 16.46 on 361 degrees of freedom
Multiple R-squared:  0.8047,    Adjusted R-squared:  0.7998 
F-statistic: 165.3 on 9 and 361 DF,  p-value: < 2.2e-16

# To produce a table
library(stargazer)
stargazer(lm_nowtot_bivariate, lm_nowtot_multivariate, lm_nowtot_all, title = "Models", column.labels=c("Bivariate","Multivariate", "Complete"), align = TRUE, type = "text")


Models
========================================================================================
                                            Dependent variable:                         
                    --------------------------------------------------------------------
                                                   nowtot                               
                         Bivariate           Multivariate               Complete        
                            (1)                   (2)                     (3)           
----------------------------------------------------------------------------------------
ngirls                    -1.753                5.378**                 2.699**         
                          (1.765)               (2.235)                 (1.043)         
                                                                                        
totchi                                         -8.499***               -2.473***        
                                                (1.714)                 (0.850)         
                                                                                        
age                                                                      0.049          
                                                                        (0.124)         
                                                                                        
repub                                                                  -46.691***       
                                                                        (2.250)         
                                                                                        
female                                                                 12.739***        
                                                                        (2.816)         
                                                                                        
white                                                                   8.322**         
                                                                        (3.317)         
                                                                                        
reld3                                                                   -4.771**        
                                                                        (1.972)         
                                                                                        
demvote                                                                100.719***       
                                                                        (10.589)        
                                                                                        
srvlng                                                                   0.040          
                                                                        (0.140)         
                                                                                        
Constant                 43.250***             57.321***                 7.773          
                          (3.232)               (4.227)                 (10.093)        
                                                                                        
----------------------------------------------------------------------------------------
Observations                371                   371                     371           
R2                         0.003                 0.065                   0.805          
Adjusted R2              -0.00004                0.060                   0.800          
Residual Std. Error  36.789 (df = 369)     35.667 (df = 368)       16.459 (df = 361)    
F Statistic         0.987 (df = 1; 369) 12.818*** (df = 2; 368) 165.272*** (df = 9; 361)
========================================================================================
Note:                                                        *p<0.1; **p<0.05; ***p<0.01

We can see that even after controlling for other characteristics an increase in the number of daughters is still associated with a meaningfully and significantly more liberal voting record. By conditioning on the total number of children we have exogenised ngirls, which suffered from endogeneity in the bivariate model.

Matching

Generally, when applying Matching techniques you should follow these steps:

Run probit (or logit) with the treatment as outcome and (relevant) covariates as predictors
Predict probabilities – this will create the propensity score. Note here that some (Sekhon e.g.) suggest that the logit score is superior to the propensity score.
Match each participant to one or more non-participants using the propensity score. You can do this in a variety of ways: Nearest Neighbour Matching (NN), Caliper Matching, Mahalanobis Distance, Exact Matching etc. The default of the Matching package we use today is NN.
Check balance after matching
Estimate the ATT

You can also use Matching for continuous treatments. However, to simplify the example in this lab we will turn the continuous treatment variable (ngirls) into a binary treatment variable that distinguishes legislators with at least one daughter from those that do not have any daughters at all. To do so, we create the binary variable treat:

# Creating a binary treatment variable.
daughters <- daughters %>% 
 mutate(treat = ifelse(ngirls > 0, 1, 0))

table(daughters$treat)


  0   1 
 59 312

Let us start with a pre-matching balance check. Note that this is essentially the same as randomization checks but in the context of Matching we call this process “Balance Check”. Since the treatment is now binary, we are going to run a logistic regression (or probit or less conservatively OLS).

Exercise 11: Run a logistic regression with the binary treatment variable (treat) as outcome.

Make sure to include all covariates from the complete multivariate regression model.

Reveal Answer

pre_matching  <- glm(treat  ~ totchi + age + repub + female +  white +  reld3 + demvote + srvlng,
                     data = daughters,   family  = binomial("logit"))
summary(pre_matching)


Call:
glm(formula = treat ~ totchi + age + repub + female + white + 
    reld3 + demvote + srvlng, family = binomial("logit"), data = daughters)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.6325   0.2479   0.4585   0.6491   1.2078  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -1.10354    1.72165  -0.641   0.5215    
totchi       0.76629    0.16640   4.605 4.12e-06 ***
age          0.01556    0.02248   0.692   0.4888    
repub       -0.11453    0.38568  -0.297   0.7665    
female       0.37737    0.54813   0.688   0.4912    
white        0.28492    0.55719   0.511   0.6091    
reld3       -0.03414    0.34187  -0.100   0.9204    
demvote      0.72026    1.82422   0.395   0.6930    
srvlng      -0.05790    0.02464  -2.350   0.0188 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 325.04  on 370  degrees of freedom
Residual deviance: 288.25  on 362  degrees of freedom
AIC: 306.25

Number of Fisher Scoring iterations: 5

As expected the total number of children is strongly associated with having at least one daughter. Interestingly, there is a somewhat weak correlation between previous time served in Congress and having a daughter. Apart from this, however, all other covariates are not significant predictors of the treat variable.

Now, we are going to create a data frame by predicting the outcome from the regression we just ran. The first column will be the propensity score, i.e. the predicted values or the probability of receiving the treatment given the covariates; and the second will be a binary indicator for those treated (treat==1). We use the Matching package to do this.

library(Matching)

ps.df <- data.frame(pr_score = predict(pre_matching, type = "response"),
                     treat = pre_matching$model$treat)
head(ps.df, n=5)

   pr_score treat
1 0.8681191     1
2 0.8610881     1
3 0.7925949     1
4 0.9835946     1
5 0.9620695     1

As you can see, the data frame shows the calculated propensity scores for each observation as well as the treatment status, i.e. whether legislators had at least one daughter.

The Common Support Assumption requires that for each value of $X$, we can compare treated and untreated units. Using the propensity score, we have reduced the dimensionality of our covariates. That is, instead of having to pay attention to individual covariates, we look at the probability of being assigned to treatment. This makes it easier for us to check the assumption. We can do this by plotting the propensity score by treatment status.

Exercise 12: Plot the distribution of the propensity score by treatment status.

Hint: You can use the the sm package’s sm.density.compare command to the do so. The first argument is the propensity score, while the second indicates treatment status.

Reveal Answer

require(sm)
sm.density.compare(ps.df$pr_score, ps.df$treat, xlab="Propensity Score")
title(main="Propensity score by treatment status")
text(0.6,10, "Control", col="red")
text(0.6,8, "Treated", col="green")

We can see that treated observations are associated with a larger propensity score than untreated units. This is what we would expect. Moreover, there is a significant overlap: For a large range of propensity scores there are both treated and untreated units.

We are going to use Matching, a package with two important functions:

The MatchBalance() function gives you balance before and after matching
The Match() performs propensity score and distance (i.e. mahalanobis) Matching

Let’s start with Step 1 of our list: Run a logit or probit with the treatment as outcome. Let’s check balance before matching with this package. One parameter that we will look at is the standarised difference in means between treated and untreated units. Below you can see the formula to calculate this estimator:

\[smd = \frac{\bar{X}_t- \bar{X}_c}{\sqrt{\frac{SD^2_{1}}{n_1} + \frac{SD^2_{1}}{n_2}}}\] But let’s use the MatchBalance() to check for balance. The syntax is the following: MatchBalance(treat assignment variable ~ covariates, data= dataset). We will then include the match.out argument into this function to check for balance between before and after matching.

bal.bm=MatchBalance(treat~ repub + srvlng + female + white + age + 
                  demvote + totchi + reld3 ,data=daughters)

What does this tell us?

We can see from this output at the end what are the variables that there is unbalance. For example, we see that there is unbalance on the totchi covariate.

Now, let’s match! We use propensity score matching (PSM). By default, we estimate NN – what does this mean? NN sets $C(\bar{P}_i)=min|\bar{P_i}-\bar{P}_j|$. Basically, the non-participant(s) $\bar{P}_j$ closest to our participant $\bar{P}_i$ will be selected as a match. The estimator has the option to perform matching with or without replacement. Today, we will conduct first matching without replacement. It’s good to know though that this method has the disadvantage that the final estimate will usually depend on the initial ordering of the treated observations for which the matches were selected. How many neighbours should you use? It’s usually better to oversample – use more neighbours. This, however, involves a trade-off between bias and variance: it trades lower variance with increased bias. Generally, in choosing between different matching estimators, the one with the best balance is the most satisfactory one.

To retrieve the propensity score, we estimate a logit model predicting assignment to treatment – just what we did above in pre_matching. We’ll call this ps though it’s exactly the same model as pre_matching.

Exercise 13: regress treatment variable treat on totchi, repub, srvlng, female, white, age, demvote, and reld3. using the glm() function. The syntax is the following: glm(treatment assignment variable ~ covariate 1 + covariate 2...covariate n, data = data, family = binomial("logit")). Store the output from the glm() function into an object called ps.

Reveal Answer

ps <- glm(treat ~ repub + srvlng + female + white + 
                age + demvote + reld3 + totchi ,    data    = daughters,    family  = binomial("logit"))

Next, we are going to match each treatment observation with a untreated unit, without replacement (that means, we are only going to use unique pairs of treatment-control, by contrast to matching with replacement, where the same individual can be control for several treated units). Let’s use the Match() function!.

The syntax is the following: Match(Y = df$outcome, Tr = df$treat, X = ps$fitted, replace = FALSE).

When we set the replace argument equal to FALSE we are setting our matching algorithm without replacement. The second argument of this function is treatment status (Tr) has to be equal to the treatment assignment variable daughters$treat. The X is the argument that determines the vector that will use to match the units. When we are conducting PSM, we set X equal to the propensity score obtained from the predicted values from the logistic regression ps$fitted

Exercise 14: Conduct matching using Match() without replacement. Match using propensity scores. Store the output of the Match() function into an object. Then, use summary() function to report the results. We need to set seed to replicate our results. Set a seed equal to 1111 set.seed(1111). This is to replicate the results from matching.

Reveal Answer

set.seed(1111) 
match.ps<- Match(Y=daughters$nowtot, Tr=daughters$treat,    X=ps$fitted, replace=FALSE)
summary(match.ps)


Estimate...  -0.16949 
SE.........  6.5292 
T-stat.....  -0.025959 
p.val......  0.97929 

Original number of observations..............  371 
Original number of treated obs...............  312 
Matched number of observations...............  59 
Matched number of observations  (unweighted).  59

We got that our point estimate is -0.17. Does this seem good? Let’s compare this point estimate to the regression estimates from OLS.

Exercise 15: Regress using lm() function setting as a dependent variable nowtot and indepent variables treat, totchi, repub, srvlng, female, white, age, demvote, and reld3.Store the output from lm() into an object. Use the summary function summary() to report the results. What the estimate for the treat variable.

Reveal Answer

multivariate <- lm(nowtot ~ treat + totchi + repub + srvlng + female + white + age + demvote + reld3, data=daughters)

summary(multivariate)


Call:
lm(formula = nowtot ~ treat + totchi + repub + srvlng + female + 
    white + age + demvote + reld3, data = daughters)

Residuals:
    Min      1Q  Median      3Q     Max 
-52.052  -9.092   0.325   8.907  59.732 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)   6.00863   10.22942   0.587   0.5573    
treat         4.09385    2.45198   1.670   0.0959 .  
totchi       -1.35926    0.68149  -1.995   0.0468 *  
repub       -46.88293    2.25988 -20.746  < 2e-16 ***
srvlng        0.03889    0.14183   0.274   0.7841    
female       12.84109    2.83150   4.535 7.85e-06 ***
white         7.94178    3.33734   2.380   0.0178 *  
age           0.04104    0.12479   0.329   0.7424    
demvote     100.64213   10.64798   9.452  < 2e-16 ***
reld3        -4.79761    1.98234  -2.420   0.0160 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 16.55 on 361 degrees of freedom
Multiple R-squared:  0.8026,    Adjusted R-squared:  0.7977 
F-statistic: 163.1 on 9 and 361 DF,  p-value: < 2.2e-16

We see that the point estimates substantially differ. From matching using nearest neighbour without replacement we got -0.17, whereas from OLS we got 4.09.

Now that we’ve done the matching, let’s check balance after matching. For that, let’s use the MatchBalance() function. Just to remember the syntax is the following: MatchBalance(treat assignment variable ~ covariates, data= dataset), but now let’s include the match.out argument into the function to check before and after matching. MatchBalance(treat assignment variable ~ covariates, match.out = output Match, data= dataset).

Exercise: 16: Check for balance using the MatchBalance() for the Nearest Neighbour matching without replacement.

Reveal Answer

After matching, we are unable to achieve balance for the totchi covariate. [Why?]

We can see that the MatchBalance reports four difference sets of balance statistics for each variable. It contains the means of each group, standardised empirical quantile-quantile plots or so-called empirical-QQ plots (these plots try to depict if two datasets come from the same population with a common distribution). MatchBalance() also reports a summary statistics from the raw empirical-QQ plots. It provides variance ratio of the treatment over the control, and the t test of difference in means. Finally, it also provides summary statistics of the Empirical Cumulative Distribution Function (eCDF), which is the formal name of a cumulative distribution of a sample.

We can plot a balance plot using the love.plot() and bal.tab() function. The bal.tab() function calculates the standardised difference in means. The love.plot() function generates a plot that displays covariate balance before and after matching. Some of the arguments are stat which determines what statistics you want to report, one option is mean.diffs which reports the mean differences for each covariate. We can set a threshold to set an optional value as a threshold marker. We can also set that the statistics are presented in absolute value by adding the abs function equal TRUE. Finally, we can also set the order of which the statistics are presented by including the var.order argument. If we set var.order equal to “unadjusted” the statistics will be ordered by the balance statistic of the unadjusted sample. See the example below. Always remember to add the object of the Match() function at the beginning of the bal.tab() function.

love.plot(bal.tab(match.ps, treat~ repub + srvlng + female + white + age + demvote + totchi + reld3,
                  data = daughters),stat = "mean.diffs", 
          threshold = .1, var.order = "unadjusted" ,abs=T)

Ideally, all covariates would be on the 0 line. The further away the absolute mean differences are, the worse the balance. Now we have seen what happens with the means of the covariates. Let’s see next what happens with their distributions. Remember, it can well be that the means are the same but the distributions are statistically different. We’ll test that like we did last week with a Kolomogorov-Smirnov Statistic.

Exercise 17: Check for balance using the MatchBalance() for the Nearest Neighbour matching without replacement, but rather than reporting mean differences, check for balance in their cumulative distribution by setting stat = "ks.statistics".

Reveal Answer

love.plot(bal.tab(match.ps, treat~ repub + srvlng + female + white + age + demvote + totchi + reld3,
                  data = daughters),stat = "ks.statistics", 
          threshold = .1, var.order = "unadjusted" ,abs=T)

Again, ideally everything would be on 0 (that would mean the covariate distributions are identical between the treatment and the control). Note also that kolomogorov-smirnov statistics are only available for continuous covariates.

Propensity Score, With Replacement

Now, let’s use propensity score WITH replacement. Remember, this means that a single control unit can be matched to several treated units. Matching without replacement can yield very bad matches if the number of comparison observations comparable to the treated observations is small.

How many neighbours should you use? It’s usually better to oversample – use more neighbours. This, however, involves a trade-off between bias and variance

To do matching with replacement we only need to change set the argument replace of the Match() function equal to TRUE. Let run Matching with replacement.

Exercise 18: Conduct Matching using the Match() function, set the replace argument equal to TRUE. Use the summary() to report your results. Interpret the output from the Match() function. With matching with replacement, do you get closer to the multivariate regression benchmark? Do you think that matching with replacement leads to more or less bias to matching with replacement?. Does it leads to less or variance compared to without matching replacement? Remember to set the treatment status (Tr) function equal to daughters$treat. Set X equal to the propensity score obtained from the predicted values from the logistic regression ps$fitted.

Reveal Answer

match.ps2 <- Match(Y=daughters$nowtot,  Tr=daughters$treat, X=ps$fitted,
                  replace=T)

summary(match.ps2)


Estimate...  -3.9583 
AI SE......  9.5929 
T-stat.....  -0.41263 
p.val......  0.67988 

Original number of observations..............  371 
Original number of treated obs...............  312 
Matched number of observations...............  312 
Matched number of observations  (unweighted).  316

We found that the estimate is now -3.93. Very far from the multivariate estimate. We even got a negative coefficient!

Regarding the output from the Match() function, we also have AI SE which is matching corrected standard error Abadie and Imbens, 2022. The T-stat and p.val are the standard t-test and p-value of the point estimate and they are interpreted as usual.

In terms of variance-bias trade off. Matching with replacement has the advantage of always matching each treated unit to the closest untreated unit. However, it produces larger bias reduction than matching with replacement.

Exercise 19: Let’s check if we actually achieved balance using the MatchBalance() function on a set of covariates. One thing that we need to do is subset our data so we can conduct our balance test only a data frame thaat contains only covariates. Let’s drop the outcome variables and some covariates from the analysis. Let’s also remove columns 1, 9, 10, 11, 12, 14 which are either outcomes or are variables that have been created as a function of other variables. You can use the squared brackets to remove these columns. You can use the following syntax: data.frame[ ,-c(1,9:12, 14)]. Then, use the bal.tab() function to report the covariate balance. The syntax is the following: bal.tab(object output Match function, treatment assignment variable ~ covariates, data = dataframe (excluding columns 1,9,10,11, 12 and 14)). Remember to install/load the cobalt library to use bal.tab() function. Identify in which variables there is unbalance before and after matching.

Reveal Answer

balancepost=MatchBalance(treat~., match.out=match.ps2, data=daughters[,-c(1, 9:12, 14)])

We observe that before matching there is unbalance for the following variable(s): totchi which is the total number of children. After matching we got unbalance for the following covariates: srvlng, age, and totchi.

Let’s see the results using the bal.tab() function

bal.tab(match.ps2, treat~., data=daughters[,-c(1, 9:12, 14)])

The bal.tab() reports substantially less information. It reports the standardised difference between the average of the treated and untreated units after matching. It also provides the numbers of observations in the treatment and control group, and the effective sample size, which is the number of observations matched in both groups. Finally it reports "Weighted matching’, which is when you introduce weighting within strata and proportional weighting within strata as alternative weighting methods.

Now plot a balance plot for Matching with replacement.

Exercise 20: Generate a balance plot before and after matching using the love.plot() and bal.tab() functions . Include the following arguments into the function: Set the stat argument equal to “variance.ratios”. As you may imagine, balance ratios are essentially the ratio of the variance of a covariate for the treated group over the variance of the same covariate for the control group. Also, let’s set the order of the variables by the size of the variance ratio for the adjusted measures by setting the var.order argument equal to “adjusted”. Also set the abs argument equal to TRUE.

Reveal Answer

daughters_cov=daughters[,-c(1, 9:12, 14)]

var_names <- data.frame(old = c("age", "educ", "black", "hispan", 
                        "married", "nodegree", "re74", "re75"),
                new = c("Age", "Years of Education", "Black", 
                        "Hispanic", "Married", "No Degree", 
                        "Earnings 1974", "Earnings 1975"))

love.plot(bal.tab(match.ps2, treat~., data=daughters[,-c(1, 9:12, 14)]), stat = "variance.ratios", var.order = "adjusted", abs = TRUE)

We don’t see to much improvement in terms balance for variance.

So far the closest we’ve got to the multivariate regression benchmark has been using matching without replacement. Let’s try Caliper Matching Now. Caliper Matching is a variant of Nearest Neighbour method that attempts to avoid ‘bad’ matches ($\bar{P}_j$ that are far from our $\bar{P}_i$. In order to do this, Caliper Matching imposes a tolerance on the maximum distance between treated and untreated units. If, for some participants, no matches are found within the specified caliper (distance) then they will be excluded from the analysis. Theoretically, Caliper should be better than NN, since it corrects for ‘bad’ matches. Compared to NN, Caliper (also Kernel, LLR) use weighted averages. Caliper for example uses the weighted average over multiple persons within the caliper. A caliper for matching is usually defined as a fraction of the standard deviation of the logit of the propensity score.

Using a caliper should improve the quality of matching and also enforces common support because untreated units far from the treated units are discarded.

Let’s estimate Caliper with a distance of 0.1 standard deviations.

Exercise 21: Perform matching setting a caliper of 0.1. To do this, add the following argument to the Match() function: caliper = 0.1. Remember to place a comma between the arguments. Set the argument replacement equals to FALSE (or F). Let’s use the same outcome. nowtot. Use the summary() to report your results.

Reveal Answer

match.ps.cal0.1=Match(Y=daughters$nowtot,   Tr=daughters$treat, 
                          X=ps$fitted, caliper=0.1, replace=F)
summary(match.ps.cal0.1)


Estimate...  4.7273 
SE.........  6.9561 
T-stat.....  0.67959 
p.val......  0.49677 

Original number of observations..............  371 
Original number of treated obs...............  312 
Matched number of observations...............  55 
Matched number of observations  (unweighted).  55 

Caliper (SDs)........................................   0.1 
Number of obs dropped by 'exact' or 'caliper'  257

We got4.73 That’s `better than before (assuming that there is, in fact, a positive effect of having daughters). At least we got a positive coefficient. How’s the balance though?

Exercise 22: Perform balance a covariate evaluation using the MatchBalance() function. Include the following variable: “repub”, “srvlng”, “female”, “white”, “age” , “totchi”, “demvote”,“reld3”. Identify which variables are unbalance before and after matching. Remember to store the output from the Match() function into a new object. Then set the argument match.out equal to new output. Identify which variable are unbalance before and after matching.

Reveal Answer

bal.ps.cal0.1=MatchBalance(treat~ repub + srvlng + female + white + age + 
             demvote + reld3 + totchi,
             match.out=match.ps.cal0.1,data=daughters)

We find that using a caliper of 0.1 we are finally able to balance the covariate totchi!

Exercise 23: Again, plot the results of your balance covariate evaluation as you did before. Use the love.plot(bal.tab(....)) functions to generate this plot. Report the difference in means by setting stat = "mean.diffs". Include a threshold by adding the threshold argument equal to .1. Order the unadjusted difference in means in a decreasing order by adding the argument var.order = "unadjusted". Finally, include the abs = TRUE argument and set it equal to TRUE, so you report only positive values of the difference in means. This is the syntax putting everything together: love.plot(bal.tab(output matching, treat variable + covariates, data = data, stat= “means.diffs”, threshold = .1, var.order = "unadjusted, abs = T).

Reveal Answer

love.plot(bal.tab(match.ps.cal0.1, treat ~ repub + srvlng + female + white + age +  demvote + reld3 + totchi,
                  data = daughters),stat = "mean.diffs", 
          threshold = .1, var.order = "unadjusted" ,abs=T)

This looks pretty decent! Finally, we achieve balance in most of the covariates!

Exercise 24: Try again by changing the caliper equal to 0.25. Perform matching using the Match() function. Do this matching without replacement, so set the replace argument equal to TRUE (T).

Reveal Answer

match.ps.cal25 <- Match(Y = daughters$nowtot, Tr = daughters$treat, X = ps$fitted,
                        caliper = 0.25, replace = F)

summary(match.ps.cal25)


Estimate...  4.6364 
SE.........  6.8657 
T-stat.....  0.67529 
p.val......  0.49949 

Original number of observations..............  371 
Original number of treated obs...............  312 
Matched number of observations...............  55 
Matched number of observations  (unweighted).  55 

Caliper (SDs)........................................   0.25 
Number of obs dropped by 'exact' or 'caliper'  257

Again, we find a point estimate way larger than our regression benchmark The estimated coefficient is now 4.63. It doesn’t change substantially once we used a caliper equal to 0.10. It seems that most observations are tightly clustered around a single propensity score. If you see from the previous matching with caliper 0.1, 55 units were matched, and with a 0.25 caliper it is the same number! So it seems that making the caliper bigger made a difference. Note the rather small number of observations that can be matched even with a caliper of 0.25. The estimate of our coefficient seems about right (under the assumption that there is an effect), but we would still reject the null hypothesis given the p-value. We’d conclude that we’d need more observations here!

Note also that our findings from caliper matching indicate that NN matching included quite a lot of matches of observations that were rather distinct in terms of their proposensity scores: While they were the nearest available neighbours, they were not actually good neighbours. Being aware of the variance-bias trade-off, it seems that NN matching did decrease the variance by introducing a meaningful bias.

Exercise 24: Let’s examine again how well the matching procedure did in producing balance. Check for balance using the MatchBalance() function. Remember to store the output of the Match() function into an object and replace this object equal to the match.out argument.

Reveal Answer

# let's check for balance before and after matching using a 0.25 caliper. 

bal.ps.cal25 = MatchBalance(treat ~ repub + srvlng + female + white + age +
                            demvote + reld3 + totchi,
                            match.out = match.ps.cal25, data = daughters)


***** (V1) repub *****
                       Before Matching       After Matching
mean treatment........    0.53846           0.54545 
mean control..........    0.49153           0.52727 
std mean diff.........        9.4            3.6181 

mean raw eQQ diff.....   0.050847          0.018182 
med  raw eQQ diff.....          0                 0 
max  raw eQQ diff.....          1                 1 

mean eCDF diff........   0.023468         0.0090909 
med  eCDF diff........   0.023468         0.0090909 
max  eCDF diff........   0.046936          0.018182 

var ratio (Tr/Co).....    0.98066           0.99469 
T-test p-value........    0.51323           0.84218 


***** (V2) srvlng *****
                       Before Matching       After Matching
mean treatment........     8.5865            9.7273 
mean control..........      10.78            9.3455 
std mean diff.........    -29.042            4.9089 

mean raw eQQ diff.....     2.1525            1.0364 
med  raw eQQ diff.....          2                 0 
max  raw eQQ diff.....          8                 4 

mean eCDF diff........   0.061969          0.034091 
med  eCDF diff........   0.060354          0.027273 
max  eCDF diff........    0.13521          0.090909 

var ratio (Tr/Co).....    0.54418           0.82865 
T-test p-value........    0.12162           0.77354 
KS Bootstrap p-value..      0.154             0.842 
KS Naive p-value......    0.32452           0.97691 
KS Statistic..........    0.13521          0.090909 


***** (V3) female *****
                       Before Matching       After Matching
mean treatment........      0.125          0.054545 
mean control..........   0.084746          0.090909 
std mean diff.........     12.152           -15.867 

mean raw eQQ diff.....   0.050847          0.036364 
med  raw eQQ diff.....          0                 0 
max  raw eQQ diff.....          1                 1 

mean eCDF diff........   0.020127          0.018182 
med  eCDF diff........   0.020127          0.018182 
max  eCDF diff........   0.040254          0.036364 

var ratio (Tr/Co).....     1.3907             0.624 
T-test p-value........    0.32993           0.41498 


***** (V4) white *****
                       Before Matching       After Matching
mean treatment........    0.87179           0.87273 
mean control..........    0.84746           0.87273 
std mean diff.........      7.268                 0 

mean raw eQQ diff.....   0.016949                 0 
med  raw eQQ diff.....          0                 0 
max  raw eQQ diff.....          1                 0 

mean eCDF diff........   0.012169                 0 
med  eCDF diff........   0.012169                 0 
max  eCDF diff........   0.024337                 0 

var ratio (Tr/Co).....    0.85267                 1 
T-test p-value........    0.63372                 1 


***** (V5) age *****
                       Before Matching       After Matching
mean treatment........     52.628                52 
mean control..........     52.203            51.164 
std mean diff.........     4.7262            9.3791 

mean raw eQQ diff.....     1.4915            1.4545 
med  raw eQQ diff.....          1                 1 
max  raw eQQ diff.....          7                 6 

mean eCDF diff........   0.033654          0.032536 
med  eCDF diff........   0.029444          0.018182 
max  eCDF diff........   0.087082          0.090909 

var ratio (Tr/Co).....    0.67918           0.79821 
T-test p-value........    0.77901           0.65186 
KS Bootstrap p-value..      0.674             0.936 
KS Naive p-value......    0.84606           0.97691 
KS Statistic..........   0.087082          0.090909 


***** (V6) demvote *****
                       Before Matching       After Matching
mean treatment........    0.49929           0.48182 
mean control..........    0.51017           0.49945 
std mean diff.........    -8.5332           -13.764 

mean raw eQQ diff.....   0.016271          0.034364 
med  raw eQQ diff.....       0.01              0.02 
max  raw eQQ diff.....       0.08              0.12 

mean eCDF diff........    0.02367          0.058442 
med  eCDF diff........   0.018307          0.054545 
max  eCDF diff........   0.071056           0.14545 

var ratio (Tr/Co).....     0.9728            1.1032 
T-test p-value........    0.55409           0.42878 
KS Bootstrap p-value..      0.848             0.504 
KS Naive p-value......    0.96361           0.60572 
KS Statistic..........   0.071056           0.14545 


***** (V7) reld3 *****
                       Before Matching       After Matching
mean treatment........    0.27244           0.25455 
mean control..........    0.27119           0.27273 
std mean diff.........    0.28019           -4.1358 

mean raw eQQ diff.....          0          0.018182 
med  raw eQQ diff.....          0                 0 
max  raw eQQ diff.....          0                 1 

mean eCDF diff........ 0.00062473         0.0090909 
med  eCDF diff........ 0.00062473         0.0090909 
max  eCDF diff........  0.0012495          0.018182 

var ratio (Tr/Co).....    0.98906           0.95667 
T-test p-value........    0.98437           0.82801 


***** (V8) totchi *****
                       Before Matching       After Matching
mean treatment........     3.0417            2.2909 
mean control..........     2.1186            2.1818 
std mean diff.........      64.22            10.406 

mean raw eQQ diff.....     0.9661           0.14545 
med  raw eQQ diff.....          1                 0 
max  raw eQQ diff.....          4                 1 

mean eCDF diff........   0.092302          0.020779 
med  eCDF diff........   0.026782          0.018182 
max  eCDF diff........    0.29933          0.072727 

var ratio (Tr/Co).....     2.2119            1.1826 
T-test p-value........ 1.1455e-08           0.25628 
KS Bootstrap p-value.. < 2.22e-16             0.728 
KS Naive p-value...... 0.00027523           0.99864 
KS Statistic..........    0.29933          0.072727 


Before Matching Minimum p.value: < 2.22e-16 
Variable Name(s): totchi  Number(s): 8 

After Matching Minimum p.value: 0.25628 
Variable Name(s): totchi  Number(s): 8

We find that balance for all covariate!

Mahalanobis Distance

One of the most common methods of multivariate matching is based on Mahalanobis distance.

Like propensity score matching, Mahalanobis distance matching is built on the notion of distance between observations of pretreatment covariates. It uses the complete variance covariance matrix, which means that the relationship between variables is included in the analysis. The contribution to the distance calculation of two highly correlated covariates will then be lower than that of less correlated ones. In essence Mahalanobis distance matching scales the distance by the inverse of the covariance matrix. Here is the mathematical formula to compute the Mahalanobis distance:

\[MD(X_i, X_j) = \sqrt{(X_i - X_j)'\Sigma^{-1}(X_i - X_j)}\]

For exact match, \[MD = (X_i, X_j) = 0\]

Exercise 25: Conduct matching using Mahalanobis distance as the matching algorithm. First, create a data frame that only includes the covariates of the daughters dataset. There are different ways to do this, you can as we did before use the square brackers, in the second argument of the square brackets (after the comma) add the concatenate function c() with all the variables that you want to remove. You need to put the column number of the variable that you want to remove. For example, if want to remove the first variable, you need to put 1. After you include the column numbers, then add a negative sign - before each column number. Finally, assign the new data frame into a new dataframe using the assignment operator <-. An example of this would be as follows: df.cov <- df[,c(-1, -2, -3)]. Alternatively, you could use the select() from the dplyr package. The syntax is the following:

df_cov <- df %>% 
select(-c(var1, var2, var3))

Exercise 26: Use the Match() function to conduct matching using Mahalabonis. Set Y argument equal to our outcome of interest daughters$nowtot. Set the treatment assignment argument Tr equal to daughters$treat. For th X argument set it equal to the data frame that only includes the covariates. What is the ATT using Mahalanobis.

Reveal Answer

daughters_cov <- daughters %>%
  dplyr::select(-c(repub, ngirls, aauw, rtl, nboys, nowtot, propgirls))
                
match.maha  <-  Match(Y=daughters$nowtot,Tr=daughters$treat,X=daughters_cov,
                     BiasAdjust=F,estimand="ATT",M=1, Weight = 2)

summary(match.maha)


Estimate...  -0.59295 
AI SE......  5.1488 
T-stat.....  -0.11516 
p.val......  0.90832 

Original number of observations..............  371 
Original number of treated obs...............  312 
Matched number of observations...............  312 
Matched number of observations  (unweighted).  312

Mahalanobis does seem to suffer from the same problem we encountered above. Its point estimate (-0.59) seems to indicate that - given the rather small number of observations - it matches observations that are rather bad matches.

And now again, let’s check for balance:

Exercise 27: Check for balance using the data set that only includes the covariates. Remember to store the output from the Match function and set it equal to match.out.

Reveal Answer

bal.maha  <- MatchBalance(daughters$treat~.,match.out = match.maha, data=daughters_cov)

Again, we keep getting unbalance for the totchi variable.

Exercise 28: As a final exercise, plot the balance plot using the love.plot(bal.tab(....)) as we did before.

Reveal Answer

love.plot(bal.tab(match.maha, treat~.,data=daughters_cov),stat = "mean.diffs", 
          threshold = 0.1, var.order = "unadjusted" ,abs=T)

Below we have a table that summarises the results from all the matching techniques and the multivariate regression. As we can see, Capiler matching did relatively a good job compared to the other matching techniques, if we assume there is a true significant effect about the size of the OLS estimator. Pay attention to the random order of observations and the variance-bias trade off when doing matching. Often, the output of different estimators will give you an idea of what is ‘wrong’ with your data if you get vastly divergent results.

Matching technique	Estimation
Multivariate	4.09
NN without rep	-0.18
NN with rep	-3.95
Caliper (0.1)	4.73
Caliper (0.25)	4.63
Mahalanobis	-0.59

Lab 3: Selection on Observables

Week 3 HT 2022

Lecture Slides & Data

Recap

Selection on Observables

1. Multivariate Regression

2. Matching

Seminar Overview

The Effect of Daughters on Voting on Women’s Issues

Descriptive Statistics

Randomness of Gender

(Multivariate) Regressions

Matching

Propensity Score, With Replacement

Mahalanobis Distance