Our goal is to identify the average treatment effect on the treated (ATT), for cohort g at event time e ≡ t − g, which is defined by:
ATTg, e ≡ 𝔼[Yi, g + e(g) − Yi, g + e(∞)|Gi = g]
We may also be interested in the average ATT across treated cohorts for a given event time:
$$ \text{ATT}_{e} \equiv \sum_g \omega_{g,e} \text{ATT}_{g,e}, \quad \omega_{g,e} \equiv \frac{\sum_i 1\{G_i=g\}}{\sum_i 1\{G_i < \infty\}} $$ Lastly, we may be interested in the average across certain event times of the average ATT across cohorts:
$$ \text{ATT}_{E} \equiv \frac{1}{|E|} \sum_{e \in E} \text{ATT}_{e} $$ where E is a set of event times, e.g., E = {1, 2, 3}.
Control group: For the treated cohort Gi = g, let Cg, e denote the corresponding set of units i that belong to a control group.
Base event time: We consider a reference event time from before treatment b, which satisfies b < 0.
Difference-in-differences: The difference-in-differences estimand is defined by, DiDg, e ≡ 𝔼[Yi, g + e − Yi, g + b|Gi = g] − 𝔼[Yi, g + e − Yi, g + b|i ∈ Cg, e]
Throughout this section, our goal is to identify ATTg, e for some treated cohort g and some event time e. We take the base event time b < 0 as given.
Parallel Trends:
𝔼[Yi, g + e(∞) − Yi, g + b(∞)|Gi = g] = 𝔼[Yi, g + e(∞) − Yi, g + b(∞)|i ∈ Cg, e] This says that, in the absence of treatment, the treatment and control groups would have experienced the same average change in their outcomes between event time b and event time e.
No Anticipation:
𝔼[Yi, g + b(g)|Gi = g] = 𝔼[Yi, g + b(∞)|Gi = g] This says that, at base event time b, the observed outcome for the treated cohort would have been the same if it had instead been assigned to never receive treatment.
We prove that DiDg, e identifies ATTg, e in three steps:
Step 1: Add and subtract Yi, g + b(∞) from the ATT definition:
ATTg, e ≡ 𝔼[Yi, g + e(g) − Yi, g + e(∞)|Gi = g] = 𝔼[Yi, g + e(g) − Yi, g + b(∞)|Gi = g] − 𝔼[Yi, g + e(∞) − Yi, g + b(∞)|Gi = g]
Step 2: Assume that Parallel Trends holds. Then, we can replace the conditioning set Gi = g with the conditioning set i ∈ Cg, e in the second term:
ATTg, e = 𝔼[Yi, g + e(g) − Yi, g + b(∞)|Gi = g] − 𝔼[Yi, g + e(∞) − Yi, g + b(∞)|Gi = g] = 𝔼[Yi, g + e(g) − Yi, g + b(∞)|Gi = g] − 𝔼[Yi, g + e(∞) − Yi, g + b(∞)|i ∈ Cg, e]
Step 3: Assume that No Anticipation holds. Then, we can replace Yi, g + b(∞) with Yi, g + b(g) if the conditioning set is Gi = g:
ATTg, e = 𝔼[Yi, g + e(g) − Yi, g + b(∞)|Gi = g] − 𝔼[Yi, g + e(∞) − Yi, g + b(∞)|i ∈ Cg, e] = 𝔼[Yi, g + e(g) − Yi, g + b(g)|Gi = g] − 𝔼[Yi, g + e(∞) − Yi, g + b(∞)|i ∈ Cg, e] where the final expression is DiDg, e.
Thus, we have shown that DiDg, e = ATTg, e if Parallel Trends and No Anticipation hold.
DiDge(...)
CommandDiDg, e
is estimated in DiDforBigData
by the
DiDge(...)
command, which is documented here.
All: The largest valid control group is Cg, e ≡ {i : Gi > max {g, g + e}}.
To use this control group, specify control_group = "all"
in
the DiDge(...)
command. This option is selected by
default.
Two alternatives can be specified.
Never-treated: The never-treated control group is
defined by Cg, e ≡ {i : Gi = ∞}.
To use this control group, specify
control_group = "never-treated"
in the
DiDge(...)
command.
Future-treated: The future-treated control group is
defined by Cg, e ≡ {i : Gi > max {g, g + e}
and Gi < ∞}. To use this control
group, specify control_group = "future-treated"
in the
DiDge(...)
command.
Base event time: The base event time can be
specified using the base_event
argument in
DiDge(...)
, where base_event = -1
by
default.
The DiDge()
command performs the following sequence of
steps:
Step 1. Define the (g, e)-specific sample of treated and control units, Sg, e ≡ {Gi = g} ∪ {i ∈ Cg, e}. Drop any observations that do not satisfy i ∈ Sg, e.
Step 2. Construct the within-i differences ΔYi, g + e ≡ Yi, g + e − Yi, g + b for each i ∈ Sg, e.
Step 3. Estimate the simple linear regression ΔYi, g + e = αg, e + βg, e1{Gi = g} + ϵi, g + e by OLS for i ∈ Sg, e.
The OLS estimate of βg, e is equivalent to DiDg, e. The standard error provided by OLS for βg, e is equivalent to the standard error from a two-sample test of equal means for the null hypothesis 𝔼[ΔYi, g + e|Gi = g] = 𝔼[ΔYi, g + e|i ∈ Cg, e] which is equivalent to testing that ATTg, e = 0.
DiD(...)
CommandDiDforBigData
uses the DiD(...)
command to
estimate DiDg, e for all
available cohorts g across a
range of possible event times e; DiD(...)
is
documented here.
DiD(...)
uses the control_group
and
base_event
arguments the same way as
DiDge(...)
.
DiD(...)
also uses the min_event
and
max_event
arguments to choose the minimum and maximum event
times e of interest. If these
arguments are not specified, it assumes all possible event times are of
interest.
In practice, DiD(...)
completes the following steps:
Step 1. Determine all possible combinations of (g, e) available in the
data. The min_event
and max_event
arguments
allow the user to restrict the minimum and maximum event times e of interest.
Step 2. In parallel, for each (g, e) combination,
construct the corresponding control group Cg, e
the same way as DiDge(...)
. Drop any (g, e) combination for
which the control group is empty.
Step 3. Within each (g, e)-specific process, define the (g, e)-specific sample of treated and control units, Sg, e ≡ {Gi = g} ∪ {i ∈ Cg, e}. Drop any observations that do not satisfy i ∈ Sg, e.
Step 4. Within each (g, e)-specific process, construct the within-i differences ΔYi, g + e ≡ Yi, g + e − Yi, g + b for each i that remains in the sample.
Step 5. Within each (g, e)-specific process, estimate ΔYi, g + e = αg, e + βg, e1{Gi = g} + ϵi, g + e by OLS.
The OLS estimate of βg, e is equivalent to DiDg, e. The standard error provided by OLS for βg, e is equivalent to the standard error from a two-sample test of equal means for the null hypothesis 𝔼[ΔYi, g + e|Gi = g] = 𝔼[ΔYi, g + e|i ∈ Cg, e] which is equivalent to testing that ATTg, e = 0. Note that ATTg, e = 0 is tested as a single hypothesis for each (g, e) combination; no adjustment for multiple hypothesis testing is applied.
Aside from estimating each DiDg, e,
DiD(...)
also estimates DiDe for each e included in the event times of
interest.
To do so, DiD(...)
completes the following steps:
Step 1. At the end of the (g, e)-specific estimation in parallel described above, it returns the various (g, e)-specific samples of the form Sg, e ≡ {Gi = g} ∪ {i ∈ Cg, e}.
Step 2. It defines an indicator for corresponding to cohort g, then stacks all of the samples Sg, e that have the same e. Note that the same i can appear multiple times due to membership in both Sg1, e and Sg2, e, so the distinct observations are distinguished by the indicators for g.
Step 3. It estimates ΔYi, g + e = ∑gαg, e + ∑gβg, e1{Gi = g} + ϵi, g + e by OLS for the stacked sample across g.
Step 4. It constructs DiDe = ∑gωg, eβg, e, where $\omega_{g,e} \equiv \frac{\sum_i 1\{G_i=g\}}{\sum_i 1\{G_i < \infty\}}$. Since each βg, e is an estimate of the corresponding ATTg, e, it follows that DiDe is an estimate of the weighted average ATTe ≡ ∑gωg, eATTg, e.
Step 5. To test the null hypothesis that ATTe = 0, it defines β̄e = (βg, e)g and ω̄e = (ωg, e)g. Note that DiDe = ω̄e′β̄e. To get the standard error, for DiDe, it uses that Var(DiDe) = ω̄e′Var(β̄e)ω̄e, where Var(β̄e) is the usual (heteroskedasticity-robust) variance-covariance matrix of the OLS coefficients. Since the same unit i appears on multiple rows of the sample, we must cluster on i when estimating Var(β̄e). Finally, the standard error corresponding to the null hypothesis of ATTe = 0 is $\sqrt{\text{Var}(\text{DiD}_e)}$.
A similar approach is used to estimate DiDE, the average DiDe across a set of event times E. It again uses that these average DiD parameters can be represented as a linear combination of OLS coefficients βg, e with appropriate weights to construct the standard error for ATTE.