Randomization Inference for Differences-in-Differences with Few Treated Clusters
Inference using difference-in-differences with clustered data requires care. Previous research has shown that, when there are few treated clusters, t tests based on a cluster-robust variance estimator (CRVE) severely over-reject, different variants of the wild cluster bootstrap can over-reject or under-reject dramatically, and procedures based on randomization inference show promise. We demonstrate that randomization inference (RI) procedures based on estimated coefficients, such as the one proposed by Conley and Taber (2011), fail whenever the treated clusters are atypical. We propose an RI procedure based on t statistics which fails only when the treated clusters are atypical and few in number. We also propose a bootstrap-based alternative to randomization inference, which mitigates the discrete nature of RI P values when the number of clusters is small. Two empirical examples demonstrate that alternative procedures can yield dramatically different inferences.