Biased and Debiased Machine Learning in Causal Inference
The application of machine learning methods in estimating treatment effects is a burgeoning area in causal inference. The regression function and the propensity score used to estimate the average treatment effects can be estimated using machine learning methods like random forests. However, the naive applications of machine learning methods to estimate treatment effects lead to biased estimates due to regularization bias. The proof examines the asymptotic behavior of the estimator.
Proof Process
Define the Model and Estimator:
- Assume a partially linear regression model where is the outcome, is the treatment indicator, and represents covariates.
- The regression function and the propensity score are modeled non-parametrically using machine learning methods.
Assumptions:
- Treatment assignment is unconfounded given covariates .
- Residuals (errors) satisfy usual assumptions (e.g., mean zero, finite variance).
Estimator Definition:
- Split the data into two groups.
- Estimate using the second group with a machine learning method.
- Regress on in the first group to obtain the estimator for the treatment effect.
Rewrite the Estimator:
Express the estimator in a form that separates the effect of and the residuals:
Decompose into Asymptotic Terms:
Decompose into terms that reveal the contributions of the estimation error from and the residuals:
Analyze Asymptotic Distribution:
- First Term: Under usual regularity conditions, the term involving (residuals) converges in distribution to a normal distribution.
- Second Term: The term involving represents the bias due to regularization in the machine learning method. This term does not have mean zero and can diverge to infinity.
Conclusion on Bias:
- The second term introduces bias because the machine learning method’s regularization bias does not vanish asymptotically.
- This results in the estimator being asymptotically biased, even if the first term converges to a normal distribution.
Detailed Example
Model Setup:
- Let
- Estimate using a machine learning method (e.g., random forest).
Estimator:
Split data into two parts.
Use part one to estimate .
Use part two to calculate :
Asymptotic Analysis:
Rewrite the estimator to separate terms:
Identify Bias Term:
- The second term represents the bias due to regularization in the machine learning method.
Conclusion:
- The bias term does not converge to zero, leading to an asymptotically biased estimator.
- This illustrates why naive application of machine learning methods without addressing regularization bias can lead to incorrect estimates of treatment effects.
Steps to Obtain a Debiased Estimator
To obtain a debiased estimator using machine learning methods, we follow a systematic approach that addresses the regularization bias inherent in machine learning models. Here is a detailed process:
1. Setup the Problem
Define the outcome , treatment indicator , and covariates . The objective is to estimate the average treatment effect (ATE).
2. Split the Data
Divide the dataset into two parts to avoid overfitting and ensure valid inference:
- Part 1: Used to estimate the propensity score.
- Part 2: Used to estimate the regression function.
3. Estimate the Propensity Score
Using Part 1 of the data, estimate the propensity score . This can be done using a machine learning model such as logistic regression, random forests, or other methods.
4. Estimate the Regression Function
Using Part 2 of the data, estimate the regression function using a machine learning model such as random forests, gradient boosting machines, or any other suitable method.
5. Calculate Residuals
For the treated and control groups in Part 2 of the data, calculate the residuals:
6. Debiasing Step
Estimate the treatment effect using the residuals and propensity scores. Calculate the debiased estimate by adjusting for the propensity score:
7. Variance Estimation
Estimate the variance of the debiased estimator to construct confidence intervals. This step involves calculating the standard error of the debiased estimate.
8. Construct Confidence Intervals
Using the standard error, construct confidence intervals for the treatment effect estimate.