Improving Inverse Probability Weighting by Post-calibrating Its Propensity Scores

causal inference

Calibrating propensity scores can improve effect estimation for expressive. non-linear statistical estimators.


Rom Gutman

Technion - Israel Institute of Technology

Ehud Karavani

IBM Research

Yishai Shimoni

IBM Research


April 15, 2024


Theoretical guarantees for causal inference using propensity scores are partially based on the scores behaving like conditional probabilities. However, scores between zero and one do not necessarily behave like probabilities, especially when output by flexible statistical estimators. We perform a simulation study to assess the error in estimating the average treatment effect before and after applying a simple and well-established postprocessing method to calibrate the propensity scores. We observe that post-calibration reduces the error in effect estimation and that larger improvements in calibration result in larger improvements in effect estimation. Specifically, we find that expressive tree-based estimators, which are often less calibrated than logistic regression-based models initially, tend to show larger improvements relative to logistic regression-based models. Given the improvement in effect estimation and that post-calibration is computationally cheap, we recommend its adoption when modeling propensity scores with expressive models.

Honorable mention: post-calibration procedures are now available in the renowned R package WeightIt v.1.0.0.


  title={Improving Inverse Probability Weighting by Post-calibrating Its Propensity Scores},
  author={Gutman, Rom and Karavani, Ehud and Shimoni, Yishai},