def leave_one_out_importance(estimator, X, a, y): results = []for col in ["full"] + X.columns.tolist(): curX = X.drop(columns=col, errors="ignore") curXa = curX.join(a) estimator.fit(curXa, y) y_pred = estimator.predict(curXa) result = {"covariate": col,"r2": r2_score(y, y_pred),"mse": mean_squared_error(y, y_pred),"mae": mean_absolute_error(y, y_pred), } results.append(result) results = pd.DataFrame(results)return resultsdef relative_explained_variation(estimator, X, a, y, metric="mse"):"""Harrell: https://www.fharrell.com/post/addvalue/""" importance = leave_one_out_importance(estimator, X, a, y) importance = importance.set_index("covariate") importance = importance / importance.loc["full"] importance = importance.drop(index="full")# importance = importance[metric]return importancedef decrease_in_explain_variation(estimator, X, a, y, metric="mse"):"""https://stackoverflow.com/q/31343563""" importance = leave_one_out_importance(estimator, X, a, y) importance = importance.set_index("covariate") importance = (importance.loc["full"]-importance) / importance.loc["full"] importance = importance.drop(index="full")# importance = importance[metric] importance = importance.abs()return importance
In [10]:
# i = leave_one_out_importance(LinearRegression(), X, a, y)# i = i.set_index("covariate")# i# relative_explained_variation(LinearRegression(), X, a, y)feature_importance = decrease_in_explain_variation(LinearRegression(), X, a, y)feature_importance
Figure 1: Encoding covariate-outcome importance information as marker opacity, marker size, and covariate order. Opacity and size allow less important covariates to appear more salient by making them more transparent or smaller, respectively. Y-axis order moves less important covariates further down the plot, focusing more important covariates in a specific region of the figure.Figure 2: Outcome-informed Love plot. On the left, a standard Love plot, showing covariates on the y-axis sorted by unadjusted ASMD, ASMD on the x-axis, unadjusted ASMDs as orange triangle and inverse propensity weighted ASMDs as blue circles. On the right, an Outcome-informed Love plot, encoding the importance of covariate on outcome prediction in three different visual channels: size (less important covariates are smaller), opacity(less important covariates are more transparent), and y-axis order (less important covariates are further down the panel). Outcome-informed Love plots reduce clutter and highlight the more important covariates to which examine balancing on.Figure 3: Outcome-informed ASMD score. A) Inverse propensity weighted ASMD, with 0.1 threshold reference (dashed) B) Covariate-outcome importance score (mean decrease in mean square error). C) Outcome-informed ASMD, generated by multiplying the above two. The ASMD emphasizes covariate-exposure associations (\(X_A, X_{AY}\)), the outcome-importance score emphasizes covariate-outcome associations (\(X_Y, X_{AY}\)), and outcome-informed ASMD emphasizes the interaction of the former two scores (\(X_{AY}\)). Love plot augmented by outcome-informed ASMD. On the left, an outcome-informed Love plot augmented by covariate-outcome importance (Similar to Figure 6, right, but without ordering). On the right, an outcome-informed Love plot, but augmented by the combined outcome-importance ASMD score. While the former may emphasize prognostic variables too much (\(X_Y\)), the latter is able to minimize their importance and emphasize instead confounding variables that have both larger covariate-outcome importance and have large ASMD.Figure 4: Love plot augmented by outcome-informed ASMD.Figure 5: Encoding covariate-outcome importance information as marker opacity, marker size, and covariate order. Opacity and size allow less important covariates to appear more salient by making them more transparent or smaller, respectively. Y-axis order moves less important covariates further down the plot, focusing more important covariates in a specific region of the figure.Figure 6: Outcome-informed Love plot. On the left, a standard Love plot, showing covariates on the y-axis sorted by unadjusted ASMD, ASMD on the x-axis, unadjusted ASMDs as orange triangle and inverse propensity weighted ASMDs as blue circles. On the right, an Outcome-informed Love plot, encoding the combined importance of covariates on outcome prediction and exposure imbalance in three different visual channels: size (less important covariates are smaller), opacity(less important covariates are more transparent), and y-axis order (less important covariates are further down the panel). Outcome-informed Love plots reduce clutter and highlight the more important covariates to which examine balancing on.