EconML包括可解释性工具,以更好地理解治疗效果。
官方可解释性Interpretability的文章中提到:
:class:.SingleTreeCateInterpreter
trains a single shallow decision tree for the treatment effect
θ(X) you learned from any of our available CATE estimators on a small set of feature X that you are interested to learn heterogeneity from.
The model will split on the cutoff points that maximize the treatment effect difference in each leaf.
Finally each leaf will be a subgroup of samples that respond to a treatment differently from other leaves.
治疗效果可能很复杂,但我们通常感兴趣的是一些简单的规则,这些规则可以区分哪些用户对提议的变化做出积极回应,哪些用户保持中立,哪些用户做出消极回应。
EconML SingleTreeCateInterpreter通过训练由任何EconML估计器输出的处理效果的单一决策树来提供可解释性。
intrp = SingleTreeCateInterpreter(include_model_uncertainty=True, max_depth=2, min_samples_leaf=10)
intrp.interpret(est, X_test)
plt.figure(figsize=(25, 5))
intrp.plot(feature_names=X.columns, fontsize=12)
在下图中,我们可以看到暗红色的用户(income < 0.48)对折扣反应强烈,白色的用户对折扣反应轻微。
SingleTreeCateInterpreter 与 SingleTreePolicyInterpreter 的差异:
- 前者代表,根据处理效应,拆分人群,人群之间的差距较大;
- 后者代表,找出 能发券 / 不能发券的界限
该模型的解释,参考Interpretability,找出 该发 or 不该发优惠券的群体:
Instead of fitting a tree to learn groups that have a different treatment effect(上个模块SingleTreeCateInterpreter的含义), :class:.SingleTreePolicyInterpreter
tries to split the samples into different treatment groups.
So in the case of binary treatments it tries to create sub-groups such that all samples within the group have either all positive effect or all negative effect.
Thus it tries to separate responders from non-responders, as opposed to trying to find groups that have different levels of response.
This way you can construct an interpretable personalized policy where you treat the groups with a postive effect and don’t treat the group with a negative effect. Our policy tree provides the recommended treatment at each leaf node.
我们希望做出政策决定,使收入最大化,而不是需求最大化。
在这个场景中,收入的计算公式为:
随着价格的降低,只有当
θ(X)+1<0时,收入才会增加。
因此,我们在这里设置sample_treatment_cast=-1
,以了解我们应该给哪种类型的客户一个小的折扣,以使收入最大。
sample_treatment_cast
对其进行解释,econml文档原表达:
sample_treatment_costs is the cost of treatment. Policy will treat if effect is above this cost.
intrp = SingleTreePolicyInterpreter(risk_level=0.05, max_depth=2, min_samples_leaf=1, min_impurity_decrease=0.001)
intrp.interpret(est, X_test, sample_treatment_costs=-1)
plt.figure(figsize=(25, 5))
intrp.plot(feature_names=X.columns, treatment_names=["Discount", "No-Discount"], fontsize=12)
EconML库包括“SingleTreePolicyInterpreter”等策略可解释性工具,该工具可以计算治疗成本和治疗效果,以了解关于哪些客户可以获利的简单规则。
从下图中我们可以看到,模型建议对收入低于
SingleTreeCateInterpreter 与 SingleTreePolicyInterpreter 的差异:
- 前者代表,根据处理效应,拆分人群,人群之间的差距较大;
- 后者代表,找出 能发券 / 不能发券的界限
现在,让我们将我们的政策与其他基准政策进行比较,
我们的模型告诉我们哪些用户可以给予小的折扣,在这个实验中,我们会为这些用户设定10%的折扣级别。
由于模型被错误地指定,我们不会期望有大折扣的好结果。
在这里,因为我们知道基本的真相,所以我们可以评估这个政策的价值。
# define function to compute revenue
def revenue_fn(data, discount_level1, discount_level2, baseline_T, policy):
policy_price = baseline_T * (1 - discount_level1) * policy + baseline_T * (1 - discount_level2) * (1 - policy)
demand = demand_fn(data, policy_price)
rev = demand * policy_price
return rev
policy_dic = {}
# our policy above
policy = intrp.treat(X)
policy_dic["Our Policy"] = np.mean(revenue_fn(train_data, 0, 0.1, 1, policy))
## previous strategy
policy_dic["Previous Strategy"] = np.mean(train_data["price"] * train_data["demand"])
## give everyone discount
policy_dic["Give Everyone Discount"] = np.mean(revenue_fn(train_data, 0.1, 0, 1, np.ones(len(X))))
## don't give discount
policy_dic["Give No One Discount"] = np.mean(revenue_fn(train_data, 0, 0.1, 1, np.ones(len(X))))
## follow our policy, but give -10% discount for the group doesn't recommend to give discount
policy_dic["Our Policy + Give Negative Discount for No-Discount Group"] = np.mean(revenue_fn(train_data, -0.1, 0.1, 1, policy))
## give everyone -10% discount
policy_dic["Give Everyone Negative Discount"] = np.mean(revenue_fn(train_data, -0.1, 0, 1, np.ones(len(X))))
# get policy summary table
res = pd.DataFrame.from_dict(policy_dic, orient="index", columns=["Revenue"])
res["Rank"] = res["Revenue"].rank(ascending=False)
这里面的一顿操作有点费解的是,intrp.treat(X)
这个输出的是什么:
每个人是否要进行打折,根据SingleTreePolicyInterpreter
来判定,输出内容[0,1],这里的每个人指的是训练集的X
,至于打多少折,这里默认为0.1
里面还有一组很骚气的组,Our Policy + Give Negative Discount for No-Discount Group
,竟然敢对不推荐给折扣的人涨价,当然revenue是上去的,rank排名第1。
输出结果:
是上面的案例【智能营销案例一:(econml)不同优惠折扣下的用户反应】的延申,一些数据计算过程都与上面一致,所以一些内容我就不赘述了。
链接:Case Study - Customer Segmentation at An Online Media Company - EconML + DoWhy.ipynb
类似的dowhy + econml的案例也可以看:
如【2.1】
如【2.2】
如【2.3】
这里因为econml和dowhy集成非常好,所以可以非常好的无缝衔接与使用。
那么dowhy主要是需要发挥其因果图方面的能力。
通过定义这些假设,DoWhy可以为我们生成一个因果图,并使用该图首先识别因果效应。
# initiate an EconML cate estimator
est = LinearDML(model_y=GradientBoostingRegressor(), model_t=GradientBoostingRegressor(),
featurizer=PolynomialFeatures(degree=2, include_bias=False))
# fit through dowhy
est_dw = est.dowhy.fit(Y, T, X=X, W=W, outcome_names=["log_demand"], treatment_names=["log_price"], feature_names=["income"],
confounder_names=confounder_names, inference="statsmodels")
# Visualize causal graph
# Try pretty printing the graph. Requires pydot and pygraphviz
display(
Image(to_pydot(est_dw._graph._graph).create_png())
except:
# Fall back on default graph view
est_dw.view_model()
LinearDML
是常规的econml的线性估计器,这里利用直接在估计器上再模拟一个因果图(est.dowhy.fit
)
因果图这里就需要把,X/W/T都定义好,具体如下图:
构建了因果图,就可以探索变量之间,有没有更深层的关系(前门、后门、IV):
identified_estimand = est_dw.identified_estimand_
print(identified_estimand)
输出如下:
Estimand type: nonparametric-ate
### Estimand : 1
Estimand name: backdoor1 (Default)
Estimand expression:
────────────(Expectation(log_demand|is_US,has_membership,days_visited,age,inco
d[log_price]
me,account_age,avg_hours,songs_purchased,friends_count))
Estimand assumption 1, Unconfoundedness: If U→{log_price} and U→log_demand then P(log_demand|log_price,is_US,has_membership,days_visited,age,income,account_age,avg_hours,songs_purchased,friends_count,U) = P(log_demand|log_price,is_US,has_membership,days_visited,age,income,account_age,avg_hours,songs_purchased,friends_count)
### Estimand : 2
Estimand name: iv
No such variable found!
### Estimand : 3
Estimand name: frontdoor
No such variable found!
因为有定义混杂因子W,所以这里X -> Y,一般都是有后门路径的。
没有前门路径做阻断。
了解完变量之间的因果关系之后可以拿到具体的处理效应:
lineardml_estimate = est_dw.estimate_
print(lineardml_estimate)
*** Causal Estimate ***
## Identified estimand
Estimand type: nonparametric-ate
### Estimand : 1
Estimand name: backdoor
Estimand expression:
────────────(Expectation(log_demand|songs_purchased,avg_hours,days_visited,has
d[log_price]
_membership,account_age,age,income,is_US,friends_count))
Estimand assumption 1, Unconfoundedness: If U→{log_price} and U→log_demand then P(log_demand|log_price,songs_purchased,avg_hours,days_visited,has_membership,account_age,age,income,is_US,friends_count,U) = P(log_demand|log_price,songs_purchased,avg_hours,days_visited,has_membership,account_age,age,income,is_US,friends_count)
## Realized estimand
b: log_demand~log_price+songs_purchased+avg_hours+days_visited+has_membership+account_age+age+income+is_US+friends_count | income
Target units: ate
## Estimate
Mean value: -0.997133982212133
Effect estimates: [-1.09399505 -1.48371714 -0.83401226 ... -1.3358834 -1.91959094
-0.40863328]
可以看到这里的CHTE的效应值为:-0.997133982212133
这里的效应是什么意思?
这里的效应是T -> Y,T = 1 / T = 0 的情况下,demand=Y的变量
以及与后面估计值有啥差异?
后面的是Y~(X|T)的变化,这里也就是弹性了
估计弹性,这个部分与【2.4】线性估计内容一致
# Get treatment effect and its confidence interval
te_pred = est_dw.effect(X_test).flatten()
te_pred_interval = est_dw.effect_interval(X_test)
# Compare the estimate and the truth
plt.figure(figsize=(10, 6))
plt.plot(X_test.flatten(), te_pred, label="Sales Elasticity Prediction")
plt.plot(X_test.flatten(), truth_te_estimate, "--", label="True Elasticity")
plt.fill_between(
X_test.flatten(),
te_pred_interval[0].flatten(),
te_pred_interval[1].flatten(),
alpha=0.2,
label="95% Confidence Interval",
plt.fill_between(
X_test.flatten(),
truth_te_lower,
truth_te_upper,
alpha=0.2,
label="True Elasticity Range",
plt.xlabel("Income")
plt.ylabel("Songs Sales Elasticity")
plt.title("Songs Sales Elasticity vs Income")
plt.legend(loc="lower right")
大部分与【2.5】一致,就是构造因果树估计器的同时,再额外构建因果图:
# initiate an EconML cate estimator
est_nonparam = CausalForestDML(model_y=GradientBoostingRegressor(), model_t=GradientBoostingRegressor())
# fit through dowhy
est_nonparam_dw = est_nonparam.dowhy.fit(Y, T, X=X, W=W, outcome_names=["log_demand"], treatment_names=["log_price"],
feature_names=["income"], confounder_names=confounder_names, inference="blb")
具体可参考:因果推断笔记——因果图建模之微软开源的dowhy(一)
大致的一些反驳的方式:
- 「添加随机混杂因子」:添加一个随机变量作为混杂因子后估计因果效应是否会改变(期望结果:不会)
- 「安慰剂干预」:将真实干预变量替换为独立随机变量后因果效应是否会改变(期望结果:因果效应归零)
- 「虚拟结果」:将真实结果变量替换为独立随机变量后因果效应是否会改变(期望结果:因果效应归零)
- 「模拟结果」:将数据集替换为基于接近给定数据集数据生成过程的方式模拟生成的数据集后因果效应是否会改变(期望结果:与数据生成过程的效应参数相匹配)
- 「添加未观测混杂因子」:添加一个额外的与干预和结果相关的混杂因子后因果效应的敏感性(期望结果:不过度敏感)
- 「数据子集验证」:将给定数据集替换为一个随机子集后因果效应是否会改变(期望结果:不会)
- 「自助验证」:将给定数据集替换为同一数据集的自助样本后因果效应是否会改变(期望结果:不会)
# Add Random Common Cause
res_random = est_nonparam_dw.refute_estimate(method_name="random_common_cause")
print(res_random)
# Add Unobserved Common Cause
res_unobserved = est_nonparam_dw.refute_estimate(
method_name="add_unobserved_common_cause",
confounders_effect_on_treatment="linear",
confounders_effect_on_outcome="linear",
effect_strength_on_treatment=0.1,
effect_strength_on_outcome=0.1,
print(res_unobserved)
#Replace Treatment with a Random (Placebo) Variable
res_placebo = est_nonparam_dw.refute_estimate(
method_name="placebo_treatment_refuter", placebo_type="permute",
num_simulations=3
print(res_placebo)
# Remove a Random Subset of the Data
res_subset = est_nonparam_dw.refute_estimate(
method_name="data_subset_refuter", subset_fraction=0.8,
num_simulations=3)
print(res_subset)
【如2.7】
【如2.8】
Interpretability with SHAP
类似于如何用SHAP解释黑箱预测机器学习模型,我们也可以解释黑箱效应异质性模型。这种方法解释了为什么异质因果效应模型对特定人群产生了较大或较小的效应值。是哪些特征导致了这种差异?
当模型被简洁地描述时,这个问题很容易解决,例如线性异质性模型的情况,人们可以简单地研究模型的系数。然而,当人们开始使用更具表现力的模型(如随机森林和因果森林)来建模效果异质性时,就会变得困难。SHAP值对于理解模型从训练数据中提取的影响异质性的主导因素有很大的帮助。
我们的软件包提供了与SHAP库的无缝集成。每个CateEstimator都有一个方法shap_values,它返回每个处理和结果对的估计器输出的SHAP值解释。
然后,可以使用SHAP库提供的大量可视化功能对这些值进行可视化。此外,只要有可能,我们的库就会为每种最终模型类型从SHAP库中调用快速专用算法,这可以大大减少计算时间。
## Ignore warnings
from econml.dml import CausalForestDML, LinearDML, NonParamDML
from econml.dr import DRLearner
from econml.metalearners import DomainAdaptationLearner, XLearner
from econml.iv.dr import LinearIntentToTreatDRIV
import numpy as np
import scipy.special
import matplotlib.pyplot as plt
import shap
from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
from sklearn.linear_model import Lasso
np.random.seed(123)
n_samples = 5000
n_features = 10
true_te = lambda X: (X[:, 0]>0) * X[:, 0]
X = np.random.normal(0, 1, size=(n_samples, n_features))
W = np.random.normal(0, 1, size=(n_samples, n_features))
T = np.random.binomial(1, scipy.special.expit(X[:, 0]))
y = true_te(X) * T + 5.0 * X[:, 0] + np.random.normal(0, .1, size=(n_samples,))
X_test = X[:min(100, n_samples)].copy()
X_test[:, 0] = np.linspace(np.percentile(X[:, 0], 1), np.percentile(X[:, 0], 99), min(100, n_samples))
# 因果树估计器
est = CausalForestDML(random_state=123)
est.fit(y, T, X=X, W=W)
# 线性估计器
est = LinearDML(random_state=123)
est.fit(y, T, X=X, W=W)
# 非参数
est = NonParamDML(model_y=RandomForestRegressor(min_samples_leaf=20, random_state=123),
model_t=RandomForestRegressor(min_samples_leaf=20, random_state=123),
model_final=RandomForestRegressor(min_samples_leaf=20, random_state=123),
random_state=123)
est.fit(y.ravel(), T.ravel(), X=X, W=W)
# 双重鲁棒学习
est = DRLearner(model_regression=RandomForestRegressor(min_samples_leaf=20, random_state=123),
model_propensity=RandomForestClassifier(min_samples_leaf=20, random_state=123),
model_final=RandomForestRegressor(min_samples_leaf=20, random_state=123),
random_state=123)
est.fit(y.ravel(), T.ravel(), X=X, W=W)
# 元学习 DomainAdaptationLearner
est = DomainAdaptationLearner(models=RandomForestRegressor(min_samples_leaf=20, random_state=123),
final_models=RandomForestRegressor(min_samples_leaf=20, random_state=123),
propensity_model=RandomForestClassifier(min_samples_leaf=20, random_state=123))
est.fit(y.ravel(), T.ravel(), X=X)
# Xlearner 元学习
# Xlearner.shap_values uses a slow shap exact explainer, as there is no well defined final model
# for the XLearner method.
est = XLearner(models=RandomForestRegressor(min_samples_leaf=20, random_state=123),
cate_models=RandomForestRegressor(min_samples_leaf=20, random_state=123),
propensity_model=RandomForestClassifier(min_samples_leaf=20, random_state=123))
est.fit(y.ravel(), T.ravel(), X=X)
# 工具变量
est = LinearIntentToTreatDRIV(model_y_xw=RandomForestRegressor(min_samples_leaf=20, random_state=123),
model_t_xwz=RandomForestClassifier(min_samples_leaf=20, random_state=123),
flexible_model_effect=RandomForestRegressor(min_samples_leaf=20, random_state=123),
random_state=123)
est.fit(y.ravel(), T.ravel(), Z=T.ravel(), X=X, W=W)
输出一个图来看:
est = CausalForestDML(random_state=123)
est.fit(y, T, X=X, W=W)
shap_values = est.shap_values(X[:20])
shap.plots.beeswarm(shap_values['Y0']['T0'])
详细的SHAP可参考:机器学习模型可解释性进行到底 —— SHAP值理论(一)
特征密度散点图:beeswarm
下图中每一行代表一个特征,横坐标为Shap值。特征的排序是按照shap 的平均绝对值,对模型来说的最重要特征。宽的地方表示有大量的样本聚集。
一个点代表一个样本,颜色越红说明特征本身数值越大,颜色越蓝说明特征本身数值越小。
可以看做一种特征重要性的排列图,从这里可以看到X0重要性高(排位);
同时,X0值越大,对模型的效果影响越好(SHAP值为正)
# 数据生成
np.random.seed(123)
n_samples = 5000
n_features = 10
n_treatments = 2
n_outputs = 3
true_te = lambda X: np.hstack([(X[:, [0]]>0) * X[:, [0]],
np.ones((X.shape[0], n_treatments - 1))*np.arange(1, n_treatments).reshape(1, -1)])
X = np.random.normal(0, 1, size=(n_samples, n_features))
W = np.random.normal(0, 1, size=(n_samples, n_features))
T = np.random.normal(0, 1, size=(n_samples, n_treatments))
for t in range(n_treatments):
T[:, t] = np.random.binomial(1, scipy.special.expit(X[:, 0]))
y = np.sum(true_te(X) * T, axis=1, keepdims=True) + 5.0 * X[:, [0]] + np.random.normal(0, .1, size=(n_samples, 1))
y = np.tile(y, (1, n_outputs))
for j in range(n_outputs):
y[:, j] = (j + 1) * y[:, j]
X_test = X[:min(100, n_samples)].copy()
X_test[:, 0] = np.linspace(np.percentile(X[:, 0], 1), np.percentile(X[:, 0], 99), min(100, n_samples))
est = CausalForestDML(n_estimators=400, random_state=123)
est.fit(y, T, X=X, W=W)
shap_values = est.shap_values(X[:200])
plt.figure(figsize=(25, 15))
for j in range(n_treatments):
for i in range(n_outputs):
plt.subplot(n_treatments, n_outputs, i + j * n_outputs + 1)
plt.title("Y{}, T{}".format(i, j))
shap.plots.beeswarm(shap_values['Y' + str(i)]['T' + str(j)], plot_size=None, show=False)
plt.tight_layout()
plt.show()
这里可以看到有生成了三个outcome,两个干预T,直接使用CausalForestDML
因果树估计器,
所以就有6种情况可以得到:
同:
因果推断与反事实预测——利用DML进行价格弹性计算(二十四)
这里回顾一下econml的一个官方案例,因果推断笔记——因果图建模之微软开源的EconML(五) 之前记录过,
github链接为:Case Study - Customer Segmentation at An Online Media Company.ipynb
比较相关的另一篇:
因果推断笔记——DML :Double Machine Learning案例学习(十六)
当然本节不摘录,只是回顾一下该案例中的一些关于弹性系数的重要细节。
数据格式为:
Feature Name | Type | Details |
---|
account_age | W | user’s account age |
age | W | user’s age |
avg_hours | W | the average hours user was online per week in the past |
days_visited | W | the average number of days user visited the website per week in the past |
friend_count | W | number of friends user connected in the account |
has_membership | W | whether the user had membership |
is_US | W | whether the user accesses the website from the US |
songs_purchased | W | the average songs user purchased per week in the past |
income | X | user’s income |
price | T | the price user was exposed during the discount season (baseline price * small discount) |
demand | Y | songs user purchased during the discount season |
数据集*有~ 10000个观察,包括9个连续和分类变量,代表用户的特征和在线行为历史,如年龄,日志收入,以前的购买,每周以前的在线时间等。
那么这里的:
- 其他变量:Z/W - account_age ~ songs_purchased - W - 混杂因子
- income - X - 考察变量 - 用户收入
- demand - Y - outcome - 销量
- Price - T - 干预,折扣,取值为
[1,0.9,0.8]
,根据下面的公式的来