我需要开发一个没有(或接近于没有)假阴性值的模型。为此,我绘制了召回率-精确度曲线,并确定阈值应设置为0.11
我的问题是,如何在模型训练时定义阈值?以后在评估时再定义就没有意义了,因为它不会反映在新的数据上。
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=101)
rfc_model = RandomForestClassifier(random_state=101)
rfc_model.fit(X_train, y_train)
rfc_preds = rfc_model.predict(X_test)
recall_precision_vals = []
for val in np.linspace(0, 1, 101):
predicted_proba = rfc_model.predict_proba(X_test)
predicted = (predicted_proba[:, 1] >= val).astype('int')
recall_sc = recall_score(y_test, predicted)
precis_sc = precision_score(y_test, predicted)
recall_precision_vals.append({
'Threshold': val,
'Recall val': recall_sc,
'Precis val': precis_sc
recall_prec_df = pd.DataFrame(recall_precision_vals)
有什么想法吗?