肿瘤模型代码(扑素贝尔叶斯模型)
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score
x = df[['最大周长','最大凹陷度']]
y = df['肿瘤性质']
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.7, random_state=42)
model = GaussianNB()
model.fit(x_train, y_train)
y_test_pred = model.predict(x_test)
print('y_test_pred:', y_test_pred)
accuracy = accuracy_score(y_test_pred, y_test)
print('accuracy:', accuracy)
x = df[['最大周长','最大凹陷度']]
从DataFrame df 中选择两列特征:"最大周长"和"最大凹陷度"作为输入特征x
注意这里使用双层括号,因为要选择多列
y = df['肿瘤性质']
从DataFrame df 中选择"肿瘤性质"列作为目标变量y(即要预测的标签)
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.7, random_state=42)
使用train_test_split函数将数据集划分为训练集和测试集
test_size=0.7表示70%的数据作为测试集,30%作为训练集
random_state=42确保每次划分结果相同(可复现性)model = GaussianNB()
创建一个高斯朴素贝叶斯分类器实例
model.fit(x_train, y_train)
使用训练数据(x_train, y_train)训练模型
y_test_pred = model.predict(x_test)
使用训练好的模型对测试集x_test进行预测,得到预测结果y_test_pred
print('y_test_pred:', y_test_pred)
打印模型的预测结果
accuracy = accuracy_score(y_test_pred, y_test)
计算模型预测准确率,比较预测结果(y_test_pred)和真实标签(y_test)
print('accuracy:', accuracy)
打印模型的准确率