1. 와인 품질 데이터셋 다운로드:
UCI Machine learning repository => Wine quality data set
https://archive.ics.uci.edu/ml/datasets/wine+quality
UCI Machine Learning Repository: Wine Quality Data Set
Wine Quality Data Set Download: Data Folder, Data Set Description Abstract: Two datasets are included, related to red and white vinho verde wine samples, from the north of Portugal. The goal is to model wine quality based on physicochemical tests (see [Cor
archive.ics.uci.edu
https://archive-beta.ics.uci.edu/dataset/186/wine+quality
UC Irvine Machine Learning Repository
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license. This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.
archive-beta.ics.uci.edu
from urllib.request import urlretrieve
url = "https://archive.ics.uci.edu" + \
"/ml/machine-learning-databases/wine-quality" + \
"/winequality-white.csv"
savepath = "winequality-white.csv"
urlretrieve(url, savepath)
url = "https://archive.ics.uci.edu" + \
"/ml/machine-learning-databases/wine-quality" + \
"/winequality-red.csv"
savepath = "winequality-red.csv"
urlretrieve(url, savepath)
2. Print CSV
import pandas as pd
# 와인 데이터 읽어 들이기
wine = pd.read_csv("winequality-white.csv", sep=";", encoding="utf-8")
print(wine)
3. 데이터 수정없이 판별
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
# 데이터 읽어 들이기
wine = pd.read_csv("winequality-white.csv", sep=";", encoding="utf-8")
# 데이터를 레이블과 데이터로 분리하기 ---(*1)
y = wine["quality"]
x = wine.drop("quality", axis=1)
# 학습 전용과 테스트 전용으로 분리하기 ---(*2)
x_train, x_test, y_train, y_test = train_test_split(
x, y, test_size=0.2)
# 학습하기 ---(*3)
model = RandomForestClassifier()
model.fit(x_train, y_train)
# 평가하기 ---(*4)
y_pred = model.predict(x_test)
print(classification_report(y_test, y_pred))
print("정답률=", accuracy_score(y_test, y_pred))
정답률= 0.6520408163265307
4. 데이터 수정후 판별
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
# 데이터 읽어 들이기 --- (*1)
wine = pd.read_csv("winequality-white.csv", sep=";", encoding="utf-8")
# 학습 전용과 테스트 전용으로 분리하기
y = wine["quality"]
x = wine.drop("quality", axis=1)
# y 레이블 변경하기 --- (*2)
newlist = []
for v in list(y):
if v <= 4:
newlist += [0]
elif v <= 7:
newlist += [1]
else:
newlist += [2]
y = newlist
# 학습 전용과 테스트 전용으로 분리하기 --- (*3)
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)
# 학습하기 --- (*4)
model = RandomForestClassifier()
model.fit(x_train, y_train)
# 평가하기 --- (*5)
y_pred = model.predict(x_test)
print(classification_report(y_test, y_pred))
print("정답률=", accuracy_score(y_test, y_pred))
정답률= 0.9346938775510204
'인공지능_머신러닝' 카테고리의 다른 글
#9강: RNN, LSTM, GLOVE - 너무쉬운 인공지능 Tensorflow/Keras (0) | 2023.05.29 |
---|---|
#8강: 임베딩(NLP Embedding) - 너무쉬운 인공지능 Tensorflow/Keras (0) | 2023.04.23 |
#7강: 자연어처리(NLP) - 너무쉬운 인공지능 Tensorflow/Keras (0) | 2023.04.09 |
#6강: 드롭아웃(Dropout) - 너무쉬운 인공지능 Tensorflow/Keras (0) | 2023.03.26 |
#5강: 전이학습(Transfer learning) - 너무쉬운 인공지능 텐서플로우/케라스 (0) | 2023.03.26 |