인공지능_머신러닝

인공지능으로 맛있는 와인 판정하는 모델 만들기

EasyCoding 2023. 4. 24. 21:33
728x90

1. 와인 품질 데이터셋 다운로드:

UCI Machine learning repository => Wine quality data set

https://archive.ics.uci.edu/ml/datasets/wine+quality 

 

UCI Machine Learning Repository: Wine Quality Data Set

Wine Quality Data Set Download: Data Folder, Data Set Description Abstract: Two datasets are included, related to red and white vinho verde wine samples, from the north of Portugal. The goal is to model wine quality based on physicochemical tests (see [Cor

archive.ics.uci.edu

 

https://archive-beta.ics.uci.edu/dataset/186/wine+quality 

 

UC Irvine Machine Learning Repository

This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license. This allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given.

archive-beta.ics.uci.edu

from urllib.request import urlretrieve
url = "https://archive.ics.uci.edu" + \
      "/ml/machine-learning-databases/wine-quality" + \
      "/winequality-white.csv"
savepath = "winequality-white.csv"
urlretrieve(url, savepath)

url = "https://archive.ics.uci.edu" + \
      "/ml/machine-learning-databases/wine-quality" + \
      "/winequality-red.csv"
savepath = "winequality-red.csv"
urlretrieve(url, savepath)

2. Print CSV

import pandas as pd

# 와인 데이터 읽어 들이기
wine = pd.read_csv("winequality-white.csv", sep=";", encoding="utf-8")
print(wine)

3. 데이터 수정없이 판별

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier 
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report

# 데이터 읽어 들이기
wine = pd.read_csv("winequality-white.csv", sep=";", encoding="utf-8")

# 데이터를 레이블과 데이터로 분리하기 ---(*1)
y = wine["quality"]
x = wine.drop("quality", axis=1)

# 학습 전용과 테스트 전용으로 분리하기 ---(*2)
x_train, x_test, y_train, y_test = train_test_split(
  x, y, test_size=0.2)

# 학습하기 ---(*3)
model = RandomForestClassifier()
model.fit(x_train, y_train)

# 평가하기 ---(*4)
y_pred = model.predict(x_test)
print(classification_report(y_test, y_pred))
print("정답률=", accuracy_score(y_test, y_pred))

정답률= 0.6520408163265307

4. 데이터 수정후 판별

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier 
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report

# 데이터 읽어 들이기 --- (*1)
wine = pd.read_csv("winequality-white.csv", sep=";", encoding="utf-8")
# 학습 전용과 테스트 전용으로 분리하기
y = wine["quality"]
x = wine.drop("quality", axis=1)

# y 레이블 변경하기 --- (*2)
newlist = []
for v in list(y):
    if v <= 4:
        newlist += [0]
    elif v <= 7:
        newlist += [1]
    else:
        newlist += [2]
y = newlist

# 학습 전용과 테스트 전용으로 분리하기 --- (*3)
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)

# 학습하기 --- (*4)
model = RandomForestClassifier()
model.fit(x_train, y_train)

# 평가하기 --- (*5)
y_pred = model.predict(x_test)
print(classification_report(y_test, y_pred))
print("정답률=", accuracy_score(y_test, y_pred))

정답률= 0.9346938775510204