About the dataset

The dataset is about red vinho verde wine samples, from the north of Portugal. The goal is to model wine quality based on physicochemical tests. The dataset is related to red variant of the Portuguese "Vinho Verde" wine.

The classes are ordered and not balanced (e.g. there are many more normal wines than excellent or poor ones). Outlier detection algorithms could be used to detect the few excellent or poor wines. Also, we are not sure if all input variables are relevant. So it could be interesting to test feature selection methods.

To load the dataset in your jupyter notebook, use the below command:

import pandas as pd
red_wine_data = pd.read_csv('https://raw.githubusercontent.com/dphi-official/Datasets/master/Wine_Dataset/Training_set_redwine.csv')

Data Description

Input variables (based on physicochemical tests):

fixed acidity
volatile acidity
citric acid
residual sugar
chlorides
free sulfur dioxide
total sulfur dioxide
density
pH
sulphates
alcohol Output variable (based on sensory data):
quality (score between 0 and 10)

Feel free to google things which you don't understand.

Reference

This dataset was downloaded from UCI Machine Learning Repository -

https://archive.ics.uci.edu/ml/datasets/Wine+Quality

Red Wine Quality

Registration Ends

Challenge Ends

Challenge Starts