Red Wine Quality
Wine is an alcoholic drink typically made from fermented grapes. Yeast consumes the sugar in the grapes and converts it to ethanol, carbon dioxide, and heat.
White wine is primarily made with white grapes, and the skins are separated from the juice before the fermentation process. Red wine is made with darker red or black grapes, and the skins remain on the grapes during the fermentation process.
“Wine is bottled poetry.” The wine connoisseurs in a wine factory in Portugal are debating on the quality of red and white wines. They thought to take the help of Data Science industry for this work. They hired you as a data scientist as you were the best data scientist in the world. Can you help them out?
Submissions are evaluated using Accuracy Score. How do we do it?
Once you generate and submit the target variable predictions on evaluation dataset, your submissions will be compared with the true values of the target variable.
The True or Actual values of the target variable are hidden on the DPhi Practice platform so that we can evaluate your model's performance on unseen data. Finally, an Accuracy score for your model will be generated and displayed
About the dataset
The dataset is about red vinho verde wine samples, from the north of Portugal. The goal is to model wine quality based on physicochemical tests. The dataset is related to red variant of the Portuguese "Vinho Verde" wine.
The classes are ordered and not balanced (e.g. there are many more normal wines than excellent or poor ones). Outlier detection algorithms could be used to detect the few excellent or poor wines. Also, we are not sure if all input variables are relevant. So it could be interesting to test feature selection methods.
To load the dataset in your jupyter notebook, use the below command:
import pandas as pd red_wine_data = pd.read_csv('https://raw.githubusercontent.com/dphi-official/Datasets/master/Wine_Dataset/Training_set_redwine.csv')
Input variables (based on physicochemical tests):
- fixed acidity
- volatile acidity
- citric acid
- residual sugar
- free sulfur dioxide
- total sulfur dioxide
- alcohol Output variable (based on sensory data):
- quality (score between 0 and 10)
Feel free to google things which you don't understand.
This dataset was downloaded from UCI Machine Learning Repository -
To participate in this challenge either you have to create a team of atleast members or join some team