Datathon

Ended

Data Sprint #3: Abalone

Predict the age of Abalone from physical measurements

Easy

|

766 Submissions

Context

What is Abalon?

Abalone is a common name for any of a group of small to very large sea snails, marine gastropod molluscs in the family Haliotidae. Other common names are ear shells, sea ears, and muttonfish or muttonshells in Australia, ormer in the UK, perlemoen in South Africa, and paua in New Zealand.

Source: importexport
Objective

The age of abalone is determined by cutting the shell through the cone, staining it, and counting the number of rings through a microscope -- a boring and time-consuming task. Other measurements, which are easier to obtain, are used to predict the age.

Your objective is to determine the age of Abalone from the physical measurements.


Evaluation Criteria

Submissions are evaluated using Root-Mean-Squared-Error (RMSE).

How do we do it? 

Once you generate and submit the target variable predictions on the testing dataset, your submissions will be compared with the true values of the target variable. 

The True or Actual values of the target variable are hidden on the DPhi Practice platform so that we can evaluate your model's performance on testing data. Finally, a Root-Mean-Squared-Error (RMSE) for your model will be generated and displayed.


Timeline

Start Date: 21st August 2020, 21:00 hours IST / 17:30 hours CET  (please locate your time here)

End Date: 24th August 2020, 21:00 hours IST / 17:30 hours CET (please locate your time here)


Do you like to understand the problem through code?

Don't worry! Understand through code! Here is your getting started code


Problem Setter: Manish KC

About the Data

The data set has 9 columns which have information related to physical measurements of abalones and the number of rings (representing age).

To load the training data in your jupyter notebook, use the below command:

import pandas as pd

abalone_data  = pd.read_csv("https://raw.githubusercontent.com/dphi-official/Datasets/master/abalone_data/training_set_label.csv" )


Data Description

Sex: Sex (M: Male, F: Female, I: Infant)

Length: Longest Shell measurement (millimetres - mm)

Diameter: Diameter - perpendicular to length (mm)

Height: Height - with meat in shell (mm)

Whole weight: Weight of whole abalone (grams)

Shucked weight: Weight of meat (grams)

Viscera weight: Gut weight after bleeding (grams)

Shell weight: Shell weight - after being dried (grams)

Rings: Rings - value + 1.5 gives age in years (eg. 4 = 5.5 years)


Test Dataset

Load the test data (name it as test_data). You can load the data using the below command.

test_data = pd.read_csv('https://raw.githubusercontent.com/dphi-official/Datasets/master/abalone_data/testing_set_label.csv')

Here the target column is deliberately not there as you need to predict it.


Acknowledgement

This dataset is downloaded from the UCI Machine Learning Repository.

 

loading...

You need to choose a submission file.

File Format

Your submission should be in CSV format.

Predictions

This file should have a header row called 'prediction'.
Please see the instructions to save a prediction file under the “Data” tab.

To participate in this challenge either you have to create a team of atleast None members or join some team