Million Song Dataset
Songs, like any other audio signal, feature distinctive fundamental frequencies, timbre components, and other properties. Each song is unique in these respects, which is why they can be patterned.
Your task is to use machine learning models to predict the release year (between 1922 and 2011) of a song that is described by 90 attributes of average timbre and covariance.
Once you generate and submit the target variable predictions on evaluation dataset, your submissions will be compared with the true values of the target variable.
The True or Actual values of the target variable are hidden on the DPhi Practice platform so that we can evaluate your model's performance on unseen data. Finally, a Root-Mean-Squared-Error (RMSE) score for your model will be generated and displayed.
About the dataset
This database contains 90 attributes of average timbre and covariance. The target variable refers to the year of release per song, between 1922 and 2011.
Download the training set from the following link: https://drive.google.com/file/d/1EjnfKFByNtRbcumGF-cDVLHVM5VPnb4h/view, unzip the file and load the training data in your jupyter notebook, use the below command:
import pandas as pd songs_data = pd.read_csv("Training_set_songs.csv" )
- TA01 to TA12 – Timbre avarages
- TC01 to TC78 – Timbre covariances
- Year – Release year
Download the testing set from the following link: https://drive.google.com/file/d/1EjnfKFByNtRbcumGF-cDVLHVM5VPnb4h/view, unzip the file and load the testing data in your jupyter notebook, use the below command:
songs_data = pd.read_csv("Testing_set_songs.csv" )
target column is deliberately not there as you need to predict it.
This dataset is adapted from:
T. Bertin-Mahieux. UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science. 2019. Available at: http://archive.ics.uci.edu/ml/datasets/YearPredictionMSD.
To participate in this challenge either you have to create a team of atleast members or join some team