Data Sprint #10: Bike Share Data

Predict the future bike shares



253 Submissions


Bike sharing systems are a new generation of traditional bike rentals where the whole process from membership, rental and return back has become automatic. Through these systems, users are able to easily rent a bike from a particular position and return back at another position. Currently, there are about over 500 bike-sharing programs around the world which is composed of over 500 thousands bicycles. Today, there exists great interest in these systems due to their important role in traffic, environmental and health issues.

Apart from interesting real world applications of bike sharing systems, the characteristics of data being generated by these systems make them attractive for research. Opposed to other transport services such as bus or subway, the duration

of travel, departure and arrival position is explicitly recorded in these systems. This feature turns the bike sharing system into a virtual sensor network that can be used for sensing mobility in the city. Hence, it is expected that most of the important events in the city could be detected via monitoring these data.


Your objective is to predict the future bike shares.

Evaluation Criteria

Submissions are evaluated using Root Mean Squared Log Error (RMSLE).

How do we do it? 

Once you generate and submit the target variable predictions on the testing dataset, your submissions will be compared with the true values of the target variable. 

The True or Actual values of the target variable are hidden on the DPhi Practice platform so that we can evaluate your model's performance on testing data. Finally, a Root Mean Squared Log Error (RMSLE) for your model will be generated and displayed.


Start Date: 16th October 2020, 21:00 hours IST / 17:30 hours CET (please locate your time here)

End Date: 19th October 2020, 21:00 hours IST / 17:30 hours CET (please locate your time here)

Problem Setter: Manish KC

Contributor: Nisrin Dhoondia


The baseline notebook is available here.

About the Data

Bike-sharing rental process is highly correlated to the environmental and seasonal settings. For instance, weather conditions, precipitation, day of week, season, hour of the day, etc. can affect the rental behaviors. The core data set is related to the two-year historical log corresponding to years 2011 and 2012 from Capital Bikeshare system, Washington D.C., USA.

To load the training data in your jupyter notebook, use the below command:

import pandas as pd

bike_share_data  = pd.read_csv("" )

Data Description

- instant: record index

- dteday : date

- season : season (1:springer, 2:summer, 3:fall, 4:winter)

- yr : year (0: 2011, 1:2012)

- mnth : month ( 1 to 12)

- hr : hour (0 to 23)

- holiday : weather day is holiday or not

- weekday : day of the week

- workingday : if day is neither weekend nor holiday is 1, otherwise is 0.

+ weathersit : 

    - 1: Clear, Few clouds, Partly cloudy, Partly cloudy

    - 2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist

    - 3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds

    - 4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog

- temp : Normalized temperature in Celsius. The values are divided to 41 (max)

- atemp: Normalized feeling temperature in Celsius. The values are divided to 50 (max)

- hum: Normalized humidity. The values are divided to 100 (max)

- windspeed: Normalized wind speed. The values are divided to 67 (max)

- cnt: count of total rental bikes including both casual and registered

Saving Prediction File & Sample Submission

You can find more details on how to save a prediction file here:

Sample submission: You should submit a CSV file with a header row and the sample submission can be found below.







Note that the header name should prediction else it will through evaluation error

Test Dataset

Load the test data (name it as test_data). You can load the data using the below command

test_data = pd.read_csv('')

Here the target column is deliberately not there as you need to predict it


This data has been sourced from the UCI Machine Learning Repository.



You need to choose a submission file.

File Format

Your submission should be in CSV format.


This file should have a header row called 'prediction'.
Please see the instructions to save a prediction file under the “Data” tab.

To participate in this challenge either you have to create a team of atleast None members or join some team