About the Data

Bike-sharing rental process is highly correlated to the environmental and seasonal settings. For instance, weather conditions, precipitation, day of week, season, hour of the day, etc. can affect the rental behaviors. The core data set is related to the two-year historical log corresponding to years 2011 and 2012 from Capital Bikeshare system, Washington D.C., USA.

To load the training data in your jupyter notebook, use the below command:

import pandas as pd

bike_share_data = pd.read_csv("https://raw.githubusercontent.com/dphi-official/Datasets/master/bike_data/bike_train.csv" )

Data Description

- instant: record index

- dteday : date

- season : season (1:springer, 2:summer, 3:fall, 4:winter)

- yr : year (0: 2011, 1:2012)

- mnth : month ( 1 to 12)

- hr : hour (0 to 23)

- holiday : weather day is holiday or not

- weekday : day of the week

- workingday : if day is neither weekend nor holiday is 1, otherwise is 0.

+ weathersit :

- 1: Clear, Few clouds, Partly cloudy, Partly cloudy

- 2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist

- 3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds

- 4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog

- temp : Normalized temperature in Celsius. The values are divided to 41 (max)

- atemp: Normalized feeling temperature in Celsius. The values are divided to 50 (max)

- hum: Normalized humidity. The values are divided to 100 (max)

- windspeed: Normalized wind speed. The values are divided to 67 (max)

- cnt: count of total rental bikes including both casual and registered

Saving Prediction File & Sample Submission

You can find more details on how to save a prediction file here: https://discuss.dphi.tech/t/how-to-submit-predictions/548

Sample submission: You should submit a CSV file with a header row and the sample submission can be found below.

prediction

110

45

12

56

Etc.

Note that the header name should prediction else it will through evaluation error

Test Dataset

Load the test data (name it as test_data). You can load the data using the below command

test_data = pd.read_csv('https://raw.githubusercontent.com/dphi-official/Datasets/master/bike_data/bike_test.csv')

Here the target column is deliberately not there as you need to predict it

Acknowledgement

This data has been sourced from the UCI Machine Learning Repository.

Data Sprint #10: Bike Share Data

Challenge Starts

Registration Ends

Challenge Ends