Invertebrate - Predictive Modeling
Consider the dataset invertebrate. It holds 280 records of flying nocturnal invertebrate (e.g., insects such as moths, beetles, mosquitos, etc.) biodiversity, based on 280 sampling events in 2017 in a province in France. The outcome of interest is the Shannon-Wiener index (SWI), which is a measure of biodiversity.
Within the context of this project, it suffices to know that SWI is a non-negative metric that is usually in practice smaller than 4.5. Low values denote low diversity, while higher values denote higher diversity. In addition to SWI, a number of explanatory variables are collected as well. An overview of all variables in the data is given in the variable description table mentioned below.
The goal of this analysis is to build a model of SWI as a function of SWF, temperature, size, management, and, duration.
You will build the model via the Invertebrate_Dataset and submit your predictions for the new_test data in the format mentioned under "How to Submit" section.
Submissions are evaluated on Root-Mean-Squared-Error (RMSE) between the predicted value of your model and true value of SWI on the evaluation dataset mentioned under submission guidelines below.
In this exercise, your predictions will be evaluated against the true values of the input features of new_test data.
Submissions are evaluated using Root-Mean-Squared-Error (RMSE). How do we do it?
Once you generate and submit the target variable predictions on evaluation dataset, your submissions will be compared with the true values of the target variable.
The True or Actual values of the target variable are hidden on the DPhi Practice platform so that we can evaluate your model's performance on evaluation data. Finally, a Root-Mean-Squared-Error (RMSE) for your model will be generated and displayed
About the dataset
To load the dataset in your jupyter notebook, use the below command:
import pandas as pd invertebrate_data = pd.read_csv("https://raw.githubusercontent.com/dphi-official/Datasets/master/Invertebrate/Invertebrate_dataset.csv
Please find the variable description of the dataset variables below:
- SWI - The Shannon-Wiener index for (flying nocturnal) invertebrate diversity on the patch (non-negative, larger values denote higher diversity).
- SWF - An (adjusted) Shannon-Wiener index for floristic diversity on the patch. The interpretation of this metric is the same as for SWI (non-negative, larger values denote higher diversity).
- Temperature - Temperature at the sampling event (in degrees Celsius).
- Size - The size of the sampling patch (in m2)
- Management - The number of years that the patch has been subject to nature management
- Duration - The duration of a sampling event (in minutes)
Load the evaluation data (name it as
eval_eval). You can load the data using the below command.
eval_eval = pd.read_csv('https://raw.githubusercontent.com/dphi-official/Datasets/master/Invertebrate/Invertebrate_new_test_data.csv
target column is deliberately not there as you need to predict it.
To participate in this challenge either you have to create a team of atleast members or join some team