About the Data

The dataset consists of morphological features that map the body structure for four different classes of microorganisms. These microorganisms are found in the lakes of Sukhna and Dhanas, Chandigarh, India. The images of microorganisms were captured by taking microscopic images of whole mounted glass slides. Following are some of the features with their descriptions.

Solidity: It is the ratio of area of an object to the area of a convex hull of the object. Computed as Area/ConvexArea.
Eccentricity: The eccentricity is the ratio of length of major to minor axis of an object.
EquivDiameter: Diameter of a circle with the same area as the region.
Extrema: Extrema points in the region. The format of the vector is [top-left top-right right-top right-bottom bottom-right bottom-left left-bottom left-top].
Filled Area: Number of on pixels in FilledImage, returned as a scalar.
Extent: Ratio of the pixel area of a region with respect to the bounding box area of an object.
Orientation: The overall direction of the shape. The value ranges from -90 degrees to 90 degrees.
Euler number: Number of objects in the region minus the number of holes in those objects.
Bounding box: Position and size of the smallest box (rectangle) which bounds the object.
Convex hull: Smallest convex shape/polygon that contains the object.
Major axis: The major axis is the endpoints of the longest line that can be drawn through the object. Length (in pixels) of the major axis is the largest dimension of the object.
Minor axis: The axis perpendicular to the major axis is called the minor axis. Length (in pixels) of the minor axis is the smallest line connecting a pair of points on the contour.
Perimeter: Number of pixels around the border of the region.
Centroid: Centre of mass of the region. It is a measure of the object's location in the image.
Area: Total number of pixels in a region/shape.
microorganism: The class of microorganisms, the target variable

To load the training data in your jupyter notebook, use the below command:

import pandas as pd

mo_data = pd.read_csv("https://raw.githubusercontent.com/dphi-official/Datasets/master/sukhna_dhanas/train_set_label.csv" )

Saving Prediction File & Sample Submission

You can find more details on how to save a prediction file here: https://discuss.dphi.tech/t/how-to-submit-predictions/548

Sample submission: You should submit a CSV file with a header row and the sample submission can be found below

prediction

1

1

3

2

4

4

.

.

Etc.

Note that the header name should be prediction else it will throw an evaluation error. A sample submission file can be found here

Test Dataset

Load the test data (name it as test_data). You can load the data using the below command.

test_data = pd.read_csv('https://raw.githubusercontent.com/dphi-official/Datasets/master/sukhna_dhanas/test_set_label.csv')

Here the target column is deliberately not there as you need to predict it

Acknowledgement

The dataset is sourced from Mendeley data.

Dhindsa, Anaahat ; Bhatia , Sanjay; Agrawal, Sunil; sohi, bs (2020), “Classification of Microorganisms of Sukhna and Dhanas Lakes”, Mendeley Data, V2, doi: 10.17632/bcnv3n43wg.2

Data Sprint #19: Classification of Microorganisms of Sukhna and Dhanas Lakes

Challenge Starts

Registration Ends

Challenge Ends