Data Sprint #22: Concrete Crack Image Classification

Build a model to detect the cracked concrete



754 Submissions


Cracking in Concrete

Definition: a complete or incomplete separation of either concrete or masonry into two or more parts produced by breaking or fracturing - ACI Concrete Terminology

Concrete is the most important material in civil engineering. Concrete provides structures with strength, rigidity, and resilience from deformation. These characteristics, however, result in concrete structures lacking the flexibility to move in response to environmental or volume changes. Cracking is usually the first sign of distress in concrete. It is, however, possible for deterioration to exist before cracks appear. Cracking can occur in both hardened and fresh, or plastic, concrete as a result of volume changes and repeated loading. [Source of Information: giatecscientific]

Problem Statement

Imagine you being a Civil Engineer with knowledge of Data Science and Machine Learning too. You are asked by your state/local government to find out all the cracked concrete and replace it with a new one. Now, with a good knowledge of Machine Learning / Deep Learning, you take a decision to build a system that would alert you when a cracked concrete is detected.


Build a Machine Learning or Deep Learning model that would help you detect the cracked concrete.

Evaluation Criteria

Submissions are evaluated using Accuracy Score.

How do we do it? 

Once we release the data, anyone can download it, build a model, and make a submission. We give competitors a set of data (training data), with both the independent and dependent variables. 

We also release another set of data (test dataset) with just the independent variables, and we hide the dependent variable that corresponds with this set. You submit the predicted values of the dependent variable for this set and we compare it against the actual values. 

The predictions are evaluated based on the evaluation metric defined in the datathon.


The baseline notebook is available here.

About the Data

The dataset contains concrete images having cracks. The data is collected from various METU Campus Buildings. The dataset is divided into two, as negative and positive crack images for image classification. 

The dataset is generated from 458 high-resolution images (4032x3024 pixel) with the method proposed by Zhang et al (2016). 

High-resolution images have variance in terms of surface finish and illumination conditions. 

No data augmentation in terms of random rotation or flipping is applied. 

The dataset can be downloaded from the given link:

From the above link you will be able to download a zip file named ‘’. After you extract this zip file, you will get four files:

  • train - contains all the cracked and not cracked concrete images that are to be used for training your model.  In this folder you will find two folders namely - ‘Negative’ contains concrete images with no crack in it, and ‘Positive’ contains cracked concrete images.
  • test - contains concrete images. For these images you are required to make predictions as ‘Negative’ if the concrete is not cracked or ‘Positive’ if the concrete is cracked
  • Testing_set_concrete_crack.csv - this is the order of the predictions for each image that is to be submitted on the platform. Make sure the predictions you download are with their image’s filename in the same order as given in this file.
  • sample_submission: This is a csv file that contains the sample submission for the data sprint.


This data is sourced from mendeley data.

2018 – Özgenel, Ç.F., Gönenç Sorguç, A. “Performance Comparison of Pretrained Convolutional Neural Networks on Crack Detection in Buildings”, ISARC 2018, Berlin



You need to choose a submission file.

File Format

Your submission should be in CSV format.


This file should have a header row called 'prediction'.
Please see the instructions to save a prediction file under the “Data” tab.

To participate in this challenge either you have to create a team of atleast None members or join some team