Data Sprint #2 - Engineering Graduates Employment Outcomes

Predict The Salary of An Indian Engineering Graduate



389 Submissions


What is Engineering?

Engineering is the use of scientific principles to design and build machines, structures, and other items, including bridges, tunnels, roads, vehicles, and buildings. The discipline of engineering encompasses a broad range of more specialized fields of engineering, each with a more specific emphasis on particular areas of applied mathematics, applied science, and types of application.


Engineering is a broad discipline that is often broken down into several sub-disciplines. Although an engineer will usually be trained in a specific discipline, he or she may become multi-disciplined through experience. Engineering is often characterized as having four main branches: chemical engineering, civil engineering, electrical engineering, and mechanical engineering. [Reference: Wikipedia]

Engineering Graduates in India

India has a total 6,214 Engineering and Technology Institutions in which around 2.9 million students are enrolled. Every year on an average 1.5 million students get their degree in engineering, but due to lack of skill required to perform technical jobs less than 20 percent get employment in their core domain. [source of information: BWEDUCATION]


A relevant question is what determines the salary and the jobs these engineers are offered right after graduation. Various factors such as college grades, candidate skills, the proximity of the college to industrial hubs, the specialization one have, market conditions for specific industries determine this. On the basis of these various factors, your objective is to determine the salary of an engineering graduate in India.


The data can be used not only to make an accurate salary predictor but also to understand what influences salary and job titles in the labour market. It’s up to you to explore things.

Evaluation Criteria

Submissions are evaluated using Root-Mean-Squared-Error (RMSE).

How do we do it? 

Once you generate and submit the target variable predictions on the testing dataset, your submissions will be compared with the true values of the target variable. 

The True or Actual values of the target variable are hidden on the DPhi Practice platform so that we can evaluate your model's performance on testing data. Finally, a Root-Mean-Squared-Error (RMSE) for your model will be generated and displayed.


Start Date: 14th August 2020, 21:00 hours IST / 17:30 hours CET  (please locate your time here)

End Date: 17th August 2020, 9:00 hours IST / 17:30 hours CET (please locate your time here)

Do you like to understand the problem through code?

Don't worry! Understand through code! Here is your getting started code

Problem Setter: Manish KC

About the dataset

The dataset contains 33 attributes. The target variable refers to the salary of an Engineering Graduate in India. 

To load the training data in your jupyter notebook, use the below command:

import pandas as pd

eng_grad_data  = pd.read_csv("" )

Data Description
  • ID: A unique ID to identify a candidate
  • Salary: Annual CTC offered to the candidate (in INR)
  • Gender: Candidate's gender
  • DOB: Date of birth of the candidate
  • 10percentage: Overall marks obtained in grade 10 examinations
  • 10board: The school board whose curriculum the candidate followed in grade 10
  • 12graduation: Year of graduation - senior year high school
  • 12percentage: Overall marks obtained in grade 12 examinations
  • 12board: The school board whose curriculum the candidate followed
  • CollegeID: Unique ID identifying the university/college which the candidate attended for her/his undergraduate
  • CollegeTier: Each college has been annotated as 1 or 2. The annotations have been computed from the average AMCAT scores obtained by the students in the college/university. Colleges with an average score above a threshold are tagged as 1 and others as 2.
  • Degree: Degree obtained/pursued by the candidate
  • Specialization: Specialization pursued by the candidate
  • CollegeGPA: Aggregate GPA at graduation
  • CollegeCityID: A unique ID to identify the city in which the college is located in.
  • CollegeCityTier: The tier of the city in which the college is located in. This is annotated based on the population of the cities.
  • CollegeState: Name of the state in which the college is located
  • GraduationYear: Year of graduation (Bachelor's degree)
  • English: Scores in AMCAT English section
  • Logical: Score in AMCAT Logical ability section
  • Quant: Score in AMCAT's Quantitative ability section
  • Domain: Scores in AMCAT's domain module
  • ComputerProgramming: Score in AMCAT's Computer programming section
  • ElectronicsAndSemicon: Score in AMCAT's Electronics & Semiconductor Engineering section
  • ComputerScience: Score in AMCAT's Computer Science section
  • MechanicalEngg: Score in AMCAT's Mechanical Engineering section
  • ElectricalEngg: Score in AMCAT's Electrical Engineering section
  • TelecomEngg: Score in AMCAT's Telecommunication Engineering section
  • CivilEngg: Score in AMCAT's Civil Engineering section
  • conscientiousness: Scores in one of the sections of AMCAT's personality test
  • agreeableness: Scores in one of the sections of AMCAT's personality test
  • extraversion: Scores in one of the sections of AMCAT's personality test
  • nueroticism: Scores in one of the sections of AMCAT's personality test
  • openess_to_experience: Scores in one of the sections of AMCAT's personality test

Note: To give you more context AMCAT is a job portal.

Test Dataset

Load the test data (name it as test_data). You can load the data using the below command.

test_data = pd.read_csv('')

Here the target column is deliberately not there as you need to predict it.


We would like to thank ‘Aspiring Minds Research’ for making this dataset available publicly.


You need to choose a submission file.

File Format

Your submission should be in CSV format.


This file should have a header row called 'prediction'.
Please see the instructions to save a prediction file under the “Data” tab.

To participate in this challenge either you have to create a team of atleast None members or join some team