National Institute of Technology Karnataka, Surathkal
Deadline for SMP 2022 registration is 26th May 6PM. Register here

Semantic Segmentation


Image segmentation is the process of partitioning a digital image into multiple segments (sets of pixels). The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze. Image segmentation is typically used to locate objects and boundaries (lines, curves, etc.) in images. More precisely, image segmentation is the process of assigning a label to every pixel in an image such that pixels with the same label share certain characteristics.
This project aims at creating a learning model to classify each pixel of an image into different classes so that its extended version can be helpful in various applications like Autonomous vehicles,Medical image diagnostics etc .


Project attempts to train visual recognition model using images which have output where the objects present in the image have been classified into several classes sharing some common characteristics.


The Project is build using ResNet in Tensorflow v1.1.0 framework on the PASCAL VOC dataset. All the code is written using python 2.7.


This project is built on a fully convolutional variant of ResNet with atrous (dilated) convolutions and atrous spatial pyramid pooling.
The model is trained on a mini-batch of images and corresponding ground truth masks with the softmax classifier at the top. During training, the masks are downsampled to match the size of the output from the network; during inference, to acquire the output of the same size as the input, bilinear upsampling is applied. The final segmentation mask is computed using argmax over the logits. The implementation of the project journey went on, in the following manner :
Phase 1 (September End) - This phase went in Looking for papers, dataset and models for the problem.
Phase 2 (mid of December) - This was the Learning phase in which we learnt all the required libraries (like matplotlib, numPy), frameworks and architecture (Residual Network a CNN architecture.)
Phase 3 (January End) - Create a baseline model for semantic image segmentation.
Phase 4 ( mid Feb ) - Fixed the implementation of the batch normalization layer: which supports both the training and inference steps. If the flag --is-training is provided, the running means and variances will be updated, otherwise, they will be kept intact. Fixed the evaluation procedure and As a result, the performance score on the validation set has increased to 80.1%
Phase 5 ( march end ) - This was the most challenging phase as in this phase, we have Optimized our baseline model in which we have re-written the training script following the original optimization setup, SGD with momentum, weight decay, learning rate with polynomial decay, different learning rates for different layers, The training script with multi-scale inputs has been added the input is resized to 0.5 and 0.75 of the original resolution, and losses are aggregated. We also wrote the code using openCV so that it will be able to capture images from the webcam not just static image and then, it will display the mask of the image as an output.


On the test set of PASCAL VOC, the model achieves 79.7% of mean intersection-over-union. Evaluation of a single-scale converted pre-trained model on the PASCAL VOC validation dataset leads to 86.9%.

Resnet 101 delivers better segmentation results along object boundaries. We think the identity mapping of Resnet 101 has an effect as hyper-column features that exploit the features from intermediate layers to better localize boundaries.
However, in the dataset much focus was on humans. So, the mask of humans was much more clearer in images than other objects.



The model can be improved later on so that other objects can also be distinguished clearly.
Apart from the existing objects, other objects can be added so that they can be segmented and the mask can be generated.
As we have implemented our baseline model, we can try to segment images in real-time and create an API for it.


We learned a lot while working on this project. Some of the things we learnt are:-
● Creating and training the model.
● Optimizing the model and testing the model with various datasets.
● Learned about Tensorflow frameworks and CNN architecture (resnet).
● Generating the mask image of the input image.
● Worked on a script to capture images from webcam using opencv and generate its mask image.


Image segmentation depends on many factors, i.e., pixel color, texture, intensity, similarity of images, image content, and problem domain. Therefore, it is not possible to consider a single method for all types of images nor all methods can perform well for a particular type of image. Hence, it is good to use a hybrid solution consisting of multiple methods for image segmentation problems.




● Asis Kumar Rout
● Awanit Ranjan
● Avdhesh Yadav
● Dhruvik Navadiya