Semantic Segmentation
PROBLEM STATEMENT:
Image segmentation is the process of partitioning a digital image into multiple segments (sets of
pixels). The goal of segmentation is to simplify and/or change the representation of an image into
something that is more meaningful and easier to analyze. Image segmentation is typically used to
locate objects and boundaries (lines, curves, etc.) in images. More precisely, image segmentation
is the process of assigning a label to every pixel in an image such that pixels with the same label
share certain characteristics.
This project aims at creating a learning model to classify each pixel of an image into different
classes so that its extended version can be helpful in various applications like Autonomous
vehicles,Medical image diagnostics etc .
PROPOSED SOLUTION:
Project attempts to train visual recognition model using images which have output where the objects present in the image have been classified into several classes sharing some common characteristics.
TECHNOLOGY USED
The Project is build using ResNet in Tensorflow v1.1.0 framework on the PASCAL VOC dataset. All the code is written using python 2.7.
METHODOLOGY
This project is built on a fully convolutional variant of ResNet with atrous (dilated) convolutions
and atrous spatial pyramid pooling.
The model is trained on a mini-batch of images and corresponding ground truth masks with the
softmax classifier at the top. During training, the masks are downsampled to match the size of the
output from the network; during inference, to acquire the output of the same size as the input,
bilinear upsampling is applied. The final segmentation mask is computed using argmax over the
logits. The implementation of the project journey went on, in the following manner :
Phase 1 (September End) - This phase went in Looking for papers, dataset and models for the
problem.
Phase 2 (mid of December) - This was the Learning phase in which we learnt all the required
libraries (like matplotlib, numPy), frameworks and architecture (Residual Network a CNN
architecture.)
Phase 3 (January End) - Create a baseline model for semantic image segmentation.
Phase 4 ( mid Feb ) - Fixed the implementation of the batch normalization layer: which supports
both the training and inference steps. If the flag --is-training is provided, the running means and
variances will be updated, otherwise, they will be kept intact. Fixed the evaluation procedure and
As a result, the performance score on the validation set has increased to 80.1%
Phase 5 ( march end ) - This was the most challenging phase as in this phase, we have Optimized
our baseline model in which we have re-written the training script train.py following the original
optimization setup, SGD with momentum, weight decay, learning rate with polynomial decay,
different learning rates for different layers, The training script with multi-scale inputs
train_msc.py
has been added the input is resized to 0.5 and 0.75 of the original resolution, and losses are
aggregated. We also wrote the code using openCV so that it will be able to capture images from
the webcam not just static image and then, it will display the mask of the image as an output.
RESULTS
On the test set of PASCAL VOC, the model achieves 79.7% of mean intersection-over-union.
Evaluation of a single-scale converted pre-trained model on the PASCAL VOC validation dataset
leads to 86.9%.
Resnet 101 delivers better segmentation results along object boundaries. We think the identity
mapping of Resnet 101 has an effect as hyper-column features that exploit the features from
intermediate layers to better localize boundaries.
However, in the dataset much focus was on humans. So, the mask of humans was much more
clearer in images than other objects.
DEMO
FUTURE WORK
The model can be improved later on so that other objects can also be distinguished clearly.
Apart from the existing objects, other objects can be added so that they can be segmented and
the mask can be generated.
As we have implemented our baseline model, we can try to segment images in real-time and
create an API for it.
KEY LEARNINGS
We learned a lot while working on this project. Some of the things we learnt are:-
● Creating and training the model.
● Optimizing the model and testing the model with various datasets.
● Learned about Tensorflow frameworks and CNN architecture (resnet).
● Generating the mask image of the input image.
● Worked on a script to capture images from webcam using opencv and generate its mask
image.
CONCLUSION
Image segmentation depends on many factors, i.e., pixel color, texture, intensity, similarity of images, image content, and problem domain. Therefore, it is not possible to consider a single method for all types of images nor all methods can perform well for a particular type of image. Hence, it is good to use a hybrid solution consisting of multiple methods for image segmentation problems.
REFERENCES
1. https://medium.com/datadriveninvestor/deep-learning-for-image-segmentation-d10d19131113
2. https://towardsdatascience.com/summary-of-segnet-a-deep-convolutional-encoder-decoder-architecture-for-image-segmentation-75b2805d86f5
TEAM
● Asis Kumar Rout asisrout7@gmail.com
● Awanit Ranjan awanitranjan000512@gmail.com
● Avdhesh Yadav avdheshyadavdow@gmail.com
● Dhruvik Navadiya navadiyadhruvik2@gmail.com