National Institute of Technology Karnataka, Surathkal
Click here to give first round of RTFM

Mood Based Movie Recommender


To Design a General Convolutional Neural Network (CNN) building framework for designing real- time CNNs. The real time CNN should be trained and designed to predict human emotions and predict movies based on those emotions.


We used Depthwise Separable Convolution Neural Networks and built a miniature version of Xception Architecture which was proposed by Francois Chollet, the founder of Keras. The architecture has already been used in certain research references. These CNNs were trained on the FER-2013 Dataset, which had the following classes {“angry”, “disgust”, “fear”, “happy”, “sad”, “surprise”, “neutral”}. We used OpenCV to get the Region of Interest (ROI) from the real-time feed and sent it to the trained min-Xception model for classification. We turned down the number of classes to the basic 4 emotions {“Happy”, “Sad”, “Neutral”, “Angry”} and assigned some movies subjectively to each category. A Web Application was developed to host the whole project. Web Application was built using the Python-Flask framework. It Integrates the prediction model to serve the recommended movies to the user. It asks the user for a real-time snapshot, which is then passed to the model in the back-end. List of Movies is extracted from the Database based on the results produced by the model.


Python, TensorFlow, Keras, OpenCV, Pandas, SciKit_Learn, NumPy, Flask ,HTML, CSS, BootStrap, MySQL


1. Our focus was on building a basic face detection program in OpenCV using Haar Cascades for the beginning phase. Initially, the algorithm needs a lot of positive images (images with faces) and negative images (images without faces) to train the classifier which is very similar to convolutional kernel. We used the default haarcascade_frontal_face_default.xml .
2. Once our basic building block was done. We diverted our attention on building a Convolutional Neural Network for training and also on finding a dataset that contains huge amounts of images.
3. For the Dataset, we found the FER-2013 dataset which was used in Facial Expression Recognition Kaggle Contest. It contained 35,685 images of 48 x 48 pixels.
4. After doing some brainstorming and learning about CNNs we found that DepthWise Separable CNNs were more better in classifying images and also reduces overfitting of the training data.
5. We found a relevant architecture called “Xception Architecture” which was developed by Google Researchers published under the name of Francois Chollet, creator of Keras.
6. The XCeption Deep Neural Network Model was too expensive to train with the computational power we had access to. So we decided to implement a miniature version of this min-XCeption whose architecture was quite elementary and also had citations.
7. Our final architecture is a fully-convolutional neural network that contains 4 residual depth-wise separable convolutions where each convolution is followed by a batch normalization operation and a ReLU activation function.
8. This architecture has approximately 60, 000 parameters; which corresponds to a reduction of 80× when compared to a original CNN.
9. We introduced batch normalization, to address the problem of Internal Covariate Shift, for much higher training rates, and helped achieving 14 times fewer training steps with the same accuracy.
10. We carried out real-time detection of emotions on the trained model. Our application gave the percentage of each emotional class.
11. Our next target was to build a Web Application, which had all of the programs running in the backend. The Web App had the following functionalities taking real time snapshots of person, detection of emotion, displaying the resultant emotion and a list of movies which were subjectively assigned to each emotion and stored in a MySQL database.
12. Everytime a emotion was called, a random list of movies from the corresponding emotion is displayed on the web app.


● Our miniature implementation of the XCeption Architecture performed pretty well on the FER-2013 dataset with about 74 % accuracy after 56 Epochs
● The architecture reduces the parameters by 80x while obtaining favourable results by using depth wise separable convolution neural networks.



● If a Survey can be conducted among thousands of people which takes note of the current movie they would prefer to watch along with the reason “Why” and take a snapshot of the person. Using this dataset along with word embedding conduct a sentiment analysis, and then train the current CNN architecture on those emotions based on the results of the sentiment analysis. This would be a more accurate representation of a movie recommender
● We’ve observed that there were certain misclassifications such as persons with glasses being classified as “angry”. This happens since the label “angry” is highly activated when it believes a person is frowning and frowning features get confused with darker glass frames. This leads to bias in the machine learning process. We believe that uncovering such behaviours is of extreme importance when creating robust classifiers, and that the use of the visualization techniques such as guided back-propagation will become invaluable when uncovering model biases.


● Using OpenCV for Face Detection
● Workings of DepthWise Separable Convolutions
● Understanding the Intricacies of Deep Learning
● Structuring of ML Projects
● Creating Basic Web Applications on Flask


GitHub Repository
Face Detection Using OpenCV
A Comprehensive Guide to Convolutional Neural Networks
Xception: Deep Learning with Depth Wise Separable Convolutions
Batch Normalization : Accelerating Deep Neural Network Training
Basic Flask Introduction


Mentors : -
● Souhard Kataria
● Aashay Maheshwarkar

Members : -
● Dwaipayan Munshi (
● Omanshu Mahawar (
● Ritik Pansuriya (