Mood Based Movie Recommender
PROBLEM STATEMENT:
To Design a General Convolutional Neural Network (CNN) building framework for designing real- time CNNs. The real time CNN should be trained and designed to predict human emotions and predict movies based on those emotions.
PROPOSED SOLUTION:
We used Depthwise Separable Convolution Neural Networks and built a miniature version of Xception Architecture which was proposed by Francois Chollet, the founder of Keras. The architecture has already been used in certain research references. These CNNs were trained on the FER-2013 Dataset, which had the following classes {“angry”, “disgust”, “fear”, “happy”, “sad”, “surprise”, “neutral”}. We used OpenCV to get the Region of Interest (ROI) from the real-time feed and sent it to the trained min-Xception model for classification. We turned down the number of classes to the basic 4 emotions {“Happy”, “Sad”, “Neutral”, “Angry”} and assigned some movies subjectively to each category. A Web Application was developed to host the whole project. Web Application was built using the Python-Flask framework. It Integrates the prediction model to serve the recommended movies to the user. It asks the user for a real-time snapshot, which is then passed to the model in the back-end. List of Movies is extracted from the Database based on the results produced by the model.
TECHNOLOGY USED
Python, TensorFlow, Keras, OpenCV, Pandas, SciKit_Learn, NumPy, Flask ,HTML, CSS, BootStrap, MySQL
METHODOLOGY
1. Our focus was on building a basic face detection program in OpenCV using Haar
Cascades for the beginning phase. Initially, the algorithm needs a lot of positive images
(images with faces) and negative images (images without faces) to train the classifier
which is very similar to convolutional kernel. We used the default
haarcascade_frontal_face_default.xml .
2. Once our basic building block was done. We diverted our attention on building a
Convolutional Neural Network for training and also on finding a dataset that contains
huge amounts of images.
3. For the Dataset, we found the FER-2013 dataset which was used in Facial Expression
Recognition Kaggle Contest. It contained 35,685 images of 48 x 48 pixels.
4. After doing some brainstorming and learning about CNNs we found that DepthWise
Separable CNNs were more better in classifying images and also reduces overfitting of
the training data.
5. We found a relevant architecture called “Xception Architecture” which was developed by
Google Researchers published under the name of Francois Chollet, creator of Keras.
6. The XCeption Deep Neural Network Model was too expensive to train with the
computational power we had access to. So we decided to implement a miniature version
of this min-XCeption whose architecture was quite elementary and also had citations.
7. Our final architecture is a fully-convolutional neural network that contains 4 residual
depth-wise separable convolutions where each convolution is followed by a batch
normalization operation and a ReLU activation function.
8. This architecture has approximately 60, 000 parameters; which corresponds to a
reduction of 80× when compared to a original CNN.
9. We introduced batch normalization, to address the problem of Internal Covariate Shift,
for much higher training rates, and helped achieving 14 times fewer training steps with
the same accuracy.
10. We carried out real-time detection of emotions on the trained model. Our application
gave the percentage of each emotional class.
11. Our next target was to build a Web Application, which had all of the programs running in
the backend. The Web App had the following functionalities taking real time snapshots of
person, detection of emotion, displaying the resultant emotion and a list of movies which
were subjectively assigned to each emotion and stored in a MySQL database.
12. Everytime a emotion was called, a random list of movies from the corresponding emotion
is displayed on the web app.
RESULTS
● Our miniature implementation of the XCeption Architecture performed pretty well on the
FER-2013 dataset with about 74 % accuracy after 56 Epochs
● The architecture reduces the parameters by 80x while obtaining favourable results by
using depth wise separable convolution neural networks.
DEMO
FUTURE WORK
● If a Survey can be conducted among thousands of people which takes note of the current
movie they would prefer to watch along with the reason “Why” and take a snapshot of the
person. Using this dataset along with word embedding conduct a sentiment analysis, and
then train the current CNN architecture on those emotions based on the results of the
sentiment analysis. This would be a more accurate representation of a movie
recommender
● We’ve observed that there were certain misclassifications such as persons with glasses
being classified as “angry”. This happens since the label “angry” is highly activated when
it believes a person is frowning and frowning features get confused with darker glass
frames. This leads to bias in the machine learning process. We believe that uncovering
such behaviours is of extreme importance when creating robust classifiers, and that the
use of the visualization techniques such as guided back-propagation will become
invaluable when uncovering model biases.
KEY LEARNINGS
● Using OpenCV for Face Detection
● Workings of DepthWise Separable Convolutions
● Understanding the Intricacies of Deep Learning
● Structuring of ML Projects
● Creating Basic Web Applications on Flask
REFERENCES
GitHub RepositoryFace Detection Using OpenCV
A Comprehensive Guide to Convolutional Neural Networks
Xception: Deep Learning with Depth Wise Separable Convolutions
Batch Normalization : Accelerating Deep Neural Network Training
Basic Flask Introduction
TEAM
Mentors : -
● Souhard Kataria
● Aashay Maheshwarkar
Members : -
● Dwaipayan Munshi (dwaipayanmunshi2001@gmail.com)
● Omanshu Mahawar (omanshumahawar.181co237@nitk.edu.in)
● Ritik Pansuriya (ritik.rjt@gmail.com)