OCR - A solution for the Visually Impaired
PROBLEM STATEMENT:
Visually compromised people find navigation quite tricky in their day-to-day lives. The project is an attempt at helping ease one of the many challenges they face by designing a prototype of a wearable device. It is an Optical Character Recognition (OCR) tool that can read the text from images and instantly convert to speech, thus not limiting them only to whatever is available in Braille. The Efficient and Accurate Scene Text detector (EAST) is a widely used text-detection architecture very often used in various OCR applications. It is made use of in this project.
PROPOSED SOLUTION:
Overview
A camera takes ‘snapshots’ of the object placed in front of it. This creates a single image which
then can be sent for processing to detect the text written on it. Once the image is pre-processed,
it is fed as input to the pre-trained model. Bounding boxes around the detected text are drawn. Once
the text is detected in the above process, it is sent for further processing to detect individual
characters. This approach is called transfer learning; when one model feeds its output as input to
another model. This way, every character in the detected textual data is extracted and returned. The
output is nothing but the detected text in the image in the form of a string.
Details
The above diagram shows the EAST model architecture. It does the work of text detection from scenic
images. The details of its implementation can be found in [1]. The output from this is then sent for
text recognition. PyTesseract[2] is used for this.
METHODOLOGY
The whole project is divided into three phases:
Phase 1: Text Detection
Phase 2: Text Recognition
Phase 3: Text to Speech Conversion
Phase 4: Prototyping a wearable device by implementing on the Raspberry Pi
Phase 1: Real time scene text detection was performed using EAST, a widely used architecture.
Phase 2: Text recognition was subsequently performed using the “Pytesseract” python tool. It is a
wrapper for Google’s Tesseract-OCR Engine.
Phase 3: “pyttsx3” is a text-to-speech conversion library in Python that was used for this phase.
Unlike alternative libraries that perform the same function, this one can work offline.
Phase 4: The pretrained models were then loaded to the Raspberry Pi 4 and real scene images were
captured using Raspberry Pi V2 camera module. The speech converted was then played using earphones
plugged into the raspberry pi.
RESULTS
Some of the results obtained are attached:
After Audio
Conversion
After Audio
Conversion
FUTURE WORK
An End-to-End model to detect text and read them to improve the speed and accuracy, instead of dividing into phases as seen in the above implementation. Adaptation to make a PDF Reader.
KEY LEARNINGS
● Neural Networks
● Raspberry Pi Platform
● Different deep learning architectures
REFERENCES
Any external references to better understand the project or the links referenced in the sections below:
● https://arxiv.org/pdf/1704.03155.pdf
● https://pypi.org/project/pytesseract/
TEAM
● Atreya Majumdar (atreyamaj@gmail.com)
● Divyansh Bansal (divyanshbansal0612@gmail.com)
● Rahasya Barkur (rahasyabarkur1999@gmail.com)
● Devishi Suresh (devishisureshkambiranda.171ec115@nitk.edu.in)