OCR - A solution for the Visually Impaired

PROBLEM STATEMENT:

Visually compromised people find navigation quite tricky in their day-to-day lives. The project is an attempt at helping ease one of the many challenges they face by designing a prototype of a wearable device. It is an Optical Character Recognition (OCR) tool that can read the text from images and instantly convert to speech, thus not limiting them only to whatever is available in Braille. The Efficient and Accurate Scene Text detector (EAST) is a widely used text-detection architecture very often used in various OCR applications. It is made use of in this project.

PROPOSED SOLUTION:

Overview

A camera takes ‘snapshots’ of the object placed in front of it. This creates a single image which then can be sent for processing to detect the text written on it. Once the image is pre-processed, it is fed as input to the pre-trained model. Bounding boxes around the detected text are drawn. Once the text is detected in the above process, it is sent for further processing to detect individual characters. This approach is called transfer learning; when one model feeds its output as input to another model. This way, every character in the detected textual data is extracted and returned. The output is nothing but the detected text in the image in the form of a string.

Details

The above diagram shows the EAST model architecture. It does the work of text detection from scenic images. The details of its implementation can be found in [1]. The output from this is then sent for text recognition. PyTesseract[2] is used for this.

METHODOLOGY

The whole project is divided into three phases:
Phase 1: Text Detection
Phase 2: Text Recognition
Phase 3: Text to Speech Conversion
Phase 4: Prototyping a wearable device by implementing on the Raspberry Pi

Phase 1: Real time scene text detection was performed using EAST, a widely used architecture.
Phase 2: Text recognition was subsequently performed using the “Pytesseract” python tool. It is a wrapper for Google’s Tesseract-OCR Engine.
Phase 3: “pyttsx3” is a text-to-speech conversion library in Python that was used for this phase. Unlike alternative libraries that perform the same function, this one can work offline.
Phase 4: The pretrained models were then loaded to the Raspberry Pi 4 and real scene images were captured using Raspberry Pi V2 camera module. The speech converted was then played using earphones plugged into the raspberry pi.

RESULTS

Some of the results obtained are attached:

After Audio Conversion

After Audio Conversion

FUTURE WORK

An End-to-End model to detect text and read them to improve the speed and accuracy, instead of dividing into phases as seen in the above implementation. Adaptation to make a PDF Reader.

KEY LEARNINGS

● Neural Networks
● Raspberry Pi Platform
● Different deep learning architectures

REFERENCES

Any external references to better understand the project or the links referenced in the sections below:

● https://arxiv.org/pdf/1704.03155.pdf
● https://pypi.org/project/pytesseract/

TEAM

● Atreya Majumdar (atreyamaj@gmail.com)
● Divyansh Bansal (divyanshbansal0612@gmail.com)
● Rahasya Barkur (rahasyabarkur1999@gmail.com)
● Devishi Suresh (devishisureshkambiranda.171ec115@nitk.edu.in)