Skip to main content

Build Your Own Optical Character Recognition

This challenge is to build your own Optical Character Recognition (OCR) tool. OCR tools date back to work that began in 1914 aimed at creating reading devices for the blind. These days they’re used to extract text from images and videos either for information archival purposes or in apps like Google Translate that can both detect text in an image or video and translate it to another language!

The Challenge - Building An OCR Tool

The goal of this coding challenge is to build a tool that you can present with an image and it will detect and extract any text in the image.

Step Zero

As always we start at the beginning! Create a new project in your tech stack and programming of choice and then proceed to step 1.

You could build this OCR tool as a command line tool, GUI tool or an API. The choice is yours!

Step 1

In this step your goal is to load an image and detect whether it contains any text. I suggest you leverage your programming language’s support for PNGs, but feel free to support any other image format’s you will find useful.

Once you have loaded the image it’s time to determine if it contains any text and where that text is. You can make this coding challenge as easy or as complex as you like. If you’re up for a true challenge read up about text detection algorithms on Google Scholar and implement one from scratch. You’ll learn a lot!

Of you’d prefer to learn how to put together a solution using off-the-shelf tools check out OpenCV which can be used to identify text in the image. You might want to convert the image to a binary image (only black or white) before doing so.

To test your code create a simple rectangle with text in various locations then check your solution correctly finds all the text.

Step 2

Once you have identified text in the image you might need to perform some transformations. In other words, in this step your goal is to de-skew the text if needs be. The aim is to remove distortions so all the text is aligned in the same plane.

Again you can build this from scratch or leverage the power of existing libraries like OpenCV.

To test step 1 and step 2 I suggest you now use this image:

Coding Challenge OCR Test Image.png

There are six lines of text in it, some are easy to see as they’re white. Some are harder to see as they’re dark grey. Some of the lines are skewed and will need straightening out.

Step 3

In this step your goal is to identify character bounds. Again if you want to do the coding challenge on hard mode, Google Scholar will suggest some papers you can dig into. If you’re building on the shoulders of giants you can once again leverage OpenCV.

I’d suggest you render the test image with boxes around all the detected characters in order to verify you solution works.

Step 4

In this step your goal is to identify the characters you have detected. There are many approaches to doing so, from using matrix matching through to deep-learning with neural networks. This is a fun coding challenge all by itself! In fact if you want to do this step alone you can tackle the Kaggle Digit Recognizer.

Going Further

If you want to take this coding challenge further a fun next step is to extract text from video!

Help Others by Sharing Your Solutions!

If you think your solution is an example other developers can learn from please share it, put it on GitHub, GitLab or elsewhere. Then let me know - ping me a message on the Discord Server, via Twitter or LinkedIn or just post about it there and tag me. Alternately please add a link to it in the Coding Challenges Shared Solutions Github repo.

Get The Challenges By Email

If you would like to recieve the coding challenges by email, you can subscribe to the weekly newsletter on SubStack here: