- Deep Learning based Text Detection Using OpenCV (C++/Python)
- Deep Learning based Text Recognition (OCR) using Tesseract and OpenCV
- OpenCV Text Detection (EAST text detector)
- OpenCV Text Detection (EAST text detector)
- Object detection with deep learning and OpenCV
Deep Learning based Text Detection Using OpenCV (C++/Python)It is capable of 1 running at near real-time at 13 FPS on p images and 2 obtains state-of-the-art text detection accuracy. To discover how to apply text detection with OpenCV, just keep reading! Detecting text in constrained, controlled environments can typically be accomplished by using heuristic-based approaches, such as exploiting gradient information or the fact that text is typically grouped into paragraphs and characters appear on a straight line. Natural scene text detection is different though — and much more challenging. Due to the proliferation of cheap digital cameras, and not to mention the fact that nearly every smartphone now has a camera, we need to be highly concerned with the conditions the image was captured under — and furthermore, what assumptions we can and cannot make. With the release of OpenCV 3. The EAST pipeline is capable of predicting words and lines of text at arbitrary orientations on p images, and furthermore, can run at 13 FPS, according to the authors. Perhaps most importantly, since the deep learning model is end-to-end, it is possible to sidestep computationally expensive sub-algorithms that other text detectors typically apply, including candidate aggregation and word partitioning. To build and train such a deep learning model, the EAST method utilizes novel, carefully designed loss functions. For more details on EAST, including architecture design and training methods, be sure to refer to the publication by the authors. You may wish to add your own images collected with your smartphone or ones you find online. If you have any improvements to the method please do feel free to share them in the comments below. Before we get started, I want to point out that you will need at least OpenCV 3. To begin, we import our required packages and modules on Lines In order to perform text detection using OpenCV and the EAST deep learning model, we need to extract the output feature maps of two layers:. We load the neural network into memory using cv2. We need to filter out weak text detections by ignoring areas that do not have sufficiently high probability Lines 82 and The EAST text detector naturally reduces volume size as the image passes through the network — our volume size is actually 4x smaller than our input image so we multiply by four to bring the coordinates back into respect of our original image. The final step is to apply non-maxima suppression to our bounding boxes to suppress weak overlapping bounding boxes and then display the resulting text predictions:. As I mentioned in the previous section, I could not use the non-maxima suppression in my OpenCV 4 install cv2. This scene contains a Spanish stop sign. As you can tell, EAST is quite accurate and relatively fast taking approximately 0. We begin by importing our packages. Everything else is the same as in the previous section. This function is used to extract:. Our command line arguments are parsed on Lines Our frame is resized, maintaining aspect ratio Line From there, we grab dimensions and compute the scaling ratios Lines We then resize the frame again must be a multiple of 32this time ignoring aspect ratio since we have stored the ratios for safe keeping Line By optimizing our for loops with Cythonwe should be able to increase the speed of our text detection pipeline. To download the source code to this tutorial, and start applying text detection to your own images, just enter your email address in the form below. Enter your email address below to get a. All too often I see developers, students, and researchers wasting their time, studying the wrong things, and generally struggling to get started with Computer Vision, Deep Learning, and OpenCV. I created this website to show you what I believe is the best possible way to get your start. I just have a toy project for text detection. The only caveat that my text might be in English or Arabic, so I will see if this can somehow help me out! It may work for your project as well, be sure to give it a try! I wonder how may: 1. I tune the implementation for Chinese characters which is square shape 2.
Deep Learning based Text Recognition (OCR) using Tesseract and OpenCV
In order to obtain the bounding box x, y -coordinates for an object in a image we need to instead apply object detection. Object detection can not only tell us what is in an image but also where the object is as well. When combined together these methods can be used for super fast, real-time object detection on resource constrained devices including the Raspberry Pi, smartphones, etc. This will enable us to pass input images through the network and obtain the output bounding box x, y - coordinates of each object in the image. SSDs, originally developed by Google, are a balance between the two. The algorithm is more straightforward and I would argue better explained in the original seminal paper than Faster R-CNNs. We can also enjoy a much faster FPS throughput than Girshick et al. To learn more about SSDs, please refer to Liu et al. When building object detection networks we normally use an existing network architecture, such as VGG or ResNet, and then use it inside the object detection pipeline. The problem is that these network architectures can be very large in the order of MB. Network architectures such as these are unsuitable for resource constrained devices due to their sheer size and resulting number of computations. Instead, we can use MobileNets Howard et al. MobileNets differ from traditional CNNs through the usage of depthwise separable convolution Figure 2 above. The problem is that we sacrifice accuracy — MobileNets are normally not as accurate as their larger big brothers…. For more details on MobileNets please see Howard et al. If we combine both the MobileNet architecture and the Single Shot Detector SSD framework, we arrive at a fast, efficient deep learning-based method to object detection. I urge you to start there while also supplying some query images of your own. On Lines 41 and 42 we set the input to the network and compute the forward pass for the input, storing the result as detections. Computing the forward pass and associated detections could take awhile depending on your model and input size, but for this example it will be relatively quick on most CPUs. We start by looping over our detections, keeping in mind that multiple objects can be detected in a single image. We also apply a check to the confidence i. If the confidence is high enough i.
OpenCV Text Detection (EAST text detector)
The importance of image processing has increased a lot during the last years. Especially with the growing market of smart phones people has started producing a huge amount of photos and videos which are continuously streamed on social platforms. Thus the increased interest of the industry towards this kind of problems is completely justified. Machine learning obviously plays a very significant role in this field. Automatic text detection and character recognition is just an example. One can cite other sophisticated applications such as animal species or plants identification, human beings detection or, more in general, extraction of any kind of information of commercial use. This field has been object of very intensive study in the past decades. Actually, at present, the problem of character recognition from black and white documents is considered solved. It is pretty common practice to scan a sheet of paper and use some standard software to convert it to a text file. In any case those are easy cases. The image are gray scale, very good contrast, no specific issue in single character contour detection and little problems due to lighting or shadows. A completely different scenario starts being depicted when we deal with natural scenes. For example a photo taken by a twitter user and then posted on the social platform. In this case the problem has not been solved at all. Actually there are still quite big issues in processing this kind of images. Of course the target of my project is not to find a final solution to this kind of open problem but in any case it is still worth trying and practice with such a fascinating topic. The post is organized as follows:. As you can see together with text at the bottom the background image is quite complex and overwhelming. The quote and the name of the author are also printed in two different font size which adds some sort of additional challenge to the task. After having loaded the image, it needs to be preprocessed. Specifically it goes through the next two steps:. After image cleaning, object detection is performed. Contours are identified and a rectangle is drawn around objects candidates. The result of this process is the following figure.
OpenCV Text Detection (EAST text detector)
The method of extracting text from images is also called Optical Character Recognition OCR or sometimes simply text recognition. Tesseract was developed as a proprietary software by Hewlett Packard Labs. Since it has been actively developed by Google and many open source contributors. Tesseract acquired maturity with version 3. Tesseract 3. In the past few years, Deep Learning based methods have surpassed traditional machine learning techniques by a huge margin in terms of accuracy in many areas of Computer Vision. Handwriting recognition is one of the prominent examples. So, it was just a matter of time before Tesseract too had a Deep Learning based recognition engine. Tesseract library is shipped with a handy command line tool called tesseract. We can use this tool to perform OCR on images and the output is stored in a text file. The usage is covered in Section 2but let us first start with installation instructions. Later in the tutorial, we will discuss how to install language and script files for languages other than English. Tesseract 4 is included with Ubuntu Due to certain dependencies, only Tesseract 3 is available from official release channels for Ubuntu versions older than If you have an Ubuntu version other than these, you will have to compile Tesseract from source. We will use Homebrew to install Tesseract on Homebrew. By default, Homebrew installs Tesseract 3, but we can nudge it to install the latest version from the Tesseract git repo using the following command. In the very basic usage, we specify the following. The language is chosen to be English and the OCR engine mode is set to 1 i. LSTM only. In Python, we use the pytesseract module. It is simply a wrapper around the command line tool with the command line options specified using the config argument. You can solve this in two ways. Tesseract is a general purpose OCR engine, but it works best when we have clean black text on solid white background in a common font. It also works well when the text is approximately horizontal and the text height is at least 20 pixels. If the text has a surrounding border, it may be detected as some random text. For example, if you scanned a book with a high-quality scanner, the results would be great. But if you took a passport with complex guilloche pattern in the background, the text recognition may not work as well. In such cases, there are several tricks that we need to employ to make reading such text possible. We will discuss those advance tricks in our next post. Even though there is a slight slant in the text, Tesseract does a reasonable job with very few mistakes. The text structure in book pages is very well defined i. A slightly difficult example is a Receipt which has non-uniform text layout and multiple fonts. Total 2. You can see there is some background clutter and the text is surrounded by a rectangle.