Optical Character Recognition From Beginner to Expert Using Python | Tesseract - Complete Tutorial
Vložit
- čas přidán 26. 12. 2021
- In this tutorial you will learn about both of concepts and practical implementations of optical character recognition in Python and Tesseract.
Tesseract is a most commonly used character recognition tool which was originally developed by the Google. Basically tesseract helps you to extract any text which is written in your digital images by using your command terminal or by using API implementations. Tesseract is not just an OCR which can extract written text from an image, it will help you to accomplish more advanced jobs which are related with character recognition operations. Some of them are get bounding estimates of recognized characters, convert images in to different output formats, use own customized configurations, get orientation and script detection reports, get tables of analysed verbose information. Tesseract supports Unicode encoding (UTF-8) and using tesseract you will be able to engage with more than 100 languages which is very helpful whenever you want to work with any other language rather than general english.
After watching and going through all the implementations regarding to this tutorial, you will end up with a guy who is well trained to work as an expert of optical character recognition !
Highly recommended for enthusiastic pythonists all over the world :)
Chapters
=========
1) Introduction to Tesseract and installation: 0:01:24
2) Introduction to Pytesseract and installation: 0:06:48
3) Configure tesseract path: 0:12:34
4) Check available languages: 0:14:17
5) Extract text from an image
5.1) Simple text extraction: 0:15:51
5.2) Specified language text extraction: 0:18:37
5.3) Multiple image text extraction: 0:32:05
5.4) Timeout text extraction: 0:35:53
6) Get and draw bounding boxes around characters: 0:40:19
7) Get report of verbose data: 0:48:22
8) Orientation and script detection: 0:51:49
9) Working with output formats
9.1) PDF: 0:57:02
9.2) HOCR: 0:59:21
9.3) XML: 1:00:40
10) Assigning Custom Configurations: 1:02:26
Download the project
====================
Google Drive : - drive.google.com/drive/folder...
References
==========
Tesseract: github.com/tesseract-ocr/tess...
Pytesseract: github.com/madmaze/pytesseract
Multiple config options: www.py4u.net/discuss/10850
Getting bounding box cordinates: stackoverflow.com/questions/2...
Social Media
============
Facebook: / sintax.tech.blog
Linkedin: / sineth-sankalpa-9aa4331ab
Subscribe 'The Sineth' and hit on the bell icon.
/ @thesineth
Thanks for watching ❤
1) Introduction to Tesseract and installation: 0:01:24
2) Introduction to Pytesseract and installation: 0:06:48
3) Configure tesseract path: 0:12:34
4) Check available languages: 0:14:17
5) Extract text from an image
5.1) Simple text extraction: 0:15:51
5.2) Specified language text extraction: 0:18:37
5.3) Multiple image text extraction: 0:32:05
5.4) Timeout text extraction: 0:35:53
6) Get and draw bounding boxes around characters: 0:40:19
7) Get report of verbose data: 0:48:22
8) Orientation and script detection: 0:51:49
9) Working with output formats
9.1) PDF: 0:57:02
9.2) HOCR: 0:59:21
9.3) XML: 1:00:40
10) Assigning Custom Configurations: 1:02:26
great job !!
❤️❤️
Thank u so much I searched for this everywhere and I found urs very greatful😇😇
Great work sineth 💐
great detail explanation
thank you for sharing
I will try this out.
Great work sinna ❤
Thank you so much,Sir.
Great work
Bro nice it really help me a lot nice video✌🤟👍
Nice work
Sineth thank a lot for your video, i whish you continue doing more videos about OCR and especcially if it were possible about handwritting text .
Thank you very much! Interesting tutorials are being readied.
🔥🔥
This is nice work 👏 👌
Thank you!
Heyyy, thanks a lot, I had project regarding it, I searched everywhere for well defined begineer friendly video, yours was a great, can you do text extraction from video, it would be helpful :)
Definetly.
Hi,
In "information about orientation and script detection" field,
I'm getting an error which says
"TesseractError: (1, 'Warning, detects only orientation with -l eng Error, OSD requires a model for the legacy engine')"
What can i do to run that block??
Thx
Hello Sir
It is Good informative video can you please let me you know can we extract text from passport or ids using this library
Make video on how to extract text if image is blured. ✌🤞
❤️❤️❤️
Hi Sineth! I'm trying to download tesseract on my Mac but I'm not sure which program to use. Could you help me out here?
Hi Kaitlyn !
Try out this one: gist.github.com/krissdap/1fb995bfd95c727eb7b4eb6d66ab7207
❤️🔥
♥️💪🔥
Does it work with a lited papers?
Any kind of an image is supported.
Can you help with conversion from image to excel direct.. ??
Yes. Contact me: sinethsankalpabkss@gmail.com
@@TheSineth emailed you
Can tesseract read pdf?
Hey sir, thanks for the project and tutorial but I am having some problems in code. I would like to contact you through mail, can you share your mail id please.
Thank you very much!
Contact: sinethsankalpabkss@gmail.com
Which algorithm using this project?