Optical Character Recognition From Beginner to Expert Using Python | Tesseract - Complete Tutorial

Sdílet
Vložit
  • čas přidán 26. 12. 2021
  • In this tutorial you will learn about both of concepts and practical implementations of optical character recognition in Python and Tesseract.
    Tesseract is a most commonly used character recognition tool which was originally developed by the Google. Basically tesseract helps you to extract any text which is written in your digital images by using your command terminal or by using API implementations. Tesseract is not just an OCR which can extract written text from an image, it will help you to accomplish more advanced jobs which are related with character recognition operations. Some of them are get bounding estimates of recognized characters, convert images in to different output formats, use own customized configurations, get orientation and script detection reports, get tables of analysed verbose information. Tesseract supports Unicode encoding (UTF-8) and using tesseract you will be able to engage with more than 100 languages which is very helpful whenever you want to work with any other language rather than general english.
    After watching and going through all the implementations regarding to this tutorial, you will end up with a guy who is well trained to work as an expert of optical character recognition !
    Highly recommended for enthusiastic pythonists all over the world :)
    Chapters
    =========
    1) Introduction to Tesseract and installation: 0:01:24
    2) Introduction to Pytesseract and installation: 0:06:48
    3) Configure tesseract path: 0:12:34
    4) Check available languages: 0:14:17
    5) Extract text from an image
    5.1) Simple text extraction: 0:15:51
    5.2) Specified language text extraction: 0:18:37
    5.3) Multiple image text extraction: 0:32:05
    5.4) Timeout text extraction: 0:35:53
    6) Get and draw bounding boxes around characters: 0:40:19
    7) Get report of verbose data: 0:48:22
    8) Orientation and script detection: 0:51:49
    9) Working with output formats
    9.1) PDF: 0:57:02
    9.2) HOCR: 0:59:21
    9.3) XML: 1:00:40
    10) Assigning Custom Configurations: 1:02:26
    Download the project
    ====================
    Google Drive : - drive.google.com/drive/folder...
    References
    ==========
    Tesseract: github.com/tesseract-ocr/tess...
    Pytesseract: github.com/madmaze/pytesseract
    Multiple config options: www.py4u.net/discuss/10850
    Getting bounding box cordinates: stackoverflow.com/questions/2...
    Social Media
    ============
    Facebook: / sintax.tech.blog
    Linkedin: / sineth-sankalpa-9aa4331ab
    Subscribe 'The Sineth' and hit on the bell icon.
    / @thesineth
    Thanks for watching ❤

Komentáře • 35

  • @TheSineth
    @TheSineth  Před 2 lety +5

    1) Introduction to Tesseract and installation: 0:01:24
    2) Introduction to Pytesseract and installation: 0:06:48
    3) Configure tesseract path: 0:12:34
    4) Check available languages: 0:14:17
    5) Extract text from an image
    5.1) Simple text extraction: 0:15:51
    5.2) Specified language text extraction: 0:18:37
    5.3) Multiple image text extraction: 0:32:05
    5.4) Timeout text extraction: 0:35:53
    6) Get and draw bounding boxes around characters: 0:40:19
    7) Get report of verbose data: 0:48:22
    8) Orientation and script detection: 0:51:49
    9) Working with output formats
    9.1) PDF: 0:57:02
    9.2) HOCR: 0:59:21
    9.3) XML: 1:00:40
    10) Assigning Custom Configurations: 1:02:26

  • @washiniranasinghe3856
    @washiniranasinghe3856 Před 2 lety +1

    great job !!
    ❤️❤️

  • @pravallika527
    @pravallika527 Před rokem

    Thank u so much I searched for this everywhere and I found urs very greatful😇😇

  • @shyamalikannangara8665

    Great work sineth 💐

  • @tony-go-code
    @tony-go-code Před rokem +1

    great detail explanation
    thank you for sharing
    I will try this out.

  • @madhushankha..5379
    @madhushankha..5379 Před 2 lety

    Great work sinna ❤

  • @uminhtetoo
    @uminhtetoo Před rokem

    Thank you so much,Sir.

  • @nethramandari3611
    @nethramandari3611 Před 2 lety

    Great work

  • @HirenThakkar45
    @HirenThakkar45 Před rokem

    Bro nice it really help me a lot nice video✌🤟👍

  • @thiwankaarunalu9211
    @thiwankaarunalu9211 Před 2 lety

    Nice work

  • @alexnieto5036
    @alexnieto5036 Před rokem

    Sineth thank a lot for your video, i whish you continue doing more videos about OCR and especcially if it were possible about handwritting text .

    • @TheSineth
      @TheSineth  Před rokem

      Thank you very much! Interesting tutorials are being readied.

  • @chamathkaadihetti7902
    @chamathkaadihetti7902 Před 2 lety +1

    🔥🔥

  • @winkfordmboma4560
    @winkfordmboma4560 Před rokem

    This is nice work 👏 👌

  • @khushipitroda385
    @khushipitroda385 Před 2 lety +2

    Heyyy, thanks a lot, I had project regarding it, I searched everywhere for well defined begineer friendly video, yours was a great, can you do text extraction from video, it would be helpful :)

  • @ArunKumar-ov4oe
    @ArunKumar-ov4oe Před 29 dny

    Hi,
    In "information about orientation and script detection" field,
    I'm getting an error which says
    "TesseractError: (1, 'Warning, detects only orientation with -l eng Error, OSD requires a model for the legacy engine')"
    What can i do to run that block??

  • @Redstonedust-rc9nr
    @Redstonedust-rc9nr Před rokem

    Thx

  • @muazzamali7050
    @muazzamali7050 Před 8 měsíci

    Hello Sir
    It is Good informative video can you please let me you know can we extract text from passport or ids using this library

  • @HirenThakkar45
    @HirenThakkar45 Před rokem

    Make video on how to extract text if image is blured. ✌🤞

  • @nethramandari3611
    @nethramandari3611 Před 2 lety

    ❤️❤️❤️

  • @kaitlynlarocco6992
    @kaitlynlarocco6992 Před 2 lety

    Hi Sineth! I'm trying to download tesseract on my Mac but I'm not sure which program to use. Could you help me out here?

    • @TheSineth
      @TheSineth  Před 2 lety

      Hi Kaitlyn !
      Try out this one: gist.github.com/krissdap/1fb995bfd95c727eb7b4eb6d66ab7207

  • @sumedikanishadi3388
    @sumedikanishadi3388 Před 2 lety

    ❤️‍🔥

  • @dilmithwalgampaya2234
    @dilmithwalgampaya2234 Před 2 lety

    ♥️💪🔥

  • @kebabsharif9627
    @kebabsharif9627 Před rokem

    Does it work with a lited papers?

    • @TheSineth
      @TheSineth  Před rokem

      Any kind of an image is supported.

  • @winkfordmboma4560
    @winkfordmboma4560 Před rokem

    Can you help with conversion from image to excel direct.. ??

  • @user-pp1vi7wj1w
    @user-pp1vi7wj1w Před 6 dny

    Can tesseract read pdf?

  • @shreyajaiswal300
    @shreyajaiswal300 Před rokem

    Hey sir, thanks for the project and tutorial but I am having some problems in code. I would like to contact you through mail, can you share your mail id please.