Step-By-Step Handwriting Words Recognition With PyTorch
Vložit
- čas přidán 19. 03. 2023
- In this tutorial, we will extend the previous tutorial to build a custom PyTorch model using the IAM Dataset for recognizing handwritten text. This dataset is commonly used as a benchmark for OCR systems and can provide a valuable foundation for constructing your own OCR system. We will be using several machine learning libraries and techniques to preprocess the data, augment it, and train a deep learning model.
During this tutorial, we will cover the following:
- An overview of the IAM Dataset and handwritten text recognition;
- Code walkthrough for importing required modules and libraries;
- Downloading and extracting the dataset using the download_and_unzip function;
- Preprocessing the dataset, including data parsing, vocab set creation, and maximum label length;
- Data augmentation techniques to improve model performance;
- A deep dive into PyTorch model training with custom CTC loss function and callbacks;
- Evaluation metrics like CER and WER to monitor training progress;
- Saving and exporting the trained PyTorch model in ONNX format.
By the end of this tutorial, you will have a good understanding of how to train a custom PyTorch model for recognizing handwritten text using the IAM Dataset. Join me in this exciting journey of handwriting recognition with PyTorch!
Text Version Tutorial: pylessons.com/pytorch-wrapper
GitHub: github.com/pythonlessons/mltu...
pypi: pypi.org/project/mltu/
#machinelearning #python #pytorch #ocr #tensorflow
Thank you so much but please can you tell how I can use my inputs to test it I've already trained with a different dataset
The text version of the tutorial has a google drive link at the end containing the trained model but I am unable to get it running
can I get some help ?
Thank for the video ,I wanna use your code, but I have a large word dataset should change anything to you code when training?
I am not sure, it depends on your dataset, but you shouldn't need to do huge changes I think, it depends how it trains
this works well with the dataset images but if i pass some other word images not from the dataset then it cant predict. same thing happens with the tensorflow model as well. Am i doing something wrong?
no, your example should be at least similar to examples that are in training data. Usually you would need to combine several large datasets and train model from them, so then model would be more robust
Thank you so much! very well explained. But I'm getting while trying to download dataset, it show error "HTTPerror: Bad Gateway"
Please help me in this regard if possible
Hello, link is not working anymore, I'll try to find new link when I'll find time
This works if image only contains 1 word or sentence (like 1 in your tensorflow video), but what if I want to train it on document like form or invoice what should I do?
Predict straight from large document is way harder task, you will need way larger dataset and model that you will need to train for months, but if you have this kind of resources its up to you :) This is why all solutions implement this in smaller steps
find a way to crop each word. Ive done this in a website with live view, using opencv it finds possible words and crops only that bit of each frame, then you can also straighten the image and apply erosion then dilation. OpenCV has a lot of tools to help with that. I got a few functions here like dilate, findContours, boundingRect, contourArea. There are more to prepare the image but these are the main ones to find individual words.
Hello, I tell you that I should try to do the first thing, having the minimum required to start with the code. This is the import of the libraries but I get error after error, did you already have those libraries installed before? Or did you install them for this video?
I would like to create a system capable of recognizing handwritten text, do you recommend pytorch or tensorflow?
Hello, it depends what OS you use and if you have GPU on your machine. PyTorch is easier to learn and easier to run on all OS systems. TensorFlow is harder to learn and with latest versions it's pretty hard to install it on Windows with GPU support. People who are programming on Windows are shifting to PyTorch because of easier setup
Hello. Thank you for the tutorial! I attempted to run the code on my end, but I get a 502 bad gateway for dataset link provided. Was the link changed?
Your welcome. No everything works just fine for me. fki.tic.heia-fr.ch/databases/download-the-iam-handwriting-database
@@PyLessons I ended up adjusting the path in the training code to point to my local copy of the dataset instead of downloading, and it seems to be working fine so far. Thank you for the help and the great tutorial/source code!
@@PyLessons it still doesn't work in my case. Same for the new link that you have shared. Could you kindly check it please. I cannot find the words.txt file even after unzipping the dataset.
Hi. Can I get your trained model by any chance?
Can you please tell me how can we take input from our side after training the model with datasets ??
Its pretty simple, I gave another file where I test it, modify it
hey how can I use nougat to make it work more efficiently with maths and other things to any idea?
No idea
Great video !
Question - What if we want to extract text from image, (Not hand written) ? Will the same model work ?
Thanks! Yes, it should work :)
@@PyLessons what if we want to extract sentences? Will the model be able to put words in sequence?
@@aspboss1973 when I was trying it, longer sentences harder to train it. It's way easier to use another techniques to separate words from sentences, predict and then combine
@@PyLessons So this technique won't be able to capture 8-10 word long scentences.
ModuleNotFoundError: No module named 'mltu.torch.losses'
I already install mltu==1.0.1, but still didn't work
@PyLessons when i try to execute fit method
I got error
UnboundLocalError : cannot access local variable 'loss_info' where it is not associated with a value
I assume you are using latest version of mltu package, you found a bug with my latest release, thanks, going to fix it asap
@@PyLessons thanks to you, your content are really helpful.
Thanks!
Released a bug fix, now you can install 1.2.2 version and everything should be fine
If i want to contact with you regarding some task. How will i do?
Hello Thank you very much for your content. Can I please know that can I use this code foridentify handwritten text in a full page
Hello, you are welcome. Attach some kind of hand written text boject detector, and try to solve task in that way
@@PyLessons thank you very much 🙌
Hello👋 can you please attach the links of latest datasets that are available. It would be a great help because i have project deadline within a week😅
You may be not able to access dataset website from your location, try to use vpn to access dataset
Grazie.
Thank you for your support!
How can i modify the code to process the data once?
Because i want to improve the model so it's time consuming when i train it and stop and train it again i waiting the data to process
It does process it only once, since you use cache as True (stores images in memory). But if you don't want to use augmentors, you may remove these lines, but model trains better with them
@@PyLessons How long it takes to finish 1000? because in my computer it takes 4-5mins for 1epoch.
Btw. thanks for the tutorial. i have a task now for text recognition and your tutorial is very helpful.
@@ruckydelmoro2500 You can use math :) but it won't take 1000 epochs if you use validation data, early stopping will work
what are your specs?
As I remember I used I7-7700k CPU and 1080TI GPU :)
@@PyLessons just to let you know, ive created a korean dataset with 50k images, and trained using your script, got an average CER: 0.063.
Also my code to create the dataset was done on a rush so the dataset looks horrible and took 15 hours to create it, but somehow its able to recognise most things i write down. Next ist to train the english model with korean and see what happens.
Sounds great! Good job, keep doing this stuff :) if you have any suggestions for improvements, let me know!
i couldn't install the mltu
Why? what OS, what python version?
@@PyLessons using python 3.9.16 using mac os, regarding to the tutorial i was tryna to install mltu=1.0.3
@@PyLessons i tried to not specify the version but it’s still not working
it says
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
Why can't you use a colab file and share it with us... I will be very useful for us
You are not learning while using colab, I found out its better practice to use pure python script, if you want to do step by step, experiment in debugger
I need your help. How we can contact you?
You can find my contacts on www.pylessons.com