UiPath Machine Learning Model Training - Best Practices | RPA | Artificial Intelligence

Sdílet
Vložit
  • čas přidán 11. 07. 2024
  • Artificial Intelligence, Machine Learning combined with Robotic Process Automation (#RPA) expand the automation capabilities of the robots.
    So, how best can we train the ML models we create to help us do complicated activities such as prediction, data extraction, classification, etc.?
    This video takes you through some common questions that help us better understand the best concepts around ML model training in #UiPath
    ▬ Contents of this video ▬▬▬▬▬▬▬▬▬▬
    0:00 - Introduction
    1:09 - How AI Robot License Work
    03:28 - Selecting Major & Minor Versions in Training Pipelines
    07:48 - Importance of Evaluating Your ML Models
    10:56 - Best Frequency to Schedule Training Pipelines
    12:08 - Use of Validation Station for Initial Training of Document Understanding ML Models
    15:16 - Should we Submit All Documents We Process for Retraining (Even the One's That Give Higher Confidence Levels)
    18:51 - I already have a Trained Model. A Production Scope Change Gives me New Invoices With a New Language the Model is Not Trained For. How to Handle This Scenario
    21:24 - Is it OK to Maintain Multiple Data Manager Sessions for a Single ML Model
    Related Documentation Links:
    docs.uipath.com/document-unde...
    #UiPathCommunity
  • Věda a technologie

Komentáře • 36

  • @hemantbonde6348
    @hemantbonde6348 Před 2 lety +1

    Great! Thanks for sharing!

  • @vijayalakshmiboya1904
    @vijayalakshmiboya1904 Před 2 lety +1

    Excellent and Awesome Video on ML pipelines and models used in AI

  • @bharatguptag1994bharat

    Hi Lahiru,
    Thanks a lot for your detailed videos on each document understanding topic.
    I have one doubt as you mentioned it is always good to train from lowest minor version but I have following doubt :
    1) I trained on 500 docs and after training pipeline package version changed from 24.0 to 24.1 which is ok.
    2) After 1 month I got 1000 invoices to train can I train only 1000 invoices on 24.1 to get 24.2 or I need to train 500 (old) plus 1000 total 1500 on 24.0 to get 24.2.
    3) And Auto finetune feature works on this logic only right it will train 1500 invoices on version 24.0 to get 24.2

  • @anusha2690
    @anusha2690 Před rokem +1

    Hi Sir
    After labelling in document manager while exporting we have 4 things in export now
    1)Current search results
    2)All Labelled
    3)Schema
    4)ALL
    what is the difference between ALL and All Labelled
    Could you please explain this

    • @LahiruFernando
      @LahiruFernando  Před rokem +1

      Hi again :)
      In simple terms, its like this...
      - Current Search Results : This only exports the items you see from the current search (the selected one on the top dropdown)
      - All Labled : This exports all the labeled documents from all batches you have in the document manager
      - Schema : Exports only the schema of the model
      - ALL : Exports everything in document manager (labeled and non-labeled)

  • @anusha2690
    @anusha2690 Před rokem +1

    After creating pipelines can we delete the pipeline if we are not using that version in ML skill
    Is it mandatory to use same version in Pipeline and ML Skill

    • @LahiruFernando
      @LahiruFernando  Před rokem +1

      Hello @anusha
      Yes, you can delete the pipeline. So, after pipeline is successful, deleting the pipeline will only delete the pipeline record. It will not delete the latest updated package version. In order to delete the updated package version, you can go into Packages screen, and delete the new version from there.
      It is not mandatory to use the same version as in the skill in the pipeline. If you wanna rollback to a previous skill, we can do that too..

    • @anusha2690
      @anusha2690 Před rokem +1

      @@LahiruFernando
      Thankyou so much for your reply
      I have one more query
      I have created one dataset1 and labelled it for 10 invoices and created pipeline, but results are not accurate,
      Then again, I created another dataset2 and imported JSON files which generated through ML extractor trainer and created Pipeline
      This time results are 85 % accurate
      To achieve 100 %, I repeated the second steps like creating other dataset3 and importing JSON files and creating pipeline. But results came up to 20 % accurate.
      Like you said in this video I thought of using lowest number if minor version. Then also results are not accurate.
      other thing is Iam getting more accurate results when I use highest number (4) in minor version than lowest number (0) in minor version.
      But i am not getting 100% accuracy in any way.
      Please guide me.
      I will be very thankful to you

    • @LahiruFernando
      @LahiruFernando  Před rokem +1

      Hi @@anusha2690
      Sorry for the late reply...
      it is not easy to get 100% accuracy, but you can get closer to it.
      So basically, when you have multiple datasets, you need to select the root folder (exports) folder where you have all the Data Manager training exports and updates received from the Action Center "fine-tune" folder.
      Once you do a training run on that entire set, that should give you a higher accuracy level.
      That's what we do in enterprise level too..
      First get the data manager exports, and get that root folder containing exports and fine-tune folders to consider everything for each training run..
      Is that how you did?
      It should not decrease the accuracy unless something is wrong...

  • @shreyjain7959
    @shreyjain7959 Před 2 lety +1

    Great Video Lahiru!! Cleared some of my doubts (specially the minor version one!). But I still have a couple more if it's all right with you😅.
    1.) In the second part of your Training UiPath Document Understanding ML Models series, you create a separate data manager. But in this video you advise to keep one single data manager. Can you shed some light on this?
    2.) Let's say I ran an initial train run with 20 docs in a single data manager. After a few weeks I get some more docs. So should I add those docs to the same data manager or a different one? And if it's the same one should I add as a different batch?
    3.) Also Once I label and train the docs in the data manager, and then use the present validation station to further label the docs, after that, is the model ready to be used? Is that the final step? Where does the evaluation part come in?
    4.) Suppose I have 15 docs with all the elements I want to extract and 5 docs which do not contain some of the elements. Should I have a separate model for those 5 docs or should I train all 20 docs in a single data manager?
    Thank you for your time and help!

    • @shreyjain7959
      @shreyjain7959 Před 2 lety +1

      @Lahiru Fernando Can you please answer my question?

    • @LahiruFernando
      @LahiruFernando  Před 2 lety

      Hi again, I missed this one as well ( not sure why it didn't show up).. Anyways, Thank you so much for your patience, and for the awesome feedback. Let me answer your questions one by one...
      1. So about the Data Manager - the answer is we need to use the same Data Manager session to improve the accuracy as explained in the video. My previous comment also explains this, so I guess you already understand.. but feel free to ping me if you have any questions..
      2. You need to add those to the same Data Manager session. You can decide whether you want to maintain it as a separate batch in the same DM session. Ideally we use batches to separate the documents for easy reference and better organization. You can either have a separate batch, or add to the previous depending on your need. Either way is fine. In case you decide to add to a different batch, when you export, make sure you select the filter as "All" or "All Labeled" to export all the labeled documents from all the batches you have for the training. If you wish to export only a specific batch, you can select that batch from the filter and then export it.
      3. So, once you have done the initial training, you can run the validation station and see how good your model is in predicting the values. You can also use that information in the validation station to further enhance the model by submitting that information for continuous retraining. The evaluation comes in as a separate thingy. Based on my experience, we do an evaluation run after the initial training we do in the Data Manager. What happens with the evaluation run is that we use another sample set of documents (which are not used for training) to evaluate the accuracy of the model. The evaluation gives you the accuracy percentage of the model. Using that, we can decide whether we need more training, and also we can use the accuracy percentage to build our logics (for deciding whether we need manual verification or not).
      4. No. whether the fields are available or not, we use a single Data Manager session. But ensure you atleast have 10 documents with all the fields (this is the minimum requirement, however, higher the better). Once you have the minimum requirement, you can still have the other documents that do not have all the fields, and still train those on he fields they have.
      I hope this is helpful :)
      and again, feel free to ping me anytime.. and next time I will make sure I reply to you much faster :)
      Have a great day my friend

    • @shreyjain7959
      @shreyjain7959 Před 2 lety +1

      @@LahiruFernando thank you for solving my doubts! Also I wanted to ask if I have less no of documents, what should I do to increase the accuracy? Should I keep on retraining on the same set with minor version as 0 (as you did in one of your videos) or are there other ways?
      Again thanks for your time and effort!

    • @LahiruFernando
      @LahiruFernando  Před 2 lety

      @@shreyjain7959 Hey... Happy to know that I was able to answer your queries :)
      So about the documents, it is always better to have a large dataset to train. less documents means we are looking at a very small dataset and patterns. In such case, if we come up with something new, that might not extract values as expected. So it is always a good practice to have at least 10 to 15 documents from each vendor (that has different layouts) and train on them. So let's say for example you have 10 vendors. Each vendor has their own format.
      If we use only one vendor to train, the data from other vendors might not be extracted properly as the model does not know such a format. So what we can do is.. use at least 10 documents from each vendor (10 + 10 = 100) and train on those.
      With this approach, it is trained on many documents, and has a better chance of capturing variations.

    • @shreyjain7959
      @shreyjain7959 Před 2 lety +1

      @@LahiruFernando Ok got it! Thanks

  • @hemantbonde6348
    @hemantbonde6348 Před 2 lety +1

    Have few questions:
    1. As in the trial version we have only 2 Ai licenses so we can create only 2 ML Skills and 1 Training or evaluation pipeline. So what will happen if we try to create more than 2 ML skills or more than 1 Training or evaluation pipeline?
    2. I have created a (Custom)Document Understanding model and initially trained in Data Manager using 10 documents. Now suppose I ran the process for 10 documents in uipath and created a logic - if it reached threshold confidence then it will directly go to Export Extraction Results activity and if didn't reach the threshold it will go to present validation station. Now my question is if, out of 10 documents, only 2 documents went to the present validation station. So can I import the output folder returned by the present validation station for these 2 documents in Data Manager? Or does it always needs 10 documents here as well?
    3. Can you show an example of evaluating(evaluation pipeline) the ML Model?

    • @LahiruFernando
      @LahiruFernando  Před 2 lety +2

      And here are my answers :)
      1. You can create more than 2 ML skills. As I explained, 1 AI robot can handle 2 ML Skills. So, since we have 2 AI Robots, you can create up to 4 ML skills without any issue. However the limitation is, you will not be able to run and evaluation or training pipelines. You need to have 1 AI robot license free all the time for that. So the moment you create your 3rd ML model in Trial, that will utilize 50% of the 2nd AI robot license; hence it will not be free for the training pipelines. That's the limitation.
      2. Very good question. The only requirement in Data Manager is, you need to have 10 unique documents in it initially with unique values. There after, no matter the number of documents, (whether its 1 or 2 etc) you can add those to the Data Manager session. Important thing is you add those to the same DM session where you have the other documents. So, over time it will grow in size, and you will have a much larger dataset every time to train on.
      3. For sure. Let me plan something around this :)

    • @hemantbonde6348
      @hemantbonde6348 Před 2 lety +1

      @@LahiruFernando Thanks for answering my questions

  • @anusha2690
    @anusha2690 Před rokem +1

    Hi sir
    I have one more query
    I have created one dataset1 and labelled it for 10 invoices and created pipeline, but results are not accurate,
    Then again, I created another dataset2 and imported JSON files which generated through ML extractor trainer and created Pipeline
    This time results are 85 % accurate
    To achieve 100 %, I repeated the second steps like creating other dataset3 and importing JSON files and creating pipeline. But results came up to 20 % accurate.
    Like you said in this video I thought of using lowest number if minor version. Then also results are not accurate.
    other thing is Iam getting more accurate results when I use highest number (4) in minor version than lowest number (0) in minor version.
    But i am not getting 100% accuracy in any way.
    Please guide me.
    I will be very thankful to you

    • @LahiruFernando
      @LahiruFernando  Před rokem +1

      Hi my friend,
      I believe I answered to your previous comment on the same thing..
      Let me know if that helps.. If not, Im more than happy to have a call with you, discuss and guide you on the flow..

    • @anusha2690
      @anusha2690 Před rokem +1

      Hi sir, can we have a call for 10 min, I have few doubts in minor version to be used while creating pipelines and ML Skills and some other doubts

    • @LahiruFernando
      @LahiruFernando  Před rokem +1

      @@anusha2690 Hi again..
      Of course I can connect with you for a call.
      Send me an email, and I'll share my number so we can connect on Whatsapp or anything to discuss.
      Here is my email: lahirufernando90@gmail.com

    • @anusha2690
      @anusha2690 Před rokem

      @@LahiruFernando Hi sir.
      I have sent mail to you

  • @anusha2690
    @anusha2690 Před rokem +1

    Hi sir
    Can you please tell me the importance of fine tune folder in dataset

    • @LahiruFernando
      @LahiruFernando  Před rokem +1

      Hello 👋
      Yes.. its a great question.
      So this fine-tune folder is created when you pass validated data from the extractor trainer activity to AI Center. This folder will contain all the information about manually verified documents on action center to improve the accuracy.
      So, during our training pipelines, we consider both fine-tune and export folders (initial training folders) to improve our skill.
      Hope this helps..

    • @anusha2690
      @anusha2690 Před rokem +1

      Hi sir,
      Thank you for your reply
      When we are creating pipeline, we are selecting only export folder right.
      Then how fine tune folder will help.
      Will fine tune folder act just as support for export folder.

    • @LahiruFernando
      @LahiruFernando  Před rokem +1

      @@anusha2690 We select export folder only when we have initial export data. But when we have fine-tune and export both, we select the folder that contain both of those folders as the dataset for training.

    • @anusha2690
      @anusha2690 Před rokem +1

      Thank you sir. I got your point now

    • @anusha2690
      @anusha2690 Před rokem +1

      @@LahiruFernando when i select the folder which contains both , iam getting below error
      Training Job failed, error: Document type purchase_orders not valid, check that document type data is in dataset folder and follows folder structure.