The EASIEST! way to do Text Classification with spaCy and Classy Classification

Sdílet
Vložit
  • čas přidán 25. 07. 2024
  • Join this channel to get access to perks:
    / @python-programming
    If you enjoy this video, please subscribe.
    ✅Be my Patron: / wjbmattingly
    ✅PayPal: www.paypal.com/cgi-bin/webscr...
    repo: github.com/wjbmattingly/fewsh...
    If there's a specific video you would like to see or a tutorial series, let me know in the comments and I will try and make it.
    If you liked this video, check out www.PythonHumanities.com, where I have Coding Exercises, Lessons, on-site Python shells where you can experiment with code, and a text version of the material discussed here.
    You can follow me at:
    / wjb_mattingly
  • Věda a technologie

Komentáře • 44

  • @python-programming
    @python-programming  Před 2 lety

    Repo: github.com/wjbmattingly/fewshot-text

  • @giantdutchviking
    @giantdutchviking Před 10 měsíci

    Thanks for making this vid, been learning Python for a bit and this stuff makes Python shine!

  • @wdonno
    @wdonno Před 2 lety +1

    You are reading my mind! Looking forward to this!

  • @shahidmahmood7252
    @shahidmahmood7252 Před 2 lety +1

    Good knowledge, shared wonderfully. Looks like a great module. Now thinking of all the applications in works of English literature. thanks!

  • @VitthalGusinge
    @VitthalGusinge Před 2 lety +2

    i am just searching for best NER algorithms since last two dasy for my usecase can't wait to see what you have it here

    • @python-programming
      @python-programming  Před 2 lety +1

      This won't focus on NER, but there is a few-shot NER from the same company called concise_concepts. I have tested it and found it good for some labels and bad for others.

  • @DK-rl1sf
    @DK-rl1sf Před rokem

    Thank you for this tutorial. I tried saving the trained model using nlp.to_disk('D:/ABC'). But when I load it back using spacy.load('D:/ABC') in a fresh Jupyter Notebook, I get the error "[E002] Can't find factory for 'text_categorizer' for language English (en). This usually happens when spaCy calls `nlp.create_pipe` with a custom component name that's not registered on the current language class. ...". I am still in the same conda environment so I can't be missing dependencies. What is causing this problem?

  • @nguyenngochai6245
    @nguyenngochai6245 Před 2 lety +2

    Thank you very much for sharing! Love it.
    May I ask would it be possible to add more classes to the data ? It would be even more awesome If it could be done for other non-English language models.

    • @python-programming
      @python-programming  Před 2 lety +1

      Yes it will be possible to add other classes and you can use any language model on hugging face

    • @nguyenngochai6245
      @nguyenngochai6245 Před 2 lety

      @@python-programming Thank you for your instant reply!
      I have successfully tried it with the "ja_core_news_lg" model, but I could not get a satisfactory result out of the Japanese sentence-transformers model. Do you have any tips for choosing the appropriate models?

    • @python-programming
      @python-programming  Před 2 lety +1

      @@nguyenngochai6245 no problem! I will test it out today

  • @luiztauffer8513
    @luiztauffer8513 Před rokem +1

    This is gold material, thanks so much for putting this out in such a comprehensive way!
    @Python Tutorials for Digital Humanities In one of your videos you mentioned you do research in History, is that right? I’m curious to know how people are using text classification methods such as this in History research, do you have any material you could point me out to?

    • @python-programming
      @python-programming  Před rokem +1

      Thanks!! Yes, my background is a PhD in medieval history but I mostly work with archival material at Smithsonian and USHMM. A lot of the publications you can find in history with text classification deal with sentiment analysis. You can find articles on Digital Humanities Quarterly and the Oxford Digital Humanities journal.

  • @kosemekars
    @kosemekars Před 2 lety +3

    Best text-related ML channel on youtube

  • @Filipkasic
    @Filipkasic Před 2 lety

    Is there a way to utilize this model without having to define what the keywords are but simply to provide a list of them without any definition?

  • @Hypothermia1337
    @Hypothermia1337 Před 2 lety +1

    Hello Dr. Mattingly, do you know if it's possible, to fine-tune a pre trained model? I'm really not familiar with that but I need to tweek a model with a few exceptions.
    Yours Sincerly

    • @python-programming
      @python-programming  Před 2 lety

      It is! If you want to fine tune a language model that can be done via Gensim or the Transformer library from HuggingFace. If you want to fine tune NER you will have some problems, namely catastrophic forgetting.

  • @lisagilyarovskaya5593
    @lisagilyarovskaya5593 Před 2 lety

    Thank you very much for this video, was looking for something exactly like this !! I was wondering if there is any way to save the model config on the disk once the pipe with support samples was added, do you have any ideas on that?

  • @victordeleon9988
    @victordeleon9988 Před 2 lety +1

    Great video, thanks a lot. Do you recommend any models in spanish besides those already available in spacy?

    • @python-programming
      @python-programming  Před 2 lety

      No problem! It depends on what you are trying to do, there are some great BERT models for Spanish. You can find them on HuggingFace's website.

    • @victordeleon9988
      @victordeleon9988 Před 2 lety +1

      @@python-programming Great, thanks a lot, your channel is awesome.

    • @python-programming
      @python-programming  Před 2 lety

      @@victordeleon9988 Thanks!!

  • @rf1890
    @rf1890 Před 2 lety +1

    I was trying to identify "local indicators of climate change impacts" (what changes people observe in their environment -... not city people... :D ) in a database of scientific articles. results are ok. its hard, but it might use as a pre-scan

  • @maxwellmandela
    @maxwellmandela Před 2 lety +1

    great stuff!

  • @szachynakubie4955
    @szachynakubie4955 Před 2 lety

    thank you

  • @CoreyMalcom
    @CoreyMalcom Před rokem

    This is a really good tutorial Thank you!
    I have not been able to get it running so far. When I attempt to "nlp.add_pipe( ) " on the text_categorizer, the kernel crashes and restarts. Any clue as to why this would be happening? I have a fresh environment with spacy and the classy_classification newly installed.

    • @python-programming
      @python-programming  Před rokem

      Thanks! Hmmm that is odd. What is your OS? Mind DMing me on Twitter with some pics?

    • @CoreyMalcom
      @CoreyMalcom Před rokem +1

      @@python-programming Sent. Thanks for looking at this. Will be really helpful.

    • @python-programming
      @python-programming  Před rokem

      @@CoreyMalcom no problem! I am in the middle of traveling. Will try and respond tomorrow

  • @youTanod
    @youTanod Před 2 lety +1

    Thank you very much for this useful video. This is exactly what I need.
    I tried it with real data, but I get this warning message, what should I do?
    UserWarning: The least populated class in y has only 1 members, which is less than n_splits=2.

    • @python-programming
      @python-programming  Před 2 lety

      Can you paste what your support data dictionary looks like?

    • @youTanod
      @youTanod Před 2 lety

      @@python-programming drive.google.com/file/d/1WcXuI2a7x_EvTreG5GWOE3lyj3Y9CAPc/view?usp=sharing

  • @ezrakassa3472
    @ezrakassa3472 Před 2 lety +2

    Cant wait. Is it multiple or binary classification though? I am hoping there would be a multiple classification as there is an elaborated video you did on binary classification?

    • @python-programming
      @python-programming  Před 2 lety

      This will be binary, but it works for multi-class just as well. Remember when you use few-shot classification, you are not doing traditional supervised learning. Instead, you are using the vectors of a support set (not training set) to then auto-identify similar vector sentences. The similarities are then scored so that you know how much something belongs to a certain category.
      The more classes that you have, the more support samples you need. I recommend using it to get a quick sense of your data and generate a starting data set quickly to then train a new model via supervised learning.
      This video is meant to serve as my transition into multi-class classification on this channel =), so those videos should be coming out shortly. We will use spaCy (simpler) and Keras (more advanced). It multiclass text classification will also receive a whole chapter in my forthcoming book on spaCy ML.

  • @gangs0846
    @gangs0846 Před 7 měsíci +1

    Is this still relevant comparing to using gpt for classification?

    • @python-programming
      @python-programming  Před 7 měsíci +1

      That is a great question. Yes, though GPT 4 is better at few shot than this approach. I still think this is useful for getting a quick classifier up and running locally to help annotating.

    • @gangs0846
      @gangs0846 Před 7 měsíci +1

      @@python-programming thank you sir

  • @trashyAIguy
    @trashyAIguy Před rokem

    Cool! I'll use it in my trashy ai to make it less trashy 🤣 to make it understand intentions