K-Means Clustering Text Documents: Python in Excel Tutorial (Free Files)

Sdílet
Vložit
  • čas přidán 23. 07. 2024
  • ⬇️ Get the files and follow along: bit.ly/4aLsMg7
    Your boss hands you a pile of documents and asks you to do some "data magic." What do you do? Use the k-means clustering algorithm!
    In this video, I will teach you a powerful technique for working with text documents using Python in Excel:
    1️⃣ Preprocess your text documents using the mighty TF-IDF.
    2️⃣ Cluster the documents using k-means.
    3️⃣ Use a machine learning model to help interpret the clusters.
    ☕ If you found this content useful and would like to support the channel, you can buy me a coffee: www.buymeacoffee.com/DaveOnData
    --------------------------------------------------------------------------------------------
    LEARN MORE
    --------------------------------------------------------------------------------------------
    My free crash course on k-means clustering (files included):
    bit.ly/ClusterAnalysisWithPython
    My free crash course on decision trees (files included):
    bit.ly/DecisionTreesWithPython
    My free crash course on tuning decision trees (files included):
    bit.ly/TuningDecisionTreesWit...
    --------------------------------------------------------------------------------------------
    VIDEO CHAPTERS
    --------------------------------------------------------------------------------------------
    00:00 Intro
    01:42 Tokenization
    05:31 Document Vectors
    06:40 The Naïve Bayes Algorithm
    10:50 The Math of Naïve Bayes
    18:10 Training the Naïve Bayes Model in Excel
    24:46 Testing the Naïve Bayes Model in Excel
    28:06 What’s Next?
    #pythoninexcel #pythonexcel #pythonforexcel
  • Věda a technologie

Komentáře • 5

  • @kristoferbrown8007
    @kristoferbrown8007 Před 3 měsíci +1

    "Studebaker" 😂 Dating yourself my friend. TF-IDF + K-Means + Decision Tree = Magic, though it still requires a good bit of extrapolation in order to understand the results. I find this piece to be most intimidating, and a barrier to diving right in. It seems like one must accumulate a certain threshold of experience in order to interpret the results properly. Just my 2 cents as I follow along with your videos. 👍

    • @DaveOnData
      @DaveOnData  Před 2 měsíci

      Interpreting cluster assignments of text documents can be particularly challenging! As you correctly point out, there's no substitute for experience.
      Hopefully, my next video on topic modeling using LDA will prove useful to you as an alternative as you begin your journey.

  • @michaelt312
    @michaelt312 Před 2 měsíci +2

    Just my brain spinning. My assumption is that I could extract a particular sets of notes in an EHR into a csv file. I can then use this process to report on particular phrases?
    Sorry, work has me buried so just now seeing your video and barely able to pay attention. But will revisit it soon.
    Hope all is well.

    • @DaveOnData
      @DaveOnData  Před 2 měsíci +1

      @michaelt312 - Correct! For example, you can use n-grams to perform an analysis on words that frequently occur in a sequence (e.g., "united states").

    • @michaelt312
      @michaelt312 Před 2 měsíci

      @@DaveOnData, I'll be re-watching this evening. Have a ticket in for the extraction since I don't have access to this particular part of Epic. I'm really looking forward to this. Hopefully will prove a theory...
      I'll report back with what I can.