K-Means Clustering Text Documents: Python in Excel Tutorial (Free Files)
Vložit
- čas přidán 23. 07. 2024
- ⬇️ Get the files and follow along: bit.ly/4aLsMg7
Your boss hands you a pile of documents and asks you to do some "data magic." What do you do? Use the k-means clustering algorithm!
In this video, I will teach you a powerful technique for working with text documents using Python in Excel:
1️⃣ Preprocess your text documents using the mighty TF-IDF.
2️⃣ Cluster the documents using k-means.
3️⃣ Use a machine learning model to help interpret the clusters.
☕ If you found this content useful and would like to support the channel, you can buy me a coffee: www.buymeacoffee.com/DaveOnData
--------------------------------------------------------------------------------------------
LEARN MORE
--------------------------------------------------------------------------------------------
My free crash course on k-means clustering (files included):
bit.ly/ClusterAnalysisWithPython
My free crash course on decision trees (files included):
bit.ly/DecisionTreesWithPython
My free crash course on tuning decision trees (files included):
bit.ly/TuningDecisionTreesWit...
--------------------------------------------------------------------------------------------
VIDEO CHAPTERS
--------------------------------------------------------------------------------------------
00:00 Intro
01:42 Tokenization
05:31 Document Vectors
06:40 The Naïve Bayes Algorithm
10:50 The Math of Naïve Bayes
18:10 Training the Naïve Bayes Model in Excel
24:46 Testing the Naïve Bayes Model in Excel
28:06 What’s Next?
#pythoninexcel #pythonexcel #pythonforexcel - Věda a technologie
"Studebaker" 😂 Dating yourself my friend. TF-IDF + K-Means + Decision Tree = Magic, though it still requires a good bit of extrapolation in order to understand the results. I find this piece to be most intimidating, and a barrier to diving right in. It seems like one must accumulate a certain threshold of experience in order to interpret the results properly. Just my 2 cents as I follow along with your videos. 👍
Interpreting cluster assignments of text documents can be particularly challenging! As you correctly point out, there's no substitute for experience.
Hopefully, my next video on topic modeling using LDA will prove useful to you as an alternative as you begin your journey.
Just my brain spinning. My assumption is that I could extract a particular sets of notes in an EHR into a csv file. I can then use this process to report on particular phrases?
Sorry, work has me buried so just now seeing your video and barely able to pay attention. But will revisit it soon.
Hope all is well.
@michaelt312 - Correct! For example, you can use n-grams to perform an analysis on words that frequently occur in a sequence (e.g., "united states").
@@DaveOnData, I'll be re-watching this evening. Have a ticket in for the extraction since I don't have access to this particular part of Epic. I'm really looking forward to this. Hopefully will prove a theory...
I'll report back with what I can.