Excel 101: Basic Text Analysis and word clouds using pivot tables and other Excel tricks

Sdílet
Vložit
  • čas přidán 21. 06. 2018
  • This video shows how to combine individual text cells into one paragraph, parse cells with multiple words in to individual values, as well as create a word cloud using an external website (no affiliation).
    Please subscribe to our page for more random tidbits of knowledge and reach out to us if you'd really like to kick up your analytics game. www.adaptalytics.com

Komentáře • 44

  • @nigerianprince5389
    @nigerianprince5389 Před 2 lety +3

    Nonchalant brilliance. Love the video mate. Very helpful

    • @adaptalytics
      @adaptalytics  Před 2 lety

      Glad you enjoyed it and found it helpful! That was the whole point of sharing. Cheers!

  • @darrah495
    @darrah495 Před 6 lety +1

    Clever thinking. Appreciate the content!

    • @adaptalytics
      @adaptalytics  Před 6 lety

      Glad you liked it. If you have questions about other Excel related analyses, let us know and we'll make a video to help!

  • @ninonacario7785
    @ninonacario7785 Před rokem

    Hey man! I just got the best ever kudos at my job because of this! Thank you so much and you are so awesome!

    • @adaptalytics
      @adaptalytics  Před rokem

      Great to hear! We share stuff like this in hopes of helping people so cheers to your success!

  • @MohamedAshraf-vh3jo
    @MohamedAshraf-vh3jo Před 2 lety +2

    THANK YOU SO MUCH!

  • @JCA51698
    @JCA51698 Před 3 lety +2

    Great video. I'm actually in an Excel file right now and used Word to make a word cloud out of the text in the spreadsheet. Will have to try this. Good speaking voice too, by the way.

    • @adaptalytics
      @adaptalytics  Před 3 lety

      Appreciate the kind words and I'm glad it was helpful!

  • @SuperNunera
    @SuperNunera Před 5 lety

    Awesome m8, ty!

    • @adaptalytics
      @adaptalytics  Před 5 lety

      Happy to help! Let us know if you're stuck on something and we'll see if we can help out with another video.

  • @BeVal
    @BeVal Před 2 lety +1

    Firstly, and respectfully, sir, you have a very beautiful masculine voice. Secondly, thank you for the approach

    • @adaptalytics
      @adaptalytics  Před 2 lety

      Firstly, thank you for the kind words and I'm glad it was helpful. :-)

  • @GreySpyder
    @GreySpyder Před 2 lety +1

    omg this is so useful

  • @ivopetrusa
    @ivopetrusa Před 3 lety

    Hi! Very useful. I tried doing the same but I experienced a lot of words like "a, the, and, but" etcetera. Do you have a speedy way of excluding these words so only the useful words stay in the pareto or wordcloud?

    • @adaptalytics
      @adaptalytics  Před 3 lety +7

      Sure. Those are called "stop words". The easiest/fastest way to handle them is to find a list of stop words that someone has already compiled (rather than compiling them yourself). Once you find that list (Google search, ask a friend, whatever) then you can bump your entire word list up against the stop word list (if using Excel, the "MATCH" function is easiest) and throw out any word in your list that was found in the stop word list. Often times the stop word list isn't complete (at least for your dataset) so you usually have to add a couple more to it. Hope that helps!

  • @imranansari-om5jy
    @imranansari-om5jy Před rokem

    nice video and really helpful, wish had a font size little bigger , would have been easy to read.

    • @adaptalytics
      @adaptalytics  Před rokem

      Great feedback, thanks. I'll adjust accordingly next time. 🙂

  • @KTsPridiction
    @KTsPridiction Před 4 lety

    It is a great trick to analyse our data... If data is too much high then it will not be cover in single cell... Is there any option to enter n number of words in single cell

    • @adaptalytics
      @adaptalytics  Před 4 lety +1

      According to this article (support.office.com/en-us/article/excel-specifications-and-limits-1672b34d-7043-467e-8e27-269d656771c3) as of today, Excel can only handle 32,767 characters in an individual cell. If you have a little more than this, you could use other tools such as Microsoft Word to do a character count to determine where your limit is and then do a couple recursive cut & paste commands from Word to Excel to handle this. However, if you're dealing with this large a volume of data, then Excel probably isn't the best platform to be analyzing text with. You might want to look into Python (if you're the coding type) or maybe Rapid Miner or SAS if you prefer a slightly more UI-based approach. Hope that helps!

  • @piotr3024
    @piotr3024 Před 2 lety

    Well explained, thanks for sharing- just one downside of cell capacity limitation to 32767 characters

    • @adaptalytics
      @adaptalytics  Před 2 lety

      Thanks for the comment and for sharing info on Excel's cell character cap. This tutorial was targeted to the average user that likely isn't dealing with a ton of data. Excel definitely has its limitations on data volume and processing constraints.

  • @nehathacker2009
    @nehathacker2009 Před 2 lety

    thank you for posting this. How to do this with sentences in the row and with like a million data records? Need a paragraph but cell character limit sets in after 250 rows.

    • @adaptalytics
      @adaptalytics  Před 2 lety

      If you've got a million records then Excel probably isn't the best tool for you. I'd recommend something like Rapid Miner (UI-based) or Python (code-based).

  • @sharminshila3789
    @sharminshila3789 Před 6 měsíci

    What is the formula you use around 4:36 into the video? and afterwards, what did you change in the formula?

    • @adaptalytics
      @adaptalytics  Před 6 měsíci

      The formula at 4:36 was an error. :-( The intent was to create two calculations - the first (column C) is a running cumulative that sums all values in the pivot table as it moves down the page. The second (column D) divides the cumulative on each row by the absolute total at the bottom of the pivot table. This second calculation is what computes the value needed to create the Pareto principle: "X% of the dataset makes up Y% of the total" - this is what I reference at 5:44 and eventually get to by 6:32 - "there are 1,000 unique words and 117 of them constitute half (50%) of total words in the data set". This is the Pareto principle. The Pareto principle is easiest understood when analyzing tax dollars paid into any government - it's often that 5% of the taxpayers pay 80% of the total tax dollars.
      Hope helps clarify things!

    • @sharminshila3789
      @sharminshila3789 Před 6 měsíci +1

      @@adaptalytics Thanks is not enough but thank you so very much for your amazing explanation. Its help me a lot and your video just save me. Thanks a lot. 😇

  • @chriscockrell9495
    @chriscockrell9495 Před 4 měsíci

    How do I do this with grouping 2 words?
    This uses space delimited separation of words. I’m interested in doing this with 2 or 3 words. Themes.
    Notes:
    Word Cloud
    Data Tab, text to column, transpose, and pivot table.

    • @adaptalytics
      @adaptalytics  Před 4 měsíci +1

      The most effective way I've seen that done is a "lookup table"/ "reference table" that contains a list of all of the unique phrases (this is typically used with localities such as "New York", "San Francisco", etc.) that you're interested in, and then some type COUNTIF function that looks through the array and counts the number of times each phrase is found. Hope that helps!

    • @chriscockrell9495
      @chriscockrell9495 Před 4 měsíci

      @@adaptalytics Thanks. That seems like a good option.
      I took the list of words in my article and made a big column, like you did. The rows where all sequential so I grabbed the word and I anded (&&) " " and then && the next word and it gives me a 2 word column. Then I just pivot tabled it like you did.
      Seems like I need to learn more on making databases from webpages. Maybe I need to study webscrapping and data structures.

  • @afkhoso
    @afkhoso Před 6 měsíci

    this was new to me, glad i came across the video, but the stop words make it ... useless. im in dire need of having those removed to make sense of all my text, done a few searches but it's not easy/straight forward. there goes another afternoon of research, trial and error.

    • @adaptalytics
      @adaptalytics  Před 6 měsíci

      The simplest way I've done this in the past is to create a secondary list of stop words and write a formula that looks at this list and if the word is found, returns a Boolean value. For example, create an attribute column named "Is Stop Word" that returns a Y if it's found in your list of stop words and an N if it's not found. To create this initial list of stop words, sort the total words high to low by volume (as I've done around the 4:00 mark) and then scroll down with your human eyeballs and type a Y by any of the words that you'd like to exclude. Once you've done this for the top 100 words or so, copy this list elsewhere in the workbook (I usually have a tab named "Lookups" for odds and ends like this), filter to ONLY the words you've flagged as Y and delete this other words - this is your initial list of stop words. Go back to your dataset and reference this list and then once you've got the new stop word column created, use it to filter OUT the stop words in your pivot table.
      Another way to do this is to start by looking at short words using the LEN function. It's not perfect but it's a start. Most stop words are 2 to 4 characters, whereas most other words are longer, so you can filter your initial list to only those words that are 4 characters or less and see if that yields a high volume of stop words.
      Hope that helps!

  • @alexus4983
    @alexus4983 Před rokem

    When I paste the last row to a new sheet it doesn’t paste all the words from the text analysis. Did I miss a step or something? Any tips would helpful!

    • @adaptalytics
      @adaptalytics  Před rokem

      You probably forgot to "paste values". Pasting "normally" would try to paste the same formula, which would error out since the data isn't there in the new workbook. Look at timestamp 2:45 - I did it quickly and used a shortcut so I'll take the blame for why you missed it. :-( Try copying, going to the new workbook, right-clicking in the cell, selecting "Paste Special" and then selecting "values". If that doesn't make sense, just do a quick internet search for "how to paste values in Excel" or something similar and you'll figure it out. 🙂 Thanks for watching!

  • @frankconte9572
    @frankconte9572 Před 2 lety

    What is the formula you use around 49 seconds into the video?

    • @adaptalytics
      @adaptalytics  Před 2 lety

      It's an old-school form of concatenation. I'm just concatenating the first two rows, separated by a space (in double quotes). The more formal way to do this would be =CONCATENATE(AH5," ",AH6). The second formula though, is slightly different, as I concatenate the first formula with the next row (rather than just the prior row with the next row). Hope that helps!

    • @frankconte9572
      @frankconte9572 Před 2 lety

      @@adaptalytics Thanks that seemed to work. My intuition told me to use concatenate but -- because of the small print -- I wasn't able to read your innovation.

  • @vvishal1754
    @vvishal1754 Před 4 lety

    Please tell the formula u used 6:00

    • @adaptalytics
      @adaptalytics  Před 4 lety

      If you're talking about how I quickly counted the number of unique words, then that wasn't a formula, but was a native Excel behavior that's available if you right click the bottom task bar (called the status bar) and make sure that "Count" is selected. Doing so will count any content highlighted at any time - it's very useful! Here's a video I found that explains it in more detail: czcams.com/video/xBNIv14rynQ/video.html - hope that helps!

  • @badal3416
    @badal3416 Před 5 měsíci

    Can you please share the dataset

    • @adaptalytics
      @adaptalytics  Před 3 měsíci

      Unfortunately no, but there are many sample data sets out there to download for free. Try looking on Kaggle. Best of luck, cheers!