10 Million Rows of data Analyzed using Excel's Data Model
Vložit
- čas přidán 30. 07. 2024
- The Excel Data Model (also referred to as Power Pivot) can handle millions of rows of data and can replace the need for millions of LOOKUP formulas
Link to download CSV Files and Demo Excel File
aasolutions.sharepoint.com/:f...
00:00 Intro
00:42 Using Power Query to Consolidate files from a Folder
03:40 Changing Data Types to reduce the file size and improve performance
05:40 Recommend 64 Bit Excel
05:50 The Data Model and creating relationships - avoiding100 Million Formulas!
07:00 Creating a Pivot Table / Pivot Chart
07:45 Adding a Slicer
08:06 Presenting the data in a 3D Map / Globe visual.
The team and I at Access Analytic develop Power BI and Excel solutions for clients in Australia and deliver training around the World. accessanalytic.com.au/
Did you know I've written a book "Power BI for the Excel Analyst"?
pbi.guide/book/
Connect with me
wyn.bio.link/ - Jak na to + styl
Wow! You have changed my understanding of Excel capabilities!!!
I used to use MS Access to do what you did with Power Query in seconds. I wonder if Access has much use still!
Brilliant video: super clear, concise, well produced and useful!
Thank you!
Thank you for the very kind comments. Access is indeed being slowly made obsolete by several other Microsoft technologies
This is a powerful example from the Big Data era.
Amazing video! Thank you Wyn!
Glad you liked it Iván
Short and precise explanation. This is amazing and thank you for providing data set for practice.
Thank you Byregowda, kind of you to leave a comment
What a beautiful way of explaining how to handle big data. Loved this !!
Thank you Shiraj, glad you enjoyed it
That is SUPER AWESOME! 😍 Thank you so much, Wyn!
No worries, you're welcome Dimitris
This is AMAZING!!!! Still in awe.
Thanks for the brilliant content.
You’re welcome
I've seen this a few times before, but no one has ever drilled down into the performance ROI... thank you, this is going to help a lot!
You’re welcome
Thank you, as many say, short and precise. Perfect
Cheers Mark
I can't say how much I am thankful for this video,i struggled for two days and everything I tried end with not responding, you save me thank youuuuuu🤩
Glad it helped
Very nice, Wyn! Thank you.
Cheers Houston
OMG, I am sitting here Wide-Eyed and Overwhelmed with Joy...Thank you so much for this.....I have been racking my brain on how to reduce and control our data to a professional and organized manner!...AMAZING!!!
Great to know
SUBSCRIBED AND FOLLOWING
You took the words out of me! 😁 following too.
Thanks @@natah1284
I’m currently working on +11k rows of data and currently experiencing challenges. Glad I came across your YT channel 😃
Good to know, thanks for the kind comments
Pretty awesome you are right!! Thank you for your videos are so easy to follow!
Thank you so much
This is helped me out TREMENDOUSLY! Thank you!
That’s great Joe. Thanks for letting me know
Thank you for producing this excellent video - very easy to follow and very informative
You’re welcome Tim
Very well explained and easy to understand. Thank you.
Thank you
Just amazing.. Your decimal rounding of to reduce space ... worked brilliantly for my data model. Data model pull from SSAS. Thank you so much.
Glad to help 😀
Well done. Thank you for the instruction.
Thanks
Tremendous. Thanks a lot for knowledge sharing.
Thank you so much Raghavendra
This is fantastic. I feel like I have a new superpower. Great explanation video.
That’s great Phoebe! Thanks for letting me know.
Power Query, Data model, very efficient. It became much easier to work with such heavy data. Thank you so much!👍
Thanks for leaving a comment Luciano
Fantastic, thank you for sharing your knowledge
You’re welcome Ed, thanks fir leaving a comment
Ohmygosh, that, my friend, was simply amazing!! 👏👏👏👏
I can't explain to you how many files I have that are so large that they respond slowly and then eventually crash! I've been trying to save these large files as .xlsb, and it helps, but it isn't fixing the ultimate problem. I cannot wait to try out this technique to see how it affects my monthly reports. Thank you so much for taking the time to go through this exercise. I am officially a subscriber!
That’s great, glad to help flag what Excel is capable of
You have no clue the amount of sleep I have lost trying to figure out how to control this amount of data
Great job! Incredibly powerful!
Cheers
Amazing!! Working up a prototype model in MS Access but wanted to report it in excel with pivots and slicers. Was ready to give up and hand it over to the BI developers but tried your method. 6m rows and extremely responsive. Also used your whole number trick and file size only 22mb. Thanks heaps!
Awesome, I really appreciate you letting me know you found it useful
Excellent tutorial, thank you!
Cheers!
You saved me a lot of time. Wonderful explanation.
You’re welcome Hussein
Thank you very much Access Analytic for sharing your knowledge in handling big data using ower query ..God bless
Thank you
Really awesome !!
This was mega! Thanks for sharing your knowledge mate! Appreciate it.
No worries, thanks for taking the time to leave a kind comment
No worries, thanks for taking the time to leave a kind comment
No worries, thanks for taking the time to leave a kind comment
No worries, thanks for taking the time to leave a kind comment
Really nice intro to Data Model and why I should probably start using it. My base item set is 150k entries, and there are various dimension tables which can be attached, the numbers get big quickly and generic Excel M.O. starts choking immediately when you throw index(match()) at all of that. Thank you!
P.S. it took me far longer to understand those manager names than I would like to admit. :p
😄. Glad it helps Lee
Awesome! Thanks for your sharing.
No worries
Thanks, lesson completed. Greetings from Costa Rica.
No worries 🇨🇷
Great tutorial! Thank you
You're welcome Salvador
forgot about the decimal trick. That was awesome . So just by trimming the decimals to rounded numbers , you can reduce your file size. I am going to try that at work
Excellent
this looks super cool, thank you very much :)
You’re welcome
The 3 tips in Excel video took me here. Thank you so much for the short but very informative tutorial video. I was trying to compare almost 9,000 rows with a couple hundreds rows, and Excel kept saying “Not responding.” I used the transpose function which is very very helpful without knowing it’s Power Query.
You’re welcome
Thank you so much Wyn for this video. I was able to take a 132 MB report and consolidate it down to just 9 MB! Now everyone in my company who does not have Excel 365 can view it as well as slice and dice.
Excellent to hear Phil. Thanks for letting me know this was useful.
@phil danley how did you consolidated it down to 9 MB??
The Power Pivot (Data Model) engine performs some amazing compression when columns contain non unique values
@@BOOMGG-w7d Wyn explained it better than I could. My answer would be "magic" LOL. I don't know how the compression works but it does.
Good Video. Very helpful and now ready to try it.
Thanks for leaving a comment Ed. Good luck with it!
This is great! Thanks 😊
You’re welcome
Amazing!
Wynn... this was amazing.. i completely forgot the 3D maps... great video!
Thanks for taking the time to leave a kind comment, I appreciate it,
Awesome, awesome, Thank You Very Much :)
You're welcome Elfrid
That was awesome.
Thanks :)
One of your best videos 😊
Thank you
fantastic video thanks!
You're welcome Simon
Great !!! thank you
You’re welcome
Thanks for the video
You’re welcome
Awesome!
Thanks 😀
Amazing stuff
Thank you
Thanks for this excellent video. Could you please tell me how you added calendar, location table and cost center table on Queries and Connections section at the beginning of the video? Thanks in Advance
Hi Debarshi,
Those were in an Excel file and I used Get Data from Excel Workbook.
Awesome...
Cheers 😃
I love your video and was able to use every part of it, however, I do not understand how you incorporated the additional tables (cost centers, etc). I get the ability to build the references but how did you get them into the mix?
Hi Robert, watch this explanation of Data Models and let me know if that helps czcams.com/video/RV47yX70NN8/video.html
You saved my life 🙏
Glad to help :)
This is great and very helpful! Thank you for making this. May I know what PC specification that you used? I want to buy a laptop that can be used to analyze big data.
Glad it helps Ivena. I'm using an XPS 17 9700 with 32 GB RAM and i7 processesor
Super👍👍
😄
Thanks a ton! Got to see a practical Big Data example handled through Excel and it's amazing! 😊👍👌
Thanks Vijay
Eish!?!? Excellent presentation, goodness!
Thanks ( I think ) Willy 😀
Wyn, your content is always helpful and timely. Can you comment here about how to best use the same dataset but doing the analysis in Power BI? Since you don't have a Table, what is the connector with PBI?
Thanks Bernie. I’m not quite sure I follow. You can do it the same way in Power BI by pulling in the folder of CSV files.
superb...
Thank you Saleem
Thanks....
You’re welcome
Very informative. I may be a little bit of topics but what software you use for making your video. Thank you!
Camtasia and a green screen
Fantastic! Thank you! However I couldn't figure out where consolidation tables came from. Is there a video about that and a link to it???❓❓
You're welcome Atomic Blue Life
This was excellent! One question please. I have many csv files like in this video, but their columns structure is only slightly different from each other. Some of them have a few extra columns that I don't need. Is it possible to tell the query which columns to take (I can only include the columns that I know exist in all fyles)? Thank you!
As long as the first file in your folder has all of the columns you need then it won’t matter that other files have extra columns you don’t need.
Make sure you use the Choose Columns or Right-Click Remove other columns option
@@AccessAnalytic Thank you
Thanks for your video! What if my data size is far more than that and i need to update it with new data everyday? If I simple refresh the power query, it took me an hour loading time. Is it possible that the power query only add the new data to existing data and do not refresh all of them? Millions thanks!
Hi Warren, there's no way to do that with Excel currently. What is your data source?
Great video! Do you have a video that show how to edit the data in massive data sets of up to 10 million rows? I've tried in vain to do it with a million rows using IF formulae (i.e if a record has "x" add the value "y" to field) but excel just gives up under the sheer weight of the formula...
Thanks, No video, but it really depends on how many columns you have, what the data source is and how much RAM you have, and if you have Excel 64 Bit. I just tested on 5 million from 10 CSV files in a folder, adding an IF Column value begins with A grab column B else 0. It ran in 30 seconds.
32GB RAM, 64 Bit Excel, 5 columns of data.
please, DR. can u explain how to search string in big data before?! thank u. you are a great teacher!
I don't understand could you provide more details please.
Awesome, I would to ask you a question, if I create the model and everything in Excel 64 bits, my coworkers with Excel 32 bits will have any problem???. Right now with Excel 32 bits an a model I’m having a problem that consume More than 2 GB of RAM, in total with all the tables no more than 50k rows. Thanks
I wouldn’t rely on it working well on 32 Bit, but it could be ok if they are just slicing and dicing.
Newer versions of 32 Bit Excel can now utilise 4GB Ram. 64 bit can utilise all the spare RAM on your machine
@@AccessAnalytic thank you very much for your answer so quickly, it seems I can ask to IT department to change my version to 64 bits but I was a little worried about that and about the macros I developed. Like you wrote, the people will only slice and see the data. It’s important to work on both arquitecture for this particular tool. Again thanks.
The only real way of knowing is to test it out
@@AccessAnalytic I will do it and I’ll let you know. Thanks
i just came across this video to understand the capabilities of power query, which has widened my horizon. can I get a download link to the other data for the connections you created before the one of 10 million rows? i want to use that to understand it perfectly if you don't mind.
I don't have the precise ones but I have extra data sets including the well known AdventureWorks one here under "Dummy Data Sets" accessanalytic.com.au/free-excel-stuff/free-excel-templates/
I loved this video. Incredible work. I have a very large CSV file with 1.2 million rows and approx 300 columns. I want to extract only few particular columns from this data. How can it be done? Many thanks in advance.
Thanks Amit, that should be quite straightforward, use Get Data > From File > From Text/CSV connect to the file, then Ctrl Click on the columns to keep and then right-click REMOVE OTHER COLUMNS, then close and load to Data Model
Hi thanks for your reply. Is it a good idea to select columns manually if I have around 300 columns. In that case I have to do a lot of scrolling. Any idea if I can extract those desirable column headers name from some other excel or CSV file.. and then keep only these columns in my original large CSV file. Thanks.
Hi @@amitchaudhary6, yes it's possible but involves writing some M code. However, the Choose Columns button makes it really easy to tick the columns you want to keep / untick the ones you want to remove.
Thanks for your inputs. It is exactly what I was looking for. 👍
300 columns?!
How do I get it to use the Exact formula so that it consolidates against a unique code from two sets of databases by also recognising lower case and upper case differences. For example AbC and ABC need to be treated as two unique codes.
I don’t know sorry. The DAX engine encodes ABC and abc the same
So I have a table in MS Access that is close to 4 million records, that I would love exported into excel. Unfortunately excel would only allow a little over 1 million records. Would this work in that instance?
You could certainly pull the data into the Excel data model and then create Pivot tables to analyse it
How can I export the table created by Power Query to one CSV file?
Check out this video. czcams.com/video/op6f-3uUFYg/video.html
If you’re using Power BI you can also use bravo.bi ( it’s free )
I’ve a video showing bravo.bi use here at 9:33 czcams.com/video/g4oZ0pOpn-4/video.html
If you want to export the 10 MM rows as a csv after transforming the data, is there a way to do it?
Yep Export Power Query Tables to CSV using DAX Studio. Even 5 Million records!
czcams.com/video/op6f-3uUFYg/video.html
Hi, thanks to how can export this data file to Power BI or any other tools?
You could publish the entire file to Power BI ,
or put the Power Query code into Power BI desktop and export using DAX studio
Or copy the M code to a Dataflow
There’s no direct way to export power query code currently
Need Help: My all CSV data is in same format..in raw header there is unique I'd and in coloum header there is Date(7 days date) and in field are there is different kpi.
After load all data I need to add Average and countif >x function at last.
I try for average..first take sum of 7 date and divide by 7. got the result but when I use power pivot and use this average data then only count shows error occurred sum is not working on this data.
I’d recommend posting screenshots and a sample file here aka.ms/excelcommunity
is there a video on how you created the calendar, costcenter, and location table?
Here’s a video on the Cslendar table. czcams.com/video/LfKm3ATibpE/video.html
The other 2 tables I just created manually in excel and pulled in using Power Query
May I ask your computer spec.(cpu, ram) to handle above 250mb excel files?
Currently using Dell XPS 32GB RAM i7-10875H CPU @ 2.30GHz (that video was done with Surface Book 16 GB RAM)
Hey! I need your help. I am trying to convert a large XML file into Excel unfortunately whenever I try to convert it I am only getting incomplete data in excel form. e.g Emails are there but the name is missing or vice-versa. Could you please help me out.
Thanks in advance
I’d recommend posting the issue with screenshots and sample data here aka.ms/excelcommunity
hey after i press okay to add connection only it gives an error after a while that The refresh operation failed because the source data base or the table does not exist, or because you do not have access to the source
More Details:
OLE DB or ODBC error: The connection could not be refreshed. This problem could occur if the connection was copied from another workbook or the workbook was created in a newer version of Excel..
An error occurred while processing the 'Actuals' table.
The current operation was cancelled because another operation in the transaction failed.
Hi @Haasan Tariq, I think I've seen that error when the source data (Excel file) has N/A# or REF# type errors in it. Are you connecting to an Excel file on your network?
Is there an easy way to extract data in an Excel data model into a CSV? Thank you!
How about this? czcams.com/video/op6f-3uUFYg/video.html
Hi Teacher, do all csv files contain hearders or only the first csv file should contain headers?
All contain same headers
Is there a way to make the loading of data faster? I am currently handling like 15 millions of data, every week I'm adding like millions of data as well until the end of the year. I'm moving to PowerBI because of this issue, but I wonder if there is a trick on this.
Sounds like you need to move to storing your data in a SQL database. A shorter term alternative if you have Power BI is to use Dataflows
this video of mine may be useful
czcams.com/video/g4oZ0pOpn-4/video.html
Thank you for the video. Very informative. You have an extra zero on the far right in your first graphic "Ten Million rows of data and 100,000,0000".. I think it should be "100,000,000".
Thanks Tom, well spotted on the mistake. After I’d published it was too late to change 😬.
Please make more videos on excel
Here's 28 others 😁
czcams.com/play/PLlHDyf8d156Xnoph4CbOiMrqQKiJZ8mhn.html.
i am having only the consolidation file...how other files can be inserted in this diagram view?
Check out this video czcams.com/video/RV47yX70NN8/video.html Essentially click the Get Data button, connect to your other files and Load to... Connection Only... Data Model
So cool, thank you!
You're welcome Al
does it have to be CSV? I have a large excel sheet that I need to do this for.
It can be Excel. Refresh wil take a little longer
within this, is there a way to change the interval? as in, can i make it so it only pulls every 4 hours, weekly, daily etc? also, will the data update automatically if its through a folder? meaning if i add another excel file to the same path, with the power pivot recognize there is more data? do you have a video on how to make the fields?
There's no simple way to schedule a refresh, for that something like Power BI could be a better solution. There are techniques with VBA or Power Automate or a third party tool called Power Update. ALL files in a folder will be reloaded into Power Pivot each time the refresh runs. When you say make the fields, can you explain a little more what you mean please
@@AccessAnalytic hmm let me just explain what i need to do. I was able to get the power pivot to work, but all i want to be able to do is take 2 columns and make a graph without summing all the data. im using this for engineering, so i just want to see how a pump in a system changes every hour each day. the only math i would need is also taking 2 columns and being able to divide them between each other to get velocity. I also want to be able to change the time interval (daily, hourly, every 4 hours). how would i go about doing all these things? Thanks for the quick response- I appreciate it
Sounds like you might need a Time Table : czcams.com/video/-q7v56p192M/video.html
And a Calendar table:
czcams.com/video/LfKm3ATibpE/video.html
@@AccessAnalytic thank you so much, you are such a helpful person. any chance you can upload the time table as an excel? I do not have Power Bi.
@@cela9482 - done 😁 accessanalytic.com.au/free-excel-stuff/free-excel-templates/
MY PROBLEM is to increase rows limit of microsoft365 from 1048 576 to further 100 000 000. this video did not help me.any suggestions?
Not possible to increase the rows on the sheet grid. Why do you need that many rows in the grid?
So i was using mobile number when loaded it in data model and then pivot 10 didgit number were getting shorter to 5 digit
Is it loaded as Text data type in Power Query? Which field ( row/col/values) are you putting it in in Pivot
Hi, when I click on 3D Maps, and drag the city/county/date over. Nothing shows up. Why?
I don’t know sorry. Country first then city second?
Is this possible in the older version of Excel?
Excel 2016 onwards
How can I get the data for Carlendar, Location and CostCenter?
Hi, you can get a Calendar and other datasets here accessanalytic.com.au/free-excel-stuff/free-excel-templates/
I don't have the Location and CostCenter tables available sorry.
What laptop / specs are you working with?
That was on my 16GB RAM 64 but office. This one with 24 million rows was 32GB core i7 www.linkedin.com/posts/wynhopkins_excel-datamodel-activity-6902207760197390336-n2Lq
Great, how to make thé date table ?
Here you go What is a Date Table / Calendar table in Power BI / Excel
czcams.com/video/LfKm3ATibpE/video.html
Hi How about text file convert to excel with 2 million rows?
You can load it to the Excel data model and then create Pivot Table reports
If I use Mac, can I do this from my Excel in Mac?
I don’t believe Power Pivot exists in Mac. Also Power Query is limited
How can I remove duplicates with more than 1 million records.I mean I have two email list I put them first one column and then I want to remove duplicates from second sheet only without changing order how can I do it?
Merge columns and remove duplicates
Not 100% clear on what you want to do. Do you want to create a stacked column of both lists ( appending list one to list 2 ) then remove duplicates?
@@AccessAnalytic I want to remove duplicates excluding first appearance of more than millions of data.
If you pull in both lists as connection only then add an identifying column e.g = 1 and = 2 then append the 2 tables
Then sort by the identifying column and then ( to be safe ) add an index column to lock in the order, then remove duplicates on the email column
❓Why do errors happen when loading? And what we can do with them?
Lot's of reasons, different column headings, errors in data. You need to work your way through the Powe Query applied steps. When consolidating multiple files it is difficult to working out which file has the issue.