How we optimize Power Query and Power BI Dataflows

Sdílet
Vložit
  • čas přidán 19. 01. 2022
  • Let's go into Adam's mind as he optimizes Power Query to make some Power BI Dataflows run faster to avoid resource usage. The result is INSANE AMAZING!
    Power Query M function reference
    docs.microsoft.com/powerquery...
    📢 Become a member: guyinacu.be/membership
    *******************
    Want to take your Power BI skills to the next level? We have training courses available to help you with your journey.
    🎓 Guy in a Cube courses: guyinacu.be/courses
    *******************
    LET'S CONNECT!
    *******************
    -- / guyinacube
    -- / awsaxton
    -- / patrickdba
    -- / guyinacube
    -- / guyinacube
    -- guyinacube.com
    **Gear**
    🛠 Check out my Tools page - guyinacube.com/tools/
    #PowerBI #PowerQuery #GuyInACube
  • Věda a technologie

Komentáře • 78

  • @GuyInACube
    @GuyInACube  Před 2 lety +14

    One thing not mentioned in the video is to be aware and careful about SQL Injection.

    • @Plackyu
      @Plackyu Před 2 lety +1

      Would love to see a video on SQL injection + Power BI and how to mitigate potential issues

    • @kkkkkkkkkkkkkkkk-k
      @kkkkkkkkkkkkkkkk-k Před 2 lety

      Can Best Practice Analyzer catch SQL injection?

  • @ItsNotAboutTheCell
    @ItsNotAboutTheCell Před 2 lety +13

    M-ind bending! Love the Advanced Editor and some neat tricks in here too for those performance folks! Keep up the PQ videos! One happy CAT!😻😻😻

  • @daryllynch998
    @daryllynch998 Před 2 lety +9

    Hi Adam, thanks for the Video. I use this approach all the time. One thing that I don slightly different is to use the List.Buffer function because sometimes, I want PQ to read the List only once to get the full Set. I find it sometimes help performance. One suggestion for each time is consider Folding Query scenario. In this situation the following statement works:
    #"Filter Rows" = Table.SelectRows( Source, each List.Contains( CurrencyList, [Source Column] ) )
    This leads to folder query in most data sources. I also feel this can be more effective T-SQL because the filter is applied to the main table without the need to the Left or Inner Join.

  • @JCMCodaste
    @JCMCodaste Před 2 lety +2

    This is a great video! thank you, Adam. Good to see that you guys are playing more with dataflows now! I will take it as a good sign of their "health" within the Power BI roadmap! (at my own risk ;)
    It would be great to have more videos on how to optimize dataflows deployment when linking them across different workspaces (also where the latter are engaged in a deployment pipeline!) ..choosing between the two different connectors in those specific situations is still making me scratch my head, to be honest. Happy 2022!

  • @pabeader1941
    @pabeader1941 Před 2 lety +1

    Added this to my BI Goodies playlist. Good job and good timing for me. Have a project where this is going to come in handy. Not so much for the performance gain as an easy way to get excel into PQ.

  • @michelleleroux1728
    @michelleleroux1728 Před 2 lety

    Learning to optimise is my goal for 2022! Thanks, Adam.

  • @tkadosh
    @tkadosh Před 2 lety

    Really amazing and efficent way ... Congratulations

  • @louism.4980
    @louism.4980 Před měsícem

    Very insightful, thank you!

  • @klaustrampedach7312
    @klaustrampedach7312 Před 2 lety

    Great video! 😊 The trick also works with a non-native query. Might come in handy 😎

  • @antoniogarrido3058
    @antoniogarrido3058 Před 2 lety

    Ingenuos solution.. 👍 Really powerful dataflow trick!

  • @jls14
    @jls14 Před 2 lety

    Excellent video 👍. If advanced editor is scary, you can also do the text combine as a new step too. That way you don't have to do so much text editing. But SQL injection is pretty cool. Fantastic explanation as always 👍.

  • @romualdulcyfer
    @romualdulcyfer Před 2 lety

    That's impressive !! I have some mergers in my PQ (eg. at data cleansing) which I need to somehow optimize after your video.

  • @bijilabs7599
    @bijilabs7599 Před 2 lety

    Wow😍 You guys are awesome!

  • @michalmolka
    @michalmolka Před 2 lety

    Really great video!

  • @dbszepesi
    @dbszepesi Před 2 lety

    Nice, more of this please!

  • @Anthony_Lecoq
    @Anthony_Lecoq Před 2 lety

    Great stuff Adam ;)

  • @billcoleman2009
    @billcoleman2009 Před 2 lety +2

    Nice. You could probably hook straight to the table, use selectcolumns() & filter steps and let query folding do its thing to keep it clean for incremental refresh etc - that should work, right?

  • @manawaajirioghene6231

    This is such a great video

  • @matthieumamet3497
    @matthieumamet3497 Před 2 lety

    Great video ❤

  • @cirilolhoycuenco6261
    @cirilolhoycuenco6261 Před 2 lety

    This is so cool!

  • @TheAalouis
    @TheAalouis Před 2 lety

    Great Job, explaining the refresh protocol columns. I am having an pretty expensive relative join in my dataflow script (find keyword in a text field and join that). Maybe this is something for your goal 2022! :)

  • @blank3786
    @blank3786 Před 2 lety

    You guys are amazing

  • @trevorc7734
    @trevorc7734 Před rokem

    Brilliant!!

  • @danmeiss
    @danmeiss Před 2 lety +2

    Adam's version of "unplugged". See the thought process behind problem solving...

  • @wmfexcel
    @wmfexcel Před 2 lety

    WOW! It makes a huge difference!
    May I know if there is any performance boost if we applied the Filters instead of doing it by Merging Queries?

  • @MrSparkefrostie
    @MrSparkefrostie Před 11 měsíci +1

    For anyone running into issues where your list converts to a table, this should help if your query is SQL, will need some tweaks if you use a differrent source. I am guessing it changes to table as its a live source and not sitting with load disabled
    Text.Combine(Sql.Database("Server", "Database", [Query = "SELECT CAST([NeededColumn] as VARCHAR) AS NeededColumn
    FROM [dbo].[NeededTable]
    WHERE Conditions = Conditions
    "])[NeededColumn], ", ")

    • @GuyInACube
      @GuyInACube  Před 11 měsíci

      Appreicate the call out on that. Will need to look at that more 👊

    • @MrSparkefrostie
      @MrSparkefrostie Před 11 měsíci

      @@GuyInACube All good, this issue was driving me...bananas....

  • @ekaterinazamosha4149
    @ekaterinazamosha4149 Před 2 lety

    wow !!!!! Thank you!

  • @kebincui
    @kebincui Před 2 měsíci

    Super ! 👍👍❤❤

  • @coolblue5929
    @coolblue5929 Před 2 lety

    I’ve been doing this for a long time, initially in datasets (and excel) and then in dataflows. I wrote custom functions to convert from tables to SQL clauses (IN or VALUES) and a custom, multi-replace function to hack the SQL text. I also use parameters of the form :from and :to, for example, to fold back range filters into the native SQL also using the multi-replace function. Things change quickly with monthly updates but, the biggest problem with this approach has been finding the right pattern to satisfy the formula fire-wall. It would be great to compliment this video with a structured commentary on this, along with careful explanation of the potential problems with SQL injection.

    • @coolblue5929
      @coolblue5929 Před 2 lety +2

      I would also suggest that, if doing such a trivial merge breaks M then M is seriously broken.

  • @noahhadro8213
    @noahhadro8213 Před 2 lety

    Awesome video. How would I do this if I wanted to filter that table but was not using a native query. so I connect to a table in sql and then I want to filter the table based upon a column IN a list of items from an excel spreadsheet?

  • @ReadySteadyExcel
    @ReadySteadyExcel Před 2 lety

    Awesome thumbnail!!

  • @remyschrader9286
    @remyschrader9286 Před 10 měsíci

    In Advanced Editor: ‘PromotionKeysAsList = Promotion[PromotionKey],
    CurrencyKeysAsList = Currency[CurrencyKey],’
    Values from a column in a table return as a list from ‘NeedValuesAsListType = TableName[ColumnName]’

  • @sirojiddinsobirov5508
    @sirojiddinsobirov5508 Před 2 lety

    Insane Amazing!!!!!

  • @sidharh100
    @sidharh100 Před 2 lety +1

    Hey Adam, I was just wondering, if you could possibly make a video regarding Significance Testing in Power Bi?

  • @kasmirasmarzo
    @kasmirasmarzo Před 2 lety

    So useful. I’m doing merges everywhere and because I’m not a ‘real data scientist’ (I’m in regulation!) I have an itty bitty laptop. I’m constantly running out of memory.

  • @vikramanmohan121
    @vikramanmohan121 Před 2 lety +1

    I have done this for multiple SQL queries where one SQL statement executed using a parameter and becomes an input for the next.
    One thing I learnt will make things easier is to store the quietly in a variable and pass it to source and you can paste your SQL query from your code editor directly on to the variable in Power Query

  • @Kierslee
    @Kierslee Před 6 měsíci

    Adam, where can I find the details regarding the time and resources to run the refresh (cpu, memory, etc.)?

  • @visheshjjain
    @visheshjjain Před 2 lety

    Instead of filtering, If you want a column from your an external file in your fact table, is there any other way, apart from merge, to do it in PQ?
    Thank you!

  • @Korallis1
    @Korallis1 Před 2 lety

    Ok so this is all great and everything but what about when you have to merge in order to get the data you require for creating relationships? Is it better to just create a SQL query to provide the exact data your after? I have such a complex model structure and as such I have a lot of merges but the source is almost always sql

  • @olivierfiliatrault6027

    Amazing! One question: will the resulting query fold?

  • @emilymorrison4263
    @emilymorrison4263 Před 2 lety

    How do you approach this if the merge is on two or more fields?

  • @LifeatAustralia
    @LifeatAustralia Před rokem

    Amazing Video Guy in a Cube 🙂 Would you please tell me how to manage merge queries when merged two tables through two different API calls. Is there any way to optimize API calls?

  • @mikemagill68
    @mikemagill68 Před 6 měsíci

    Q. Awesome video. This is exactly the challenge I’m facing currently but m pulling in a SQL table with c. 54. million rows and then need to eliminate rows based on an external table with c. 10,000 rows. I totally get what you’ve done but can you inject a SQL ‘WHERE’ command with a comma separated list of 10,000 values? Is there an upper limit?

  • @fluffigverbimmelt
    @fluffigverbimmelt Před 2 lety

    Nice, I might be able to use this on a db where DBA only let's us access via NOLOCK hint (yeah, don't get me started on that)

  • @bbangel90
    @bbangel90 Před rokem

    I have a Power App Dataflows which transforms and load data to a Dataverse table . This operation now takes around 50 minutes to complete, and I am speculating that the issue is not in the calculation in Power Query, but rather in the loading operation to the Dataverse table. How can I go about optimizing/improving the data loading performance?

  • @DanielWeikert
    @DanielWeikert Před 2 lety +1

    Where can I see which operation is "expensive" and should be avoided? Is there a reference doc for that? Thanks

    • @Narses3
      @Narses3 Před 2 lety +1

      Dataflow = slow,
      Understand what each step is doing, and what isn't folding to the source,
      An inner merge (filter) isn't folding to the source = all the data is being queried before then being filtered by the mashup engine
      There is no silver bullet , but if you understand what each step does (and isnt doing) then you can optimize, as with a lot of things it comes with experience.

  • @rakeshverma-fi2ju
    @rakeshverma-fi2ju Před 2 lety

    Hey everyone,
    We are refreshing a dataset against a dataflow that is using an incremental refresh and our C drive of gateway servers is hitting 0 MB space( from 400GBs). Is there anything we are doing wrong…

  • @Narses3
    @Narses3 Před 2 lety

    Nice video concept to explain the mashup containers, also a nice showcase of other people Frankensteining queries/dataflows potentially over time from different people .
    Not sure if the exact problem you solved for the customer was as simple as this problem, but if it was then the team that manages dataflows/powerbi really should have seen this a mile off , but I guess not everyone knows everything to do with powerbi. I appreciate things are obvious in hindsight but I would be really surprised if a problem like this really did make its way to your desk.

    • @davidlopez-fe2lb
      @davidlopez-fe2lb Před 2 lety

      To your point Jonathan, I feel like the "architect" team should've have guidelines on how to build things, so monsters like this don't get created. We just brought on Power BI Desktop + Service onto our org and we've set no guidelines on how to build, so we'll see plenty of monsters like this. It's just a classic org problem of migrating processes over to the new platform asap, "we don't care how it gets done as long as its done" scenario. Luckily I watch Guy in a Cube weekly, and so far have minimized the monsters I've built.

    • @ItsNotAboutTheCell
      @ItsNotAboutTheCell Před 2 lety +3

      Can confirm, with the constant release of new features, it's often difficult to keep up with each nuance of the product, especially for those who aren't in Power BI for 40 hours a week (as simple or as complex as the topic may be). I always appreciate the opportunity to see the product with "fresh eyes" though and see some light bulb moments when people learn something new that can help them in the future.

  • @gravestoner2488
    @gravestoner2488 Před rokem

    All my data comes in excel sheets, i have 1 sheet per month for 2 different data fact sheets. Is there a good way to combine these automatically without merging?

  • @fsanfo
    @fsanfo Před 11 měsíci

    I have this query that takes 20m to run in my Oracle PL/SQL client. And I discovered that PQuery Online evaluates querys for 10 minutes only. So what do you recommend me? Retrieve the data by chunks or something else? I'm Pro User using Power BI service.

  • @9zQx86LT
    @9zQx86LT Před 2 lety

    Can this work with cosmos db?

  • @WojennyMlotek
    @WojennyMlotek Před 2 lety

    Yep, more optimazation please

  • @MrSparkefrostie
    @MrSparkefrostie Před 11 měsíci

    What I am looking for is to apply predicates against an upstream dataflow, maybe it does this already, time will tell

  • @pawewrona9749
    @pawewrona9749 Před 2 lety

    Was laughing really hard when I heard that Adam and Patrick had a "race competition"

  • @mohammedameen9477
    @mohammedameen9477 Před 2 lety

    Awesome Video,
    How can we solve this when we have two tables which we want to merge and the data is coming from a flat file like csv or txt. I would want to perform left outer join on these two table based on some condition like
    IF (Table1[Column] = Table2[Column], Table2[Column], 0)
    Suppose if we have only one column from Table2 which we want to bring in Table1. We can convert that column in a list and use List.Select and write a condition but when we want to bring two columns or more how will do that.
    Can you help me with this.
    Thanks :)

    • @olemew
      @olemew Před 2 lety

      csv file is not sql based, you cant do that

  • @biexbr
    @biexbr Před 2 lety

    Mannn, I do this a Lot! And I mean a lot lot.

  • @asjones987
    @asjones987 Před 2 lety

    Nice example, but curious on the syntax in the where clause. What were the “&” doing around PromotionList and CurrencList?

    • @dipeshbhundia1290
      @dipeshbhundia1290 Před 2 lety

      Concatenation to access the variables

    • @pabeader1941
      @pabeader1941 Před 2 lety +1

      They are there to 'build' the actual string that is being sent to the data source. & is the concatenate operator. It's used to combine strings together.

  • @matthiask4602
    @matthiask4602 Před 2 lety

    This almost like in Inception. Just with a list.

  • @matthiask4602
    @matthiask4602 Před 2 lety +1

    finally a GIAC M-Code video. #avoidthewait

  • @jenniferpavey1
    @jenniferpavey1 Před 2 lety

    This!! Omg

  • @Baldur1005
    @Baldur1005 Před 2 lety

    I know this against Roche's Maxim, but refresh time-outs are not worth it. I do "Group Bys" and Merges in DAX and if I want to to enrich table I use relationship or lookup function, doing those in DAX hurts performance by little margin and increase model size, but time-outs are much much worse.

  • @Phoenixspin
    @Phoenixspin Před rokem

    The thumbnail makes me think of monkey brains. (Indiana Jones reference)

  • @Paul-pv8mo
    @Paul-pv8mo Před 2 lety

    🌈 P r o m o S M!!!

  • @PaulBailey3
    @PaulBailey3 Před 2 lety

    That's 🍌s