Microsoft Fabric: How to append only Incremental data using Data Pipeline in Lakehouse

Sdílet
Vložit
  • čas přidán 10. 09. 2024

Komentáře • 25

  • @brianmorante1621
    @brianmorante1621 Před rokem +1

    I have been looking for this. You explained this so I can easily understand! This helped my team. Thank you.

  • @adefwebserver
    @adefwebserver Před rokem +1

    Thanks! You are my "Go To" guy on this stuff :)

    • @AmitChandak
      @AmitChandak  Před rokem

      Thanks! 🙏
      Hope you enjoying the series
      czcams.com/video/p-v0I5S-ybs/video.html
      Please Share
      Please find 370+ videos, blogs, and files in 70+ hours of content in the form of an organized course. Get Fabricated with Microsoft Fabric, Power BI, SQL, Power Query, DAX
      biworld.graphy.com/courses/Get-Fabricated-Learn-Microsoft-Fabric-Power-BI-SQL-Power-Query-DAX-Dataflow-Gen2-Data-Pipeline-from-Amit-Chandak-649506b9e4b06f333017b4f5

  • @Jhonhenrygomez1
    @Jhonhenrygomez1 Před 2 měsíci

    Hello, in Fabric, there is a way for incremental loading to be done without a watermark, meaning it does not use fields. When a data source such as PostgreSQL is used to identify changes "automatically," the tools must consume the WAL so that incremental loading can be done with less manual processing. I want to know if Fabric fulfills this function because, under what you mentioned, a 200GB table (for example) would take a long time to refresh and must have a date field to validate incremental topics.

  • @directxrajesh
    @directxrajesh Před 5 měsíci +3

    Since there is no upsert..how do we handle updates to existing data at source.

  • @clvc699
    @clvc699 Před 3 měsíci +1

    could you do this with files (parquet) in the lakehouse using incremental data?

    • @AmitChandak
      @AmitChandak  Před 3 měsíci +1

      For that you need use pyspark and use append mode

    • @heyrobined
      @heyrobined Před 2 měsíci

      @@AmitChandak make a tutorial on that. because on premises data cant be loaded directly to workspace and required external staging storage, so only way is to do files load but there is no option for append here . can you create a way?

  • @anushav3342
    @anushav3342 Před 10 měsíci +1

    Hi Amit! How to work with REST API data to append the incremental data into Fabric. Do i need to reset any of the steps or do i need to follow the same procedure.

    • @AmitChandak
      @AmitChandak  Před 10 měsíci +1

      If you can pass filters to REST API- Between date or >= date then we can implement the same logic. Just let me know what you can pass

  • @moeeljawad5361
    @moeeljawad5361 Před měsícem

    Hello Amit, that is wonderful thanks for sharing, at 18:42 you had mentioned that using the lookup activity is not the best practice if the table is very large, and you mentioned using a table approach, can you elaborate more on that? would having a script activity after the copy activity that will query the copied table, and get the maximum date stored in a table in the lakehouse, and then you directly lookup from that table be a possibility?

    • @dekta4
      @dekta4 Před 6 dny

      Hi Amit, yes please elaborate more why the lookup(get max date) is a risky practice? white is the best approach instead?

  • @PrabhakaranManoharan2794

    Hi Amit. That was a great tutorial. Can we get a video on the same scenario when the data source is .csv/excel files rather than an SQL Server?

    • @AmitChandak
      @AmitChandak  Před rokem +1

      Thanks 🙏
      If the Excel or CSV only has incremental data next time, you can use the append functionality of pipeline or Dataflow gen 2.
      If they have full data, you can follow the same process, but the query will not fold at the source.

  • @remek5758
    @remek5758 Před 7 měsíci

    Do you happen to know if mapping data flow will be available at some point in Fabric?

  • @sansophany4952
    @sansophany4952 Před rokem +1

    Thanks for sharing. this is very helpful. I wounder is it possible to do realtime data ingestion (realtime pipeline) to lakehouse or warehouse?

    • @AmitChandak
      @AmitChandak  Před rokem +1

      Yes, Please explore streaming dataset, event streams & Kusto
      learn.microsoft.com/en-us/fabric/real-time-analytics/

    • @sansophany4952
      @sansophany4952 Před rokem

      @@AmitChandak Thank you. will go through that.

  • @christianharrington8145
    @christianharrington8145 Před rokem +1

    Thanks great video! 🙂
    I wonder about the strategy when data can be modified. So for ex. if you load purchasing document or any other document, some attributes of measure already loaded to the warehouse might change. In this case, it's not only a matter of adding new records, but also updating them. Since there is no concept of unique primary key in Fabric so far (that might change thought), I wonder how to achieve this?
    That reminds me of the datawarehouse 101 old days where we needed to reverse documents that have changed, say original doc was qty 100, now you load same doc changed with qty of 90, so you needed to add a record with qty -100 and another one with qty +90.
    There might ne some easier solution for sure. Any clue? 😊
    (and there are deletes as well!)
    Thanks!

    • @AmitChandak
      @AmitChandak  Před rokem +1

      Yes, Unique Key is something for which I have seen some ideas, already in place. It should be there soon. As of now I have managed update in fact using source key.

  • @saikrishnanimmagadda6469

    Hi Amit,
    I have been actively following your instructional videos on data ingestion via Data Pipeline, specifically from On-premises to Fabric. While attempting to implement the process, I consistently encounter the following error message. I am reaching out to seek your expert guidance in resolving this issue. Your insights and assistance would be greatly appreciated in helping me overcome this obstacle in my data ingestion efforts.
    Thank you in advance for your time and support.
    ERROR [08S01] [Microsoft][ODBC PostgreSQL Wire Protocol driver]Socket closed. ERROR [HY000] [Microsoft][ODBC PostgreSQL Wire Protocol driver]Can't connect to server on 'xxxxxxx' ERROR [01S00] [Microsoft][ODBC PostgreSQL Wire Protocol driver]Invalid attribute in connection string: sslmode.
    Kind Regards,
    Sai.

    • @AmitChandak
      @AmitChandak  Před rokem

      Please un-check encrypted flag and try