Say goodbye to messy JSON headaches with VARIANT

Sdílet
Vložit
  • čas přidán 26. 07. 2024
  • Try it out today on Databricks: docs.databricks.com/en/semi-s...
    Read more about it on our blog: www.databricks.com/blog/intro...
    If you're curious about the implementation check out the talk: • Variant Data Type - Ma...
    Or read about it on GitHub: github.com/apache/spark/blob/...
  • Věda a technologie

Komentáře • 18

  • @afrikaniz3d
    @afrikaniz3d Před měsícem +6

    Only note for these videos, since they're not Shorts, ia that it would be more beneficial to use the full wide (1920 x 1080) format, so it's more readable at all resolutions.

    • @Databricks
      @Databricks  Před měsícem

      I completely hear you, trying to figure out the best way to film for multiple platforms at once when some define 'short' as

  • @TheDataArchitect
    @TheDataArchitect Před měsícem +1

    That's awesome.

  • @the_class_apart
    @the_class_apart Před 26 dny

    Wow this is amazing. I wanted to understand how variant data type is different from Struct type?
    Also second question. How does it work with array of json?

    • @Databricks
      @Databricks  Před 25 dny

      Variant can be a mix of structs and arrays. The difference is the flexibility that you can have compared to the other two.

  • @matthiasmueller9340
    @matthiasmueller9340 Před měsícem +1

    How can I specify the required runtime version when using serverless sql warehouse?

    • @Databricks
      @Databricks  Před měsícem +3

      Variant types will be coming to serverless early/mid July, no need to select a runtime - Holly

  • @fernalication
    @fernalication Před 27 dny

    I ended up writing a custom function to handle data in batches and recursively exploding lists and normalizing dictionaries. Not having a schema or frontend developers saving elemnts as lists, then dictiomaries and then as bananas was tricky. I will give this one a try 😅

    • @Databricks
      @Databricks  Před 26 dny +1

      Hope this simplifies things! Would love to hear if you notice performance gains too. Holly

  • @gravenguan
    @gravenguan Před měsícem

    How did parse_json handle schema evolution and from my kowledge, prod table do not recommend parse schema on the fly, it's more safer to define schema first

    • @Databricks
      @Databricks  Před měsícem +1

      I agree, but with a lot of JSON data you don't know the schema upfront and so can't define it. It's worth noting this is different from inferring the schema which looks at the first 1000 rows and is brittle to upstream changes - Holly

    • @gravenguan
      @gravenguan Před měsícem +1

      @@Databricks We used parse_json for dev and exploration purposes as well, thank for the clarification

    • @Databricks
      @Databricks  Před měsícem

      @@gravenguan No worries! Hope this clarifies for other users too

  • @TheDataArchitect
    @TheDataArchitect Před měsícem

    Who's the speaker?

    • @Databricks
      @Databricks  Před měsícem

      Holly Smith - FYI it's also me in the comments for my videos so fire away with any technical follow on questions - Holly

    • @TheDataArchitect
      @TheDataArchitect Před měsícem +1

      @@Databricks Awesome thanks

  • @nagendrasrinivas-cj7sr
    @nagendrasrinivas-cj7sr Před měsícem +2

    this is clearly copied from snowflake

    • @Databricks
      @Databricks  Před měsícem +2

      Variants in their various forms have been around for many decades. We're big fans of open source so anyone can use the implementation in other projects or products.