![Databricks](/img/default-banner.jpg)
- 2 962
- 16 255 619
Databricks
United States
Registrace 1. 07. 2014
Databricks is the Data and AI company. More than 10,000 organizations worldwide - including Block, Comcast, Conde Nast, Rivian, and Shell, and over 60% of the Fortune 500 - rely on the Databricks Data Intelligence Platform to take control of their data and put it to work with AI. Databricks is headquartered in San Francisco, with offices around the globe, and was founded by the original creators of Lakehouse, Apache Spark™, Delta Lake and MLflow.
Make your records Unique with Generated Identity Columns
Check out the docs here: docs.databricks.com/en/delta/generated-columns.html#use-identity-columns-in-delta-lake
Find more examples in this blog: www.databricks.com/blog/2022/08/08/identity-columns-to-generate-surrogate-keys-are-now-available-in-a-lakehouse-near-you.html
Find more examples in this blog: www.databricks.com/blog/2022/08/08/identity-columns-to-generate-surrogate-keys-are-now-available-in-a-lakehouse-near-you.html
zhlédnutí: 1 002
Video
How to enforce data quality across columns within a table in Databricks
zhlédnutí 1,4KPřed 14 hodinami
Check out the docs here: docs.databricks.com/en/delta/generated-columns.html
Databricks Clean Rooms
zhlédnutí 1,9KPřed 19 hodinami
Data clean rooms allow businesses to easily collaborate on data in a secure environment, where multiple parties can safely combine sensitive data without compromising privacy or security. By implementing stringent protocols and advanced technologies, data clean rooms enable organizations to share data securely while ensuring compliance with privacy and regulatory requirements. In an era where d...
AI/BI: Intelligent Analytics for Real-World Data
zhlédnutí 2,7KPřed dnem
In this video you will learn about AI/BI which features two complementary capabilities: Dashboards and Genie. Dashboards provide a low-code experience to help analysts quickly build highly interactive data visualizations for their business teams using natural language, and Genie allows business users to converse with their data to ask questions and self-serve their own analytics. Databricks AI/...
AI-Powered Data Warehousing on Databricks SQL
zhlédnutí 1,3KPřed dnem
In this video, you will learn how you can leverage Databricks SQL's data warehousing capabilities to call AI functions, query models, and utilize the context-aware Databricks Assistant for seamless and efficient data analysis. This powerful combination makes it easier for analysts to unlock valuable insights and drive impactful decisions.
LakeFlow Demo
zhlédnutí 4,1KPřed 14 dny
Databricks LakeFlow is a new solution that contains everything you need to build and operate production data pipelines. It includes new native, highly scalable connectors for databases including MySQL, Postgres, SQL Server and Oracle and enterprise applications like Salesforce, Microsoft Dynamics, NetSuite, Workday, ServiceNow and Google Analytics. Users can transform data in batch and streamin...
Say goodbye to messy JSON headaches with VARIANT
zhlédnutí 3,3KPřed 14 dny
Try it out today on Databricks: docs.databricks.com/en/semi-structured/variant.html Read more about it on our blog: www.databricks.com/blog/introducing-open-variant-data-type-delta-lake-and-apache-spark If you're curious about the implementation check out the talk: czcams.com/video/jtjOfggD4YY/video.html Or read about it on GitHub: github.com/apache/spark/blob/master/common/variant/README.md
Data Intelligence Day Seoul 2024
zhlédnutí 567Před 14 dny
Data Intelligence Day Seoul, Korea took place on 23 April 2024 and gathered over 1,200 industry leaders and data and AI experts. Watch Data Intelligence Day Seoul On Demand: events.databricks.com/KoreaDIDays2024
An Introduction to DBRX
zhlédnutí 4,3KPřed 21 dnem
Learn from Naveen Rao, VP of Generative AI at Databricks, as he explains DBRX, a new, open source foundation model that sets the standard for production quality and price/performance. With up to 3x faster inference, DBRX - outperforms all other open models in quality benchmarks - and that allows enterprises to quickly build your own custom LLM efficiently and with full control. Read more about ...
Demo: How Do I Use DBRX?
zhlédnutí 1,9KPřed 21 dnem
Watch how DBRX uses Databricks to build and customize GenAI applications using your own enterprise data Read more about DBRX here: www.databricks.com/blog/announcing-dbrx-new-standard-efficient-open-source-customizable-llms?
What's Next for Apache Spark™ Including the Upcoming Release of Apache Spark 4.0
zhlédnutí 7KPřed 21 dnem
Reynold Xin, Co-founder and Chief Architect, Databricks shares the latest innovation coming out of the Apache Spark™ open source project including a preview of the anticipated release of Spark 4.0 Speakers: Reynold Xin, Co-founder and Chief Architect, Databricks Tareef Kawaf, President, Posit Sofware, PBC
The Evolution of Delta Lake from Data + AI Summit 2024
zhlédnutí 2,4KPřed 21 dnem
Shant Hovsepian, Chief Technology Officer of Data Warehousing at Databricks explains why Delta Lake is the most adopted open lakehouse format. Includes: - Delta Lake UniForm GA (support for and compatibility with Hudi, Apache Iceberg, Delta) - Delta Lake Liquid Clustering - Delta Lake production-ready catalog (Iceberg REST API) - The growth and strength of the Delta ecosystem - Delta Kernel - D...
Setting up PAT and Secret Scope
zhlédnutí 427Před 21 dnem
Quick video on how to setup a Personal Access Token and Secret Scope and Secret with Azure Key Vault.
Increase your column sizes without rewriting the entire table
zhlédnutí 793Před 21 dnem
Docs: docs.databricks.com/en/delta/type-widening.html
Announcing Delta Lake 4.0 with Liquid Clustering. Presented by Shant Hovsepian at Data + AI Summit
zhlédnutí 5KPřed 21 dnem
Announcing Delta Lake 4.0 with Liquid Clustering. Presented by Shant Hovsepian at Data AI Summit
Open Sourcing Unity Catalog Live Onstage with Matei Zaharia at Data + AI Summit 2024
zhlédnutí 1,2KPřed 21 dnem
Open Sourcing Unity Catalog Live Onstage with Matei Zaharia at Data AI Summit 2024
Databricks LakeFlow: A Unified, Intelligent Solution for Data Engineering. Presented by Bilal Aslam
zhlédnutí 9KPřed 21 dnem
Databricks LakeFlow: A Unified, Intelligent Solution for Data Engineering. Presented by Bilal Aslam
Recap of Announcements at Data + AI Summit 2024 with Ali Ghodsi, Co-Founder and CEO, Databricks
zhlédnutí 982Před 21 dnem
Recap of Announcements at Data AI Summit 2024 with Ali Ghodsi, Co-Founder and CEO, Databricks
Announcing Databricks Clean Rooms with Live Demo. Presented by Matei Zaharia and Darshana Sivakumar
zhlédnutí 1,3KPřed 21 dnem
Announcing Databricks Clean Rooms with Live Demo. Presented by Matei Zaharia and Darshana Sivakumar
Data Sharing and Cross-Organization Collaboration. Presented by Matei Zaharia at Data + AI Summit
zhlédnutí 460Před 21 dnem
Data Sharing and Cross-Organization Collaboration. Presented by Matei Zaharia at Data AI Summit
Announcing Unity Catalog Metrics with Live Demo. Matei Zaharia and Zeashan Pappa at Data + AI Summit
zhlédnutí 1,2KPřed 21 dnem
Announcing Unity Catalog Metrics with Live Demo. Matei Zaharia and Zeashan Pappa at Data AI Summit
Evolving Data Governance With Unity Catalog Presented by Matei Zaharia at Data + AI Summit 2024
zhlédnutí 3,3KPřed 21 dnem
Evolving Data Governance With Unity Catalog Presented by Matei Zaharia at Data AI Summit 2024
Unity Catalog Demo of New Features with Zeashan Pappa at Data + AI Summit 2024
zhlédnutí 1,4KPřed 21 dnem
Unity Catalog Demo of New Features with Zeashan Pappa at Data AI Summit 2024
How Data Intelligence is Delivering Big Wins at Texas Rangers. Alexander Booth at Data + AI Summit
zhlédnutí 494Před 21 dnem
How Data Intelligence is Delivering Big Wins at Texas Rangers. Alexander Booth at Data AI Summit
The Future of Lakehouse Format Interoperability with Ali Ghodsi and Ryan Blue at Data + AI Summit
zhlédnutí 682Před 21 dnem
The Future of Lakehouse Format Interoperability with Ali Ghodsi and Ryan Blue at Data AI Summit
How to Make Small Language Models Work. Yejin Choi Presents at Data + AI Summit 2024
zhlédnutí 4KPřed 21 dnem
How to Make Small Language Models Work. Yejin Choi Presents at Data AI Summit 2024
Data + AI Summit Keynote 2024 - Day 2 Opening Remarks with Ali Ghodsi
zhlédnutí 288Před 21 dnem
Data AI Summit Keynote 2024 - Day 2 Opening Remarks with Ali Ghodsi
Building an Enterprise Data & AI Catalog with Databricks Unity Catalog
zhlédnutí 1,3KPřed 21 dnem
Building an Enterprise Data & AI Catalog with Databricks Unity Catalog
Patrick Wendell, Co-founder and VP of Engineering on Building Production-Quality AI Systems
zhlédnutí 2KPřed 21 dnem
Patrick Wendell, Co-founder and VP of Engineering on Building Production-Quality AI Systems
The Best Data Warehouse is a Lakehouse
zhlédnutí 4,8KPřed 21 dnem
The Best Data Warehouse is a Lakehouse
Nice video. You showcased an example of Duck DB integration with Unity Catalog. Can you please help me validate if my understanding captured in below points, about the behaviour of "open sourced Unity Catalog" (UC)? 1. Within UC, we can apply the Data Security changes like data masking. When this UC is accessed from Duck DB, the columns will appear as masked there too. 2. Similarly, other Security config such as row level Security, column level Security will also be visible in Duck DB. 3. Similar to "attach accounts_prod" command in Duck DB, we can integrate UC with other lakehouse implementations such as Microsoft Fabric and even on-prem Delta Lake too (or at least such integration is in roadmap). 4. Such tables are hosted/managed Within Databricks, but are accessed from Duck DB too, which is a reverse of what is done in case of "external table".
Thank you for your great presentation. Could you please provide the source code or share the link to the Git repository
Epic Brit accent
This sounds backwards
system.billing.usage does not contain a cluster_id columns
Would it be possible to generate this on DLT Streaming table ?
Lovely! .... more detailed demo please 🤓 many thanks
No module named 'databricks.vector_search'
I cannt find the "launch genie" button , Do i need to enable anything else ?
The video is playing back strangely interlaced, like a column was out in the stream or what not.
How to install private package in serverless compute? I used init_script with normal compute to add private package access details in /etc/pip.conf BUT with serverless, initi scripts can't be used
oi got stuck because I'm only receiving 3 rows of responses. but my table have 800
Thanks for sharing - great content Naveen
are you hiring?
Epic
Databricks AI is great but a show stopper for adoption in many big organisations is data privacy and residency. My understanding is the data leaves the organisations tenant.
Brilliant 👌🏽
Awesome!
When I choose Timeseries as the profile type and set all the required fields it seems to be working fine, but when I open the dashboard it throws an error like "Table or View {catalog}.{schema}.my_table_profile_metrics Can not be found". should I create the my_table_profile_metrics myself or it is a part of the process?
Awasome. Can I use German to ask these questions?
curious to know, too
Did I miss something or did you not mention on how to activate the AI features? Because when I try to use ai_analyze_sentiment() within my serverless Europe West env, I get an error saying AI_FUNCTION_HTTP_REQUEST_ERROR
Wow this is amazing. I wanted to understand how variant data type is different from Struct type? Also second question. How does it work with array of json?
Variant can be a mix of structs and arrays. The difference is the flexibility that you can have compared to the other two.
People who don't undertand the video at all (possibly not even finish watching) comment Feifei Li "creepy". That's the time you know most of the world goes crazy.
Love this shorts
I ended up writing a custom function to handle data in batches and recursively exploding lists and normalizing dictionaries. Not having a schema or frontend developers saving elemnts as lists, then dictiomaries and then as bananas was tricky. I will give this one a try 😅
Hope this simplifies things! Would love to hear if you notice performance gains too. Holly
Hello for the video, it could't follow it up, because of the juniper notebook, what do you recommend me to follow in order to replicate what you did in this vidoe. Thank you.
Well to be honest, it is not really free. You still need to pay for the AWS resources set up through CloudFormation. That stack cost me ~7 USD per day.
Awesome!
oh great.
Great video Jason, Thanks for putting together. Would you be able to share the notebooks as well.
Unfortunately, can't understand shit of what he's saying!
Getting an issue when trying to create the serving endpoint. It's saying Served entity creation aborted for served entity `audio_transcription_chatbot_model-1`, config version 1, since the update timed out. Have not been able to figure out why.
懂了,赛博号脉!
Very inspiring! My mind is going att 1000 miles an hour with ideas for our startup and clients from this!
Where can access your code or workbook? Would be nie to run your code.
Holden and team are incredibly engaging and very easy to understand!
Great feature, please also include low code features in order to be more beneficial as Data factory also has for ETL
awesome
Excellent presentation, beginning 3.5-4.0 Billion years ago and explaining all the way to now (AI, non-physical-spatial). Excellent. Thank you. 👏
Who's the speaker?
Holly Smith - FYI it's also me in the comments for my videos so fire away with any technical follow on questions - Holly
@@Databricks Awesome thanks
AI can do everything you need to do in times of studying and understanding AI.
Awesome 👏🏾
🎉
How did parse_json handle schema evolution and from my kowledge, prod table do not recommend parse schema on the fly, it's more safer to define schema first
I agree, but with a lot of JSON data you don't know the schema upfront and so can't define it. It's worth noting this is different from inferring the schema which looks at the first 1000 rows and is brittle to upstream changes - Holly
@@Databricks We used parse_json for dev and exploration purposes as well, thank for the clarification
@@gravenguan No worries! Hope this clarifies for other users too
this is clearly copied from snowflake
Variants in their various forms have been around for many decades. We're big fans of open source so anyone can use the implementation in other projects or products.