Data Council
Data Council
  • 915
  • 3 809 977
Building an Ecosystem for Open Foundation Models, Together
In this talk, Ce Zhang shares experiences in building the open source foundation model ecosystem through collaboration with the community. He delves into how balancing data quality, model architecture and infrastructure presents both opportunities and challenges. He also discusses navigating the extensive scale and cost of GPU clusters and optimizing their usage. Most importantly, he explores how data quality can be reasoned about in a structured manner to boost model quality.
This video provides a unique perspective on managing technical issues in open source ecosystems and is a must-watch for those interested in understanding the behind-the-scenes of data science and AI development.
👉 Sign up for our "No BS" Newsletter to get the latest technical data & AI content: hubs.li/Q02vz6xC0
#opensource #gpu #dataquality
ABOUT DATA COUNCIL:
Data Council brings together the brightest minds in data to share industry knowledge, technical architectures and best practices in building cutting edge data & AI systems and tools.
FIND US:
Twitter: datacouncilai
LinkedIn: www.linkedin.com/company/datacouncil-ai/
Website: www.datacouncil.ai/
zhlédnutí: 259

Video

Stochastic | AI Launchpad '24
zhlédnutí 252Před 2 měsíci
Stochastic is an end-to-end AI platform for enterprise knowledge work that provides personalized AI agents with zero setup or coding. ABOUT THE SPEAKER: Glenn Ko, Co-founder & CEO, Stochastic AI LAUNCHPAD: Data Council Zero Prime Ventures partnered to give six AI-first startups a chance to present brief demos on stage to top investors and elite founders during Data Council's annual conference i...
sea.dev | AI Launchpad '24
zhlédnutí 278Před 2 měsíci
sea.dev is breaking the constraints of existing data systems and NL2SQL with graph-based tools to allow LLM apps to reliably act on fintech data ABOUT THE SPEAKERS: Matt Arderne, Co-founder, sea.dev Marya Bazzi, Co-founder, sea.dev Vladimirs Murevics, Co-founder, sea.dev AI LAUNCHPAD: Data Council Zero Prime Ventures partnered to give six AI-first startups a chance to present brief demos on sta...
Phaselab | AI Launchpad '24
zhlédnutí 87Před 2 měsíci
Phaselab builds smart automation to make companies’ data privacy programs more effective and efficient. ABOUT THE SPEAKER: Josh Schwartz, Co-founder & CEO, Phaselab AI LAUNCHPAD: Data Council Zero Prime Ventures partnered to give six AI-first startups a chance to present brief demos on stage to top investors and elite founders during Data Council's annual conference in Austin 2024. 👉 Sign up fo...
Parea | AI Launchpad '24
zhlédnutí 280Před 2 měsíci
Parea builds developer tools for evaluating, testing and monitoring LLM-powered applications. ABOUT THE SPEAKER: Joel Alexander, Co-founder, Parea AI LAUNCHPAD: Data Council Zero Prime Ventures partnered to give six AI-first startups a chance to present brief demos on stage to top investors and elite founders during Data Council's annual conference in Austin 2024. 👉 Sign up for our “No BS” News...
InQuery | AI Launchpad '24
zhlédnutí 236Před 2 měsíci
InQuery simplifies data lakehouse maintenance, saving your data team time and money. ABOUT THE SPEAKERS: Erick Enriquez, Co-founder & CEO, InQuery Khalil Miri, Co-founder & CTO, InQuery AI LAUNCHPAD: Data Council Zero Prime Ventures partnered to give six AI-first startups a chance to present brief demos on stage to top investors and elite founders during Data Council's annual conference in Aust...
Dataland | AI Launchpad '24
zhlédnutí 204Před 2 měsíci
Dataland is the AI-powered internal tools platform. It is the easiest way to deliver high-quality internal tools to your business users ABOUT THE SPEAKER: Arthur Wu, Co-founder, Dataland AI LAUNCHPAD: Data Council Zero Prime Ventures partnered to give six AI-first startups a chance to present brief demos on stage to top investors and elite founders during Data Council's annual conference in Aus...
Rising Tides with Radical Transparency: Why and How to Open Source Your Data Platform
zhlédnutí 126Před 2 měsíci
Join Tim Castillo from Dagster Labs for an insightful journey into how their data platform became successfully open-sourced. Discover the hurdles, cultural shifts and innovative implementations behind this strategic decision. Data engineers, analytics engineers and data platform engineers - learn how to leverage open source to enhance your projects and contribute to the data community. 👉 Sign u...
Case Studies from a Methodologist on an Experimentation Platform
zhlédnutí 315Před 2 měsíci
Dive into the world of A/B testing with Microsoft's Experimentation Platform Team. Join Laura Cosgrove for an exclusive tech talk where she uncovers the secrets behind Microsoft’s cutting-edge statistical evaluation and simulation frameworks. In this video, discover the power of Microsoft's variance reduction estimator and its game-changing impact on service efficacy. Ready to elevate your A/B ...
A 101 in Time Series Analytics with Apache Arrow, Pandas and Parquet
zhlédnutí 1,1KPřed 2 měsíci
Dive deep into the world of databases and analytics in this talk from Zoe Steinkamp of InfluxData. Learn how you can unleash the potential of Apache Arrow and Apache Parquet for efficient, scalable handling of time-series data. Equip your toolbox with cutting-edge open-source technologies and industry-standard analytics libraries to build the foundation of a high performance analytics applicati...
Unified Stream/Batch Execution with Ibis
zhlédnutí 524Před 2 měsíci
This talk is a deep dive exploration into the powerful world of Ibis, as Voltron Data showcases their recent work merging batch and streaming concepts and introducing an Apache Flink backend. This comprehensive tutorial will provide you with invaluable insights for working with data across a variety of platforms. Watch the full video to explore the potential of a unified approach for both batch...
How Beam Uses Code-Based Dashboards to Scale Analytics Products
zhlédnutí 317Před 2 měsíci
In this talk, Emilio Tamez unravels the magic behind dashboards-as-code. From Python scripts to modular design, Beam is breaking down the barriers between complexity and simplicity. The dashboards-as-code methodology has allowed Beam to incrementally approach their goals by building boilerplate dashboards as a series of code-defined, standardized modules which can be arranged into a dashboard i...
Building Responsible and Trustworthy Generative AI Products at LinkedIn
zhlédnutí 541Před 2 měsíci
Dive into the heart of LinkedIn's commitment to ethical AI development, where revolutionary Generative AI meets responsibility. Listen in to this insightful exploration as Daniel Olmedilla unveils the foundational principles and architecture guiding LinkedIn's AI journey. With a special focus on their cutting-edge Generative AI products and features, this talk gives an exclusive look into Linke...
What Makes for an Effective Data Practitioner in 2024?
zhlédnutí 412Před 2 měsíci
Listen in as Marck Vaisman shares insights from his years of experience and demystifies the complexities of the data practitioner role, while providing a roadmap for skill development across all levels. Whether you're a seasoned leader aiming to upskill your team or a novice stepping into the realm of data, this video offers valuable guidance to propel your career in the right direction. 👉 Sign...
Is Kubernetes a Database?
zhlédnutí 506Před 2 měsíci
Uncover how Kubernetes extends beyond stateless apps and now supports stateful workloads and database management with Custom Resources. In this video, discover the potential to eliminate traditional databases by transforming the Kubernetes API into a potent database and metastore. Don't miss this chance to learn how leveraging Kubernetes can revolutionize your tech projects. 👉 Sign up for our "...
How Developers Should Think About the Emerging AI Stack | Together, Pinecone, Anthropic
zhlédnutí 579Před 2 měsíci
How Developers Should Think About the Emerging AI Stack | Together, Pinecone, Anthropic
From Playgrounds to Production: The Evolution of AI Evaluation at Coda
zhlédnutí 99Před 2 měsíci
From Playgrounds to Production: The Evolution of AI Evaluation at Coda
Events Sourcing with Kafka at Scale
zhlédnutí 149Před 2 měsíci
Events Sourcing with Kafka at Scale
Creating a Competitive Advantage in the Age of Intelligence as a Service
zhlédnutí 106Před 2 měsíci
Creating a Competitive Advantage in the Age of Intelligence as a Service
Build Faster, More Responsive Analytics with a Semantic Layer | Cube Workshop
zhlédnutí 279Před 2 měsíci
Build Faster, More Responsive Analytics with a Semantic Layer | Cube Workshop
Streaming CDC data from PostgreSQL to Snowflake, challenges and solutions
zhlédnutí 463Před 2 měsíci
Streaming CDC data from PostgreSQL to Snowflake, challenges and solutions
OttoBot: Productionizing LLM Models
zhlédnutí 145Před 2 měsíci
OttoBot: Productionizing LLM Models
Building a User-Level Targeting Platform
zhlédnutí 137Před 2 měsíci
Building a User-Level Targeting Platform
Data Culture 2.0: Leveraging AI to Build Human Connections and Expand Your Influence
zhlédnutí 98Před 2 měsíci
Data Culture 2.0: Leveraging AI to Build Human Connections and Expand Your Influence
Beyond Kafka: Cutting Costs and Complexity with WarpStream and S3
zhlédnutí 269Před 2 měsíci
Beyond Kafka: Cutting Costs and Complexity with WarpStream and S3
Ten Years of Building Open Source Standards
zhlédnutí 249Před 2 měsíci
Ten Years of Building Open Source Standards
Move Fast and Don't Break Things -- How to Build a Data Platform that Scales with your Organization
zhlédnutí 316Před 2 měsíci
Move Fast and Don't Break Things How to Build a Data Platform that Scales with your Organization
Redefining Database Workloads: The Future with Modern Object Storage
zhlédnutí 98Před 2 měsíci
Redefining Database Workloads: The Future with Modern Object Storage
Beyond MLOps: Building AI systems with Metaflow
zhlédnutí 648Před 2 měsíci
Beyond MLOps: Building AI systems with Metaflow
How to Align AI Capabilities with Product Strategy so You Can Innovate
zhlédnutí 219Před 2 měsíci
How to Align AI Capabilities with Product Strategy so You Can Innovate

Komentáře

  • @chrismcgrath7610
    @chrismcgrath7610 Před 2 dny

    2nd Legendary talk, I can't remember how many years it's been since I last actually watched a tech video at 1x speed, and had my attention completely captured / enjoyed it, this was fascinating. This guy is in the Venn diagram of smart person, who knows how to properly present/communicate, and was willing to do the prep work. VS many other smart people suck at communication/presentation or aren't willing to do the prep work.

  • @Anhar001
    @Anhar001 Před 3 dny

    all this jank just to solve the issue which is basically Python. Just write a fully statically compiled binary and shove that on a NFS, then just use rsync between dev machines and NFS. Have a shell script watch binary file changes and relaunch when file is changed. Look ma, I just replaced entire solid with a few bash scripts 😂

  • @jimshtepa5423
    @jimshtepa5423 Před 3 dny

    10:55 what's wrong with uzbekistan?))))

  • @krishnapraveen777
    @krishnapraveen777 Před 4 dny

    Chad engineer

  • @hemantishwaran5741
    @hemantishwaran5741 Před 4 dny

    It’s great for ggplot and webpages. But if you ever write a textbook go straight to latex from the command line.

  • @malware_creations2606

    Also I've read the Kafka has an issue with consumer lag. How do you handle those ?

  • @zuowang5185
    @zuowang5185 Před 14 dny

    Is there an updated version of the logging pipeline 4 years later?

  • @bluejinux
    @bluejinux Před 17 dny

    One of the best presentations on what purpose of data warehouse and data lakehouse and where the future is going for data.

  • @randomhandle307
    @randomhandle307 Před 20 dny

    Very nice. Thanks

  • @AndreaMontes_
    @AndreaMontes_ Před 23 dny

    I'm rewatching this talk, the speaker is quite good. Taking some notes to prepare my own talk

  • @hannahnelson4569
    @hannahnelson4569 Před 28 dny

    Very cool talk! The idea of learning hueristics was very cool! I didn't quite understand how the criterion for splitting down multiple paths! I will check out the source code! Thank you for hosting this talk!

  • @fb-gu2er
    @fb-gu2er Před měsícem

    Backend in Python? Yikes

  • @guykerem7874
    @guykerem7874 Před měsícem

    One of the best talks on data in 2024. Thank you Abhi! You never miss a chance to inspire and impress

  • @tessafelice2181
    @tessafelice2181 Před měsícem

    I love the name mother duck. I feel it’s a respectful tribute to the female source of life and code.

  • @CreativeInspireP380
    @CreativeInspireP380 Před měsícem

    This was an extremely informative talk - especially the section on challenges - and one I wish would receive more attention due to how useful it is as an overview to quite a few complex and highly relevant issues. It would be nice if it were re-elaborated and presented in a non-live presentation format.

  • @the-ghost-in-the-machine1108

    thanks

  • @nosh3019
    @nosh3019 Před měsícem

    Great talk 🎉

  • @jayleejw1801
    @jayleejw1801 Před měsícem

    The amount of background noise in this video is absurd.

  • @tratkotratkov126
    @tratkotratkov126 Před měsícem

    Great, very much needed and promising project ! However, it is not quiet clear what do you mean when you are talking about data versioning (DV) - do you version the data as LakeFS does or you are just versioning the source code which is producing this data. Also the diagrams in the presentation (Virtual/Physical layers) I find confusing and not easy to grasp at first glance. It will be nice in the next iteration if you use some real world/practical entities to describe demo objects like customer, product, sales etc. instead of just “source” and wrap the demo in some quick story like “Meet Alex, the data engineer at TechCorp, a rapidly growing tech company. Alex is responsible for managing the company’s data pipelines, ensuring that data from various sources is clean, consistent, and available for analysis” etc. you got the idea. Finally I would suggest you switch the sequence and the time you spend on the theory and the demo part - show your fantastic open source project demo first and how easy is implementing the 3 concepts in meaningful story then after each segment just mention the theoretical part, but don’t allow the theory to consume 75% of your presentation unless you want to be considered as one of the many Data Governance “gurus” which are presenting on this channel. Whishing you all good luck with this fantastic project !

  • @LucasCardoso-mw4ok
    @LucasCardoso-mw4ok Před měsícem

    Hi! Nice video. I'm a little concerned about how I can get my development data from Copilot.

  • @KC53557
    @KC53557 Před měsícem

    A good example of not getting AI right is the creation of the Maga loon and Jan 6.

  • @68sahil56
    @68sahil56 Před měsícem

    30:29

  • @68sahil56
    @68sahil56 Před měsícem

    18:19

  • @VipulVaibhaw
    @VipulVaibhaw Před měsícem

    Fantastic talk!

  • @allthingsdata
    @allthingsdata Před měsícem

    Loved it.

  • @AshishKumar-ll2mt
    @AshishKumar-ll2mt Před měsícem

    Looks like this field never took off the way it should have

  • @yogeshbharadwaj6200
    @yogeshbharadwaj6200 Před měsícem

    Very nice demo..Tks..

  • @compilation_exe3821
    @compilation_exe3821 Před měsícem

    Amazing

  • @timothymcglynn1935
    @timothymcglynn1935 Před měsícem

    HI 👋

  • @HikarusVibrator
    @HikarusVibrator Před měsícem

    If someone can explain to me how you’re supposed to do a major version DB upgrade with a Debezium connector. It’s such an unbelievable pain that it’s a total dealbreaker. Unless I’m missing something

  • @Eriddoch
    @Eriddoch Před 2 měsíci

    Dang, Miriah you are an AMAZING speaker, and as someone who works on data engineering systems but doesn't own them (MLOps), this is really valuable.

  • @420_gunna
    @420_gunna Před 2 měsíci

    bullshit buzzwords "cognitive analytics" vomit and a saccharine exhortative tone "quantum computing + graphene + ai" come on

  • @paoloogr
    @paoloogr Před 2 měsíci

    Nice talk! Thanks.

  • @ex-cursion
    @ex-cursion Před 2 měsíci

    I loved this and wish there was more of it. Thank you! But as noted: 'invoice reconciliation is boring'. I feel like the survival of our species will pivot not on our curiosity, but on our capacity to constrain our desire for novelty enough to solve boring problems.

  • @matthewborn
    @matthewborn Před 2 měsíci

    This is an excellent talk. Thank you, Abhi!

  • @malcolmgdavis
    @malcolmgdavis Před 2 měsíci

    Pointer vs. Value discussion: Based on the Method vs. Function discussion, ADT should be strictly adhered to. Operations that modify the ADT are modeled as functions that take the old state as an argument and return the new state as part of the result. In other words, a function should enforce immutability. The ADT approach helps with concurrency, making the code cleaner and easier to read. As an API user, I shouldn't worry about the state changing when I pass a structure. Of course, the pure ADT model's problem is memory consumption. That's why ADT models are generally implemented in VMs that can routinely find old structures without references and remove them from memory.

  • @malcolmgdavis
    @malcolmgdavis Před 2 měsíci

    The method vs. function debate is absurd. The presenter needs to learn or spend time with OO programming. Class methods don't have to be logically connected to states. I developed in C during the 80s. The problem with structs is that the data is the point of coupling. The class hides data. In OO, the focus is on behavior and not the state. The OO state can be anywhere and can change. The strategy allows the implementation of the module to be changed without disturbing the client programs.

  • @1988YUVAL
    @1988YUVAL Před 2 měsíci

    Very interesting presentation. Looks like a very well thought out solution for managing data transformations. I wonder if it will take off like dbt.

  • @Jack-lg9mq
    @Jack-lg9mq Před 2 měsíci

    Good presentation. Also nice to see that Jimmi Simpson is expanding his horizons.

  • @mattbahr228
    @mattbahr228 Před 2 měsíci

    Awesome presentation!

  • @wonlee4138
    @wonlee4138 Před 2 měsíci

    Thanks for the great presentation!

  • @prashant776
    @prashant776 Před 2 měsíci

    Really good and informative. I congratulate PeerDB for their recent seed round secured . I see there is a lot of potential in PeerDB where organisations are looking to stream their data to warehouse. I have had a very unique need , I wish PeerDB was a wonderful choice back then.

  • @AndreaMontes_
    @AndreaMontes_ Před 2 měsíci

    Great speaker 👏👏

  • @thrawn01
    @thrawn01 Před 2 měsíci

    This was super useful, I learned a lot, Thank you!

  • @IbraheemFaiq
    @IbraheemFaiq Před 2 měsíci

    Great

  • @samhughes1747
    @samhughes1747 Před 2 měsíci

    I really enjoyed this. It was high-level, but hey, a hype-free, facts-only talk about working with generative models? I'll take it!

  • @Shikara_Animals
    @Shikara_Animals Před 2 měsíci

    Best teacher ❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤

  • @VijayasarathyMuthu
    @VijayasarathyMuthu Před 2 měsíci

    You should include LightDash

  • @whatSriBishnusRajDharmaN-ek1hl

    mother chods what doing here canot learn me detect leran mine concern your life risk at usa houston

  • @clarkylifehacks8220
    @clarkylifehacks8220 Před 2 měsíci

    This is great. Not the same context (not data), but I do 3 of the 4 roles under incident management, it can get messy!