Data Exchange Podcast (Episode 240): Chang She of LanceDB

Sdílet
Vložit
  • čas přidán 5. 09. 2024
  • Episode Notes: thedataexchang...
    In this episode we discuss Lance, an open-source columnar data format that tackles the unique challenges posed by modern AI and machine learning workloads.
    *Sections*
    Introduction to Lance and the Challenge of Unstructured Data - 00:00:05
    Overcoming Limitations of Existing Formats (Parquet, ORC) - 00:02:56
    Lance: A New Data Format for AI Workloads - 00:06:05
    Efficient Metadata Handling and Wide Data Support in Lance - 00:07:20
    Integrated Vector Indexing for AI Applications - 00:09:15
    LanceDB: A Scalable Vector Database Built on Lance Format - 00:10:39
    Real-World Use Cases: Images, Videos, and Large-Scale Datasets - 00:12:31
    Lance as a "One-Stop Shop" for AI Data Lakes - 00:13:49
    Comparison to Meta's Nimble: Similarities and Differences - 00:15:18
    Open Source Ecosystem and Community Contributions - 00:18:48
    Key Use Cases: Data Exploration, Training, and Vector Search - 00:21:31
    Addressing the Limitations of Traditional Vector Search Systems - 00:24:32
    Exploratory Data Analysis for Unstructured Data with Lance - 00:28:02
    Multimodal Embeddings and Vector Search - 00:35:51
    Feature Stores and Their Evolving Role in AI - 00:41:34
    Putting LanceDB's Vector Search to the Test - 00:44:14
    Embedding Pipelines, Ecosystem Integrations, and Deployment - 00:50:27
    Open Source and Enterprise Offerings from LanceDB - 00:53:45
    The Future of Lance: New Encodings, Integrations, and Governance - 00:56:38

Komentáře •