Building a global data supply chain to improve protein design and protect biodiversity

Sdílet
Vložit
  • čas přidán 31. 05. 2023
  • Presented on May 10th 2023 by Phil Lorenz
    Abstract
    With more than 99.9% of biodiversity remaining unknown, the ground-truth genome and protein sequences available for deep learning models are highly unrepresentative. We therefore present a data-centric approach to protein design through a knowledge graph sourced from environmental metagenomics and metadata collection across 5 continents. Leveraging this data, we describe ZymCtrl, a conditional language model for the controllable generation of artificial enzymes, and display case studies with performance validation on specific protein classes including fluorinases and gene-editing systems.

Komentáře • 1

  • @obsidian161
    @obsidian161 Před rokem

    Prof. Chris Snow from CSU brought me here! Go Rams!