Building a global data supply chain to improve protein design and protect biodiversity
Vložit
- čas přidán 31. 05. 2023
- Presented on May 10th 2023 by Phil Lorenz
Abstract
With more than 99.9% of biodiversity remaining unknown, the ground-truth genome and protein sequences available for deep learning models are highly unrepresentative. We therefore present a data-centric approach to protein design through a knowledge graph sourced from environmental metagenomics and metadata collection across 5 continents. Leveraging this data, we describe ZymCtrl, a conditional language model for the controllable generation of artificial enzymes, and display case studies with performance validation on specific protein classes including fluorinases and gene-editing systems.
Prof. Chris Snow from CSU brought me here! Go Rams!