MIA: Zitnik Lab, Multimodal protein language models for deciphering protein function

Sdílet
Vložit
  • čas přidán 9. 06. 2024
  • Models, Inference and Algorithms
    May 29, 2024
    Broad Institute of MIT and Harvard
    Marinka Zitnik
    Assistant Professor of Biomedical Informatics, Harvard Medical School
    Owen Queen
    Research Associate, Harvard Medical School
    Yepeng Huang
    PhD Student, Harvard Medical School
    Understanding the relationship between a protein's amino acid sequence and its structure or function is a long-standing challenge with far-reaching implications for therapeutic development, as the effects of drugs are often directly linked to proteins. Current protein language models (PLMs) capture evolutionary relationships based on sequences but fall short of directly acquiring protein functions from multimodal molecular data, including protein sequences and structures, peptides, and domains. We develop a multimodal protein language model that integrates textual protein descriptions with a sequence-structure PLM to create a more comprehensive and functionally insightful model of proteins. This integration promises to bridge the current gap in PLMs, transitioning from understanding the structural aspects of proteins to gaining a functional view of vast protein space. The model allows scientists to express their queries in natural language and interact with protein models in an open-ended manner. It allows for text-based prediction of protein targets, multimodal protein captioning, and Q&A sessions with scientists with varying levels of expertise, among others. Trained on a new dataset of protein-text instructions, the model can generalize to new phenotypes in a zero-shot manner, making it versatile for diverse tasks even when functional annotations are scarce. We conclude with an outlook for the future with “AI scientists” as generative agents capable of skeptical learning and reasoning to empower biomedical research.
    For more information visit: www.broadinsti...
    Copyright Broad Institute, 2024. All rights reserved.

Komentáře •