Site-wise mutation effects enable combinatorial protein variant design

Sdílet
Vložit
  • čas přidán 6. 06. 2023
  • Tuesday June 6th, 4-5 pm EST | David Ding - Postdoctoral Scholar, UC Berkeley
    Abstract: Design and natural evolution of proteins is profoundly impacted by the amount and frequency of epistasis, ie. how substitutions affect each other. Recent efforts in developing large deep learning models with 100s of millions of parameters trained on structural and sequence databases have made progress in predicting protein variant effects by utilizing the ability of such models to fit increasingly complicated protein fitness landscapes with specific interactions between substitutions. However, for most proteins, it remains unclear how important biological epistasis is for prediction and design of combinatorial variants. Here, we systematically examined 8 combinatorial protein variant effect datasets for the complexity of their fitness landscape. We start by measuring the effect of ~10.000 combinatorial variants at ten binding residues in the antitoxin ParD3 on its in vivo ability to neutralize its cognate toxin, ParE3. Using this and two additional datasets in this protein, we show that a simple logistic regression model considering only site-wise amino acid preferences without interactions between residues can explain much and, in some datasets, virtually all of the combinatorial mutation effects (R2: 83-98%). As a result, this minimal model with ~60-200 parameters can be trained on a small number (~200-1000) of observations to predict unobserved combinatorial variant effects well (Pearson r~0.80-0.98). We find that such site-wise preferences are affected by mutations at neighboring residues, leading us to develop an unsupervised strategy - which we call ALMS for ‘assessment of local mutations from structure’ - for design of functional and diverse sequences using structural microenvironment information in data-poor regimes. ALMS outperforms not just random library sampling approaches but also high-capacity neural networks that can model specific dependencies between residues. Finally, we observe that these results generalize to observed combinatorial variant effects in 7 other proteins (R2 ~ 78-95%). These results demonstrate that simple, site-wise supervised and unsupervised approaches enable the design of combinatorial protein variant effects, including therapeutically relevant ones.
    Preprint: www.biorxiv.org/content/10.11...
    Google Scholar: scholar.google.com/citations?...

Komentáře • 1

  • @obsoletepowercorrupts
    @obsoletepowercorrupts Před 2 měsíci

    This could be used for Ribosomopathies _(abnormalities in rRNA genes, ribosomal component proteins)_ in anaemia and bone marrow, to pre-emptively reverse engineer them by computer. So one makes a new, custom, ribosome organelle computationally. Comparison across genus and species could be for hypothetical empirical emulation _(and as an aside, sexual selection for immune system traits could be postulated, for inbreeding and outbreeding or phenotype)._ Cancer susceptibility and mitigating factors such as premature cell death _(not solely mutation from RNA interference)_ could be predicted. Extension to sickle cell prediction would be a worthwhile investigation.
    For instance: Narrow down and isolate the Mutation Preference Inference from RNA interference by means of singular value decomposition used in _(including Gaussian)_ noise elimination with a logistics fit, also by linear regression and derive if it is in a sparse matrix or dense matrix. Express the molecular Schrödinger electron density _(Gaussian probability estimation, in a Rosenblatt-Parzen window treating the kernel as its hypercube)_ as kissing-spheres _(or sphere packing via combinatorial optimisation)._ For learning _(educationally but also for machine-learning),_ that could be done in R (cran-project) for better graphics fo the polynomials but other than that, python would be similar enough language. OpenCL for heterogeneous low power computation _(low power implant medical devices for instance)_ might be an option although DLib and Eigen maths libraries for C++ would augment it.
    As an (extra) aside, discovery of the mechanisms of diploidisation in the (sans meiosis) parasexual cycle _(fungi and prokaryote, in mitosis)_ could be researched using singular value decomposition for theoretical soil emulation and fungi creation (or a plant) for antibiotic discovery. Then apply that to neofunctionalisation, subfunctionalisation and genome downsizing (post-polyploidisation diploidisation), for instance, as relevant to DNA repetition and gene deletion _(and extraneous gene copies with alleles in Eukaryota's taxonomic groups)._
    My comment has no hate in it and I do no harm. I am not appalled or afraid, boasting or envying or complaining... Just saying. Psalms23: Giving thanks and praise to the Lord and peace and love. Also, I'd say Matthew6.