Deciphering mutations in actionable genes by integrating structural and evolutionary epistatic features.

Authors Federica Luppino
Advisors
University Technische Universität Dresden
Examination Date 2024-11-22
Open Access true
Print Publication Date 2024-11-22
Online Publication Date 2024-11-22
Abstract Despite the rapid advancement of sequencing technologies and although the wide diffusion of Whole Genome Sequencing (WGS) and Whole Exome Sequencing (WES) led to an increase in the diagnoses of diseases (A. C. Lionel, et al. 2018; D. J. Stavropoulos, et al. 2016; J. C. Taylor, et al. 2015) most genetic variants remain without a clear interpretation. One of the main difficulty related with the assessment of sequencing results is the abundance of Single Nucleotide Variant (SNV), around 4 million, that each healthy individual carries. Nearly all of these mutations will not produce any phenotype, that is equal to say that they have a benign or neutral effect. Only handful of those variants are potentially pathogenic, namely disease-causing. That is why computational Variant Effect Predictor (VEP) tools are used to prioritize variants worth investigating for medical consideration. Furthermore, the evidence of computational tools is considered among the different sources for variant effect assessment according to the American College of Medical Genetics and Genomics (ACMG) and Association for Molecular Pathology (AMP) guidelines. In addition, those software tools can be recognized as medical devices according to the second article of the Medical Device Regulation (MDR) of the European Union (Regulation (EU) 2017/745). That is why building a computational tool that predicts with high accuracy variant pathogenicity might have a direct impact on the healthcare system. Since 2001 more than 100 VEPs tools have been developed. Yet, their thresholds to classify a variant as pathogenic are often set for high sensitivity, that results in high false positive rate, namely misclassification of benign variants (C. Cubuk, et al. 2021). During my PhD, I developed Deciphering Mutations in Actionable Genes (DeMAG), a supervised classifier for interpreting missense mutations, namely SNVs that alter the protein sequence, in a list of 59 actionable genes as identified by the ACMG Secondary Findings (SF) v2.0 list (S. S. Kalia, et al. 2017). DeMAG is a supervised classifier trained with a Gradient boosting machine (GBM) model that employs only 13 conservation-based and structural features derived from AlphaFold 3D models and manually curated Multiple Sequence Alignment (MSA). DeMAG yields the best performance on clinical data among other popular VEP tools, balancing sensitivity and specificity, reaching the highest Matthews Correlation Coefficient (MCC). The advancement of DeMAG is due to the assembling of a balanced and high-quality training set and to the design of the partners score, a feature that captures epistasis, both in the sequence and in the 3D space of the protein. Here, epistasis refers to residues co-evolution in the sequence and residues spatial proximity in the 3D structure of the protein. The feature is a probabilistic score obtained with a mixture discriminant analysis that predicts pathogenicity based on the phenotypic effect of co-evolving and spatially close residues. The partners score feature is a general framework to study genotype and phenotype interactions. For example, those interactions might be between hetero or homoproteins forming a complex as tertiary structure and genetic variants occurring at interfaces, already known to be disease-causing, might be enriched for the same phenotypic effect. The framework of the partners score might not be limited to protein sequence, for example, interactions in the 3D genome might reveal regions enriched with the same phenotypic effect. DeMAG has been trained only on a small set of genes and yet, without further training, it generalizes well to additional 257 genes that have enough clinical data. Because for those new genes I did not manually curate MSA, I noted that the partners score from protein 3D models seems necessary for reaching high performance, while the contribution of the partners score obtained from long-range interactions, as derived from the co-evolution analysis, does not seem crucial for variant effect predictions. DeMAG is a supervised method especially designed for clinical translation purposes. That is why it focuses on clinically actionable genes and it balances its performance between the accuracy of the pathogenic and the benign class, acknowledging the importance of minimizing both the false negatives and false positives to avoid under and over diagnosis, critical to reduce health costs and patients psychological burden. Unsupervised general VEPs are powerful tools to investigate the functional effect of genetic variants as demonstrated by their higher correlation, over supervised tools, with data from Multiplexed Assay of Variant Effect (MAVE) and Deep Mutational Scanning (DMS) experiments. Nevertheless, for targeted clinical applications, I endorse the development of specialized tools that can leverage the existing wealth of data and knowledge available to minimize predictions errors. In order to make DeMAG readily available, I developed a web application available at https://demag.org/demag_app/ that provides predictions for all amino acids substitutions in the 59 and additional 257 genes together with training and testing datasets. Moreover, the app displays all the features of DeMAG highlighting the specific value annotated for the query mutation in relation to the distribution of the features for the pathogenic and benign mutations in the training set. This provides more insights than the minimalistic prediction label.
Cover Image
Affiliated With Tóth-Petróczy
Selected By
Acknowledged Services
Publication Status Published
Edoc Link
Sfx Link
DOI
PubMed ID
WebOfScience Link
Alternative Full Text URL https://nbn-resolving.org/urn:nbn:de:bsz:14-qucosa2-948126
Display Publisher Download Only false
Visible On MPI-CBG Website false
PDF Downloadable true
Created By thuem
Added Date 2025-02-11
Last Edited By thuem
Last Edited Date 2025-02-11 17:35:50.521
Library ID 8906
Document ID
Entry Complete true
eDoc Compliant true
Include in Edoc Report true
In Pure false
Ready for eDoc Export true
Author Affiliations Complete false
Project Name
Project URL
Grant ID
Funding Programme
Funding Organisation