| Abstract |
Background Translation of mRNA into proteins, a process essential to life, is an error-prone process. Translation errors, called phenotypic mutations, can change the protein sequence from what is encoded in the genome. The most important types of phenotypic mutations include amino acid misincorporations, stop codon readthrough, and ribosomal frameshifting, which was the focus of this thesis. Phenotypic mutations are stochastic: At a given position in the mRNA, a fraction of ribosomes makes the error, while others continue with canonical translation. Producing these stochastic non-canonical proteins is generally deleterious, but in some cases, phenotypic mutations have evolved to happen at high rates and perform an adaptive function. Phenotypic mutations can thus be seen either as random errors that need to be avoided or mitigated, or as oppurtunities to change the meaning of mRNAs, diversify the proteome, and help evolve new functions. Understanding how phenotypic mu-tations affect organisms requires an overview of where and how frequently they occur. However, most research has focused on a few specific cases. We still lack a systematic understanding of which mRNA sequences cause errors, and how such error-prone sequences are distributed through natural coding sequences. Hypothesis and Aims I hypothesized that phenotypic mutations are frequent enough to shape the proteome and play a role in protein evolution. To test this hypothesis, I needed to identify the genes and sequence positions where these errors occur, and with what probability. I further hypothe-sized that a small sequence context is sufficient to define phenotypic mutation rates. My project focused on ribosomal frameshifting as a phenotypic mutation that is potentially very deleterious, but adaptive in specific cases. Based on the hypotheses above, the first aim was to identify short mRNA sequences that by themselves cause frameshifting. The identified frameshift-stimulating sequences were the basis for the second aim: I screened prokaryotic genomes for effective frameshift sites and identified the frameshifted proteins they would produce. An investigation of the distribution of sites, as well as a functional analysis of the frameshift proteins, would reveal how ribosomal frameshifting contributes to protein evolution. Additionally, in two collaborative projects, I investigated amino acid misincorporations and stop codon readthrough. For both error types, we aimed to identify sequences that cause them, and then to assess the extent of errors in genomes. We also looked for evidence of genome adaptation to avoid, mitigate, or utilize these errors, to understand how phenotypic mutations change the proteome. Materials and Methods To investigate amino acid misincorporations, we developed a bioinformatic pipeline and Py-thon package, deTEL, to identify amino acid misincorporations in mass spectrometry data. We used it to re-process hundreds of public proteomics (mass spectrometry) datasets from E. coli and S. cerevisiae. From the identified misincorporations, we calculated misincorporation rates for codon and amino acid pairs. To find evidence of naturally occurring stop codon readthrough, I analyzed wild-type E. coli by mass spectrometry-based proteomics to identify sequences that would only be produced by readthrough. I also analyzed the distribution and usage of stop codons in the E. coli genome to assess genomic adaptation to this error type. To investigate frameshifting, I combined two approaches: First, I developed a computa-tional model, called SLIPPERRS, that predicts frameshifting sites (10 nucleotides in length) from the mRNA sequence. SLIPPERRS predicts frameshifting by modeling tRNA arrival at the ribosome and codon-anticodon binding. Second, I designed a quantitative assay, based on a prokaryotic in-vitro system, that measures frameshifting probabilities by two orthogonal readouts: fluorescence, and quantitative mass spectrometry. Guided by model predictions, I measured the frameshift probabilities of around 160 10-mers. I then used the library of frameshift sites to analyze frameshifting in prokaryotic genomes. I screened the genomes of 200 Gammaproteobacteria species for effective frameshift sites. I predicted the hypothetical proteins produced by frameshifting at these sites and analyzed them in silico, using predictions of disorder, domain, and 3D structure. I compared prod-ucts of efficient and inefficient frameshift sites to find evidence for or against genomic ad-aptation to frameshifting. Results By re-analyzing publicly available proteomics data, we detected tens of thousands of ami-no acid misincorporation events in E. coli and S. cerevisiae. Error rates for different types of misincorporations range from 10-7 to 10-3. Error rates differ significantly between mass spectrometry studies and between species, highlighting the need for data aggregation and species-specific studies. Misincorporations are so frequent that 20% of protein molecules or more will contain an error, suggesting that misincorporations are a ubiquitous phenomenon. Using mass spectrometry-based proteomics, I looked for evidence of stop codon readthrough in wild-type E. coli and detected readthrough events in 15 genes. Readthrough is more frequent at lower growth temperatures, and in genes that use the UGA stop codon. In the E. coli genome, UGA is avoided where readthrough would have severe consequences, and often has ‘backup’ stop codons, indicating selection against deleterious readthrough. To investigate ribosomal frameshifting, I developed a mechanistic model and a quantitative assay. Using the assay, I identified dozens of previously uncharacterized 10-mers that cause efficient frameshifting. Further, I found codon patterns and sequence traits that are predictive of ribosomal frameshifting. The model identifies efficient FS sites with more than 70% accuracy, making it one of the first general models to predict frameshifting from the mRNA sequence. The bioinformatic study revealed that efficient FS sites are common in Gammaproteobac-teria genomes: Ca. 100 genes per species contain an FS site. The proteins produced by these sites are shorter and less disordered than expected, indicating selection against the consequences of frameshifting. In about 40% of cases, the frameshift adds at least one ordered domain to the protein. I present one case where ribosomal frameshifting seems to be adaptive, as it ‘repairs’ a pseudogene destroyed by a genomic frameshift mutation. Conclusions The projects on amino acid misincorporations, stop codon readthrough, and ribosomal frameshifting presented here show that phenotypic mutations are a common phenomenon in prokaryotes. They occur throughout the proteome at seemingly non-programmed, non-adaptive sites, at a rate and extent that is clearly detectable. It appears that genomes have evolved to avoid the deleterious consequences of phenotypic mutations. In the case of ribosomal frameshifting, however, organisms seem to make adaptive use of the error in at least some cases. These results show that phenotypic mutations are an important contributor to the regulation and evolution of the proteome. |