Computational Genomics / Bioinformatics
The genome revolution has produced an abundance of protein sequence data. Traditional homology-based computer methods make it possible to establish evolutionary relationships between large numbers of these proteins. Yet among any set of new protein sequences, say from the complete genome sequence of a new organism, a significant fraction of the proteins cannot be assigned fuctions by traditional methods. A new sequence may have no recognizable homologs in other organisms, or it may have recognizable homologs, the cellular functions of which are yet unknown. The critical need to make some kind of functional inferences for the vast numbers of proteins that could not be functionally annoted by traditional homology methods led in 1999, and in the years that followed, to new ideas for inferring 'functional linkages' between proteins not related to each other by homology. These 'non-homology' or 'genomic context' methods included the 'Phylogenetic Profile' method and the 'Rosetta-Stone' method (both pioneered principally by Edward Marcotte and Matteo Pellegrini when they were postdocs with Eisenberg and Yeates), and others. Subsequent work has aimed to extend those ideas. One recent extension of Phylogenetic Profiles (developed by Peter Bowers and Shawn Cokus) involves an application of logic analysis to uncover proteins whose presence vs. absence across organisms is related to the presence or absence of two other proteins, taken in logical combination. These kinds of higher order relationships are expected to be abundant in the cell, but are not detected by the original Phylogenetic Profile method, which looks for direct similarity between the profiles of just two proteins at a time.

Our computational genomics work has touched on many other subjects as well: disulfide bonding in thermophiles, repetitive protein sequences, genomic encoding of unusual amino acids such as selenocysteine and pyrollysine, detection of protein targeting sequences, and the function of bacterial microcompartments.

References:

  • Jorda J, Yeates TO. (2011). Widespread disulfide bonding in proteins from thermophilic archaea. Archaea. 2011. 2011:409156. [Abstract]
  • Fan C, Cheng S, Liu Y, Escobar CM, Crowley CS, Jefferson RE, Yeates TO, Bobik TA. (2010). Short N-terminal sequences package proteins into bacterial microcompartments. Proc. Natl. Acad. Sci. U.S.A.. Apr 2010. 107(16):7509-14. [Abstract]
  • Sprinzak E, Cokus SJ, Yeates TO, Eisenberg D, Pellegrini M. (2009). Detecting coordinated regulation of multi-protein complexes using logic analysis of gene expression. BMC Syst Biol. 2009. 3:115. [Abstract]
  • Beeby M, Bobik TA, Yeates TO. (2009). Exploiting genomic patterns to discover new supramolecular protein assemblies. Protein Sci.. Jan 2009. 18(1):69-79. [Abstract]
  • Chaudhuri BN, Yeates TO. (2005). A computational method to predict genetically encoded rare amino acids in proteins. Genome Biol.. 2005. 6(9):R79. [Abstract]
  • Beeby M, O'Connor BD, Ryttersgaard C, Boutz DR, Perry LJ, Yeates TO. (2005). The genomics of disulfide bonding and protein stabilization in thermophiles. PLoS Biol.. Sep 2005. 3(9):e309. [Abstract]
  • Bowers PM, O'Connor BD, Cokus SJ, Sprinzak E, Yeates TO, Eisenberg D. (2005). Utilizing logical relationships in genomic data to decipher cellular processes. FEBS J.. Oct 2005. 272(20):5110-8. [Abstract]
  • Bowers PM, Pellegrini M, Thompson MJ, Fierro J, Yeates TO, Eisenberg D. (2004). Prolinks: a database of protein functional linkages derived from coevolution. Genome Biol.. 2004. 5(5):R35. [Abstract]
  • O'Connor BD, Yeates TO. (2004). GDAP: a web tool for genome-wide protein disulfide bond prediction. Nucleic Acids Res.. Jul 2004. 32(Web Server issue):W360-4. [Abstract]
  • Bowers PM, Cokus SJ, Eisenberg D, Yeates TO. (2004). Use of logic relationships to decipher protein network organization. Science. Dec 2004. 306(5705):2246-9. [Abstract]
  • Strong M, Graeber TG, Beeby M, Pellegrini M, Thompson MJ, Yeates TO, Eisenberg D. (2003). Visualization and interpretation of protein networks in Mycobacterium tuberculosis based on hierarchical clustering of genome-wide functional linkage maps. Nucleic Acids Res.. Dec 2003. 31(24):7099-109. [Abstract]
  • Mallick P, Boutz DR, Eisenberg D, Yeates TO. (2002). Genomic evidence that the intracellular proteins of archaeal microbes contain disulfide bonds. Proc. Natl. Acad. Sci. U.S.A.. Jul 2002. 99(15):9679-84. [Abstract]
  • Eisenberg D, Marcotte EM, Xenarios I, Yeates TO. (2000). Protein function in the post-genomic era. Nature. Jun 2000. 405(6788):823-6. [Abstract]
  • Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO. (1999). Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc. Natl. Acad. Sci. U.S.A.. Apr 1999. 96(8):4285-8. [Abstract]
  • Pellegrini M, Marcotte EM, Yeates TO. (1999). A fast algorithm for genome-wide analysis of proteins with repeated sequences. Proteins. Jun 1999. 35(4):440-6. [Abstract]
  • Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg D. (1999). Detecting protein function and protein-protein interactions from genome sequences. Science. Jul 1999. 285(5428):751-3. [Abstract]
  • Marcotte EM, Pellegrini M, Yeates TO, Eisenberg D. (1999). A census of protein repeats. J. Mol. Biol.. Oct 1999. 293(1):151-60. [Abstract]
  • Pellegrini M, Yeates TO. (1999). Searching for frameshift evolutionary relationships between protein sequence families. Proteins. Nov 1999. 37(2):278-83. [Abstract]
  • Marcotte EM, Pellegrini M, Thompson MJ, Yeates TO, Eisenberg D. (1999). A combined algorithm for genome-wide prediction of protein function. Nature. Nov 1999. 402(6757):83-6. [Abstract]

A diagram illustrating a method for predicting mechanisms of protein targeting (e.g. to bacterial microcompartments) by special N or C-terminal sequence extensions. (Adapted from Fan, et al. 2010)


A diagram illustrating the idea of logic analysis of phylogenetic profiles. (Adapted from Bowers, et al. 2002)



Return to Research Overview