Miscellaneous Projects and Collaborations

Project started: Aug 2002

This is basically a catchall for all of the smaller projects and collaborations that we have in the lab, which have not yet become worthy of their own thumbnail ;-) Because this category is in a state of constant flux, the best way to get more info on these projects is to contact Dan.

motif-x

http://motif-x.med.harvard.edu

Project started: Nov 2005

Proteins can be represented as strings of letters representing the 20 amino acids. Thus, proteins, like paragraphs, can be subdivided into sentences (protein domains), words (protein motifs), and letters (amino acids). In 2005, we developed the motif-x algorithm (and corresponding web tool) to computationally extract overrepresented motifs from large-scale phosphorylation data sets in the first attempt to discover kinase motifs using a substrate, rather than kinase, driven approach. The algorithm is an iterative strategy which builds successive motifs through comparison to a dynamic statistical background. Since its release, motif-x has become a standard proteomic analysis, with over 250 journal citations, and over 25,000 web site visits from ~100 countries. motif-x has also proven itself to be a general motif extraction algorithm, which can be used on any PTM data set, and can even be used to discover overrepresented patterns in FASTA formatted protein or DNA sequences. More information on the motif-x algorithm and web tool can be obtained through the Schwartz & Gygi (2005) and Chou & Schwartz (2011) publications at right, and by visiting the motif-x web site at the URL listed above.

scan-x

http://scan-x.med.harvard.edu

Project started: Oct 2008

Over the past decade a staggering amount of post-translational modification (PTM) data has been added to the literature largely due to technical advances in PTM enrichment and MS/MS instrumentation. In 2009 we published scan-x, a sister-program to motif-x, which aimed to harness the large-scale PTM data available in the literature to yield PTM predictions. Precomputed scan-x runs for the prediction of phosphorylation sites in yeast, fly, mouse and human proteins and acetylation sites in human proteins are available on the scan-x web site. More information on the scan-x algorithm and web tool can be obtained through the Schwartz, Chou, and Church (2009) and Chou & Schwartz (2011) publications at right, and by visiting the scan-x web site at the URL listed above.

ProPeL

Project started: Aug 2010

The traditional and most popular means of determining kinase specificity motifs has been through the use of combinatorial peptide library based approaches. These strategies, however, have several notable limitations including the need for large quantities of active recombinant kinase, the inability to use MS/MS technologies for peptide sequencing, and the high cost of peptide library synthesis. The Proteomic Peptide Library (ProPeL) project is aimed at alleviating these limitations by moving the kinase reaction from the inside of a test tube to the inside of living bacteria. In short, ProPeL uses the native expressed proteome of living bacteria as an "in vivo" peptide library for an expressed eukaryotic kinase. After lysing the bacteria, digesting proteins with trypsin, and enriching phosphorylation sites, we identify the phosphorylation sites using MS/MS and visualize motifs using the pLogo software. Additionally, these motifs can further be used to make additional phosphorylation predictions. For more info on the ProPeL strategy see our initial proof of concept paper (Chou et al., PLOS ONE, 2012) at right.

virPTM

http://virptm.hms.harvard.edu

Project started: Aug 2010

Although there are numerous databases documenting post-translational modifications in a wide variety of organisms spanning all domains of life, we realized a couple of years ago that no such database existed for viruses - hence the viral post-translational modification (virPTM) database. At present the database only contains information on phosphorylation, but in the future it will also contain other PTMs. Additionally, the virPTM database is a repository of phosphorylation predictions carried out using scan-x. Our main idea for the virPTM project can be summarized (albeit simply) as follows: read viral proteomes -> make PTM predictions for variety of enzymes-> uncover host enzymes that interact with virus -> inhibit host enzymes -> prevent viral replication. For more info on the virPTM database see our publication in Science Signaling at right, and visit the current web site at the URL listed above.

pLogo

http://plogo.uconn.edu/

Project started: Jun 2011

Linear biological sequence motifs are often represented as single amino acid consensus sequences (e.g., TATAAA or RRxS). Although easy to write on paper, these representations are typically oversimplifications of true motif profiles as there often exist numerous functional motif instances that do not fit the consensus. To present a more comprehensive view of biological sequence motifs, Schneider and Stephens introduced the "sequence logo" in 1990. In the pLogo (or probability logo) project we extend the concept of the sequence logo through a new visualization scheme and interactive web framework in which residue/nucleotide heights are drawn relative to their statistical significance. We are constantly working on ways to make the pLogo visualization method better and the corresponding pLogo web tool more functional for those who use it. For more info or to make your own pLogos visit the pLogo web site at the URL listed above.