Introduction and overview

TCUP can estimate the taxonomic composition of a sample of microorganisms using peptides generated from mass spectrometry. The programs contained in the TCUP package are meant to be used in a larger proteotyping workflow like this:

Overview of how |name| fits into the greater picture of the proteotyping workflow.

In the above picture, the components of TCUP are represented as the two boxes inside the dashed light green area in the lower center of the picture. Before TCUP can be used to estimate the taxonomic composition or detect expressed antibiotic resistance proteins in a sample, the following steps must be performed after the mass spectrometry:

  1. Match tandem mass spectra with peptide sequences (i.e. protein identification).
  2. Map the identified peptides to a database of reference genome sequences.

Step 1 can be performed using any mass spectrometry search engine, such as X!Tandem, Mascot, MyriMatch, Comet, Tide, etc. It is recommended to use the largest possible database of non-redundant protein sequences in this step, to allow for maximal similarity between the identified peptide sequences and the actual peptides in the sample. Step 2 can be performed using any sequence aligner/mapper capable of mapping protein sequences to nucleotide references, producing results in blast8 tabular format. We recommend using BLAT (or pBLAT).

TCUP uses several reference data sources (the two main ones depicted in the picture above) to create its estimate of a sample’s taxonomic composition and detecting expressed antibiotic resistance proteins. More information on how to prepare these reference databases is available in Preparing databases for use with TCUP.