Running TCUP¶
Before running TCUP, ensure that the requisite databases have been created (see Preparing databases for use with TCUP). The input data to TCUP are mappings of peptides from against the reference genomes database you used to create the taxref DB (see Preparing databases for use with TCUP).
Note
IMPORTANT: The FASTA headers of the peptides need to contain the peptide length. It should be encoded at the end of the header as a single integer after an underscore character (“_”) at the end of the first space separated part of the header. E.g. the following header:
>peptidename_15 some other information
belongs to a peptide called peptidename that is 15 amino acids long.
The mappings of peptides to reference genome sequences must be in blast8 tabular format (without column headers or comments). We recommend using BLAT or pBLAT. On Windows, BLAST can be used.
Note
All examples of command invocations in this section are aimed towards Linux. Where notable differences exist, they will be pointed out.
BLAT/pBLAT¶
The peptides in the sample need to mapped to the reference genomes. We
recommend the use of BLAT for this, as it performs very well with translated
mapping to large genome sequences. It is important to tell BLAT to do
protein-to-translated-dna mapping with the -t=dnax -q=prot
arguments. We
have successfully used the following command line options with tandem mass
spectrometry data:
blat <ref_db.fasta> <sample.fasta> -t=dnax -q=prot -minScore=10 -stepSize=5 -tileSize=5 -minIdentity=90 -out=blast8 <outfilename.blast8>
For the antibiotic resistance detection, the high sensitivity afforded by the above settings are not required, also the mapping is now protein-to-protein. Instead of using the above command line for antibiotic resistance detection, we recommend using the default settings, like so:
blat <resfinder.fasta> <sample.fasta> -out=blast8 <outfilename.blast8>
BLAST¶
We do not recommend using BLAST, but if no other option is available, it can be done using the following settings for taxonomic composition estimation:
tblastn.exe -query <query> -db <db> -out <outfile> -outfmt 6
For antibiotic resistance detection, we need to do protein-to-protein mapping using the following command line:
blastp.exe -query <query> -db <db> -out <outfile> -outfmt 6
Note
TCUP has not yet been optimized for use with BLAST, and the results might be be unreliable if BLAST is used for peptide to genome mapping.
Taxonomic composition¶
A typical invocation might look something like this:
taxonomic_composition \
--taxref-db <TAXREF_DB> \
--annotation-db <ANNOTATION_DB> \
--write-xlsx <OUTPUT_XLSX_FILENAME> \
--output <OUTPUT_TXT_FILENAME>
<INPUT_BLAST8_FILENAME>
where <TAXREF_DB>
is the path to the taxonomy reference database file,
<ANNOTATION_DB>
is the path to the annotation database file,
<OUTPUT_XLSX_FILENAME>
is desired filename for the output in Excel format,
<OUTPUT_TXT_FILENAME>
is the desired filename for the output in text
format, and <INPUT_BLAST8_FILENAME>
is the filename of a file in BLAST8
format containing the mapping results of sample peptides against the reference
genomes.
Running taxonomic_composition --help
will show a full list of all available
options and their default values.
Antibiotic resistance¶
A typical invocation might look something like this:
antibiotic_resistance \
--resfinder <RESFINDER_DB> \
--output <OUTPUT_TXT_FILENAME> \
<INPUT_BLAST8_FILENAME>
where <RESFINDER_DB>
is the path to the ResFinder sqlite3 database file,
<OUTPUT_TXT_FILENAME>
is the desired filename for the text format output,
and <INPUT_BLAST8_FILENAME>
is the filename of a file in BLAST8 format
containing the mapping results of peptides against the ResFinder database.
Running antibiotic_resistance --help
will show a full list of all available
options and their default values.