| annotate.protein_id {specL} | R Documentation |
This function assigns the protein identifier for a list of tandem mass specs having a peptide sequence assigned.
annotate.protein_id(data, file = NULL, fasta = read.fasta(file = file,
as.string = TRUE, seqtype = "AA"), digestPattern = "(([RK])|(^)|(^M))")
data |
list of records containing mZ and peptide sequences. |
file |
file name of a FASTA file. |
fasta |
a fasta object as returned by the |
digestPattern |
a regex pattern which can be used by the |
The protein sequences a read by the read.fasta function
of the seqinr package. The protein identifier is written
to the protein proteinInformation variable.
If the function is called on a multi-core architecture it uses mclapply.
It is recommended to load the FASTA file prior to running
annotate.protein_id using
myFASTA <- read.fasta(file = file,
as.string = TRUE,
seqtype = "AA")
instead of providing the FASTA file name to the function.
it returns a list object.
Jonas Grossmann and Christian Panse, 2014
?read.fasta of the seqinr package.
http://www.uniprot.org/help/fasta-headers
# annotate.protein_id
# our Fasta sequence
irtFASTAseq <- paste(">zz|ZZ_FGCZCont0260|",
"iRT_Protein_with_AAAAK_spacers concatenated Biognosys\n",
"LGGNEQVTRAAAAKGAGSSEPVTGLDAKAAAAKVEATFGVDESNAKAAAAKYILAGVENS",
"KAAAAKTPVISGGPYEYRAAAAKTPVITGAPYEYRAAAAKDGLDAASYYAPVRAAAAKAD",
"VTPADFSEWSKAAAAKGTFIIDPGGVIRAAAAKGTFIIDPAAVIRAAAAKLFLQFGAQGS",
"PFLK\n")
# be realistic, do it from file
Tfile <- file(); cat(irtFASTAseq, file = Tfile);
#use read.fasta from seqinr
fasta.irtFASTAseq <-read.fasta(Tfile, as.string=TRUE, seqtype="AA")
close(Tfile)
#annotate with proteinID
# -> here we find all psms from the one proteinID above
peptideStd <- specL::annotate.protein_id(peptideStd,
fasta=fasta.irtFASTAseq)
#show indices for all PSMs where we have a proteinInformation
which(unlist(lapply(peptideStd,
function(x){nchar(x$proteinInformation)>0})))