| findORFs {ORFik} | R Documentation |
Find all Open Reading Frames (ORFs) on the input sequences
in ONLY 5'- 3' direction (+), but within all three possible reading frames.
For each sequence of the input vector IRanges with START and
STOP positions (inclusive) will be returned as
IRangesList. Returned coordinates are relative to the
input sequences.
findORFs(seqs, startCodon = startDefinition(1), stopCodon = stopDefinition(1), longestORF = TRUE, minimumLength = 0)
seqs |
(DNAStringSet or character) DNA sequences to search for Open Reading Frames. Can be both uppercase or lowercase. Easiest call to get seqs if you want only regions from a fasta/fasta index pair is: seqs = ORFik:::txSeqsFromFa(grl, faFile), where grl is a GRanges/List of regions. |
startCodon |
(character) Possible START codons to search for. Check
|
stopCodon |
(character) Possible STOP codons to search for. Check
|
longestORF |
(logical) Default TRUE. Keep only the longest ORF per
unique (seqname, strand, stopcodon) combination, you can also use function
|
minimumLength |
(integer) Default is 0. Minimum length of ORF, without counting 3bp for START and STOP codons. For example minimumLength = 8 will result in size of ORFs to be at least START + 8*3 (bp) + STOP = 30 bases. Use this param to restrict search. |
If you want antisence strand too, do:
#positive strands
pos <- findORFs(seqs)
#negative strands (DNAStringSet only if character)
neg <- findORFs(reverseComplement(DNAStringSet(seqs)))
relist(c(GRanges(pos, strand = "+"), GRanges(neg, strand = "-")),
skeleton = merge(pos, neg))
(IRangesList) of ORFs locations by START and STOP sites grouped by input seqeunces. In a list of sequences, only the indices of the sequences that had ORFs will be returned, e.g. 3 sequences where only 1 and 3 has ORFs, will return size 2 IRangesList with names c("1", "3"). If there are a total of 0 ORFs, an empty IRangesList will be returned.
[findMapORFs()], [findORFsFasta()], [startDefinition()], [stopDefinition()]
Other findORFs: findMapORFs,
findORFsFasta,
startDefinition,
stopDefinition
findORFs("ATGTAA")
findORFs("ATGTTAA") # not in frame anymore
findORFs("ATGATGTAA") # two ORFs
findORFs("ATGATGTAA", longestORF = TRUE) # only longest of two above
findORFs(c("ATGTAA", "ATGATGTAA"))