gene_df |
Data object containing the genes
(see gene_input for options on how
the genes can be stored within the object).
Can be one of the following formats:
matrix : A sparse or dense matrix.
data.frame : A data.frame,
data.table. or tibble.
codelist : A list or character vector.
Genes, transcripts, proteins, SNPs, or genomic ranges
can be provided in any format
(HGNC, Ensembl, RefSeq, UniProt, etc.) and will be
automatically converted to gene symbols unless
specified otherwise with the ... arguments.
Note: If you set method="homologene", you
must either supply genes in gene symbol format (e.g. "Sox2")
OR set standardise_genes=TRUE.
|
gene_input |
Which aspect of gene_df to
get gene names from:
"rownames" : From row names of data.frame/matrix.
"colnames" : From column names of data.frame/matrix.
<column name> : From a column in gene_df,
e.g. "gene_names".
|
gene_output |
How to return genes.
Options include:
"rownames" : As row names of gene_df.
"colnames" : As column names of gene_df.
"columns" : As new columns "input_gene", "ortholog_gene"
(and "input_gene_standard" if standardise_genes=TRUE)
in gene_df.
"dict" : As a dictionary (named list) where the names
are input_gene and the values are ortholog_gene.
"dict_rev" : As a reversed dictionary (named list)
where the names are ortholog_gene and the values are input_gene.
|
standardise_genes |
If TRUE AND
gene_output="columns", a new column "input_gene_standard"
will be added to gene_df containing standardised HGNC symbols
identified by gorth.
|
input_species |
Name of the input species (e.g., "mouse","fly").
Use map_species to return a full list
of available species.
|
output_species |
Name of the output species (e.g. "human","chicken").
Use map_species to return a full list
of available species.
|
method |
R package to to use for gene mapping:
"gprofiler" : Slower but more species and genes.
"homologene" : Faster but fewer species and genes.
"babelgene" : Faster but fewer species and genes.
Also gives consensus scores for each gene mapping based on a
several different data sources.
|
drop_nonorths |
Drop genes that don't have an ortholog
in the output_species.
|
non121_strategy |
How to handle genes that don't have
1:1 mappings between input_species:output_species.
Options include:
"drop_both_species" or "dbs" or 1 :
Drop genes that have duplicate
mappings in either the input_species or output_species
(DEFAULT).
"drop_input_species" or "dis" or 2 :
Only drop genes that have duplicate
mappings in the input_species.
"drop_output_species" or "dos" or 3 :
Only drop genes that have duplicate
mappings in the output_species.
"keep_both_species" or "kbs" or 4 :
Keep all genes regardless of whether
they have duplicate mappings in either species.
"keep_popular" or "kp" or 5 :
Return only the most "popular" interspecies ortholog mappings.
This procedure tends to yield a greater number of returned genes
but at the cost of many of them not being true biological 1:1 orthologs.
"sum","mean","median","min" or "max" :
When gene_df is a matrix and gene_output="rownames",
these options will aggregate many-to-one gene mappings
(input_species-to-output_species)
after dropping any duplicate genes in the output_species.
|
mthreshold |
Maximum number of ortholog names per gene to show.
Passed to gorth.
Only used when method="gprofiler" (DEFAULT : Inf).
|
as_sparse |
Convert gene_df to a sparse matrix.
Only works if gene_df is one of the following classes:
matrix
Matrix
data.frame
data.table
tibble
If gene_df is a sparse matrix to begin with,
it will be returned as a sparse matrix
(so long as gene_output= "rownames" or "colnames").
|
sort_rows |
Sort gene_df rows alphanumerically.
|
verbose |
Print messages.
|
... |
Additional arguments to be passed to
gorth or homologene.
NOTE: To return only the most "popular"
interspecies ortholog mappings,
supply mthreshold=1 here AND set method="gprofiler" above.
This procedure tends to yield a greater number of returned genes but at
the cost of many of them not being true biological 1:1 orthologs.
For more details, please see
here.
|