| fit_model {ttgsea} | R Documentation |
From the result of GSEA, we can predict enrichment scores for unique tokens or words from text in names of gene sets by using deep learning. The function "text_token" is used for tokenizing text and the function "token_vector" is used for encoding. Then the encoded sequence is fed to the embedding layer of the model.
fit_model(gseaRes, text, score, model, ngram_min = 1, ngram_max = 2,
num_tokens, length_seq, epochs, batch_size,
use_generator = TRUE, ...)
gseaRes |
a table with GSEA result having rows for gene sets and columns for text and scores |
text |
column name for text data |
score |
column name for enrichment score |
model |
deep learning model, input dimension and length for the embedding layer must be same to the "num_token" and "length_seq", respectively |
ngram_min |
minimum size of an n-gram (default: 1) |
ngram_max |
maximum size of an n-gram (default: 2) |
num_tokens |
maximum number of tokens, it must be equal to the input dimension of "layer_embedding" in the "model" |
length_seq |
length of input sequences, it must be equal to the input length of "layer_embedding" in the "model" |
epochs |
number of epochs |
batch_size |
batch size |
use_generator |
if "use_generator" is TRUE, the function "sampling_generator" is used for "fit_generator". Otherwise, the "fit" is used without a generator. |
... |
additional parameters for the "fit" or "fit_generator" |
model |
trained model |
tokens |
information for tokens |
token_pred |
prediction for every token, each row has a token and its predicted score |
token_gsea |
list of the GSEA result only for the corresponding token |
num_tokens |
maximum number of tokens |
length_seq |
length of input sequences |
Dongmin Jung
keras::fit_generator, keras::layer_embedding, keras::pad_sequences, textstem::lemmatize_strings, text2vec::create_vocabulary, text2vec::prune_vocabulary
library(reticulate)
if (keras::is_keras_available() & reticulate::py_available()) {
library(fgsea)
data(examplePathways)
data(exampleRanks)
names(examplePathways) <- gsub("_", " ",
substr(names(examplePathways), 9, 1000))
set.seed(1)
fgseaRes <- fgsea(examplePathways, exampleRanks)
num_tokens <- 1000
length_seq <- 30
batch_size <- 32
embedding_dims <- 50
num_units <- 32
epochs <- 1
ttgseaRes <- fit_model(fgseaRes, "pathway", "NES",
model = bi_gru(num_tokens,
embedding_dims,
length_seq,
num_units),
num_tokens = num_tokens,
length_seq = length_seq,
epochs = epochs,
batch_size = batch_size,
use_generator = FALSE)
}