The ReactomeGSA package is a client to the web-based Reactome Analysis System. Essentially, it performs a gene set analysis using the latest version of the Reactome pathway database as a backend.
This vignette shows how the ReactomeGSA package can be used to perform a pathway analysis of cell clusters in single-cell RNA-sequencing data.
To cite this package, use
Griss J. ReactomeGSA, https://github.com/reactome/ReactomeGSA (2019)
The ReactomeGSA package can be directly installed from Bioconductor:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
if (!require(ReactomeGSA))
BiocManager::install("ReactomeGSA")
#> Loading required package: ReactomeGSA
# install the ReactomeGSA.data package for the example data
if (!require(ReactomeGSA))
BiocManager::install("ReactomeGSA.data")For more information, see https://bioconductor.org/install/.
As an example we load single-cell RNA-sequencing data of B cells extracted from the dataset published by Jerby-Arnon et al. (Cell, 2018).
Note: This is not a complete Seurat object. To decrease the size, the object only contains gene expression values and cluster annotations.
The pathway analysis is at the very end of a scRNA-seq workflow. This means, that any Q/C was already performed, the data was normalized and cells were already clustered.
The ReactomeGSA package can now be used to get pathway-level expression values for every cell cluster. This is achieved by calculating the mean gene expression for every cluster and then submitting this data to a gene set variation analysis.
All of this is wrapped in the single analyse_sc_clusters function.
library(ReactomeGSA)
gsva_result <- analyse_sc_clusters(jerby_b_cells, verbose = TRUE)
#> Calculating average cluster expression...
#> Converting expression data to string... (This may take a moment)
#> Conversion complete
#> Submitting request to Reactome API...
#> Compressing request data...
#> Reactome Analysis submitted succesfully
#> Converting dataset Seurat...
#> Mapping identifiers...
#> Performing gene set analysis using ssGSEA
#> Analysing dataset 'Seurat' using ssGSEA
#> Retrieving result...The resulting object is a standard ReactomeAnalysisResult object.
gsva_result
#> ReactomeAnalysisResult object
#> Reactome Release: 72
#> Results:
#> - Seurat:
#> 1720 pathways
#> 12305 fold changes for genes
#> No Reactome visualizations available
#> ReactomeAnalysisResultpathways returns the pathway-level expression values per cell cluster:
pathway_expression <- pathways(gsva_result)
# simplify the column names by removing the default dataset identifier
colnames(pathway_expression) <- gsub("\\.Seurat", "", colnames(pathway_expression))
pathway_expression[1:3,]
#> Name Cluster.1 Cluster.10 Cluster.11
#> R-HSA-1059683 Interleukin-6 signaling 0.09545353 0.07958761 0.1330523
#> R-HSA-109606 Intrinsic Pathway for Apoptosis 0.10809265 0.10399420 0.1169451
#> R-HSA-109703 PKB-mediated events 0.17906778 0.11160170 0.1198267
#> Cluster.12 Cluster.13 Cluster.2 Cluster.3 Cluster.4 Cluster.5
#> R-HSA-1059683 0.09148179 0.09802880 0.1029586 0.09398238 0.1051934 0.09402404
#> R-HSA-109606 0.11811004 0.13732430 0.1051371 0.10962570 0.1131350 0.10520629
#> R-HSA-109703 0.14712571 0.09568951 0.1217289 0.12551990 0.1112811 0.10844562
#> Cluster.6 Cluster.7 Cluster.8 Cluster.9
#> R-HSA-1059683 0.08332742 0.1021174 0.1265275 0.09876770
#> R-HSA-109606 0.10518480 0.1160139 0.1196084 0.11431412
#> R-HSA-109703 0.17851610 0.1699756 0.1689482 0.05316363A simple approach to find the most relevant pathways is to assess the maximum difference in expression for every pathway:
# find the maximum differently expressed pathway
max_difference <- do.call(rbind, apply(pathway_expression, 1, function(row) {
values <- as.numeric(row[2:length(row)])
return(data.frame(name = row[1], min = min(values), max = max(values)))
}))
max_difference$diff <- max_difference$max - max_difference$min
# sort based on the difference
max_difference <- max_difference[order(max_difference$diff, decreasing = T), ]
head(max_difference)
#> name min max
#> R-HSA-389542 NADPH regeneration -0.4229458 0.4292007
#> R-HSA-8964540 Alanine metabolism -0.5051647 0.2773550
#> R-HSA-140180 COX reactions -0.4743840 0.2573558
#> R-HSA-5263617 Metabolism of ingested MeSeO2H into MeSeH -0.1684564 0.4948353
#> R-HSA-9636003 NEIL3-mediated resolution of ICLs -0.4970992 0.1125057
#> R-HSA-3248023 Regulation by TREX1 -0.0969508 0.4516689
#> diff
#> R-HSA-389542 0.8521465
#> R-HSA-8964540 0.7825197
#> R-HSA-140180 0.7317398
#> R-HSA-5263617 0.6632917
#> R-HSA-9636003 0.6096049
#> R-HSA-3248023 0.5486197The ReactomeGSA package contains two functions to visualize these pathway results. The first simply plots the expression for a selected pathway:
For a better overview, the expression of multiple pathways can be shown as a heatmap using gplots heatmap.2 function:
# Additional parameters are directly passed to gplots heatmap.2 function
plot_gsva_heatmap(gsva_result, max_pathways = 15, margins = c(6,20))The plot_gsva_heatmap function can also be used to only display specific pahtways:
# limit to selected B cell related pathways
relevant_pathways <- c("R-HSA-983170", "R-HSA-388841", "R-HSA-2132295", "R-HSA-983705", "R-HSA-5690714")
plot_gsva_heatmap(gsva_result,
pathway_ids = relevant_pathways, # limit to these pathways
margins = c(6,30), # adapt the figure margins in heatmap.2
dendrogram = "col", # only plot column dendrogram
scale = "row", # scale for each pathway
key = FALSE, # don't display the color key
lwid=c(0.1,4)) # remove the white space on the leftThis analysis shows us that cluster 8 has a marked up-regulation of B Cell receptor signalling, which is linked to a co-stimulation of the CD28 family. Additionally, there is a gradient among the cluster with respect to genes releated to antigen presentation.
Therefore, we are able to further classify the observed B cell subtypes based on their pathway activity.
The pathway-level expression analysis can also be used to run a Principal Component Analysis on the samples. This is simplified through the function plot_gsva_pca:
In this analysis, cluster 11 is a clear outlier from the other B cell subtypes and therefore might be prioritised for further evaluation.
sessionInfo()
#> R version 4.0.0 alpha (2020-03-31 r78116)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows Server 2012 R2 x64 (build 9600)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=C
#> [2] LC_CTYPE=English_United States.1252
#> [3] LC_MONETARY=English_United States.1252
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=English_United States.1252
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] ReactomeGSA.data_1.1.1 Seurat_3.1.4 edgeR_3.29.1
#> [4] limma_3.43.6 ReactomeGSA_1.1.3
#>
#> loaded via a namespace (and not attached):
#> [1] TH.data_1.0-10 Rtsne_0.15 colorspace_1.4-1
#> [4] ellipsis_0.3.0 ggridges_0.5.2 farver_2.0.3
#> [7] leiden_0.3.3 listenv_0.8.0 npsurv_0.4-0
#> [10] ggrepel_0.8.2 fansi_0.4.1 mvtnorm_1.1-0
#> [13] codetools_0.2-16 splines_4.0.0 mnormt_1.5-6
#> [16] lsei_1.2-0 knitr_1.28 TFisher_0.2.0
#> [19] jsonlite_1.6.1 ica_1.0-2 cluster_2.1.0
#> [22] png_0.1-7 uwot_0.1.8 sctransform_0.2.1
#> [25] BiocManager_1.30.10 compiler_4.0.0 httr_1.4.1
#> [28] assertthat_0.2.1 Matrix_1.2-18 lazyeval_0.2.2
#> [31] cli_2.0.2 prettyunits_1.1.1 htmltools_0.4.0
#> [34] tools_4.0.0 rsvd_1.0.3 igraph_1.2.5
#> [37] gtable_0.3.0 glue_1.4.0 reshape2_1.4.4
#> [40] RANN_2.6.1 dplyr_0.8.5 rappdirs_0.3.1
#> [43] Rcpp_1.0.4.6 Biobase_2.47.3 vctrs_0.2.4
#> [46] multtest_2.43.1 gdata_2.18.0 ape_5.3
#> [49] nlme_3.1-147 gbRd_0.4-11 lmtest_0.9-37
#> [52] xfun_0.13 stringr_1.4.0 globals_0.12.5
#> [55] lifecycle_0.2.0 irlba_2.3.3 gtools_3.8.2
#> [58] future_1.16.0 MASS_7.3-51.5 zoo_1.8-7
#> [61] scales_1.1.0 hms_0.5.3 parallel_4.0.0
#> [64] sandwich_2.5-1 RColorBrewer_1.1-2 curl_4.3
#> [67] yaml_2.2.1 gridExtra_2.3 reticulate_1.15
#> [70] pbapply_1.4-2 ggplot2_3.3.0 stringi_1.4.6
#> [73] mutoss_0.1-12 plotrix_3.7-7 caTools_1.18.0
#> [76] BiocGenerics_0.33.3 bibtex_0.4.2.2 Rdpack_0.11-1
#> [79] rlang_0.4.5 pkgconfig_2.0.3 bitops_1.0-6
#> [82] evaluate_0.14 lattice_0.20-41 ROCR_1.0-7
#> [85] purrr_0.3.3 labeling_0.3 patchwork_1.0.0
#> [88] htmlwidgets_1.5.1 cowplot_1.0.0 tidyselect_1.0.0
#> [91] RcppAnnoy_0.0.16 plyr_1.8.6 magrittr_1.5
#> [94] R6_2.4.1 gplots_3.0.3 multcomp_1.4-13
#> [97] pillar_1.4.3 sn_1.6-1 fitdistrplus_1.0-14
#> [100] survival_3.1-12 tsne_0.1-3 tibble_3.0.0
#> [103] future.apply_1.4.0 crayon_1.3.4 KernSmooth_2.23-16
#> [106] plotly_4.9.2.1 rmarkdown_2.1 progress_1.2.2
#> [109] locfit_1.5-9.4 grid_4.0.0 data.table_1.12.8
#> [112] metap_1.3 digest_0.6.25 tidyr_1.0.2
#> [115] numDeriv_2016.8-1.1 stats4_4.0.0 munsell_0.5.0
#> [118] viridisLite_0.3.0