Title: | Non-Negative Matrix Factorization for Single-Cell Omics |
---|---|
Description: | A collection of methods to extract gene programs from single-cell gene expression data using non-negative matrix factorization (NMF). 'GeneNMF' contains functions to directly interact with the 'Seurat' toolkit and derive interpretable gene program signatures. |
Authors: | Massimo Andreatta [aut, cre] , Santiago Carmona [aut] |
Maintainer: | Massimo Andreatta <[email protected]> |
License: | GPL-3 |
Version: | 0.6.2 |
Built: | 2024-11-11 17:16:50 UTC |
Source: | https://github.com/carmonalab/genenmf |
Remove meta-programs from the GeneNMF results. This can be desirable if one or more MPs are redundant or have poor metrics (e.g. low sample coverage or low silhouette score).
dropMetaPrograms(mp.res, dropMP = NULL)
dropMetaPrograms(mp.res, dropMP = NULL)
mp.res |
The meta-programs object generated by |
dropMP |
A vector with the names of the MPs to be dropped |
The input meta-program object, minus the dropped MPs
library(Seurat) data(sampleObj) geneNMF_programs <- multiNMF(list(sampleObj), k=5) geneNMF_metaprograms <- getMetaPrograms(geneNMF_programs, nMP=3) geneNMF_metaprograms_filtered <- dropMetaPrograms(geneNMF_metaprograms, dropMP = c("MP2"))
library(Seurat) data(sampleObj) geneNMF_programs <- multiNMF(list(sampleObj), k=5) geneNMF_metaprograms <- getMetaPrograms(geneNMF_programs, nMP=3) geneNMF_metaprograms_filtered <- dropMetaPrograms(geneNMF_metaprograms, dropMP = c("MP2"))
Select highly variable genes (HVG) from an expression matrix. Genes from a blocklist (e.g. cell cycling genes, mitochondrial genes) can be excluded from the list of variable genes, as well as genes with very low or very high average expression
findVariableFeatures_wfilters( obj, nfeatures = 2000, genesBlockList = NULL, min.exp = 0.01, max.exp = 3 )
findVariableFeatures_wfilters( obj, nfeatures = 2000, genesBlockList = NULL, min.exp = 0.01, max.exp = 3 )
obj |
A Seurat object containing an expression matrix |
nfeatures |
Number of top HVG to be returned |
genesBlockList |
Optionally takes a vector or list of vectors of gene names. These genes will be ignored for HVG detection. This is useful to mitigate effect of genes associated with technical artifacts or batch effects (e.g. mitochondrial, heat-shock response). If set to 'NULL' no genes will be excluded |
min.exp |
Minimum average normalized expression for HVG. If lower, the gene will be excluded |
max.exp |
Maximum average normalized expression for HVG. If higher, the gene will be excluded |
Returns the input Seurat object obj
with the calculated highly
variable features accessible through VariableFeatures(obj)
data(sampleObj) sampleObj <- findVariableFeatures_wfilters(sampleObj, nfeatures=100)
data(sampleObj) sampleObj <- findVariableFeatures_wfilters(sampleObj, nfeatures=100)
Get the gene expression matrix from a Seurat object, optionally centered and/or subset on highly variable genes
getDataMatrix( obj, assay = "RNA", slot = "data", hvg = NULL, center = FALSE, scale = FALSE, non_negative = TRUE )
getDataMatrix( obj, assay = "RNA", slot = "data", hvg = NULL, center = FALSE, scale = FALSE, non_negative = TRUE )
obj |
Seurat object |
assay |
Get data matrix from this assay |
slot |
Get data matrix from this slot (=layer) |
hvg |
List of variable genes to subset the matrix. If NULL, uses all genes |
center |
Whether to center the data matrix |
scale |
Whether to scale the data matrix |
non_negative |
Enforce non-negative values for NMF |
Returns a sparse data matrix (cells per genes), subset according to the given parameters
data(sampleObj) matrix <- getDataMatrix(sampleObj)
data(sampleObj) matrix <- getDataMatrix(sampleObj)
Run it over a list of NMF models obtained using multiNMF
; it will
determine gene programs that are consistently observed across samples
and values of k.
getMetaPrograms( nmf.res, nMP = 10, specificity.weight = 5, weight.explained = 0.5, max.genes = 200, metric = c("cosine", "jaccard"), hclust.method = "ward.D2", min.confidence = 0.5, remove.empty = TRUE )
getMetaPrograms( nmf.res, nMP = 10, specificity.weight = 5, weight.explained = 0.5, max.genes = 200, metric = c("cosine", "jaccard"), hclust.method = "ward.D2", min.confidence = 0.5, remove.empty = TRUE )
nmf.res |
A list of NMF models obtained from |
nMP |
Total number of meta-programs |
specificity.weight |
A parameter controlling how specific gene should be for each program. 'specificity.weight=0' no constraint on specificity, and positive values impose increasing specificity. |
weight.explained |
Fraction of NMF weights explained by selected genes. For example if weight.explained=0.5, all genes that together account for 50% of NMF weights are used to return program signatures. |
max.genes |
Max number of genes for each programs |
metric |
Metric to calculate pairwise similarity between programs |
hclust.method |
Method to build similarity tree between individual programs |
min.confidence |
Percentage of programs in which a gene is seen (out of programs in the corresponding program tree branch/cluster), to be retained in the consensus metaprograms |
remove.empty |
Whether to remove meta-programs with no genes above confidence threshold |
Returns a list with i) 'metaprograms.genes' top genes for each meta-program; ii) 'metaprograms.metrics' dataframe with meta-programs statistics: a) freq. of samples where the MP is present, b) average silhouette width, c) mean similarity (cosine or Jaccard), d) number of genes in MP, e) number of gene programs in MP; iii) 'programs.similarity': matrix of similarities (Jaccard or cosine) between meta-programs; iv) 'programs.tree': hierarchical clustering of meta-programs (hclust tree); v) 'programs.clusters': meta-program identity for each program
library(Seurat) data(sampleObj) geneNMF_programs <- multiNMF(list(sampleObj), k=5) geneNMF_metaprograms <- getMetaPrograms(geneNMF_programs, nMP=3)
library(Seurat) data(sampleObj) geneNMF_programs <- multiNMF(list(sampleObj), k=5) geneNMF_metaprograms <- getMetaPrograms(geneNMF_programs, nMP=3)
Run it over a list of NMF models obtained using multiNMF()
getNMFgenes( nmf.res, specificity.weight = 5, weight.explained = 0.5, max.genes = 200 )
getNMFgenes( nmf.res, specificity.weight = 5, weight.explained = 0.5, max.genes = 200 )
nmf.res |
A list of NMF models obtained using |
specificity.weight |
A parameter controlling how specific gene should be for each program. 'specificity.weight=0' no constraint on specificity, and positive values impose increasing specificity. |
weight.explained |
Fraction of NMF weights explained by selected genes. For example if weight.explained=0.5, all genes that together account for 50% of NMF weights are used to return program signatures. |
max.genes |
Max number of genes for each program |
Returns a list of top genes for each gene program found
by multiNMF()
library(Seurat) data(sampleObj) geneNMF_programs <- multiNMF(list(sampleObj), k=5) geneNMF_genes <- getNMFgenes(geneNMF_programs)
library(Seurat) data(sampleObj) geneNMF_programs <- multiNMF(list(sampleObj), k=5) geneNMF_genes <- getNMFgenes(geneNMF_programs)
Given a list of Seurat objects, run non-negative matrix factorization on each sample individually, over a range of target NMF components (k).
multiNMF( obj.list, assay = "RNA", slot = "data", k = 5:6, hvg = NULL, nfeatures = 2000, L1 = c(0, 0), min.exp = 0.01, max.exp = 3, center = FALSE, scale = FALSE, min.cells.per.sample = 10, hvg.blocklist = NULL, seed = 123 )
multiNMF( obj.list, assay = "RNA", slot = "data", k = 5:6, hvg = NULL, nfeatures = 2000, L1 = c(0, 0), min.exp = 0.01, max.exp = 3, center = FALSE, scale = FALSE, min.cells.per.sample = 10, hvg.blocklist = NULL, seed = 123 )
obj.list |
A list of Seurat objects |
assay |
Get data matrix from this assay |
slot |
Get data matrix from this slot (=layer) |
k |
Number of target components for NMF (can be a vector) |
hvg |
List of pre-calculated variable genes to subset the matrix. If hvg=NULL it calculates them automatically |
nfeatures |
Number of HVG, if calculate_hvg=TRUE |
L1 |
L1 regularization term for NMF |
min.exp |
Minimum average log-expression value for retaining genes |
max.exp |
Maximum average log-expression value for retaining genes |
center |
Whether to center the data matrix |
scale |
Whether to scale the data matrix |
min.cells.per.sample |
Minimum numer of cells per sample (smaller samples will be ignored) |
hvg.blocklist |
Optionally takes a vector or list of vectors of gene names. These genes will be ignored for HVG detection. This is useful to mitigateeffect of genes associated with technical artifacts and batch effects (e.g. mitochondrial), and to exclude TCR and BCR adaptive immune(clone-specific) receptors. If set to 'NULL' no genes will be excluded |
seed |
Random seed |
Returns a list of NMF programs, one for each sample and for each
value of 'k'. The format of each program in the list follosw the
structure of nmf
factorization models.
library(Seurat) data(sampleObj) geneNMF_programs <- multiNMF(list(sampleObj), k=5)
library(Seurat) data(sampleObj) geneNMF_programs <- multiNMF(list(sampleObj), k=5)
Given a list of Seurat objects, run non-negative PCA factorization on each sample individually.
multiPCA( obj.list, assay = "RNA", slot = "data", k = 4:5, hvg = NULL, nfeatures = 500, min.exp = 0.01, max.exp = 3, min.cells.per.sample = 10, center = FALSE, scale = FALSE, hvg.blocklist = NULL, seed = 123 )
multiPCA( obj.list, assay = "RNA", slot = "data", k = 4:5, hvg = NULL, nfeatures = 500, min.exp = 0.01, max.exp = 3, min.cells.per.sample = 10, center = FALSE, scale = FALSE, hvg.blocklist = NULL, seed = 123 )
obj.list |
A list of Seurat objects |
assay |
Get data matrix from this assay |
slot |
Get data matrix from this slot (=layer) |
k |
Number of target components for PCA |
hvg |
List of pre-calculated variable genes to subset the matrix. If hvg=NULL it calculates them automatically |
nfeatures |
Number of HVG, if calculate_hvg=TRUE |
min.exp |
Minimum average log-expression value for retaining genes |
max.exp |
Maximum average log-expression value for retaining genes |
min.cells.per.sample |
Minimum numer of cells per sample (smaller samples will be ignored) |
center |
Whether to center the data matrix |
scale |
Whether to scale the data matrix |
hvg.blocklist |
Optionally takes a vector or list of vectors of gene names. These genes will be ignored for HVG detection. This is useful to mitigateeffect of genes associated with technical artifacts and batch effects (e.g. mitochondrial), and to exclude TCR and BCR adaptive immune(clone-specific) receptors. If set to 'NULL' no genes will be excluded |
seed |
Random seed |
Returns a list of non-negative PCA programs, one for each sample.
The format of each program in the list follows the
structure of nmf
factorization models.
library(Seurat) data(sampleObj) geneNMF_programs <- multiPCA(list(sampleObj), k=5)
library(Seurat) data(sampleObj) geneNMF_programs <- multiPCA(list(sampleObj), k=5)
Generates a clustered heatmap for meta-program similarities (by Jaccard
index or Cosine similarity). This function is intended to be run on the object
generated by getMetaPrograms
, which contains a pre-calculated
tree of pairwise similarities between clusters (as a 'hclust' object).
plotMetaPrograms( mp.res, similarity.cutoff = c(0, 1), scale = "none", downsample = 1000, showtree = TRUE, palette = viridis(100, option = "A", direction = -1), annotation_colors = NULL, main = "Clustered Heatmap", show_rownames = FALSE, show_colnames = FALSE, ... )
plotMetaPrograms( mp.res, similarity.cutoff = c(0, 1), scale = "none", downsample = 1000, showtree = TRUE, palette = viridis(100, option = "A", direction = -1), annotation_colors = NULL, main = "Clustered Heatmap", show_rownames = FALSE, show_colnames = FALSE, ... )
mp.res |
The meta-programs object generated by |
similarity.cutoff |
Min and max values for similarity metric |
scale |
Heatmap rescaling (passed to pheatmap as 'scale') |
downsample |
Limit max number of samples in heatmap, to avoid overloading the graphics |
showtree |
Whether to plot the hierarchical clustering tree |
palette |
Heatmap color palette (passed to pheatmap as 'color') |
annotation_colors |
Color palette for MP annotations |
main |
Heatmap title |
show_rownames |
Whether to display individual program names as rows |
show_colnames |
Whether to display individual program names as cols |
... |
Additional parameters for pheatmap |
Returns a clustered heatmap of MP similarities, in ggplot2 format
library(Seurat) data(sampleObj) geneNMF_programs <- multiNMF(list(sampleObj), k=5) geneNMF_metaprograms <- getMetaPrograms(geneNMF_programs, nMP=3) plotMetaPrograms(geneNMF_metaprograms)
library(Seurat) data(sampleObj) geneNMF_programs <- multiNMF(list(sampleObj), k=5) geneNMF_metaprograms <- getMetaPrograms(geneNMF_programs, nMP=3) plotMetaPrograms(geneNMF_metaprograms)
Utility function to run Gene Set Enrichment Analysis (GSEA) against gene
sets from MSigDB. Note: this is an optional function, which is conditional
to the installation of suggested packages fgsea
and msigdbr
.
runGSEA( genes, universe = NULL, category = "H", subcategory = NULL, species = "Homo sapiens", pval.thr = 0.05 )
runGSEA( genes, universe = NULL, category = "H", subcategory = NULL, species = "Homo sapiens", pval.thr = 0.05 )
genes |
A vector of genes |
universe |
Background universe of gene symbols (passed on to |
category |
GSEA main category (e.g. "H" or "C5") |
subcategory |
GSEA subcategory |
species |
Species for GSEA analysis. For a list of the available species,
type |
pval.thr |
Min p-value to include results |
Returns a table of enriched gene programs from GSEA
data(sampleObj) geneset <- c("BANK1","CD22","CD79A","CD19","IGHD","IGHG3","IGHM") #test is conditional on availability of suggested packages if (requireNamespace("fgsea", quietly=TRUE) & requireNamespace("msigdbr", quietly=TRUE)) { gsea_res <- runGSEA(geneset, universe=rownames(sampleObj), category = "C8") }
data(sampleObj) geneset <- c("BANK1","CD22","CD79A","CD19","IGHD","IGHG3","IGHM") #test is conditional on availability of suggested packages if (requireNamespace("fgsea", quietly=TRUE) & requireNamespace("msigdbr", quietly=TRUE)) { gsea_res <- runGSEA(geneset, universe=rownames(sampleObj), category = "C8") }
Compute NMF embeddings for single-cell dataset, and store them in the Seurat data structure. They can be used as an alternative to PCA for downstream analyses.
runNMF( obj, assay = "RNA", slot = "data", k = 10, new.reduction = "NMF", seed = 123, L1 = c(0, 0), hvg = NULL, center = FALSE, scale = FALSE )
runNMF( obj, assay = "RNA", slot = "data", k = 10, new.reduction = "NMF", seed = 123, L1 = c(0, 0), hvg = NULL, center = FALSE, scale = FALSE )
obj |
A seurat object |
assay |
Get data matrix from this assay |
slot |
Get data matrix from this slot (=layer) |
k |
Number of components for low-dim representation |
new.reduction |
Name of new dimensionality reduction |
seed |
Random seed |
L1 |
L1 regularization term for NMF |
hvg |
Which genes to use for the reduction |
center |
Whether to center the data matrix |
scale |
Whether to scale the data matrix |
Returns a Seurat object with a new dimensionality reduction (NMF)
data(sampleObj) sampleObj <- runNMF(sampleObj, k=8)
data(sampleObj) sampleObj <- runNMF(sampleObj, k=8)
A Seurat object containing single-cell transcriptomes
(scRNA-seq) for 50 cells and 20729 genes.
Single-cell UMI counts were normalized using a standard log-normalization:
counts for each cell were divided by the total counts for that cell and
multiplied by 10,000, then natural-log transformed using 'log1p'.
This a subsample of 25 predicted B cells and 25 predicted NK cells from
the large scRNA-seq PBMC dataset published
by Hao et al. (doi:10.1016/j.cell.2021.04.048) and
available as UMI counts at
https://atlas.fredhutch.org/data/nygc/multimodal/pbmc_multimodal.h5seurat
sampleObj
sampleObj
A sparse matrix of 50 cells and 20729 genes.
doi:10.1016/j.cell.2021.04.048