Package 'GeneNMF' reference manual

Package 'GeneNMF'

Title:	Non-Negative Matrix Factorization for Single-Cell Omics
Description:	A collection of methods to extract gene programs from single-cell gene expression data using non-negative matrix factorization (NMF). 'GeneNMF' contains functions to directly interact with the 'Seurat' toolkit and derive interpretable gene program signatures.
Authors:	Massimo Andreatta [aut, cre] , Santiago Carmona [aut]
Maintainer:	Massimo Andreatta <[email protected]>
License:	GPL-3
Version:	0.7.0
Built:	2025-03-18 12:48:18 UTC
Source:	https://github.com/carmonalab/genenmf

Title:

Non-Negative Matrix Factorization for Single-Cell Omics

Description:

A collection of methods to extract gene programs from single-cell gene expression data using non-negative matrix factorization (NMF). 'GeneNMF' contains functions to directly interact with the 'Seurat' toolkit and derive interpretable gene program signatures.

Authors:

Massimo Andreatta [aut, cre]

, Santiago Carmona [aut]

Maintainer:

Massimo Andreatta <[email protected]>

License:

GPL-3

Version:

0.7.0

Built:

2025-03-18 12:48:18 UTC

Source:

https://github.com/carmonalab/genenmf

Help Index

Drop meta-programs

Description

Remove meta-programs from the GeneNMF results. This can be desirable if one or more MPs are redundant or have poor metrics (e.g. low sample coverage or low silhouette score).

Usage

dropMetaPrograms(mp.res, dropMP = NULL)
dropMetaPrograms(mp.res, dropMP = NULL)

Arguments

`mp.res`	The meta-programs object generated by `getMetaPrograms`
`dropMP`	A vector with the names of the MPs to be dropped

Value

The input meta-program object, minus the dropped MPs

Examples

library(Seurat)
data(sampleObj)
geneNMF_programs <- multiNMF(list(sampleObj), k=5)
geneNMF_metaprograms <- getMetaPrograms(geneNMF_programs, nMP=3)
geneNMF_metaprograms_filtered <- dropMetaPrograms(geneNMF_metaprograms, dropMP = c("MP2")) 

library(Seurat)
data(sampleObj)
geneNMF_programs <- multiNMF(list(sampleObj), k=5)
geneNMF_metaprograms <- getMetaPrograms(geneNMF_programs, nMP=3)
geneNMF_metaprograms_filtered <- dropMetaPrograms(geneNMF_metaprograms, dropMP = c("MP2"))

Find variable features

Description

Select highly variable genes (HVG) from an expression matrix. Genes from a blocklist (e.g. cell cycling genes, mitochondrial genes) can be excluded from the list of variable genes, as well as genes with very low or very high average expression

Usage

findVariableFeatures_wfilters(
  obj,
  nfeatures = 2000,
  genesBlockList = NULL,
  min.exp = 0.01,
  max.exp = 3
)
findVariableFeatures_wfilters(
  obj,
  nfeatures = 2000,
  genesBlockList = NULL,
  min.exp = 0.01,
  max.exp = 3
)

Arguments

`obj`	A Seurat object containing an expression matrix
`nfeatures`	Number of top HVG to be returned
`genesBlockList`	Optionally takes a vector or list of vectors of gene names. These genes will be ignored for HVG detection. This is useful to mitigate effect of genes associated with technical artifacts or batch effects (e.g. mitochondrial, heat-shock response). If set to 'NULL' no genes will be excluded
`min.exp`	Minimum average normalized expression for HVG. If lower, the gene will be excluded
`max.exp`	Maximum average normalized expression for HVG. If higher, the gene will be excluded

Value

Returns the input Seurat object obj with the calculated highly variable features accessible through VariableFeatures(obj)

Examples

data(sampleObj)
sampleObj <- findVariableFeatures_wfilters(sampleObj, nfeatures=100)

data(sampleObj)
sampleObj <- findVariableFeatures_wfilters(sampleObj, nfeatures=100)

Extract data matrix from Seurat object

Description

Get the gene expression matrix from a Seurat object, optionally centered and/or subset on highly variable genes

Usage

getDataMatrix(
  obj,
  assay = "RNA",
  slot = "data",
  hvg = NULL,
  center = FALSE,
  scale = FALSE,
  non_negative = TRUE
)
getDataMatrix(
  obj,
  assay = "RNA",
  slot = "data",
  hvg = NULL,
  center = FALSE,
  scale = FALSE,
  non_negative = TRUE
)

Arguments

`obj`	Seurat object
`assay`	Get data matrix from this assay
`slot`	Get data matrix from this slot (=layer)
`hvg`	List of variable genes to subset the matrix. If NULL, uses all genes
`center`	Whether to center the data matrix
`scale`	Whether to scale the data matrix
`non_negative`	Enforce non-negative values for NMF

Value

Returns a sparse data matrix (cells per genes), subset according to the given parameters

Examples

data(sampleObj)
matrix <- getDataMatrix(sampleObj)

data(sampleObj)
matrix <- getDataMatrix(sampleObj)

Extract consensus gene programs (meta-programs)

Description

Run it over a list of NMF models obtained using multiNMF; it will determine gene programs that are consistently observed across samples and values of k.

Usage

getMetaPrograms(
  nmf.res,
  nMP = 10,
  specificity.weight = 5,
  weight.explained = 0.5,
  max.genes = 200,
  metric = c("cosine", "jaccard"),
  hclust.method = "ward.D2",
  min.confidence = 0.5,
  remove.empty = TRUE
)
getMetaPrograms(
  nmf.res,
  nMP = 10,
  specificity.weight = 5,
  weight.explained = 0.5,
  max.genes = 200,
  metric = c("cosine", "jaccard"),
  hclust.method = "ward.D2",
  min.confidence = 0.5,
  remove.empty = TRUE
)

Arguments

`nmf.res`	A list of NMF models obtained from `multiNMF`
`nMP`	Total number of meta-programs
`specificity.weight`	A parameter controlling how specific gene should be for each program. 'specificity.weight=0' no constraint on specificity, and positive values impose increasing specificity.
`weight.explained`	Fraction of NMF weights explained by selected genes. For example if weight.explained=0.5, all genes that together account for 50% of NMF weights are used to return program signatures.
`max.genes`	Max number of genes for each programs
`metric`	Metric to calculate pairwise similarity between programs
`hclust.method`	Method to build similarity tree between individual programs
`min.confidence`	Percentage of programs in which a gene is seen (out of programs in the corresponding program tree branch/cluster), to be retained in the consensus metaprograms
`remove.empty`	Whether to remove meta-programs with no genes above confidence threshold

Value

Returns a list with i) 'metaprograms.genes' top genes for each meta-program; ii) 'metaprograms.metrics' dataframe with meta-programs statistics: a) freq. of samples where the MP is present, b) average silhouette width, c) mean similarity (cosine or Jaccard), d) number of genes in MP, e) number of gene programs in MP; iii) 'metaprogram.composition' dataframe containing the number of individual for each sample that contributed to the consensus MPs; iv) 'programs.similarity': matrix of similarities (Jaccard or cosine) between meta-programs; v) 'programs.tree': hierarchical clustering of meta-programs (hclust tree); vi) 'programs.clusters': meta-program identity for each program

Examples

library(Seurat)
data(sampleObj)
geneNMF_programs <- multiNMF(list(sampleObj), k=5)
geneNMF_metaprograms <- getMetaPrograms(geneNMF_programs, nMP=3)

library(Seurat)
data(sampleObj)
geneNMF_programs <- multiNMF(list(sampleObj), k=5)
geneNMF_metaprograms <- getMetaPrograms(geneNMF_programs, nMP=3)

Get list of genes for each NMF program

Description

Run it over a list of NMF models obtained using multiNMF()

Usage

getNMFgenes(
  nmf.res,
  specificity.weight = 5,
  weight.explained = 0.5,
  max.genes = 200
)
getNMFgenes(
  nmf.res,
  specificity.weight = 5,
  weight.explained = 0.5,
  max.genes = 200
)

Arguments

`nmf.res`	A list of NMF models obtained using `multiNMF()`
`specificity.weight`	A parameter controlling how specific gene should be for each program. 'specificity.weight=0' no constraint on specificity, and positive values impose increasing specificity.
`weight.explained`	Fraction of NMF weights explained by selected genes. For example if weight.explained=0.5, all genes that together account for 50% of NMF weights are used to return program signatures.
`max.genes`	Max number of genes for each program

Value

Returns a list of top genes for each gene program found by multiNMF()

Examples

library(Seurat)
data(sampleObj)
geneNMF_programs <- multiNMF(list(sampleObj), k=5)
geneNMF_genes <- getNMFgenes(geneNMF_programs)

library(Seurat)
data(sampleObj)
geneNMF_programs <- multiNMF(list(sampleObj), k=5)
geneNMF_genes <- getNMFgenes(geneNMF_programs)

Run NMF on a list of Seurat objects

Description

Given a list of Seurat objects, run non-negative matrix factorization on each sample individually, over a range of target NMF components (k).

Usage

multiNMF(
  obj.list,
  assay = "RNA",
  slot = "data",
  k = 5:6,
  hvg = NULL,
  nfeatures = 2000,
  L1 = c(0, 0),
  min.exp = 0.01,
  max.exp = 3,
  center = FALSE,
  scale = FALSE,
  min.cells.per.sample = 10,
  hvg.blocklist = NULL,
  seed = 123
)
multiNMF(
  obj.list,
  assay = "RNA",
  slot = "data",
  k = 5:6,
  hvg = NULL,
  nfeatures = 2000,
  L1 = c(0, 0),
  min.exp = 0.01,
  max.exp = 3,
  center = FALSE,
  scale = FALSE,
  min.cells.per.sample = 10,
  hvg.blocklist = NULL,
  seed = 123
)

Arguments

`obj.list`	A list of Seurat objects
`assay`	Get data matrix from this assay
`slot`	Get data matrix from this slot (=layer)
`k`	Number of target components for NMF (can be a vector)
`hvg`	List of pre-calculated variable genes to subset the matrix. If hvg=NULL it calculates them automatically
`nfeatures`	Number of HVG, if calculate_hvg=TRUE
`L1`	L1 regularization term for NMF
`min.exp`	Minimum average log-expression value for retaining genes
`max.exp`	Maximum average log-expression value for retaining genes
`center`	Whether to center the data matrix
`scale`	Whether to scale the data matrix
`min.cells.per.sample`	Minimum numer of cells per sample (smaller samples will be ignored)
`hvg.blocklist`	Optionally takes a vector or list of vectors of gene names. These genes will be ignored for HVG detection. This is useful to mitigateeffect of genes associated with technical artifacts and batch effects (e.g. mitochondrial), and to exclude TCR and BCR adaptive immune(clone-specific) receptors. If set to 'NULL' no genes will be excluded
`seed`	Random seed

Value

Returns a list of NMF programs, one for each sample and for each value of 'k'. The format of each program in the list follosw the structure of nmf factorization models.

Examples

library(Seurat)
data(sampleObj)
geneNMF_programs <- multiNMF(list(sampleObj), k=5)

library(Seurat)
data(sampleObj)
geneNMF_programs <- multiNMF(list(sampleObj), k=5)

Run PCA on a list of Seurat objects

Description

Given a list of Seurat objects, run non-negative PCA factorization on each sample individually.

Usage

multiPCA(
  obj.list,
  assay = "RNA",
  slot = "data",
  k = 4:5,
  hvg = NULL,
  nfeatures = 500,
  min.exp = 0.01,
  max.exp = 3,
  min.cells.per.sample = 10,
  center = FALSE,
  scale = FALSE,
  hvg.blocklist = NULL,
  seed = 123
)
multiPCA(
  obj.list,
  assay = "RNA",
  slot = "data",
  k = 4:5,
  hvg = NULL,
  nfeatures = 500,
  min.exp = 0.01,
  max.exp = 3,
  min.cells.per.sample = 10,
  center = FALSE,
  scale = FALSE,
  hvg.blocklist = NULL,
  seed = 123
)

Arguments

`obj.list`	A list of Seurat objects
`assay`	Get data matrix from this assay
`slot`	Get data matrix from this slot (=layer)
`k`	Number of target components for PCA
`hvg`	List of pre-calculated variable genes to subset the matrix. If hvg=NULL it calculates them automatically
`nfeatures`	Number of HVG, if calculate_hvg=TRUE
`min.exp`	Minimum average log-expression value for retaining genes
`max.exp`	Maximum average log-expression value for retaining genes
`min.cells.per.sample`	Minimum numer of cells per sample (smaller samples will be ignored)
`center`	Whether to center the data matrix
`scale`	Whether to scale the data matrix
`hvg.blocklist`	Optionally takes a vector or list of vectors of gene names. These genes will be ignored for HVG detection. This is useful to mitigateeffect of genes associated with technical artifacts and batch effects (e.g. mitochondrial), and to exclude TCR and BCR adaptive immune(clone-specific) receptors. If set to 'NULL' no genes will be excluded
`seed`	Random seed

Value

Returns a list of non-negative PCA programs, one for each sample. The format of each program in the list follows the structure of nmf factorization models.

Examples

library(Seurat)
data(sampleObj)
geneNMF_programs <- multiPCA(list(sampleObj), k=5)

library(Seurat)
data(sampleObj)
geneNMF_programs <- multiPCA(list(sampleObj), k=5)

Visualizations for meta-programs

Description

Generates a clustered heatmap for meta-program similarities (by Jaccard index or Cosine similarity). This function is intended to be run on the object generated by getMetaPrograms, which contains a pre-calculated tree of pairwise similarities between clusters (as a 'hclust' object).

Usage

plotMetaPrograms(
  mp.res,
  similarity.cutoff = c(0, 1),
  scale = "none",
  downsample = 1000,
  showtree = TRUE,
  palette = viridis(100, option = "A", direction = -1),
  annotation_colors = NULL,
  main = "Clustered Heatmap",
  show_rownames = FALSE,
  show_colnames = FALSE,
  ...
)
plotMetaPrograms(
  mp.res,
  similarity.cutoff = c(0, 1),
  scale = "none",
  downsample = 1000,
  showtree = TRUE,
  palette = viridis(100, option = "A", direction = -1),
  annotation_colors = NULL,
  main = "Clustered Heatmap",
  show_rownames = FALSE,
  show_colnames = FALSE,
  ...
)

Arguments

`mp.res`	The meta-programs object generated by `getMetaPrograms`
`similarity.cutoff`	Min and max values for similarity metric
`scale`	Heatmap rescaling (passed to pheatmap as 'scale')
`downsample`	Limit max number of samples in heatmap, to avoid overloading the graphics
`showtree`	Whether to plot the hierarchical clustering tree
`palette`	Heatmap color palette (passed to pheatmap as 'color')
`annotation_colors`	Color palette for MP annotations
`main`	Heatmap title
`show_rownames`	Whether to display individual program names as rows
`show_colnames`	Whether to display individual program names as cols
`...`	Additional parameters for pheatmap

Value

Returns a clustered heatmap of MP similarities, in ggplot2 format

Examples

library(Seurat)
data(sampleObj)
geneNMF_programs <- multiNMF(list(sampleObj), k=5)
geneNMF_metaprograms <- getMetaPrograms(geneNMF_programs, nMP=3)
plotMetaPrograms(geneNMF_metaprograms)

library(Seurat)
data(sampleObj)
geneNMF_programs <- multiNMF(list(sampleObj), k=5)
geneNMF_metaprograms <- getMetaPrograms(geneNMF_programs, nMP=3)
plotMetaPrograms(geneNMF_metaprograms)

Run Gene set enrichment analysis

Description

Utility function to run Gene Set Enrichment Analysis (GSEA) against gene sets from MSigDB. Note: this is an optional function, which is conditional to the installation of suggested packages fgsea and msigdbr.

Usage

runGSEA(
  genes,
  universe = NULL,
  category = "H",
  subcategory = NULL,
  species = "Homo sapiens",
  custom.db = NULL,
  pval.thr = 0.05
)
runGSEA(
  genes,
  universe = NULL,
  category = "H",
  subcategory = NULL,
  species = "Homo sapiens",
  custom.db = NULL,
  pval.thr = 0.05
)

Arguments

`genes`	A vector of genes
`universe`	Background universe of gene symbols (passed on to `fgsea::fora`)
`category`	GSEA main category (e.g. "H" or "C5")
`subcategory`	GSEA subcategory
`species`	Species for GSEA analysis. For a list of the available species, type `msigdbr::msigdbr_species()`
`custom.db`	A custom database of signatures, to be used instead of MSigDB. Provide custom.db as a named list, where the list names are the signature names, and the list elements are the signatures (as vectors).
`pval.thr`	Min p-value to include results

Value

Returns a table of enriched gene programs from GSEA

Examples

data(sampleObj)
geneset <- c("BANK1","CD22","CD79A","CD19","IGHD","IGHG3","IGHM")
#test is conditional on availability of suggested packages
if (requireNamespace("fgsea", quietly=TRUE) &
   requireNamespace("msigdbr", quietly=TRUE)) {
   gsea_res <- runGSEA(geneset,
       universe=rownames(sampleObj),
       category = "C8")
}

data(sampleObj)
geneset <- c("BANK1","CD22","CD79A","CD19","IGHD","IGHG3","IGHM")
#test is conditional on availability of suggested packages
if (requireNamespace("fgsea", quietly=TRUE) &
   requireNamespace("msigdbr", quietly=TRUE)) {
   gsea_res <- runGSEA(geneset,
       universe=rownames(sampleObj),
       category = "C8")
}

Compute NMF as a low-dim embedding for Seurat

Description

Compute NMF embeddings for single-cell dataset, and store them in the Seurat data structure. They can be used as an alternative to PCA for downstream analyses.

Usage

runNMF(
  obj,
  assay = "RNA",
  slot = "data",
  k = 10,
  new.reduction = "NMF",
  seed = 123,
  L1 = c(0, 0),
  hvg = NULL,
  center = FALSE,
  scale = FALSE
)
runNMF(
  obj,
  assay = "RNA",
  slot = "data",
  k = 10,
  new.reduction = "NMF",
  seed = 123,
  L1 = c(0, 0),
  hvg = NULL,
  center = FALSE,
  scale = FALSE
)

Arguments

`obj`	A seurat object
`assay`	Get data matrix from this assay
`slot`	Get data matrix from this slot (=layer)
`k`	Number of components for low-dim representation
`new.reduction`	Name of new dimensionality reduction
`seed`	Random seed
`L1`	L1 regularization term for NMF
`hvg`	Which genes to use for the reduction
`center`	Whether to center the data matrix
`scale`	Whether to scale the data matrix

Value

Returns a Seurat object with a new dimensionality reduction (NMF)

Examples

data(sampleObj)
sampleObj <- runNMF(sampleObj, k=8)
data(sampleObj)
sampleObj <- runNMF(sampleObj, k=8)

Sample dataset to test GeneNMF installation

Description

A Seurat object containing single-cell transcriptomes (scRNA-seq) for 50 cells and 20729 genes. Single-cell UMI counts were normalized using a standard log-normalization: counts for each cell were divided by the total counts for that cell and multiplied by 10,000, then natural-log transformed using 'log1p'.

This a subsample of 25 predicted B cells and 25 predicted NK cells from the large scRNA-seq PBMC dataset published by Hao et al. (doi:10.1016/j.cell.2021.04.048) and available as UMI counts at https://atlas.fredhutch.org/data/nygc/multimodal/pbmc_multimodal.h5seurat

Usage

sampleObj
sampleObj

Format

A sparse matrix of 50 cells and 20729 genes.

Source

doi:10.1016/j.cell.2021.04.048