| Title: | Reference-based analysis of scRNA-seq data |
|---|---|
| Description: | This package implements methods to project single-cell RNA-seq data onto a reference atlas, enabling interpretation of unknown cell transcriptomic states in the the context of known, reference states. |
| Authors: | Massimo Andreatta [aut, cre] (ORCID: <https://orcid.org/0000-0002-8036-2647>), Paul Gueguen [aut] (ORCID: <https://orcid.org/0000-0003-2930-6073>), Josep Garnica [aut] (ORCID: <https://orcid.org/0000-0001-9493-1321>), Santiago Carmona [aut] (ORCID: <https://orcid.org/0000-0002-2495-0671>) |
| Maintainer: | Massimo Andreatta <[email protected]> |
| License: | GPL-3 + file LICENSE |
| Version: | 3.7.0 |
| Built: | 2026-05-29 09:17:36 UTC |
| Source: | https://github.com/carmonalab/ProjecTILs |
A list of cell cycling signatures (G1.S and G2.M phases), for mouse and human.
cell.cycle.objcell.cycle.obj
A list of cycling signatures.
This function uses a nearest-neighbor algorithm to predict a feature (e.g. the cell state) of the query cells. Distances between cells in the reference map and cells in the query are calculated in a reduced space (PCA or UMAP) and the feature is assigned to query cells based on a consensus of its nearest neighbors in the reference object.
cellstate.predict( ref, query, reduction = "pca", ndim = NULL, k = 5, min.confidence = 0.2, nn.decay = 0.1, labels.col = "functional.cluster" )cellstate.predict( ref, query, reduction = "pca", ndim = NULL, k = 5, min.confidence = 0.2, nn.decay = 0.1, labels.col = "functional.cluster" )
ref |
Reference Atlas |
query |
Seurat object with query data |
reduction |
The dimensionality reduction used to calculate pairwise distances. One of "pca" or "umap" |
ndim |
How many dimensions in the reduced space to be used for distance calculations |
k |
Number of neighbors to assign the cell type |
min.confidence |
Minimum confidence score to return cell type labels (otherwise NA) |
nn.decay |
Weight decay for internal nearest neighbors (between 0 and 1) |
labels.col |
The metadata field of the reference to annotate the clusters (default: functional.cluster) |
The query object submitted as parameter, with two additional metadata slots for predicted state and its confidence score
data(query_example_seurat) ref <- load.reference.map() q <- make.projection(query_example_seurat, ref=ref) q <- cellstate.predict(ref, query=q) table(q$functional.cluster)data(query_example_seurat) ref <- load.reference.map() q <- make.projection(query_example_seurat, ref=ref) q <- cellstate.predict(ref, query=q) table(q$functional.cluster)
This function allows to calculate and plot pseudo-bulk gene expression by cell type and custom grouping variables. Data can be split in principle by any metadata present in the starting Seurat object (e.g. patient, tissue, study, etc.). This can be useful to evaluate consistency of expression profiles for different cell types across samples, studies or other grouping variables.
celltype.heatmap( data, assay = "RNA", slot = "data", genes, ref = NULL, scale = "row", method = c("ward.D2", "ward.D", "average"), brewer.palette = "RdBu", palette_reverse = F, palette = NULL, cluster.col = "functional.cluster", group.by = NULL, flip = FALSE, cluster_genes = FALSE, cluster_samples = FALSE, min.cells = 10, show_samplenames = FALSE, remove.NA.meta = TRUE, breaks = seq(-2, 2, by = 0.1), return.matrix = FALSE, ... )celltype.heatmap( data, assay = "RNA", slot = "data", genes, ref = NULL, scale = "row", method = c("ward.D2", "ward.D", "average"), brewer.palette = "RdBu", palette_reverse = F, palette = NULL, cluster.col = "functional.cluster", group.by = NULL, flip = FALSE, cluster_genes = FALSE, cluster_samples = FALSE, min.cells = 10, show_samplenames = FALSE, remove.NA.meta = TRUE, breaks = seq(-2, 2, by = 0.1), return.matrix = FALSE, ... )
data |
A Seurat object to be used for the heatmap |
assay |
A string indicating the assay type, default is "RNA" |
slot |
Data slot (layer) in Seurat object |
genes |
A vector of genes to be used in the heatmap |
ref |
A ProjecTILs reference Seurat object to define the order of functional.cluster |
scale |
A string indicating the scale of the heatmap, default is "row" |
method |
A string or vector of strings indicating the clustering method to be used, default is "ward.D2" |
brewer.palette |
A string indicating the color palette to be used, default is "RdBu" |
palette_reverse |
A boolean indicating if color palette should be reversed, default is FALSE |
palette |
A named list containing colors vectors compatible with pheatmap. The list is named by the metadata names, default is taking these palettes to plot metadata: "Paired","Set2","Accent","Dark2","Set1","Set3". |
cluster.col |
The metadata column name containing the cell type labels |
group.by |
The metadata column names used as grouping variables |
flip |
A boolean indicating if the heatmap should be flipped, default is FALSE |
cluster_genes |
A boolean indicating if genes should be clustered, default is FALSE |
cluster_samples |
A boolean indicating if samples should be clustered, default is FALSE |
min.cells |
A value defining the minimum number of cells a sample should have to be kept, default is 10 |
show_samplenames |
A boolean indicating whether the heatmap should display the sample names or not, default is FALSE |
remove.NA.meta |
A boolean indicating if missing samples with missing metadata should be plotted, default is TRUE |
breaks |
Range of values for plotting (see 'breaks' parameter in pheatmap) |
return.matrix |
If true, return the pseudo-bulk data matrix instead of graphical output |
... |
Additional parameters for 'pheatmap' |
A pheatmap plot, displaying averaged expression values across genes for each selected genes and samples.
library(Seurat) ref <- load.reference.map(ref = "https://figshare.com/ndownloader/files/38921366") celltype.heatmap(ref, assay = "RNA", genes = c("LEF1","SELL","GZMK","FGFBP2"), ref = ref, cluster.col = "functional.cluster", group.by = c("orig.ident", "Tissue"))library(Seurat) ref <- load.reference.map(ref = "https://figshare.com/ndownloader/files/38921366") celltype.heatmap(ref, assay = "RNA", genes = c("LEF1","SELL","GZMK","FGFBP2"), ref = ref, cluster.col = "functional.cluster", group.by = c("orig.ident", "Tissue"))
Given a projected object and its reference, calculate silhouette coefficient for query cells with respect to reference cells with the same cell labels.
compute_silhouette( ref, query = NULL, reduction = "pca", ndim = NULL, label_col = "functional.cluster", normalize.scores = FALSE, min.cells = 20 )compute_silhouette( ref, query = NULL, reduction = "pca", ndim = NULL, label_col = "functional.cluster", normalize.scores = FALSE, min.cells = 20 )
ref |
Reference object |
query |
Query object. If not specified, the silhouette coefficient of only the reference will be calculated |
reduction |
Which dimensionality reduction to use for euclidian distance calculation |
ndim |
Number of dimensions in the dimred to use for distance calculation. If NULL, use all dimensions. |
label_col |
Metadata column with cell type annotations. Must be present both in reference and query |
normalize.scores |
Whether to normalize silhouette scores by the average cell type silhouettes of the reference |
min.cells |
Only report silhouette scores for cell type with at least this number of cells |
A dataframe with average silhouette coefficient for each cell type
data(query_example_seurat) ref <- load.reference.map() q <- Run.ProjecTILs(query_example_seurat, ref=ref, fast.umap.predict=TRUE) combined <- compute_silhouette(ref, query=q)data(query_example_seurat) ref <- load.reference.map() q <- Run.ProjecTILs(query_example_seurat, ref=ref, fast.umap.predict=TRUE) combined <- compute_silhouette(ref, query=q)
Searches PCA or ICA dimensions where the query set deviates the most from a control set or from the reference map. It can be useful to suggest novel cell states that escape from the main axes of diversity of the UMAP
find.discriminant.dimensions( ref, query, query.control = NULL, query.assay = "RNA", state = "largest", labels.col = "functional.cluster", reduction = "ICA", test = c("ks", "t.test"), ndim = 50, print.n = 3, verbose = T )find.discriminant.dimensions( ref, query, query.control = NULL, query.assay = "RNA", state = "largest", labels.col = "functional.cluster", reduction = "ICA", test = c("ks", "t.test"), ndim = 50, print.n = 3, verbose = T )
ref |
Seurat object with reference atlas |
query |
Seurat object with query data |
query.control |
Optionally, you can compare your query with a control sample, instead of the reference |
query.assay |
The data slot to be used for enrichment analysis |
state |
Perform discriminant analysis on this cell state. Can be either:
|
labels.col |
The metadata field used to annotate the clusters (default: functional.cluster) |
reduction |
Which dimensionality reduction to use (either ICA or PCA) |
test |
Which test to perform between the dataset distributions in each ICA/PCA dimension. One of 'ks' (Kolmogorov-Smirnov) or 't.test' (T-test) |
ndim |
How many dimensions to consider in the reduced ICA/PCA space |
print.n |
The number of top dimensions to return to STDOUT |
verbose |
Print results to STDOUT |
A dataframe, where rows are ICA/PCA dimensions. ICA/PCAs are ranked by statistical significance when comparing their distribution between query and control (or query vs. reference map)
find.discriminant.dimensions(ref, query=query.set) find.discriminant.dimensions(ref, query=query.set, query.control=control.set)find.discriminant.dimensions(ref, query=query.set) find.discriminant.dimensions(ref, query=query.set, query.control=control.set)
Based on 'FindMarkers'. It performs differential expression analysis between a projected query and a control (either the reference map or a control sample), for a given cell type. Useful to detect whether specific cell states over/under-express genes between conditions or with respect to the reference.
find.discriminant.genes( ref, query, query.control = NULL, ref.assay = "RNA", query.assay = "RNA", state = "largest", labels.col = "functional.cluster", test = "wilcox", min.cells = 10, genes.use = c("variable", "all"), ... )find.discriminant.genes( ref, query, query.control = NULL, ref.assay = "RNA", query.assay = "RNA", state = "largest", labels.col = "functional.cluster", test = "wilcox", min.cells = 10, genes.use = c("variable", "all"), ... )
ref |
Seurat object with reference atlas |
query |
Seurat object with query data |
query.control |
Optionally, you can compare your query with a control sample, instead of the reference |
ref.assay |
The referece assay to be used for DE analysis |
query.assay |
The query assay to be used for DEG analyis, if comparing to the reference |
state |
Perform discriminant analysis on this cell state. Can be either:
|
labels.col |
The metadata field used to annotate the clusters (default: functional.cluster) |
test |
Type of test for DE analysis. See help for 'FindMarkers' for implemented tests. |
min.cells |
Minimum number of cells in the cell type to proceed with analysis. |
genes.use |
What subset of genes to consider for DE analysis:
|
... |
Adding parameters for 'FindMarkers' |
A dataframe with a ranked list of genes as rows, and statistics as columns (e.g. log fold-change, p-values). See help for 'FindMarkers' for more details.
# Discriminant genes between query and reference in cell type "Tex" markers <- find.discriminant.genes(ref, query=query.set, state="Tex") # Discriminant genes between query and control sample in most represented cell type markers <- find.discriminant.genes(ref, query=query.set, query.control=control.set) # Pass results to EnhancedVolcano for visual results library(EnhancedVolcano) EnhancedVolcano(markers, lab = rownames(markers), x = 'avg_logFC', y = 'p_val')# Discriminant genes between query and reference in cell type "Tex" markers <- find.discriminant.genes(ref, query=query.set, state="Tex") # Discriminant genes between query and control sample in most represented cell type markers <- find.discriminant.genes(ref, query=query.set, query.control=control.set) # Pass results to EnhancedVolcano for visual results library(EnhancedVolcano) EnhancedVolcano(markers, lab = rownames(markers), x = 'avg_logFC', y = 'p_val')
This function expands FindAllMarkers to find markers that are differentially expressed across multiple datasets or samples. Given a Seurat object with identity classes (for example annotated clusters) and a grouping variable (for example a Sample ID), it calculate differentially expressed genes (DEGs) individually for each sample. Then it determines the fraction of samples for which the gene was found to be differentially expressed.
FindAllMarkers.bygroup( object, split.by = NULL, only.pos = TRUE, features = NULL, min.cells.group = 10, min.freq = 0.5, ... )FindAllMarkers.bygroup( object, split.by = NULL, only.pos = TRUE, features = NULL, min.cells.group = 10, min.freq = 0.5, ... )
object |
A Seurat object |
split.by |
A metadata column name - the data will be split by this column to calculate FindAllMarkers separately for each data split |
only.pos |
Only return positive markers (TRUE by default) |
features |
Genes to test. Default is to use all genes |
min.cells.group |
Minimum number of cells in the group - if lower the group is skipped |
min.freq |
Only return markers which are differentially expressed in at least this fraction of datasets. |
... |
Additional paramters to FindAllMarkers |
This function can be useful to find marker genes that are specific for individual cell types, and that are found to be so consistently across multiple samples.
A list of marker genes for each identity class (typically clusters), with two associated numerical values: i) the fraction of datasets for which the marker was found to be differentially expressed; ii) the average log-fold change for the genes across datasets
library(Seurat) ref <- load.reference.map(ref = "https://figshare.com/ndownloader/files/38921366") Idents(ref) <- "functional.cluster" FindAllMarkers.bygroup(ref, split.by = "Sample", min.cells.group=30, min.freq=0.8)library(Seurat) ref <- load.reference.map(ref = "https://figshare.com/ndownloader/files/38921366") Idents(ref) <- "functional.cluster" FindAllMarkers.bygroup(ref, split.by = "Sample", min.cells.group=30, min.freq=0.8)
Download and load reference atlases.
get.reference.maps( collection = NULL, reference = NULL, update = FALSE, directory = "./ProjecTILs_references", as.list = TRUE, verbose = TRUE )get.reference.maps( collection = NULL, reference = NULL, update = FALSE, directory = "./ProjecTILs_references", as.list = TRUE, verbose = TRUE )
collection |
Collection to download and load. See available collection using list.reference.maps. If NULL, all are downloaded and loaded (default) |
reference |
References to download and load. See available collection using list.reference.maps. If NULL, all are downloaded and loaded (default) |
update |
Boolean whether to delete current reference maps and download them again |
directory |
Directory where to download and load from reference maps. By default a directory named "ProjecTILs_references" is created in working directory. |
as.list |
Boolean whether to simplify list ( |
verbose |
Inform of the status of processes |
# explore available reference maps list.reference.maps() # consider increasing downloading timeout options(timeout = 1000) # get all available reference maps ref.maps <- get.reference.maps() # get certain collections or reference maps # all human references maps ref.maps.human <- get.reference.maps(collection = "human") # only some references ref.maps <- get.reference.maps(reference = "DC") ref.maps.CD4 <- get.reference.maps(reference = c("CD4", "Virus_CD4T")) # update previously downloaded maps ref.maps <- get.reference.maps(update = TRUE)# explore available reference maps list.reference.maps() # consider increasing downloading timeout options(timeout = 1000) # get all available reference maps ref.maps <- get.reference.maps() # get certain collections or reference maps # all human references maps ref.maps.human <- get.reference.maps(collection = "human") # only some references ref.maps <- get.reference.maps(reference = "DC") ref.maps.CD4 <- get.reference.maps(reference = c("CD4", "Virus_CD4T")) # update previously downloaded maps ref.maps <- get.reference.maps(update = TRUE)
A conversion table of stable orthologs between Hs and Mm.
Hs2Mm.convert.tableHs2Mm.convert.table
A dataframe containing gene ortholog mapping.
https://www.ensembl.org/Mus_musculus/Info/Index
Obtain the list of available reference atlas for ProjecTILs to then download and load them using get.reference.maps.
list.reference.maps()list.reference.maps()
# explore available reference maps list.reference.maps()# explore available reference maps list.reference.maps()
Load or download the reference map for dataset projection. By the default it downloads a reference atlas of tumour-infiltrating lymphocytes (TILs) from mouse.
load.reference.map(ref = "referenceTIL")load.reference.map(ref = "referenceTIL")
ref |
Reference atlas as a Seurat object (by default downloads a mouse reference TIL atlas). To use a custom reference atlas, provide a .rds object or a URL to a .rds object, storing a Seurat object prepared using make.reference |
# consider increasing downloading timeout, if downloading Default reference atlas or large reference options(timeout = 1000) # Download and load default reference map ref <- load.reference.map() # download reference map from url ref.web <- load.reference.map(ref = url) # Load any reference map ref <- load.reference.map(ref = "path/to/ref")# consider increasing downloading timeout, if downloading Default reference atlas or large reference options(timeout = 1000) # Download and load default reference map ref <- load.reference.map() # download reference map from url ref.web <- load.reference.map(ref = url) # Load any reference map ref <- load.reference.map(ref = "path/to/ref")
This function allows projecting ("query") single-cell RNA-seq datasets onto a reference map (i.e. a curated and annotated scRNA-seq dataset). To project multiple datasets, submit a list of Seurat objects with the query parameter. The projection consists of 3 steps:
pre-processing: optional steps which might include pre-filtering of cells by markers using 'scGate', data normalization, and ortholog conversion.
batch-effect correction: uses built-in STACAS algorithm to detect and correct for batch effects (this step assumes that at least a fraction of the cells in the query are in the same state than cells in the reference)
embedding of corrected query data in the reduced-dimensionality spaces (PCA and UMAP) of the reference map.
make.projection( query, ref = NULL, filter.cells = TRUE, query.assay = NULL, direct.projection = FALSE, STACAS.anchor.coverage = 0.7, STACAS.correction.scale = 100, STACAS.k.anchor = 5, STACAS.k.weight = "max", skip.normalize = FALSE, fast.umap.predict = FALSE, ortholog_table = NULL, scGate_model = NULL, ncores = 1, progressbar = TRUE )make.projection( query, ref = NULL, filter.cells = TRUE, query.assay = NULL, direct.projection = FALSE, STACAS.anchor.coverage = 0.7, STACAS.correction.scale = 100, STACAS.k.anchor = 5, STACAS.k.weight = "max", skip.normalize = FALSE, fast.umap.predict = FALSE, ortholog_table = NULL, scGate_model = NULL, ncores = 1, progressbar = TRUE )
query |
Query data, either as single Seurat object or as a list of Seurat object |
ref |
Reference Atlas - if NULL, downloads the default TIL reference atlas |
filter.cells |
Pre-filter cells using 'scGate'. Only set to FALSE if the dataset has been previously subset to cell types represented in the reference. |
query.assay |
Which assay slot to use for the query (defaults to DefaultAssay(query)) |
direct.projection |
If true, apply PCA transformation directly without alignment |
STACAS.anchor.coverage |
Focus on few robust anchors (low STACAS.anchor.coverage) or on a large amount of anchors (high STACAS.anchor.coverage). Must be number between 0 and 1. |
STACAS.correction.scale |
Slope of sigmoid function used to determine strength of batch effect correction. |
STACAS.k.anchor |
Integer. For alignment, how many neighbors (k) to use when picking anchors. |
STACAS.k.weight |
Number of neighbors to consider when weighting anchors. Default is "max", which disables local anchor weighting. |
skip.normalize |
By default, log-normalize the count data. If you have already normalized your data, you can skip normalization. |
fast.umap.predict |
Fast approximation for UMAP projection. Uses coordinates of nearest neighbors in PCA space to assign UMAP coordinates (credits to Changsheng Li for the implementation) |
ortholog_table |
Dataframe for conversion between ortholog genes
(by default package object |
scGate_model |
scGate model used to filter target cell type from query data
(if NULL use the model stored in |
ncores |
Number of cores for parallel execution (requires BiocParallel) |
progressbar |
Whether to show a progress bar for projection process or not (requires BiocParallel) |
See load.reference.map to load or download a reference atlas. See also ProjecTILs.classifier to use ProjecTILs as a cell type classifier.
An augmented Seurat object with projected UMAP coordinates on the reference map
data(query_example_seurat) ref <- load.reference.map() make.projection(query_example_seurat, ref=ref)data(query_example_seurat) ref <- load.reference.map() make.projection(query_example_seurat, ref=ref)
Converts a Seurat object to a ProjecTILs reference atlas. You can preserve your low-dimensionality embeddings (e.g. UMAP) in the reference atlas by setting 'recalculate.umap=FALSE', or recalculate the UMAP using one of the two methods umap::umap or uwot::umap. Recalculation allows exploting the 'predict' functionalities of these methods for embedding of new points; skipping recalculation will make the projection use an approximation for UMAP embedding of the query.
make.reference( ref, assay = NULL, assay.raw = "RNA", atlas.name = "custom_reference", annotation.column = "functional.cluster", recalculate.umap = FALSE, umap.method = c("umap", "uwot"), metric = "cosine", min_dist = 0.3, n_neighbors = 30, ndim = 20, dimred = "umap", nfeatures = 1000, color.palette = NULL, scGate.model.human = NULL, scGate.model.mouse = NULL, store.markers = FALSE, n.markers = 10, seed = 123, layer1_link = NULL )make.reference( ref, assay = NULL, assay.raw = "RNA", atlas.name = "custom_reference", annotation.column = "functional.cluster", recalculate.umap = FALSE, umap.method = c("umap", "uwot"), metric = "cosine", min_dist = 0.3, n_neighbors = 30, ndim = 20, dimred = "umap", nfeatures = 1000, color.palette = NULL, scGate.model.human = NULL, scGate.model.mouse = NULL, store.markers = FALSE, n.markers = 10, seed = 123, layer1_link = NULL )
ref |
Seurat object with reference atlas |
assay |
The assay storing the reference expression data (e.g. "integrated") |
assay.raw |
The assay storing raw expression data (e.g. "RNA") |
atlas.name |
An optional name for your reference |
annotation.column |
The metadata column with the cluster annotations for this atlas |
recalculate.umap |
If TRUE, run the 'umap' or 'uwot' algorithm to generate embeddings. Otherwise use the embeddings stored in the 'dimred' slot. |
umap.method |
Which method to use for calculating the umap reduction |
metric |
Distance metric to use to find nearest neighbors for UMAP |
min_dist |
Effective minimum distance between UMAP embedded points |
n_neighbors |
Size of local neighborhood for UMAP |
ndim |
Number of PCA dimensions |
dimred |
Use the pre-calculated embeddings stored at 'Embeddings(ref, dimred)' |
nfeatures |
Number of variable features (only calculated if not already present) |
color.palette |
A (named) vector of colors for the reference plotting functions. One color for each cell type in 'functional.cluster' |
scGate.model.human |
A human scGate model to purify the cell types represented in the map. For example, if the map contains CD4 T cell subtype, specify an scGate model for CD4 T cells. |
scGate.model.mouse |
A mouse scGate model to purify the cell types represented in the map. |
store.markers |
Whether to store the top differentially expressed genes in 'ref@misc$gene.panel' |
n.markers |
Store the top 'n.markers' for each subtype given by differential expression analysis |
seed |
Random seed |
layer1_link |
Broad cell type contained in this reference atlas (i.e. CD4T, CL:0000624...) to link with broad cell type annotation (layer1). |
A reference atlas compatible with ProjecTILs
custom_reference <- ProjecTILs::make.reference(my_dataset, recalculate.umap=T)custom_reference <- ProjecTILs::make.reference(my_dataset, recalculate.umap=T)
Given two Seurat objects, merge counts and data as well as dim reductions (PCA, UMAP, ICA, etc.)
## S3 method for class 'Seurat.embeddings' merge(x = NULL, y = NULL, merge.dr = TRUE, ...)## S3 method for class 'Seurat.embeddings' merge(x = NULL, y = NULL, merge.dr = TRUE, ...)
x |
First object to merge |
y |
Second object to merge |
merge.dr |
How to handle merging dimensional reductions (see merge.Seurat) |
... |
More parameters to merge function |
A merged Seurat object
o1 <- query_example_seurat o2 <- query_example_seurat seurat.merged <- merge.Seurat.embeddings(o1, o2) #To merge multiple object stored in a list seurat.merged <- Reduce(f=merge.Seurat.embeddings, x=obj.list)o1 <- query_example_seurat o2 <- query_example_seurat seurat.merged <- merge.Seurat.embeddings(o1, o2) #To merge multiple object stored in a list seurat.merged <- Reduce(f=merge.Seurat.embeddings, x=obj.list)
Add an extra dimension to the reference map (it can be suggested by 'find.discriminant.dimensions'), to explore additional axes of variability in a query dataset compared to the reference map.
## S3 method for class 'discriminant.3d' plot( ref, query, query.control = NULL, query.assay = "RNA", labels.col = "functional.cluster", extra.dim = "ICA_1", query.state = NULL )## S3 method for class 'discriminant.3d' plot( ref, query, query.control = NULL, query.assay = "RNA", labels.col = "functional.cluster", extra.dim = "ICA_1", query.state = NULL )
ref |
Seurat object with reference object |
query |
Seurat object with query data |
query.control |
Optionally, you can compare your query with a control sample, instead of the reference |
query.assay |
The data slot to be used for enrichment analysis |
labels.col |
The metadata field used to annotate the clusters |
extra.dim |
The additional dimension to be added on the z-axis of the plot. Can be either:
|
query.state |
Only plot the query cells from this specific state |
A three dimensional plot with UMAP_1 and UMAP_2 on the x and y axis respectively, and the specified 'extra.dim' on the z-axis.
plot.discriminant.3d(ref, query=query, extra.dim="ICA_19") plot.discriminant.3d(ref, query=treated.set, query.control=control.set, extra.dim="ICA_2")plot.discriminant.3d(ref, query=query, extra.dim="ICA_19") plot.discriminant.3d(ref, query=treated.set, query.control=control.set, extra.dim="ICA_2")
Plots the UMAP representation of the reference map, together with the projected coordinates of a query dataset.
## S3 method for class 'projection' plot( ref, query = NULL, labels.col = "functional.cluster", cols = NULL, linesize = 1, pointsize = 1, density_adjust = 1, ref.alpha = 0.3, ref.size = NULL, ... )## S3 method for class 'projection' plot( ref, query = NULL, labels.col = "functional.cluster", cols = NULL, linesize = 1, pointsize = 1, density_adjust = 1, ref.alpha = 0.3, ref.size = NULL, ... )
ref |
Reference object |
query |
Seurat object with query data |
labels.col |
The metadata field to annotate the clusters (default: functional.cluster) |
cols |
Custom color palette for clusters |
linesize |
Contour line thickness for projected query |
pointsize |
Point size for cells in projected query |
density_adjust |
Adjust factor for contour line density |
ref.alpha |
Transparency parameter for reference cells |
ref.size |
Adjust point size for reference cells |
... |
Additional parameters for |
UMAP plot of reference map with projected query set in the same space
data(query_example_seurat) ref <- load.reference.map() q <- Run.ProjecTILs(query_example_seurat, ref=ref, fast.umap.predict=TRUE) plot.projection(ref=ref, query=q)data(query_example_seurat) ref <- load.reference.map() q <- Run.ProjecTILs(query_example_seurat, ref=ref, fast.umap.predict=TRUE) plot.projection(ref=ref, query=q)
Makes a barplot of the frequency of cell states in a query object.
## S3 method for class 'statepred.composition' plot( ref, query, labels.col = "functional.cluster", cols = NULL, metric = c("Count", "Percent") )## S3 method for class 'statepred.composition' plot( ref, query, labels.col = "functional.cluster", cols = NULL, metric = c("Count", "Percent") )
ref |
Reference object |
query |
Seurat object with query data |
labels.col |
The metadata field used to annotate the clusters (default: functional.cluster) |
cols |
Custom color palette for clusters |
metric |
One of 'Count' or 'Percent'. 'Count' plots the absolute number of cells, 'Percent' the fraction on the total number of cells. |
Barplot of predicted state composition
data(query_example_seurat) ref <- load.reference.map() q <- make.projection(query_example_seurat, ref=ref) q <- cellstate.predict(ref, query=q) plot.statepred.composition(query_example.seurat)data(query_example_seurat) ref <- load.reference.map() q <- make.projection(query_example_seurat, ref=ref) q <- cellstate.predict(ref, query=q) plot.statepred.composition(query_example.seurat)
Makes a radar plot of the expression level of a set of genes. It can be useful to compare the gene expression profile of different cell states in the reference atlas vs. a projected set.
## S3 method for class 'states.radar' plot( ref, query = NULL, labels.col = "functional.cluster", ref.assay = "RNA", query.assay = "RNA", genes4radar = c("Foxp3", "Cd4", "Cd8a", "Tcf7", "Ccr7", "Gzmb", "Gzmk", "Pdcd1", "Havcr2", "Tox", "Mki67"), meta4radar = NULL, norm.factor = 1, min.cells = 20, cols = NULL, return = FALSE, return.as.list = FALSE )## S3 method for class 'states.radar' plot( ref, query = NULL, labels.col = "functional.cluster", ref.assay = "RNA", query.assay = "RNA", genes4radar = c("Foxp3", "Cd4", "Cd8a", "Tcf7", "Ccr7", "Gzmb", "Gzmk", "Pdcd1", "Havcr2", "Tox", "Mki67"), meta4radar = NULL, norm.factor = 1, min.cells = 20, cols = NULL, return = FALSE, return.as.list = FALSE )
ref |
Reference object |
query |
Query data, either as a Seurat object or as a list of Seurat objects |
labels.col |
The metadata field used to annotate the clusters |
ref.assay |
The assay to pull the reference expression data |
query.assay |
The assay to pull the query expression data |
genes4radar |
Which genes to use for plotting |
meta4radar |
Which metadata columns (numeric) to use for plotting. If not NULL, |
norm.factor |
Normalization factor for rescaling expression or metadata values |
min.cells |
Only display cell states with a minimum number of cells |
cols |
Custom color palette for samples in radar plot |
return |
Return the combined plots instead of printing them to the default device (deprecated) |
return.as.list |
Return plots in a list, instead of combining them in a single plot |
Radar plot of gene expression of key genes by cell subtype
ref <- load.reference.map() plot.states.radar(ref)ref <- load.reference.map() plot.states.radar(ref)
Apply label transfer to annotate a query dataset with the cell types of a reference object. Compared to Run.ProjecTILs, only cell labels are returned. The low-dim embeddings of the query object (PCA, UMAP) are not modified.
ProjecTILs.classifier( query, ref = NULL, filter.cells = TRUE, split.by = NULL, reduction = "pca", ndim = NULL, k = 5, nn.decay = 0.1, min.confidence = 0.2, labels.col = "functional.cluster", overwrite = TRUE, ncores = 1, ... )ProjecTILs.classifier( query, ref = NULL, filter.cells = TRUE, split.by = NULL, reduction = "pca", ndim = NULL, k = 5, nn.decay = 0.1, min.confidence = 0.2, labels.col = "functional.cluster", overwrite = TRUE, ncores = 1, ... )
query |
Query data, either as single Seurat object or as a list of Seurat object |
ref |
Reference Atlas - if NULL, downloads the default TIL reference atlas |
filter.cells |
Pre-filter cells using 'scGate'. Only set to FALSE if the dataset has been previously subset to cell types represented in the reference. |
split.by |
Grouping variable to split the query object (e.g. if the object contains multiple samples) |
reduction |
The dimensionality reduction used to assign cell type labels |
ndim |
The number of dimensions used for cell type classification |
k |
Number of neighbors for cell type classification |
nn.decay |
Weight decay for internal nearest neighbors (between 0 and 1) |
min.confidence |
Minimum confidence score to return cell type labels (otherwise NA) |
labels.col |
The metadata field with label annotations of the reference, which will be transferred to the query dataset |
overwrite |
Replace any existing labels in |
ncores |
Number of cores for parallel processing |
... |
Additional parameters to make.projection |
See load.reference.map to load or download a reference atlas. See Run.ProjecTILs to embed the query in the same space of the reference
The query object with a additional metadata columns containing predicted cell labels and confidence scores for the predicted cell labels If cells were filtered prior to projection, they will be labeled as 'NA'
## Not run: data(query_example_seurat) ref <- load.reference.map() q <- ProjecTILs.classifier(query_example_seurat, ref=ref) table(q$functional.cluster, useNA="ifany") ## End(Not run)## Not run: data(query_example_seurat) ref <- load.reference.map() q <- ProjecTILs.classifier(query_example_seurat, ref=ref) table(q$functional.cluster, useNA="ifany") ## End(Not run)
A small dataset of CD8 T cells, to test the ProjecTILs installation.
query_example_seuratquery_example_seurat
A Seurat object
https://pmc.ncbi.nlm.nih.gov/articles/PMC6673650/
Load a query expression matrix to be projected onto the reference atlas. Several formats (10x, hdf5, raw and log counts)
are supported - see type parameter for details
read.sc.query( filename, type = c("10x", "hdf5", "raw", "raw.log2"), project.name = "Query", min.cells = 3, min.features = 50, gene.column.10x = 2, raw.rownames = 1, raw.sep = c("auto", " ", "\t", ","), raw.header = TRUE, use.readmtx = TRUE )read.sc.query( filename, type = c("10x", "hdf5", "raw", "raw.log2"), project.name = "Query", min.cells = 3, min.features = 50, gene.column.10x = 2, raw.rownames = 1, raw.sep = c("auto", " ", "\t", ","), raw.header = TRUE, use.readmtx = TRUE )
filename |
Path to expression matrix file or folder |
type |
Expression matrix format (10x, hdf5, raw, raw.log2) |
project.name |
Title for the project |
min.cells |
Only keep genes represented in at least min.cells number of cells |
min.features |
Only keep cells expressing at least min.features genes |
gene.column.10x |
For 10x format - which column of genes.tsv or features.tsv to use for gene names |
raw.rownames |
For raw matrix format - A vector of row names, or a single number giving the column of the table which contains the row names |
raw.sep |
For raw matrix format - Separator for raw expression matrix |
raw.header |
For raw matrix format - Use headers in expression matrix |
use.readmtx |
Use ReadMtx function to read in 10x files with custom names |
A Seurat object populated with raw counts and normalized counts for single-cell expression
fname <- "./sample_data" querydata <- read.sc.query(fname, type="10x")fname <- "./sample_data" querydata <- read.sc.query(fname, type="10x")
Given a reference object and a (list of) projected objects, recalculate low-dim embeddings accounting for the projected cells
recalculate.embeddings( ref, projected, ref.assay = "integrated", proj.assay = "integrated", ndim = NULL, n.neighbors = 20, min.dist = 0.3, recalc.pca = FALSE, resol = 0.4, k.param = 15, metric = "cosine", umap.method = c("umap", "uwot"), seed = 123 )recalculate.embeddings( ref, projected, ref.assay = "integrated", proj.assay = "integrated", ndim = NULL, n.neighbors = 20, min.dist = 0.3, recalc.pca = FALSE, resol = 0.4, k.param = 15, metric = "cosine", umap.method = c("umap", "uwot"), seed = 123 )
ref |
Reference map |
projected |
A projected object (or list of projected objects) generated using make.projection |
ref.assay |
Assay for reference object |
proj.assay |
Assay for projected object(s) |
ndim |
Number of dimensions for recalculating dimensionality reductions |
n.neighbors |
Number of neighbors for UMAP algorithm |
min.dist |
Tightness parameter for UMAP embedding |
recalc.pca |
Whether to recalculate the PCA embeddings with the combined reference and projected data |
resol |
Resolution for unsupervised clustering |
k.param |
Number of nearest neighbors for clustering |
metric |
Distance metric to use to find nearest neighbors for UMAP |
umap.method |
Which method should be used to calculate UMAP embeddings |
seed |
Random seed for reproducibility |
A combined reference object of reference and projected object(s), with new low dimensional embeddings
combined <- recalculate.embeddings(ref, projected, ndim=10)combined <- recalculate.embeddings(ref, projected, ndim=10)
This function allows projecting ("query") single-cell RNA-seq datasets onto a reference map (i.e. a curated and annotated scRNA-seq dataset). To project multiple datasets, submit a list of Seurat objects with the query parameter. The projection consists of 3 steps:
pre-processing: optional steps which might include pre-filtering of cells by markers using 'scGate', data normalization, and ortholog conversion.
batch-effect correction: uses built-in STACAS algorithm to detect and correct for batch effects (this step assumes that at least a fraction of the cells in the query are in the same state than cells in the reference)
embedding of corrected query data in the reduced-dimensionality spaces (PCA and UMAP) of the reference map.
This function acts as a wrapper for make.projection and cellstate.predict
Run.ProjecTILs( query, ref = NULL, filter.cells = TRUE, split.by = NULL, reduction = "pca", ndim = NULL, k = 5, nn.decay = 0.1, min.confidence = 0.2, labels.col = "functional.cluster", ... )Run.ProjecTILs( query, ref = NULL, filter.cells = TRUE, split.by = NULL, reduction = "pca", ndim = NULL, k = 5, nn.decay = 0.1, min.confidence = 0.2, labels.col = "functional.cluster", ... )
query |
Query data, either as single Seurat object or as a list of Seurat object |
ref |
Reference Atlas - if NULL, downloads the default TIL reference atlas |
filter.cells |
Pre-filter cells using 'scGate'. Only set to FALSE if the dataset has been previously subset to cell types represented in the reference. |
split.by |
Grouping variable to split the query object (e.g. if the object contains multiple samples) |
reduction |
The dimensionality reduction used to assign cell type labels, based on majority voting of nearest neighbors between reference and query. |
ndim |
The number of dimensions used for cell type classification |
k |
Number of neighbors for cell type classification |
nn.decay |
Weight decay for internal nearest neighbors (between 0 and 1) |
min.confidence |
Minimum confidence score to return cell type labels (otherwise NA) |
labels.col |
The metadata field of the reference to annotate the clusters |
... |
Additional parameters to make.projection |
See load.reference.map to load or download a reference atlas. See also ProjecTILs.classifier to use ProjecTILs as a cell type classifier.
An augmented Seurat object with projected UMAP coordinates on the reference map and cell classifications
data(query_example_seurat) ref <- load.reference.map() q <- Run.ProjecTILs(query_example_seurat, ref=ref, fast.umap.predict=TRUE) plot.projection(ref=ref, query=q)data(query_example_seurat) ref <- load.reference.map() q <- Run.ProjecTILs(query_example_seurat, ref=ref, fast.umap.predict=TRUE) plot.projection(ref=ref, query=q)