| Title: | Evaluation of cell type classifications in single-cell transcriptomics |
|---|---|
| Description: | scTypeEval provides tools to evaluate and validate cell type classifications in single-cell transcriptomics when ground truth labels are limited or unavailable. Results are organized in an S4 object that integrates processed data, dimensional reductions, dissimilarity assays, and consistency metrics computed across samples. The workflow includes preprocessing and feature selection, principal component analysis, computation of dissimilarity matrices, internal validation metrics (for example, silhouette-based summaries), and visualization utilities to inspect heatmaps and PCA plots. Functions support common single-cell containers and enable comparison of clustering and labeling strategies across datasets. |
| Authors: | Josep Garnica [aut, cre] (ORCID: <https://orcid.org/0000-0001-9493-1321>), Massimo Andreatta [aut] (ORCID: <https://orcid.org/0000-0002-8036-2647>), Santiago Carmona [aut] (ORCID: <https://orcid.org/0000-0002-2495-0671>) |
| Maintainer: | Josep Garnica <[email protected]> |
| License: | GPL-3 + file LICENSE |
| Version: | 1.1.0 |
| Built: | 2026-05-28 14:37:25 UTC |
| Source: | https://github.com/carmonalab/scTypeEval |
This function allows the user to insert pre-computed dimensionality reduction (e.g., PCA, UMAP,
t-SNE, or any embedding) into an scTypeEval object. The embeddings are stored in the
reductions slot as a dim_red object, enabling integration with the scTypeEval
workflow for downstream analysis.
add_dim_reduction( scTypeEval, embeddings, aggregation, ident = NULL, ident_name = "custom", sample = NULL, key = NULL, gene_list = NULL, black_list = NULL, feature_loadings = NULL, filter = FALSE, min_samples = 5, min_cells = 10, verbose = TRUE )add_dim_reduction( scTypeEval, embeddings, aggregation, ident = NULL, ident_name = "custom", sample = NULL, key = NULL, gene_list = NULL, black_list = NULL, feature_loadings = NULL, filter = FALSE, min_samples = 5, min_cells = 10, verbose = TRUE )
scTypeEval |
A |
embeddings |
A numeric matrix of dimension-reduced embeddings (cells/samples x components). |
aggregation |
Character. Aggregation level of the embeddings. Options:
|
ident |
Required. A vector of cell identities (e.g., cell type annotation).
If |
ident_name |
Character. Name assigned to the provided |
sample |
Required. A vector indicating sample identity of each observation.
If |
key |
Optional. Character. Key or label assigned to this dimensionality reduction (e.g., |
gene_list |
Optional. Character vector or named list of genes associated with the embeddings
(e.g., input features used for the dimensionality reduction). Default: |
black_list |
Optional. Character vector of genes excluded from the dimensionality reduction.
Default: |
feature_loadings |
Optional. Matrix of feature loadings corresponding to the embeddings
(e.g., PCA rotation matrix). Default: |
filter |
Logical. If |
min_samples |
Integer. Minimum number of samples required for retaining a cell type in
single-cell filtering. Default: |
min_cells |
Integer. Minimum number of cells required per group for filtering. Default: |
verbose |
Logical. Whether to print messages during execution. Default: |
The modified scTypeEval object with the new dimensionality reduction stored
in the reductions slot.
# Create test data with enough samples library(Matrix) counts <- Matrix(rpois(6000, 5), nrow = 100, ncol = 60, sparse = TRUE) rownames(counts) <- paste0("Gene", seq_len(100)) colnames(counts) <- paste0("Cell", seq_len(60)) metadata <- data.frame( celltype = rep(c("TypeA", "TypeB"), each = 30), sample = rep(paste0("Sample", seq_len(6)), times = 10), row.names = colnames(counts) ) # Create scTypeEval object from matrix sceval <- create_scTypeEval(matrix = counts, metadata = metadata) sceval <- create_scTypeEval( matrix = counts, metadata = metadata, active_ident = "celltype" ) #' # Process data with filtering sceval <- run_processing_data( sceval, ident = "celltype", sample = "sample", min_samples = 3, min_cells = 3, verbose = FALSE ) # Create mock embeddings n_cells <- ncol(sceval@data[["single-cell"]]@matrix) embeddings <- matrix(rnorm(n_cells * 10), nrow = 10, ncol = n_cells) ident <- sceval@data[["single-cell"]]@ident[[1]] sample <- sceval@data[["single-cell"]]@sample sceval <- add_dim_reduction( sceval, embeddings = embeddings, aggregation = "single-cell", ident = ident, sample = sample, key = "custom_embedding", verbose = FALSE )# Create test data with enough samples library(Matrix) counts <- Matrix(rpois(6000, 5), nrow = 100, ncol = 60, sparse = TRUE) rownames(counts) <- paste0("Gene", seq_len(100)) colnames(counts) <- paste0("Cell", seq_len(60)) metadata <- data.frame( celltype = rep(c("TypeA", "TypeB"), each = 30), sample = rep(paste0("Sample", seq_len(6)), times = 10), row.names = colnames(counts) ) # Create scTypeEval object from matrix sceval <- create_scTypeEval(matrix = counts, metadata = metadata) sceval <- create_scTypeEval( matrix = counts, metadata = metadata, active_ident = "celltype" ) #' # Process data with filtering sceval <- run_processing_data( sceval, ident = "celltype", sample = "sample", min_samples = 3, min_cells = 3, verbose = FALSE ) # Create mock embeddings n_cells <- ncol(sceval@data[["single-cell"]]@matrix) embeddings <- matrix(rnorm(n_cells * 10), nrow = 10, ncol = n_cells) ident <- sceval@data[["single-cell"]]@ident[[1]] sample <- sceval@data[["single-cell"]]@sample sceval <- add_dim_reduction( sceval, embeddings = embeddings, aggregation = "single-cell", ident = ident, sample = sample, key = "custom_embedding", verbose = FALSE )
This function appends a new gene list to an existing scTypeEval object.
It ensures that the input list is valid and assigns names if they are missing.
add_gene_list(scTypeEval, gene_list = NULL)add_gene_list(scTypeEval, gene_list = NULL)
scTypeEval |
An |
gene_list |
A named list of gene sets to append. If the list is unnamed, names will be assigned automatically. |
The function verifies that gene_list is provided and is a valid list.
If any elements in gene_list lack names, they are automatically renamed.
An updated scTypeEval object with the new gene list added.
#' # Create synthetic test data library(Matrix) counts <- Matrix(rpois(1000, 5), nrow = 50, ncol = 20, sparse = TRUE) rownames(counts) <- paste0("Gene", seq_len(50)) colnames(counts) <- paste0("Cell", seq_len(20)) metadata <- data.frame( celltype = rep(c("TypeA", "TypeB"), each = 10), sample = rep(paste0("Sample", seq_len(4)), each = 5), row.names = colnames(counts) ) # Create scTypeEval object from matrix sceval <- create_scTypeEval(matrix = counts, metadata = metadata) sceval <- create_scTypeEval( matrix = counts, metadata = metadata, active_ident = "celltype" ) sceval <- add_gene_list(sceval, gene_list = list("cytokines" = c("IL10", "IL6", "IL4")))#' # Create synthetic test data library(Matrix) counts <- Matrix(rpois(1000, 5), nrow = 50, ncol = 20, sparse = TRUE) rownames(counts) <- paste0("Gene", seq_len(50)) colnames(counts) <- paste0("Cell", seq_len(20)) metadata <- data.frame( celltype = rep(c("TypeA", "TypeB"), each = 10), sample = rep(paste0("Sample", seq_len(4)), each = 5), row.names = colnames(counts) ) # Create scTypeEval object from matrix sceval <- create_scTypeEval(matrix = counts, metadata = metadata) sceval <- create_scTypeEval( matrix = counts, metadata = metadata, active_ident = "celltype" ) sceval <- add_gene_list(sceval, gene_list = list("cytokines" = c("IL10", "IL6", "IL4")))
Adds a user-supplied processed dataset (e.g. single-cell or pseudobulk)
into an scTypeEval object as a data_assay. Supports filtering and consistency checks
on cell type and sample annotations.
add_processed_data( scTypeEval, data, aggregation, ident = NULL, ident_name = "custom", sample = NULL, filter = FALSE, min_samples = 5, min_cells = 10, verbose = TRUE )add_processed_data( scTypeEval, data, aggregation, ident = NULL, ident_name = "custom", sample = NULL, filter = FALSE, min_samples = 5, min_cells = 10, verbose = TRUE )
scTypeEval |
An |
data |
A count matrix (dense or sparse) containing processed expression values (either single-cell or pseudobulk aggregated). |
aggregation |
A string specifying the aggregation type. Must be one of the
supported: |
ident |
A vector of cell identities corresponding to the columns of |
ident_name |
A string specifying the name under which the provided |
sample |
A vector of sample identifiers corresponding to the columns of |
filter |
Logical indicating whether to filter the data based on |
min_samples |
Minimum number of samples required to retain a feature (default: 5). |
min_cells |
Minimum number of cells required to retain a feature (default: 10). |
verbose |
Logical indicating whether to print progress messages (default: TRUE). |
The function validates that identity and sample annotations match the dimensions of the input data.
For aggregation = "single-cell", optional filtering removes groups with too few
samples or cells.
For aggregation = "pseudobulk", the function checks that each (sample, identity)
pair occurs exactly once (i.e., fully aggregated).
Processed data is wrapped in a data_assay object and added to the scTypeEval.
An updated scTypeEval object containing:
data: A new data_assay object stored under the specified aggregation type.
# Create test data with enough samples library(Matrix) counts <- Matrix(rpois(6000, 5), nrow = 100, ncol = 60, sparse = TRUE) rownames(counts) <- paste0("Gene", seq_len(100)) colnames(counts) <- paste0("Cell", seq_len(60)) metadata <- data.frame( celltype = rep(c("TypeA", "TypeB"), each = 30), sample = rep(paste0("Sample", seq_len(6)), times = 10), row.names = colnames(counts) ) sceval <- create_scTypeEval(matrix = counts, metadata = metadata) sceval <- add_processed_data( sceval, data = counts, aggregation = "single-cell", ident = metadata$celltype, sample = metadata$sample, filter = FALSE )# Create test data with enough samples library(Matrix) counts <- Matrix(rpois(6000, 5), nrow = 100, ncol = 60, sparse = TRUE) rownames(counts) <- paste0("Gene", seq_len(100)) colnames(counts) <- paste0("Cell", seq_len(60)) metadata <- data.frame( celltype = rep(c("TypeA", "TypeB"), each = 30), sample = rep(paste0("Sample", seq_len(6)), times = 10), row.names = colnames(counts) ) sceval <- create_scTypeEval(matrix = counts, metadata = metadata) sceval <- add_processed_data( sceval, data = counts, aggregation = "single-cell", ident = metadata$celltype, sample = metadata$sample, filter = FALSE )
A curated list of genes typically excluded from single-cell RNA-seq analysis to reduce technical artifacts and improve cell type annotation quality.
A character vector with 7,775 gene symbols including:
MT- prefix genes (MT-RNR1, MT-RNR2, MT-TA, etc.)
RPL and RPS prefix genes
Genes with -AS1, -AS2, -IT1, -DT suffixes
MIR prefix genes (MIR1-1, MIR1-2, etc.)
SNORA, SNORD, RNU prefix genes
G1/S and G2/M phase markers
X and Y chromosome specific genes
Heat shock proteins, immediate early genes
This blacklist is automatically loaded and used by default in scTypeEval
when black_list = NULL is specified in functions like run_hvg
and run_gene_markers. Users can override this default by providing
a custom character vector of gene symbols to exclude.
The blacklist helps improve downstream analysis by removing:
Genes with high technical variance unrelated to cell type identity
Genes that may confound clustering (e.g., cell cycle genes)
Genes with batch-specific expression patterns
Non-coding RNAs that may not be informative for cell type annotation
A character vector containing gene symbols to be excluded from analysis, including cell cycle genes (G1/S and G2/M), mitochondrial genes, ribosomal genes, TCR and immunoglobulin genes, pseudogenes, heat shock proteins, non-coding RNAs, and sex chromosome genes (X and Y).
Generated using the SignatuR package (Andreatta et al., https://github.com/carmonalab/SignatuR) with the following categories:
Cell cycle genes: G1/S and G2/M phase markers from SignatuR::Programs
Technical artifacts: Mitochondrial, ribosomal, TCR, immunoglobulins, pseudogenes, HSP, and non-coding RNAs from SignatuR::Blocklists
Sex chromosome genes: X-inactivation escapees and Y-chromosome specific genes from GenderGenes database (http://bioinf.wehi.edu.au/software/GenderGenes/)
# Load the default gene blacklist data(black_list) # Inspect the first few entries head(black_list)# Load the default gene blacklist data(black_list) # Inspect the first few entries head(black_list)
This function initializes an scTypeEval object from various input formats, including Seurat,
SingleCellExperiment, or raw count matrices. It ensures compatibility by validating input types
and structures before constructing the object.
create_scTypeEval( matrix = NULL, metadata = NULL, gene_lists = list(), black_list = NULL, active_ident = NULL )create_scTypeEval( matrix = NULL, metadata = NULL, gene_lists = list(), black_list = NULL, active_ident = NULL )
matrix |
A Seurat object, SingleCellExperiment object, dense/sparse count
matrix, or |
metadata |
A metadata dataframe. Required if |
gene_lists |
A named list of gene sets to use in the evaluation (default: empty list). |
black_list |
A character vector of genes to exclude from analysis (default: empty). |
active_ident |
The active identity class or cluster label (optional). If provided, it is validated against the metadata. |
If matrix = NULL, the function initializes an empty dgCMatrix with 0 rows and 0 columns,
and requires a metadata dataframe.
For Seurat and SingleCellExperiment objects, counts and metadata are extracted automatically.
For raw matrices, metadata must be provided explicitly.
The function validates that the number of metadata rows matches the number of cells (matrix columns).
If active_ident is provided, it is checked for consistency with the metadata.
An scTypeEval object containing:
counts: A sparse count matrix (dgCMatrix).
metadata: A dataframe with metadata for each cell.
gene_lists: A list of gene sets used in classification.
black_list: A vector of excluded genes.
active_ident: Active cluster identity (if provided).
# Create synthetic test data library(Matrix) counts <- Matrix(rpois(1000, 5), nrow = 50, ncol = 20, sparse = TRUE) rownames(counts) <- paste0("Gene", seq_len(50)) colnames(counts) <- paste0("Cell", seq_len(20)) metadata <- data.frame( celltype = rep(c("TypeA", "TypeB"), each = 10), sample = rep(paste0("Sample", seq_len(4)), each = 5), row.names = colnames(counts) ) # Create scTypeEval object from matrix sceval <- create_scTypeEval(matrix = counts, metadata = metadata) # With custom gene lists and active identity gene_list <- list(markers = rownames(counts)[seq_len(10)]) sceval <- create_scTypeEval( matrix = counts, metadata = metadata, gene_lists = gene_list, active_ident = "celltype" )# Create synthetic test data library(Matrix) counts <- Matrix(rpois(1000, 5), nrow = 50, ncol = 20, sparse = TRUE) rownames(counts) <- paste0("Gene", seq_len(50)) colnames(counts) <- paste0("Cell", seq_len(20)) metadata <- data.frame( celltype = rep(c("TypeA", "TypeB"), each = 10), sample = rep(paste0("Sample", seq_len(4)), each = 5), row.names = colnames(counts) ) # Create scTypeEval object from matrix sceval <- create_scTypeEval(matrix = counts, metadata = metadata) # With custom gene lists and active identity gene_list <- list(markers = rownames(counts)[seq_len(10)]) sceval <- create_scTypeEval( matrix = counts, metadata = metadata, gene_lists = gene_list, active_ident = "celltype" )
Computes internal validation metrics (consistency measures) for cell type
annotations based on dissimilarity assays stored in a scTypeEval object.
For each indicted dissimilarity representation, one or more internal validation metrics
are calculated per cell type and returned in a tidy data.frame.
get_consistency( scTypeEval, dissimilarity_slot = "all", consistency_metric = c("silhouette", "2label_silhouette", "NeighborhoodPurity", "ward_PropMatch", "Orbital_medoid", "Average_similarity"), knn_graph_k = 5, hclust_method = "ward.D2", normalize = FALSE, return_scTypeEval = FALSE, verbose = TRUE )get_consistency( scTypeEval, dissimilarity_slot = "all", consistency_metric = c("silhouette", "2label_silhouette", "NeighborhoodPurity", "ward_PropMatch", "Orbital_medoid", "Average_similarity"), knn_graph_k = 5, hclust_method = "ward.D2", normalize = FALSE, return_scTypeEval = FALSE, verbose = TRUE )
scTypeEval |
A |
dissimilarity_slot |
Character. Which dissimilarity assay(s) to use.
Can be |
consistency_metric |
Character vector. Internal validation metrics to compute. Supported options include:
Default: all supported metrics. |
knn_graph_k |
Integer. Number of nearest neighbors to use for
graph-based metrics ( |
hclust_method |
Character. Agglomeration method passed to |
normalize |
Logical. Whether to normalize metric values for expected proportions by chance.
Default is |
return_scTypeEval |
Logical. Whether to return data frame with inter-sample consistencies or store within scTypeEval@consistency slot. Default is |
verbose |
Logical. Whether to print progress messages. Default is |
This function builds upon the dissimilarity assays generated by
run_dissimilarity. For each selected dissimilarity representation,
the chosen internal validation metrics are computed and stored in a long-format
data frame, allowing downstream comparison across cell types, metrics,
and dissimilarity methods.
A data.frame with the following columns:
celltype – the annotation/group label
measure – numeric consistency score
consistency_metric – the metric name
dissimilarity_method – the dissimilarity method used
ident – the identity class (from scTypeEval@ident)
# Create and process test data library(Matrix) counts <- Matrix(rpois(6000, 5), nrow = 100, ncol = 60, sparse = TRUE) rownames(counts) <- paste0("Gene", seq_len(100)) colnames(counts) <- paste0("Cell", seq_len(60)) metadata <- data.frame( celltype = rep(c("TypeA", "TypeB"), each = 30), sample = rep(paste0("Sample", seq_len(6)), times = 10), row.names = colnames(counts) ) sceval <- create_scTypeEval(matrix = counts, metadata = metadata) sceval <- run_processing_data(sceval, ident = "celltype", sample = "sample", min_samples = 3, min_cells = 3, verbose = FALSE) sceval <- run_hvg(sceval, var_method = "basic", ngenes = 100, verbose = FALSE) sceval <- run_dissimilarity(sceval, method = "Pseudobulk:Euclidean", reduction = FALSE, verbose = FALSE) # Obtain consistency cons <- get_consistency(sceval, verbose = FALSE)# Create and process test data library(Matrix) counts <- Matrix(rpois(6000, 5), nrow = 100, ncol = 60, sparse = TRUE) rownames(counts) <- paste0("Gene", seq_len(100)) colnames(counts) <- paste0("Cell", seq_len(60)) metadata <- data.frame( celltype = rep(c("TypeA", "TypeB"), each = 30), sample = rep(paste0("Sample", seq_len(6)), times = 10), row.names = colnames(counts) ) sceval <- create_scTypeEval(matrix = counts, metadata = metadata) sceval <- run_processing_data(sceval, ident = "celltype", sample = "sample", min_samples = 3, min_cells = 3, verbose = FALSE) sceval <- run_hvg(sceval, var_method = "basic", ngenes = 100, verbose = FALSE) sceval <- run_dissimilarity(sceval, method = "Pseudobulk:Euclidean", reduction = FALSE, verbose = FALSE) # Obtain consistency cons <- get_consistency(sceval, verbose = FALSE)
This function performs hierarchical clustering using precomputed dissimilarity
matrices stored in the dissimilarity slot of an scTypeEval object.
Each dissimilarity assay is clustered independently using the specified hierarchical
clustering method.
get_hierarchy( scTypeEval, dissimilarity_slot = "all", hierarchy_method = "ward.D2", verbose = TRUE )get_hierarchy( scTypeEval, dissimilarity_slot = "all", hierarchy_method = "ward.D2", verbose = TRUE )
scTypeEval |
An |
dissimilarity_slot |
Character string. Specifies which dissimilarity assays to cluster.
Use |
hierarchy_method |
Character string specifying the hierarchical clustering method
(default: |
verbose |
Logical. If |
For each dissimilarity assay, the function applies hclust to the stored dissimilarity matrix. The number of clusters is set equal to the number of unique identities provided in the assay metadata.
A list of clustering results, one per dissimilarity assay. Each element contains a contingency table of cluster assignments versus input labels.
# Create and process test data library(Matrix) counts <- Matrix(rpois(6000, 5), nrow = 100, ncol = 60, sparse = TRUE) rownames(counts) <- paste0("Gene", seq_len(100)) colnames(counts) <- paste0("Cell", seq_len(60)) metadata <- data.frame( celltype = rep(c("TypeA", "TypeB"), each = 30), sample = rep(paste0("Sample", seq_len(6)), times = 10), row.names = colnames(counts) ) sceval <- create_scTypeEval(matrix = counts, metadata = metadata) sceval <- run_processing_data(sceval, ident = "celltype", sample = "sample", min_samples = 3, min_cells = 3, verbose = FALSE) sceval <- run_hvg(sceval, var_method = "basic", ngenes = 100, verbose = FALSE) # Compute Pseudobulk Euclidean dissimilarity sceval <- run_dissimilarity(sceval, method = "Pseudobulk:Euclidean", reduction = FALSE, verbose = FALSE) # Perform hierarchical clustering on all available dissimilarity matrices hier_results <- get_hierarchy(sceval, verbose = FALSE)# Create and process test data library(Matrix) counts <- Matrix(rpois(6000, 5), nrow = 100, ncol = 60, sparse = TRUE) rownames(counts) <- paste0("Gene", seq_len(100)) colnames(counts) <- paste0("Cell", seq_len(60)) metadata <- data.frame( celltype = rep(c("TypeA", "TypeB"), each = 30), sample = rep(paste0("Sample", seq_len(6)), times = 10), row.names = colnames(counts) ) sceval <- create_scTypeEval(matrix = counts, metadata = metadata) sceval <- run_processing_data(sceval, ident = "celltype", sample = "sample", min_samples = 3, min_cells = 3, verbose = FALSE) sceval <- run_hvg(sceval, var_method = "basic", ngenes = 100, verbose = FALSE) # Compute Pseudobulk Euclidean dissimilarity sceval <- run_dissimilarity(sceval, method = "Pseudobulk:Euclidean", reduction = FALSE, verbose = FALSE) # Perform hierarchical clustering on all available dissimilarity matrices hier_results <- get_hierarchy(sceval, verbose = FALSE)
This function computes KNN-based neighborhood composition scores from dissimilarity
matrices stored in a scTypeEval object. For each dissimilarity assay, it constructs
a KNN graph, computes the cell type composition among neighbors
and aggregates results at the group level.
get_nn( scTypeEval, dissimilarity_slot = "all", knn_graph_k = 5, normalize = FALSE, verbose = TRUE )get_nn( scTypeEval, dissimilarity_slot = "all", knn_graph_k = 5, normalize = FALSE, verbose = TRUE )
scTypeEval |
An object containing dissimilarity matrices and metadata. |
dissimilarity_slot |
A character string specifying which dissimilarity assay(s)
to use. If |
knn_graph_k |
Integer; the number of neighbors to consider in the KNN graph (default: |
normalize |
Logical; if |
verbose |
Logical; if |
The function extracts dissimilarity matrices from the scTypeEval object and applies
a KNN graph construction. For each assay, it computes neighbor
cell-type proportions per cell and then aggregate them by cell type. If
normalization is enabled, the observed neighbor proportions are scaled relative to
the expected global frequency of each cell type.
A list (or a single data frame if only one assay is processed) where each element corresponds to a dissimilarity assay. Each element contains a data frame with rows corresponding to reference cell types and columns representing the mean proportion of neighbors belonging to each cell type.
# Create and process test data library(Matrix) counts <- Matrix(rpois(6000, 5), nrow = 100, ncol = 60, sparse = TRUE) rownames(counts) <- paste0("Gene", seq_len(100)) colnames(counts) <- paste0("Cell", seq_len(60)) metadata <- data.frame( celltype = rep(c("TypeA", "TypeB"), each = 30), sample = rep(paste0("Sample", seq_len(6)), times = 10), row.names = colnames(counts) ) sceval <- create_scTypeEval(matrix = counts, metadata = metadata) sceval <- run_processing_data(sceval, ident = "celltype", sample = "sample", min_samples = 3, min_cells = 3, verbose = FALSE) sceval <- run_hvg(sceval, var_method = "basic", ngenes = 100, verbose = FALSE) # Compute Pseudobulk Euclidean dissimilarity sceval <- run_dissimilarity(sceval, method = "Pseudobulk:Euclidean", reduction = FALSE, verbose = FALSE) # Get nearest neighbors result <- get_nn(scTypeEval = sceval, knn_graph_k = 5, normalize = TRUE, verbose = FALSE)# Create and process test data library(Matrix) counts <- Matrix(rpois(6000, 5), nrow = 100, ncol = 60, sparse = TRUE) rownames(counts) <- paste0("Gene", seq_len(100)) colnames(counts) <- paste0("Cell", seq_len(60)) metadata <- data.frame( celltype = rep(c("TypeA", "TypeB"), each = 30), sample = rep(paste0("Sample", seq_len(6)), times = 10), row.names = colnames(counts) ) sceval <- create_scTypeEval(matrix = counts, metadata = metadata) sceval <- run_processing_data(sceval, ident = "celltype", sample = "sample", min_samples = 3, min_cells = 3, verbose = FALSE) sceval <- run_hvg(sceval, var_method = "basic", ngenes = 100, verbose = FALSE) # Compute Pseudobulk Euclidean dissimilarity sceval <- run_dissimilarity(sceval, method = "Pseudobulk:Euclidean", reduction = FALSE, verbose = FALSE) # Get nearest neighbors result <- get_nn(scTypeEval = sceval, knn_graph_k = 5, normalize = TRUE, verbose = FALSE)
Loads single-cell datasets from .rds or .h5ad files into R.
Supports Seurat, SingleCellExperiment, and anndata objects.
Depending on the input type and the split parameter, the function
either returns the original object or extracts and returns the
raw counts matrix and metadata.
load_single_cell_object(path, split = TRUE)load_single_cell_object(path, split = TRUE)
path |
Character string. Path to the file to load.
Supported formats: |
split |
Logical (default:
If |
.rds input:
If the object is a Seurat object, the raw counts are extracted
from the "RNA" assay, and cell metadata is taken from [email protected].
If the object is a SingleCellExperiment, counts are extracted from
the "counts" assay, and cell metadata is taken from colData(object).
Other .rds object types are not supported.
.h5ad input:
Requires the anndata package. The "counts" layer must be present,
otherwise the function will stop with an error. Counts are transposed
to match R’s cell-by-gene convention.
If split = TRUE, a list containing:
counts – sparse counts matrix
metadata – cell metadata
If split = FALSE, returns the loaded single-cell object directly.
# Set a temporary location for temporary file filepath <- file.path(tempdir(), "sce_test.rds") # Create small SCE object with sparse matrix counts <- Matrix::Matrix(rpois(1000, 5), nrow = 50, ncol = 20, sparse = TRUE) rownames(counts) <- paste0("Gene", seq_len(50)) colnames(counts) <- paste0("Cell", seq_len(20)) sce_obj <- SingleCellExperiment::SingleCellExperiment( assays = list(counts = counts), colData = data.frame( cell_type = rep(c("TypeA", "TypeB"), each = 10), row.names = colnames(counts) ) ) saveRDS(sce_obj, filepath) obj <- load_single_cell_object(filepath, split = TRUE)# Set a temporary location for temporary file filepath <- file.path(tempdir(), "sce_test.rds") # Create small SCE object with sparse matrix counts <- Matrix::Matrix(rpois(1000, 5), nrow = 50, ncol = 20, sparse = TRUE) rownames(counts) <- paste0("Gene", seq_len(50)) colnames(counts) <- paste0("Cell", seq_len(20)) sce_obj <- SingleCellExperiment::SingleCellExperiment( assays = list(counts = counts), colData = data.frame( cell_type = rep(c("TypeA", "TypeB"), each = 10), row.names = colnames(counts) ) ) saveRDS(sce_obj, filepath) obj <- load_single_cell_object(filepath, split = TRUE)
Visualize dissimilarity matrices stored in an scTypeEval object as annotated
heatmaps. Each selected dissimilarity assay is shown as a heatmap where rows and
columns represent cell type and sample, ordered by cell type and optionally sorted by similarity
or consistency metrics. Group boundaries are marked to highlight cell-type consistency.
plot_heatmap( scTypeEval, dissimilarity_slot = "all", sort_similarity = NULL, sort_consistency = NULL, low_color = "black", high_color = "white", hclust_method = "ward.D2", verbose = TRUE, ... )plot_heatmap( scTypeEval, dissimilarity_slot = "all", sort_similarity = NULL, sort_consistency = NULL, low_color = "black", high_color = "white", hclust_method = "ward.D2", verbose = TRUE, ... )
scTypeEval |
An |
dissimilarity_slot |
Character string specifying which dissimilarity assay(s)
to plot. If |
sort_similarity |
Optional. Character string naming a dissimilarity assay to use for ordering cells by similarity (hierarchical clustering within each cell type). |
sort_consistency |
Optional. Character string specifying a consistency metric
(passed to |
low_color |
Color for the low dissimilarity end of the heatmap gradient (default: |
high_color |
Color for the high dissimilarity end of the heatmap gradient (default: |
hclust_method |
Clustering method to use when |
verbose |
Logical. Whether to print progress and diagnostic messages (default: |
... |
Additional arguments passed to |
Ordering logic:
If both sort_similarity and sort_consistency are NULL, cell types are
ordered alphabetically, and cells within each type are ordered alphabetically.
If only sort_consistency is provided, cell types are ordered by the
selected consistency metric, and cells within each type are ordered alphabetically.
If only sort_similarity is provided, cell types are ordered by hierarchical
clustering of average similarities, and cells within each type are clustered.
If both are provided, cell types are ordered by consistency, while cells within each type are ordered by similarity clustering.
Each heatmap is rendered with ggplot2, with group boundaries and axis labels indicating cell-type structure.
A named list of ggplot2 objects, one per dissimilarity assay.
If only a single assay is selected, the corresponding heatmap plot is returned directly.
run_dissimilarity, hclust, get_consistency
#' # Create and process test data library(Matrix) counts <- Matrix(rpois(6000, 5), nrow = 100, ncol = 60, sparse = TRUE) rownames(counts) <- paste0("Gene", seq_len(100)) colnames(counts) <- paste0("Cell", seq_len(60)) metadata <- data.frame( celltype = rep(c("TypeA", "TypeB"), each = 30), sample = rep(paste0("Sample", seq_len(6)), times = 10), row.names = colnames(counts) ) sceval <- create_scTypeEval(matrix = counts, metadata = metadata) sceval <- run_processing_data(sceval, ident = "celltype", sample = "sample", min_samples = 3, min_cells = 3, verbose = FALSE) sceval <- run_hvg(sceval, var_method = "basic", ngenes = 100, verbose = FALSE) # Compute Pseudobulk Euclidean dissimilarity sceval <- run_dissimilarity(sceval, method = "Pseudobulk:Euclidean", reduction = FALSE, verbose = FALSE) # Plot heatmaps for all dissimilarity assays heatmap <- plot_heatmap(sceval)#' # Create and process test data library(Matrix) counts <- Matrix(rpois(6000, 5), nrow = 100, ncol = 60, sparse = TRUE) rownames(counts) <- paste0("Gene", seq_len(100)) colnames(counts) <- paste0("Cell", seq_len(60)) metadata <- data.frame( celltype = rep(c("TypeA", "TypeB"), each = 30), sample = rep(paste0("Sample", seq_len(6)), times = 10), row.names = colnames(counts) ) sceval <- create_scTypeEval(matrix = counts, metadata = metadata) sceval <- run_processing_data(sceval, ident = "celltype", sample = "sample", min_samples = 3, min_cells = 3, verbose = FALSE) sceval <- run_hvg(sceval, var_method = "basic", ngenes = 100, verbose = FALSE) # Compute Pseudobulk Euclidean dissimilarity sceval <- run_dissimilarity(sceval, method = "Pseudobulk:Euclidean", reduction = FALSE, verbose = FALSE) # Plot heatmaps for all dissimilarity assays heatmap <- plot_heatmap(sceval)
This function visualizes Multidimensional Scaling (MDS) results computed from
dissimilarity assays stored in an scTypeEval object. For each dissimilarity assay,
MDS is performed using stats::cmdscale() and results are displayed as scatterplots.
plot_mds( scTypeEval, dissimilarity_slot = "all", label = TRUE, dims = c(1, 2), show_legend = FALSE )plot_mds( scTypeEval, dissimilarity_slot = "all", label = TRUE, dims = c(1, 2), show_legend = FALSE )
scTypeEval |
An |
dissimilarity_slot |
Character string specifying which dissimilarity assay(s)
to use. If |
label |
Logical; whether to add medoid labels to the MDS plot (default: |
dims |
Integer vector of length 2; the MDS dimensions to plot (default: |
show_legend |
Logical; whether to display a legend (default: |
For each selected dissimilarity assay, the function:
Extracts the dissimilarity matrix and cell-type identities.
Performs classical multidimensional scaling (MDS) with cmdscale().
Generates a 2D scatterplot with group labels and optional legends using an internal plotting helper.
The plot axes correspond to the requested MDS dimensions (dims).
A named list of MDS plots (ggplot2 objects), one per dissimilarity assay.
If only a single assay is processed, a single ggplot2 object is returned.
#' # Create and process test data library(Matrix) counts <- Matrix(rpois(6000, 5), nrow = 100, ncol = 60, sparse = TRUE) rownames(counts) <- paste0("Gene", seq_len(100)) colnames(counts) <- paste0("Cell", seq_len(60)) metadata <- data.frame( celltype = rep(c("TypeA", "TypeB"), each = 30), sample = rep(paste0("Sample", seq_len(6)), times = 10), row.names = colnames(counts) ) sceval <- create_scTypeEval(matrix = counts, metadata = metadata) sceval <- run_processing_data(sceval, ident = "celltype", sample = "sample", min_samples = 3, min_cells = 3, verbose = FALSE) sceval <- run_hvg(sceval, var_method = "basic", ngenes = 100, verbose = FALSE) # Compute Pseudobulk Euclidean dissimilarity sceval <- run_dissimilarity(sceval, method = "Pseudobulk:Euclidean", reduction = FALSE, verbose = FALSE) # Plot MDS for all dissimilarity assays mds_plots <- plot_mds(sceval)#' # Create and process test data library(Matrix) counts <- Matrix(rpois(6000, 5), nrow = 100, ncol = 60, sparse = TRUE) rownames(counts) <- paste0("Gene", seq_len(100)) colnames(counts) <- paste0("Cell", seq_len(60)) metadata <- data.frame( celltype = rep(c("TypeA", "TypeB"), each = 30), sample = rep(paste0("Sample", seq_len(6)), times = 10), row.names = colnames(counts) ) sceval <- create_scTypeEval(matrix = counts, metadata = metadata) sceval <- run_processing_data(sceval, ident = "celltype", sample = "sample", min_samples = 3, min_cells = 3, verbose = FALSE) sceval <- run_hvg(sceval, var_method = "basic", ngenes = 100, verbose = FALSE) # Compute Pseudobulk Euclidean dissimilarity sceval <- run_dissimilarity(sceval, method = "Pseudobulk:Euclidean", reduction = FALSE, verbose = FALSE) # Plot MDS for all dissimilarity assays mds_plots <- plot_mds(sceval)
This function visualizes Principal Component Analysis (PCA) results stored in the
reductions slot of an scTypeEval object.
plot_pca( scTypeEval, reduction_slot = "all", label = TRUE, dims = c(1, 2), show_legend = FALSE )plot_pca( scTypeEval, reduction_slot = "all", label = TRUE, dims = c(1, 2), show_legend = FALSE )
scTypeEval |
An |
reduction_slot |
Character. Name(s) of the reduction(s) to plot. If |
label |
Logical. Whether to add medoid labels to the PCA plot (default: |
dims |
Integer vector of length 2. The principal component (PC) dimensions to plot
(default: |
show_legend |
Logical. Whether to display a legend (default: |
A named list of PCA plots (ggplot objects) corresponding to
the PCA analyses stored in the reductions slot of an scTypeEval object.
If only one reduction is selected, a single ggplot object is returned.
# Create and process test data library(Matrix) counts <- Matrix(rpois(6000, 5), nrow = 100, ncol = 60, sparse = TRUE) rownames(counts) <- paste0("Gene", seq_len(100)) colnames(counts) <- paste0("Cell", seq_len(60)) metadata <- data.frame( celltype = rep(c("TypeA", "TypeB"), each = 30), sample = rep(paste0("Sample", seq_len(6)), times = 10), row.names = colnames(counts) ) sceval <- create_scTypeEval(matrix = counts, metadata = metadata) sceval <- run_processing_data(sceval, ident = "celltype", sample = "sample", min_samples = 3, min_cells = 3, verbose = FALSE) sceval <- run_hvg(sceval, var_method = "basic", ngenes = 100, verbose = FALSE) # Run PCA on HVG genes sceval <- run_pca(sceval, gene_list = "HVG", ndim = 4, verbose = FALSE) # plot PCA plot_pca(sceval)# Create and process test data library(Matrix) counts <- Matrix(rpois(6000, 5), nrow = 100, ncol = 60, sparse = TRUE) rownames(counts) <- paste0("Gene", seq_len(100)) colnames(counts) <- paste0("Cell", seq_len(60)) metadata <- data.frame( celltype = rep(c("TypeA", "TypeB"), each = 30), sample = rep(paste0("Sample", seq_len(6)), times = 10), row.names = colnames(counts) ) sceval <- create_scTypeEval(matrix = counts, metadata = metadata) sceval <- run_processing_data(sceval, ident = "celltype", sample = "sample", min_samples = 3, min_cells = 3, verbose = FALSE) sceval <- run_hvg(sceval, var_method = "basic", ngenes = 100, verbose = FALSE) # Run PCA on HVG genes sceval <- run_pca(sceval, gene_list = "HVG", ndim = 4, verbose = FALSE) # plot PCA plot_pca(sceval)
Computes dissimilarity between cell populations or pseudobulk profiles
stored in a scTypeEval object, using one of the supported methods.
Dissimilarity can be computed either on dimensional reduction embeddings
(if available) or on processed gene expression data.
run_dissimilarity( scTypeEval, method = "Pseudobulk:Euclidean", reduction = TRUE, gene_list = NULL, black_list = NULL, reciprocal_classifier = "SingleR", ncores = 1, bparam = NULL, progressbar = FALSE, verbose = TRUE )run_dissimilarity( scTypeEval, method = "Pseudobulk:Euclidean", reduction = TRUE, gene_list = NULL, black_list = NULL, reciprocal_classifier = "SingleR", ncores = 1, bparam = NULL, progressbar = FALSE, verbose = TRUE )
scTypeEval |
A |
method |
Character. Dissimilarity method to use. Must be one of
|
reduction |
Logical. Whether to compute dissimilarity on dimensional
reduction embeddings (if available).
No supported for "recip_classif" dissimilarity methods.
Default is |
gene_list |
Optional. Character vector of genes to include.
If |
black_list |
Optional. Character vector of genes to exclude. If
|
reciprocal_classifier |
Character. Classifier to use for recip_classif
dissimilarity methods. Default is |
ncores |
Integer. Number of cores for parallelization. Default is |
bparam |
Optional. BiocParallel parameter object to control
parallelization. Default is |
progressbar |
Logical. Whether to display a progress bar during
computation. Default is |
verbose |
Logical. Whether to print progress messages. Default is |
The function supports multiple dissimilarity strategies:
"Pseudobulk:<distance>" – computes pairwise distances between
pseudobulk profiles using the specified distance metric.
Supported distances are euclidean, cosine, and pearson.
"WasserStein" – computes Wasserstein distances between groups
of embeddings or cells.
"recip_classif:<method>" – assigns cells pairwise between samples using
the specified classifier, then computes dissimilarity between assignments.
Supported methods are 'match' (binary) and 'score'.
If reduction = TRUE, the function expects that dimensional reduction
embeddings have been added previously via run_pca() or
add_dim_reduction(). If unavailable, set reduction = FALSE
to compute dissimilarity on processed expression data instead.
An updated scTypeEval object with a new dissimilarity_assay
stored in scTypeEval@dissimilarity[[method]].
add_dim_reduction, run_pca,
run_processing_data
# Create and process test data library(Matrix) counts <- Matrix(rpois(6000, 5), nrow = 100, ncol = 60, sparse = TRUE) rownames(counts) <- paste0("Gene", seq_len(100)) colnames(counts) <- paste0("Cell", seq_len(60)) metadata <- data.frame( celltype = rep(c("TypeA", "TypeB"), each = 30), sample = rep(paste0("Sample", seq_len(6)), times = 10), row.names = colnames(counts) ) sceval <- create_scTypeEval(matrix = counts, metadata = metadata) sceval <- run_processing_data(sceval, ident = "celltype", sample = "sample", min_samples = 3, min_cells = 3, verbose = FALSE) sceval <- run_hvg(sceval, var_method = "basic", ngenes = 100, verbose = FALSE) # Compute Pseudobulk Euclidean dissimilarity sceval <- run_dissimilarity(sceval, method = "Pseudobulk:Euclidean", reduction = FALSE, verbose = FALSE)# Create and process test data library(Matrix) counts <- Matrix(rpois(6000, 5), nrow = 100, ncol = 60, sparse = TRUE) rownames(counts) <- paste0("Gene", seq_len(100)) colnames(counts) <- paste0("Cell", seq_len(60)) metadata <- data.frame( celltype = rep(c("TypeA", "TypeB"), each = 30), sample = rep(paste0("Sample", seq_len(6)), times = 10), row.names = colnames(counts) ) sceval <- create_scTypeEval(matrix = counts, metadata = metadata) sceval <- run_processing_data(sceval, ident = "celltype", sample = "sample", min_samples = 3, min_cells = 3, verbose = FALSE) sceval <- run_hvg(sceval, var_method = "basic", ngenes = 100, verbose = FALSE) # Compute Pseudobulk Euclidean dissimilarity sceval <- run_dissimilarity(sceval, method = "Pseudobulk:Euclidean", reduction = FALSE, verbose = FALSE)
Identifies cell type marker genes from normalized single-cell data
stored in an scTypeEval object. The identified markers are stored in the
gene_lists slot under the chosen method.
run_gene_markers( scTypeEval, method = c("scran.findMarkers"), ngenes_celltype = 50, aggregation = "single-cell", black_list = NULL, ncores = 1, bparam = NULL, progressbar = FALSE, verbose = TRUE, ... )run_gene_markers( scTypeEval, method = c("scran.findMarkers"), ngenes_celltype = 50, aggregation = "single-cell", black_list = NULL, ncores = 1, bparam = NULL, progressbar = FALSE, verbose = TRUE, ... )
scTypeEval |
An |
method |
A character string specifying the marker gene identification method. Currently supported:
Default: |
ngenes_celltype |
Integer specifying the max number of marker genes to retain
per cell type (default: |
aggregation |
Method to group cells stored in |
black_list |
A character vector of genes to exclude from marker selection.
If |
ncores |
Integer specifying the number of cores to use for parallel
processing (default: |
bparam |
Optional. A |
progressbar |
Logical, whether to display a progress bar during computation
(default: |
verbose |
Logical, whether to print messages during execution (default: |
... |
Additional arguments passed to the underlying marker detection function. |
Requires that normalized single-cell data has been generated with
run_processing_data.
Both cell identities and sample annotations are automatically extracted from the normalized data.
Genes present in the blacklist (black_list) are removed before marker selection.
For "scran.findMarkers", the scran method findMarkers is applied
to identify differentially expressed genes per cell type while adjusting for sample effects.
The modified scTypeEval object with marker genes added to
scTypeEval@gene_lists[[method]].
run_processing_data, add_processed_data
# Create and process test data library(Matrix) counts <- Matrix(rpois(6000, 5), nrow = 100, ncol = 60, sparse = TRUE) rownames(counts) <- paste0("Gene", seq_len(100)) colnames(counts) <- paste0("Cell", seq_len(60)) metadata <- data.frame( celltype = rep(c("TypeA", "TypeB"), each = 30), sample = rep(paste0("Sample", seq_len(6)), times = 10), row.names = colnames(counts) ) sceval <- create_scTypeEval(matrix = counts, metadata = metadata) sceval <- run_processing_data(sceval, ident = "celltype", aggregation = "single-cell", sample = "sample", min_samples = 3, min_cells = 3, verbose = FALSE) # Identify marker genes per cell type sceval <- run_gene_markers(sceval, method = "scran.findMarkers", ngenes_celltype = 10, verbose = FALSE)# Create and process test data library(Matrix) counts <- Matrix(rpois(6000, 5), nrow = 100, ncol = 60, sparse = TRUE) rownames(counts) <- paste0("Gene", seq_len(100)) colnames(counts) <- paste0("Cell", seq_len(60)) metadata <- data.frame( celltype = rep(c("TypeA", "TypeB"), each = 30), sample = rep(paste0("Sample", seq_len(6)), times = 10), row.names = colnames(counts) ) sceval <- create_scTypeEval(matrix = counts, metadata = metadata) sceval <- run_processing_data(sceval, ident = "celltype", aggregation = "single-cell", sample = "sample", min_samples = 3, min_cells = 3, verbose = FALSE) # Identify marker genes per cell type sceval <- run_gene_markers(sceval, method = "scran.findMarkers", ngenes_celltype = 10, verbose = FALSE)
Detects highly variable genes (HVGs) from the normalized
single-cell data stored in an scTypeEval object. The identified HVGs are
stored in the gene_lists slot under "HVG".
run_hvg( scTypeEval, var_method = "scran", ngenes = 2000, sample = TRUE, aggregation = "single-cell", black_list = NULL, ncores = 1, bparam = NULL, progressbar = FALSE, verbose = TRUE, ... )run_hvg( scTypeEval, var_method = "scran", ngenes = 2000, sample = TRUE, aggregation = "single-cell", black_list = NULL, ncores = 1, bparam = NULL, progressbar = FALSE, verbose = TRUE, ... )
scTypeEval |
An |
var_method |
Character string specifying the method for identifying highly
variable genes. Options: |
ngenes |
Integer specifying the number of highly variable genes to retain
(default: |
sample |
Logical indicating whether to leverage sample information when
computing HVGs. If |
aggregation |
Method to group cells stored in |
black_list |
A character vector of genes to exclude from HVG selection.
If |
ncores |
Integer specifying the number of CPU cores to use for parallel
processing (default: |
bparam |
A |
progressbar |
Logical, whether to display a progress bar during computation
(default: |
verbose |
Logical, whether to print progress messages (default: |
... |
Additional arguments passed to internal HVG computation functions. |
Requires that normalized single-cell data has been generated with
run_processing_data.
Genes present in the blacklist (black_list) are removed before HVG selection.
Available HVG methods:
"scran": Uses the scran modelGeneVar function to
model gene-specific variance and identify biologically variable genes.
"basic": A simple variance-to-mean approach ranking genes by
coefficient of variation, selecting the top ngenes.
The modified scTypeEval object with HVGs added to
scTypeEval@gene_lists[["HVG"]].
run_processing_data, add_processed_data
# Create and process test data library(Matrix) counts <- Matrix(rpois(6000, 5), nrow = 100, ncol = 60, sparse = TRUE) rownames(counts) <- paste0("Gene", seq_len(100)) colnames(counts) <- paste0("Cell", seq_len(60)) metadata <- data.frame( celltype = rep(c("TypeA", "TypeB"), each = 30), sample = rep(paste0("Sample", seq_len(6)), times = 10), row.names = colnames(counts) ) sceval <- create_scTypeEval(matrix = counts, metadata = metadata) sceval <- run_processing_data(sceval, ident = "celltype", sample = "sample", min_samples = 3, min_cells = 3, verbose = FALSE) # Compute HVGs using basic method sceval <- run_hvg(sceval, var_method = "basic", ngenes = 100, verbose = FALSE)# Create and process test data library(Matrix) counts <- Matrix(rpois(6000, 5), nrow = 100, ncol = 60, sparse = TRUE) rownames(counts) <- paste0("Gene", seq_len(100)) colnames(counts) <- paste0("Cell", seq_len(60)) metadata <- data.frame( celltype = rep(c("TypeA", "TypeB"), each = 30), sample = rep(paste0("Sample", seq_len(6)), times = 10), row.names = colnames(counts) ) sceval <- create_scTypeEval(matrix = counts, metadata = metadata) sceval <- run_processing_data(sceval, ident = "celltype", sample = "sample", min_samples = 3, min_cells = 3, verbose = FALSE) # Compute HVGs using basic method sceval <- run_hvg(sceval, var_method = "basic", ngenes = 100, verbose = FALSE)
This function computes Principal Component Analysis (PCA) for each processed
data aggregation stored in the scTypeEval object (e.g., single-cell and/or pseudobulk).
The resulting PCA embeddings and loadings are stored in the reductions slot of the object.
run_pca( scTypeEval, gene_list = NULL, black_list = NULL, ndim = 30, verbose = TRUE )run_pca( scTypeEval, gene_list = NULL, black_list = NULL, ndim = 30, verbose = TRUE )
scTypeEval |
A |
gene_list |
Named list of features defining the gene set for PCA analysis.
If |
black_list |
Character vector of genes to exclude from PCA. If |
ndim |
Integer. Number of principal components to compute (default: 30). |
verbose |
Logical. Whether to print progress messages during computation (default: |
This function runs PCA on all processed data slots within the scTypeEval object.
Each PCA result is stored as a dim_red assay containing:
embeddings: PCA coordinates of samples/cells.
feature_loadings: Loadings of features (genes) on each PC.
gene_list: The gene sets used for PCA.
black_list: The genes excluded from PCA.
aggregation: The aggregation type (e.g., "single-cell", "pseudobulk").
ident, sample, and group: Metadata carried over from processed data.
The modified scTypeEval object with PCA results stored in the
reductions slot for each processed data aggregation.
# Create and process test data library(Matrix) counts <- Matrix(rpois(6000, 5), nrow = 100, ncol = 60, sparse = TRUE) rownames(counts) <- paste0("Gene", seq_len(100)) colnames(counts) <- paste0("Cell", seq_len(60)) metadata <- data.frame( celltype = rep(c("TypeA", "TypeB"), each = 30), sample = rep(paste0("Sample", seq_len(6)), times = 10), row.names = colnames(counts) ) sceval <- create_scTypeEval(matrix = counts, metadata = metadata) sceval <- run_processing_data(sceval, ident = "celltype", sample = "sample", min_samples = 3, min_cells = 3, verbose = FALSE) sceval <- run_hvg(sceval, var_method = "basic", ngenes = 100, verbose = FALSE) # Run PCA on HVG genes sceval <- run_pca(sceval, gene_list = "HVG", ndim = 4, verbose = FALSE)# Create and process test data library(Matrix) counts <- Matrix(rpois(6000, 5), nrow = 100, ncol = 60, sparse = TRUE) rownames(counts) <- paste0("Gene", seq_len(100)) colnames(counts) <- paste0("Cell", seq_len(60)) metadata <- data.frame( celltype = rep(c("TypeA", "TypeB"), each = 30), sample = rep(paste0("Sample", seq_len(6)), times = 10), row.names = colnames(counts) ) sceval <- create_scTypeEval(matrix = counts, metadata = metadata) sceval <- run_processing_data(sceval, ident = "celltype", sample = "sample", min_samples = 3, min_cells = 3, verbose = FALSE) sceval <- run_hvg(sceval, var_method = "basic", ngenes = 100, verbose = FALSE) # Run PCA on HVG genes sceval <- run_pca(sceval, gene_list = "HVG", ndim = 4, verbose = FALSE)
Runs processing on the count matrix stored in an scTypeEval
object by aggregating, filtering, and normalizing data.
Results are stored as data_assay objects within the scTypeEval.
run_processing_data( scTypeEval, ident = NULL, sample = NULL, aggregation = c("single-cell", "pseudobulk"), normalization_method = "Log1p", min_samples = 5, min_cells = 10, verbose = TRUE )run_processing_data( scTypeEval, ident = NULL, sample = NULL, aggregation = c("single-cell", "pseudobulk"), normalization_method = "Log1p", min_samples = 5, min_cells = 10, verbose = TRUE )
scTypeEval |
An |
ident |
A column name from metadata to use as the identity class (e.g. cell type, cluster).
If |
sample |
A column name from metadata indicating sample identifiers, required. |
aggregation |
Method to group cells, either |
normalization_method |
A string specifying the normalization method to apply
(default: |
min_samples |
Minimum number of samples required to retain a cell type (default: 5). |
min_cells |
Minimum number of cells required to retain a cell type in a sample (default: 10). |
verbose |
Logical indicating whether to print progress messages (default: TRUE). |
The function performs the following steps:
Validates and sets the identity (ident) and sample grouping (sample).
Iterates over aggregation_types: single-cell and/or pseudobulk.
Extracts and filters the count matrix using
min_samples and min_cells thresholds.
Normalizes the resulting matrix.
Wraps the processed data into data_assay objects.
Stores the list of processed assays inside the scTypeEval object.
An updated scTypeEval object containing:
data: A list of data_assay objects, one for each aggregation method.
# Create test data with enough samples library(Matrix) counts <- Matrix(rpois(6000, 5), nrow = 100, ncol = 60, sparse = TRUE) rownames(counts) <- paste0("Gene", seq_len(100)) colnames(counts) <- paste0("Cell", seq_len(60)) metadata <- data.frame( celltype = rep(c("TypeA", "TypeB"), each = 30), sample = rep(paste0("Sample", seq_len(6)), times = 10), row.names = colnames(counts) ) sceval <- create_scTypeEval(matrix = counts, metadata = metadata) # Process data with filtering sceval <- run_processing_data( sceval, ident = "celltype", sample = "sample", min_samples = 3, min_cells = 3, verbose = FALSE ) # Check processed data length(sceval@data)# Create test data with enough samples library(Matrix) counts <- Matrix(rpois(6000, 5), nrow = 100, ncol = 60, sparse = TRUE) rownames(counts) <- paste0("Gene", seq_len(100)) colnames(counts) <- paste0("Cell", seq_len(60)) metadata <- data.frame( celltype = rep(c("TypeA", "TypeB"), each = 30), sample = rep(paste0("Sample", seq_len(6)), times = 10), row.names = colnames(counts) ) sceval <- create_scTypeEval(matrix = counts, metadata = metadata) # Process data with filtering sceval <- run_processing_data( sceval, ident = "celltype", sample = "sample", min_samples = 3, min_cells = 3, verbose = FALSE ) # Check processed data length(sceval@data)
This function assigns a specific cell type annotation from the metadata as the active identity
in an scTypeEval object, allowing for downstream analysis based on the selected classification.
set_active_ident(scTypeEval, ident = NULL)set_active_ident(scTypeEval, ident = NULL)
scTypeEval |
An |
ident |
A character string specifying the column in |
The function ensures that the provided identity exists within the metadata before setting it. If no valid identity is provided, an error is raised.
The modified scTypeEval object with the active identity set.
# Create synthetic test data library(Matrix) counts <- Matrix(rpois(1000, 5), nrow = 50, ncol = 20, sparse = TRUE) rownames(counts) <- paste0("Gene", seq_len(50)) colnames(counts) <- paste0("Cell", seq_len(20)) metadata <- data.frame( celltype = rep(c("TypeA", "TypeB"), each = 10), sample = rep(paste0("Sample", seq_len(4)), each = 5), row.names = colnames(counts) ) # Create scTypeEval object from matrix sceval <- create_scTypeEval(matrix = counts, metadata = metadata) # add active ident sceval <- set_active_ident(sceval, ident = "celltype")# Create synthetic test data library(Matrix) counts <- Matrix(rpois(1000, 5), nrow = 50, ncol = 20, sparse = TRUE) rownames(counts) <- paste0("Gene", seq_len(50)) colnames(counts) <- paste0("Cell", seq_len(20)) metadata <- data.frame( celltype = rep(c("TypeA", "TypeB"), each = 10), sample = rep(paste0("Sample", seq_len(4)), each = 5), row.names = colnames(counts) ) # Create scTypeEval object from matrix sceval <- create_scTypeEval(matrix = counts, metadata = metadata) # add active ident sceval <- set_active_ident(sceval, ident = "celltype")
A high-level convenience function that initializes an scTypeEval object
from a count matrix and metadata, performs preprocessing (normalization,
filtering, optional dimensionality reduction), defines gene lists,
and computes one or more dissimilarity metrics between cell populations.
This function integrates multiple internal scTypeEval pipeline steps, including data preparation, HVG selection, PCA reduction, and dissimilarity computation, providing a streamlined workflow for single-cell data annotation evaluation.
wrapper_scTypeEval( scTypeEval = NULL, count_matrix, metadata, ident, sample, aggregation = c("single-cell", "pseudobulk"), gene_list = NULL, reduction = TRUE, ndim = 30, black_list = NULL, normalization_method = "Log1p", dissimilarity_method = c("WasserStein", "Pseudobulk:Euclidean", "Pseudobulk:Cosine", "Pseudobulk:Pearson", "recip_classif:Match", "recip_classif:Score"), min_samples = 5, min_cells = 10, ncores = 1, bparam = NULL, progressbar = FALSE, verbose = TRUE )wrapper_scTypeEval( scTypeEval = NULL, count_matrix, metadata, ident, sample, aggregation = c("single-cell", "pseudobulk"), gene_list = NULL, reduction = TRUE, ndim = 30, black_list = NULL, normalization_method = "Log1p", dissimilarity_method = c("WasserStein", "Pseudobulk:Euclidean", "Pseudobulk:Cosine", "Pseudobulk:Pearson", "recip_classif:Match", "recip_classif:Score"), min_samples = 5, min_cells = 10, ncores = 1, bparam = NULL, progressbar = FALSE, verbose = TRUE )
scTypeEval |
An |
count_matrix |
A numeric or sparse |
metadata |
A |
ident |
Character string indicating the metadata column specifying cell identities (e.g., cell types or clusters). |
sample |
Character string specifying the metadata column containing sample identifiers (used for pseudobulk aggregation). |
aggregation |
Method to group cells, either |
gene_list |
Optional named list of gene sets to include in the analysis.
If |
reduction |
Logical; if |
ndim |
Integer; number of principal components to retain
when |
black_list |
Optional character vector of genes to exclude from analysis.
If |
normalization_method |
Character string specifying the normalization
method to apply. Options include |
dissimilarity_method |
Character vector of dissimilarity metrics to compute. Available options include:
By default all supported methods are run, and multiple methods can be provided. |
min_samples |
Integer; minimum number of samples required for pseudobulk analysis (default: |
min_cells |
Integer; minimum number of cells required per group for inclusion (default: |
ncores |
Integer; number of CPU cores for parallel execution (default: |
bparam |
Optional |
progressbar |
Logical; if |
verbose |
Logical; if |
This wrapper combines multiple pipeline steps from scTypeEval:
create_scTypeEval() — initializes the object.
run_processing_data() — performs normalization and filtering.
run_hvg() or add_gene_list() — defines gene sets.
run_pca() — performs PCA if reduction = TRUE.
run_dissimilarity() — computes dissimilarities across methods.
This provides a simple entry point for end-to-end setup and dissimilarity computation from raw single-cell data with minimal manual steps.
An updated scTypeEval object containing:
Normalized and filtered data
HVG gene sets or user-provided gene lists
PCA reductions (if enabled)
Computed dissimilarity matrices for each selected method
create_scTypeEval,
run_processing_data,
run_dissimilarity,
run_pca,
run_hvg
#' # Create test data with enough samples library(Matrix) counts <- Matrix(rpois(6000, 5), nrow = 100, ncol = 60, sparse = TRUE) rownames(counts) <- paste0("Gene", seq_len(100)) colnames(counts) <- paste0("Cell", seq_len(60)) metadata <- data.frame( celltype = rep(c("TypeA", "TypeB"), each = 30), sample = rep(paste0("Sample", seq_len(6)), times = 10), row.names = colnames(counts) ) sc_res <- wrapper_scTypeEval( count_matrix = counts, metadata = metadata, ident = "celltype", sample = "sample", min_samples = 3, min_cells = 3, normalization_method = "Log1p", dissimilarity_method = c("Pseudobulk:Euclidean"), reduction = TRUE, ndim = 4, verbose = FALSE )#' # Create test data with enough samples library(Matrix) counts <- Matrix(rpois(6000, 5), nrow = 100, ncol = 60, sparse = TRUE) rownames(counts) <- paste0("Gene", seq_len(100)) colnames(counts) <- paste0("Cell", seq_len(60)) metadata <- data.frame( celltype = rep(c("TypeA", "TypeB"), each = 30), sample = rep(paste0("Sample", seq_len(6)), times = 10), row.names = colnames(counts) ) sc_res <- wrapper_scTypeEval( count_matrix = counts, metadata = metadata, ident = "celltype", sample = "sample", min_samples = 3, min_cells = 3, normalization_method = "Log1p", dissimilarity_method = c("Pseudobulk:Euclidean"), reduction = TRUE, ndim = 4, verbose = FALSE )