Introduction

Accurate cell type annotation is essential but challenging in single-cell RNA sequencing (scRNA-seq). Manual approaches are time-consuming, subjective, and inconsistent, while automated classifiers often fail to generalize across tissues, conditions, or closely related cell types. A key limitation for both is the lack of true ground truth for benchmarking.

scTypeEval provides a ground-truth-agnostic framework to systematically assess annotation quality using internal validation metrics. It enables:

Quantification of inter-sample label consistency
Identification of ambiguous or misclassified populations
Reproducible benchmarking of manual annotations, automated classifiers, or clustering results

Key Features

Ground-truth agnostic – Evaluate annotations without external references
Cross-dataset benchmarking – Assess consistency across samples and studies
Customizable – Works with Seurat, SingleCellExperiment, or matrices; supports custom gene lists
Robust – Sensitive to misclassification; reliable across batch effects, label granularity, and sample sizes

Quick Start

Load the package:

library(scTypeEval)
library(Matrix)

Generate Example Data

For this vignette, we’ll create a synthetic dataset representing single-cell RNA-seq data from multiple samples with distinct cell types. This simulates a typical scenario where we want to evaluate the consistency of cell type annotations across biological replicates.

set.seed(22)

# Number of genes and cells
n_genes <- 5000
n_cells_per_sample <- 500
n_samples <- 6
cell_types <- c("CD4_T", "CD8_T", "B_cells", "NK_cells")

# Generate gene names
gene_names <- paste0("Gene_", seq_len(n_genes))

# Total cells
total_cells <- n_cells_per_sample * n_samples

# Generate metadata: samples and cell types
samples <- rep(paste0("Sample", seq_len(n_samples)), each = n_cells_per_sample)

# Assign cell types evenly within each sample
n_cell_types <- length(cell_types)
cells_per_type <- n_cells_per_sample %/% n_cell_types
celltypes <- rep(rep(cell_types, each = cells_per_type), times = n_samples)

# Initialize count matrix
count_matrix <- matrix(0, nrow = n_genes, ncol = total_cells)
rownames(count_matrix) <- gene_names
colnames(count_matrix) <- paste0("Cell_", seq_len(total_cells))

# Add cell-type-specific expression patterns
for (i in seq_along(cell_types)) {
  ct <- cell_types[i]
  ct_indices <- which(celltypes == ct)
  
  # Marker genes for this cell type (50 genes per type)
  marker_start <- ((i - 1) * 50 + 1)
  marker_end <- min(marker_start + 49, n_genes)
  marker_genes <- marker_start:marker_end
  
  # High expression in marker genes (lambda = 50)
  count_matrix[marker_genes, ct_indices] <- matrix(
    rpois(length(marker_genes) * length(ct_indices), lambda = 50),
    nrow = length(marker_genes)
  )
  
  # Background expression in other genes (lambda = 5)
  other_genes <- setdiff(seq_len(n_genes), marker_genes)
  count_matrix[other_genes, ct_indices] <- matrix(
    rpois(length(other_genes) * length(ct_indices), lambda = 5),
    nrow = length(other_genes)
  )
}

# Add sample-specific batch effects
for (s in seq_len(n_samples)) {
  sample_cells <- which(samples == paste0("Sample", s))
  batch_effect <- rnorm(1, mean = 1, sd = 0.1)
  count_matrix[, sample_cells] <- round(
    count_matrix[, sample_cells] * batch_effect
  )
}

# Convert to sparse matrix
count_matrix <- Matrix(count_matrix, sparse = TRUE)

# Create metadata dataframe
metadata <- data.frame(
  celltype = celltypes,
  sample = samples,
  batch = rep(
    c("Batch1", "Batch2"),
    each = n_cells_per_sample * (n_samples/2)
  ),
  row.names = colnames(count_matrix),
  stringsAsFactors = FALSE
)

# Preview the data
head(metadata)
#>        celltype  sample  batch
#> Cell_1    CD4_T Sample1 Batch1
#> Cell_2    CD4_T Sample1 Batch1
#> Cell_3    CD4_T Sample1 Batch1
#> Cell_4    CD4_T Sample1 Batch1
#> Cell_5    CD4_T Sample1 Batch1
#> Cell_6    CD4_T Sample1 Batch1
dim(count_matrix)
#> [1] 5000 3000

Core Workflow

Step 1: Create scTypeEval Object

scTypeEval objects can be created from:

A count matrix and metadata dataframe
A Seurat object
A SingleCellExperiment object

# Create scTypeEval object from matrix and metadata
sceval <- create_scTypeEval(
  matrix = count_matrix,
  metadata = metadata
)

The scTypeEval object stores the raw count data and metadata, which will be processed in subsequent steps.

Step 2: Process Data

Process and normalize the data, creating both single-cell and pseudobulk representations. This step:

Aggregates counts by cell type and sample (pseudobulk)
Filters out cell types with insufficient samples or cells
Normalizes the data for downstream analysis

# Process data with filtering criteria
sceval <- run_processing_data(
  scTypeEval = sceval,
  ident = "celltype",          # Column defining cell type identities
  sample = "sample",           # Column defining sample IDs
  min_samples = 3,             # Minimum samples required per cell type
  min_cells = 10               # Minimum cells per sample-celltype combination
)

The processed data is stored as data_assay objects within the scTypeEval object, accessible for downstream analysis.

Step 3: Extract Relevant Features

Identify biologically relevant features for evaluating consistency. scTypeEval supports:

Highly Variable Genes (HVGs): Genes with high variability across the dataset
Cell Type Marker Genes: Genes differentially expressed in each cell type
Custom Gene Lists: User-defined gene sets

Highly Variable Genes

# Identify HVGs using the basic method
sceval <- run_hvg(
  scTypeEval = sceval,
  var_method = "basic", # Method for variance modeling: "basic" or "scran"
  ngenes = 1000,        # Number of HVGs to retain
  sample = TRUE         # Perform sample-level blocking
)

# View stored gene lists
gene_lists <- sceval@gene_lists
names(gene_lists)
#> [1] "HVG"
if ("HVG" %in% names(gene_lists)) {
  length(gene_lists$HVG)
}
#> [1] 1000

Cell Type Marker Genes

# Identify marker genes per cell type
sceval <- run_gene_markers(
  scTypeEval = sceval,
  method = "scran.findMarkers",  # Method for finding markers
  ngenes_celltype = 50           # Maximum markers per cell type
)

# View marker genes
gene_lists <- sceval@gene_lists
if (!is.null(gene_lists$scran.findMarkers)) {
  head(gene_lists$scran.findMarkers, 20)
}
#>  [1] "Gene_119" "Gene_131" "Gene_125" "Gene_104" "Gene_149" "Gene_129"
#>  [7] "Gene_147" "Gene_138" "Gene_133" "Gene_103" "Gene_112" "Gene_128"
#> [13] "Gene_121" "Gene_115" "Gene_148" "Gene_107" "Gene_116" "Gene_124"
#> [19] "Gene_110" "Gene_143"

Custom Gene Lists (Optional)

You can also add custom gene lists using add_gene_list():

# Example: Add a custom list of immune response genes
custom_genes <- c("Gene_1", "Gene_50", "Gene_100", "Gene_150")

sceval <- add_gene_list(
  scTypeEval = sceval,
  gene_list = list("custom_genes" = custom_genes)
)

These different selected features are stored within the scTypeEval object and can be used in the subsequent steps.

gene_lists <- sceval@gene_lists
names(gene_lists)
#> [1] "HVG"               "scran.findMarkers" "custom_genes"

Step 4: Dimensional Reduction (Optional but Recommended)

Consistency metrics can be computed directly on the selected features. However, computing them in a low-dimensional space (e.g., PCA) significantly speeds up the analysis while yielding similar results.

# Run PCA on processed data
sceval <- run_pca(
  scTypeEval = sceval,
  ndim = 20,         # Number of principal components
  gene_list = "HVG" # specify gene list to subset if multiple were added
)

# View available reductions
reductions <- sceval@reductions
if (length(reductions) > 0) {
  names(reductions)
}
#> [1] "single-cell" "pseudobulk"

Alternatively, pre-computed embeddings (PCA, UMAP, t-SNE) can be inserted using add_dim_reduction().

Step 5: Compute Dissimilarity Matrices

The core of scTypeEval is computing pairwise dissimilarities between cell types across samples. Multiple dissimilarity methods are supported:

Pseudobulk-based Distances

Compute distances between pseudobulk gene expression profiles:

# Euclidean distance on pseudobulk profiles
sceval <- run_dissimilarity(
  sceval,
  method = "Pseudobulk:Euclidean",
  reduction = FALSE # Use processed expression data of selected features
)

# Cosine distance on PCA embeddings
sceval <- run_dissimilarity(
  sceval,
  method = "Pseudobulk:Cosine",
  reduction = TRUE               # Use PCA space
)

Wasserstein Distance

Compute Wasserstein distances between single-cell distributions:

# Wasserstein distance on PCA embeddings (faster)
sceval <- run_dissimilarity(
  sceval,
  method = "WasserStein",
  reduction = TRUE
)

Reciprocal Classification

Match cells across samples using a classifier:

# Reciprocal classification with SingleR
sceval <- run_dissimilarity(
  sceval,
  method = "recip_classif:Match",
  reciprocal_classifier = "SingleR",
  # usage of dimensional reduction not supported
  # for this dissimilarity approach
  reduction = FALSE 
)

View Available Dissimilarity Matrices

# List all computed dissimilarity matrices
dissim <- sceval@dissimilarity
if (length(dissim) > 0) {
  names(dissim)
}
#> [1] "Pseudobulk:Euclidean" "Pseudobulk:Cosine"    "WasserStein"         
#> [4] "recip_classif:Match"

Step 6: Compute Consistency Metrics

Evaluate inter-sample consistency for each cell type using internal validation metrics. scTypeEval supports multiple consistency metrics:

silhouette: Standard cohesion/separation score
2label_silhouette: Silhouette comparing “own type” vs. all others
NeighborhoodPurity: Fraction of K-nearest neighbors with same label
ward_PropMatch: Proportion in dominant Ward cluster
Orbital_medoid: Fraction closer to own medoid than others
Average_similarity: Within-type similarity vs. between-type

Compute Silhouette Scores

# Compute silhouette consistency on Euclidean dissimilarity
consistency_sil <- get_consistency(
  sceval,
  dissimilarity_slot = "Pseudobulk:Euclidean",
  consistency_metric = "silhouette"
)

# View results
print(consistency_sil)
#>          celltype   measure consistency_metric dissimilarity_method    ident
#> CD4.T       CD4.T 0.9392103         silhouette Pseudobulk:Euclidean celltype
#> CD8.T       CD8.T 0.9372690         silhouette Pseudobulk:Euclidean celltype
#> B.cells   B.cells 0.9382785         silhouette Pseudobulk:Euclidean celltype
#> NK.cells NK.cells 0.9370830         silhouette Pseudobulk:Euclidean celltype

Higher values indicate better consistency (samples with the same cell type annotation are more similar to each other than to samples with different annotations).

Compute Neighborhood Purity

# Compute neighborhood purity on Wasserstein dissimilarity
consistency_nbhd <- get_consistency(
  sceval,
  dissimilarity_slot = "WasserStein",
  consistency_metric = "NeighborhoodPurity"
)

# View results
print(consistency_nbhd)
#>          celltype measure consistency_metric dissimilarity_method    ident
#> CD4.T       CD4.T       1 NeighborhoodPurity          WasserStein celltype
#> CD8.T       CD8.T       1 NeighborhoodPurity          WasserStein celltype
#> B.cells   B.cells       1 NeighborhoodPurity          WasserStein celltype
#> NK.cells NK.cells       1 NeighborhoodPurity          WasserStein celltype

Compare Multiple Metrics

consistency_all <- 
  get_consistency(
    sceval,
    dissimilarity_slot = c("Pseudobulk:Euclidean",
                           "WasserStein"), # allow multiple arguments
    consistency_metric = c("silhouette",
                           "NeighborhoodPurity",
                           "Average_similarity") # allow multiple arguments
  )

print(consistency_all)
#>            celltype   measure consistency_metric dissimilarity_method    ident
#> CD4.T         CD4.T 0.9392103         silhouette Pseudobulk:Euclidean celltype
#> CD8.T         CD8.T 0.9372690         silhouette Pseudobulk:Euclidean celltype
#> B.cells     B.cells 0.9382785         silhouette Pseudobulk:Euclidean celltype
#> NK.cells   NK.cells 0.9370830         silhouette Pseudobulk:Euclidean celltype
#> CD4.T1        CD4.T 1.0000000 NeighborhoodPurity Pseudobulk:Euclidean celltype
#> CD8.T1        CD8.T 1.0000000 NeighborhoodPurity Pseudobulk:Euclidean celltype
#> B.cells1    B.cells 1.0000000 NeighborhoodPurity Pseudobulk:Euclidean celltype
#> NK.cells1  NK.cells 1.0000000 NeighborhoodPurity Pseudobulk:Euclidean celltype
#> CD4.T2        CD4.T 0.9427627 Average_similarity Pseudobulk:Euclidean celltype
#> CD8.T2        CD8.T 0.9410225 Average_similarity Pseudobulk:Euclidean celltype
#> B.cells2    B.cells 0.9419125 Average_similarity Pseudobulk:Euclidean celltype
#> NK.cells2  NK.cells 0.9408865 Average_similarity Pseudobulk:Euclidean celltype
#> CD4.T3        CD4.T 0.9008518         silhouette          WasserStein celltype
#> CD8.T3        CD8.T 0.9009151         silhouette          WasserStein celltype
#> B.cells3    B.cells 0.8998458         silhouette          WasserStein celltype
#> NK.cells3  NK.cells 0.8996690         silhouette          WasserStein celltype
#> CD4.T11       CD4.T 1.0000000 NeighborhoodPurity          WasserStein celltype
#> CD8.T11       CD8.T 1.0000000 NeighborhoodPurity          WasserStein celltype
#> B.cells11   B.cells 1.0000000 NeighborhoodPurity          WasserStein celltype
#> NK.cells11 NK.cells 1.0000000 NeighborhoodPurity          WasserStein celltype
#> CD4.T21       CD4.T 0.9099102 Average_similarity          WasserStein celltype
#> CD8.T21       CD8.T 0.9099267 Average_similarity          WasserStein celltype
#> B.cells21   B.cells 0.9090413 Average_similarity          WasserStein celltype
#> NK.cells21 NK.cells 0.9089435 Average_similarity          WasserStein celltype

Visualization

Dissimilarity Heatmap

Visualize dissimilarity matrices as annotated heatmaps. Cell types can be ordered by similarity or consistency scores:

# Plot dissimilarity heatmap with consistency-based ordering
plot_heatmap(
  sceval,
  dissimilarity_slot = "Pseudobulk:Euclidean",
  sort_consistency = "silhouette",      # Order by silhouette scores
  sort_similarity = NULL                 # Or order by similarity
)

The heatmap shows pairwise dissimilarities between samples, grouped by cell type. Cell types with higher consistency (e.g., more coherent annotations) show tighter clustering of their samples.

Pseudobulk PCA

Pseudobulk PCA per sample & cell type

plot_pca(
  sceval,
  reduction_slot = "pseudobulk"
)

Interpretation Guidelines

Identifying Problematic Annotations

Cell types with low consistency scores may indicate:

Ambiguous cell type boundaries: Related cell types (e.g., activated vs. resting T cells) may overlap
Heterogeneous populations: A single label covering multiple biological states
Annotation errors: Inconsistent labeling across samples

Recommendations

Investigate cell types with low consistency using the heatmap visualization
Consider refining annotations for cell types with negative consistency scores
Use biological knowledge to interpret whether low consistency reflects true biology or annotation issues

Session Information

sessionInfo()
#> R version 4.6.1 (2026-06-24)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 26.04 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.32.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] SingleCellExperiment_1.35.2 SummarizedExperiment_1.43.0
#>  [3] Biobase_2.73.1              GenomicRanges_1.65.1       
#>  [5] Seqinfo_1.3.0               IRanges_2.47.2             
#>  [7] S4Vectors_0.51.5            BiocGenerics_0.59.10       
#>  [9] generics_0.1.4              MatrixGenerics_1.25.0      
#> [11] matrixStats_1.5.0           Seurat_5.5.1               
#> [13] SeuratObject_5.4.0          sp_2.2-1                   
#> [15] Matrix_1.7-5                scTypeEval_1.1.2           
#> [17] BiocStyle_2.41.0           
#> 
#> loaded via a namespace (and not attached):
#>   [1] RColorBrewer_1.1-3     sys_3.4.3              jsonlite_2.0.0        
#>   [4] magrittr_2.0.5         spatstat.utils_3.2-3   farver_2.1.2          
#>   [7] rmarkdown_2.31         vctrs_0.7.3            ROCR_1.0-12           
#>  [10] spatstat.explore_3.8-1 S4Arrays_1.13.0        htmltools_0.5.9       
#>  [13] BiocNeighbors_2.7.2    SparseArray_1.13.2     sass_0.4.10           
#>  [16] sctransform_0.4.3      parallelly_1.48.0      KernSmooth_2.23-26    
#>  [19] bslib_0.11.0           htmlwidgets_1.6.4      ica_1.0-3             
#>  [22] plyr_1.8.9             plotly_4.12.0          zoo_1.8-15            
#>  [25] cachem_1.1.0           buildtools_1.0.0       igraph_2.3.3          
#>  [28] mime_0.13              lifecycle_1.0.5        pkgconfig_2.0.3       
#>  [31] rsvd_1.0.5             R6_2.6.1               fastmap_1.2.0         
#>  [34] fitdistrplus_1.2-6     future_1.70.0          shiny_1.14.0          
#>  [37] digest_0.6.39          patchwork_1.3.2        tensor_1.5.1          
#>  [40] dqrng_0.4.1            RSpectra_0.16-2        irlba_2.3.7           
#>  [43] beachmat_2.29.0        labeling_0.4.3         progressr_1.0.0       
#>  [46] spatstat.sparse_3.2-0  httr_1.4.8             polyclip_1.10-7       
#>  [49] abind_1.4-8            compiler_4.6.1         withr_3.0.3           
#>  [52] S7_0.2.2               BiocParallel_1.47.0    fastDummies_1.7.6     
#>  [55] MASS_7.3-66            DelayedArray_0.39.3    bluster_1.23.0        
#>  [58] tools_4.6.1            lmtest_0.9-40          otel_0.2.0            
#>  [61] httpuv_1.6.17          future.apply_1.20.2    goftest_1.2-3         
#>  [64] glue_1.8.1             nlme_3.1-170           promises_1.5.0        
#>  [67] grid_4.6.1             Rtsne_0.17             cluster_2.1.8.2       
#>  [70] reshape2_1.4.5         gtable_0.3.6           spatstat.data_3.1-9   
#>  [73] tidyr_1.3.2            data.table_1.18.4      metapod_1.21.0        
#>  [76] ScaledMatrix_1.21.0    BiocSingular_1.29.0    XVector_0.53.0        
#>  [79] spatstat.geom_3.8-1    RcppAnnoy_0.0.23       ggrepel_0.9.8         
#>  [82] RANN_2.6.2             pillar_1.11.1          stringr_1.6.0         
#>  [85] limma_3.69.2           spam_2.11-4            RcppHNSW_0.7.0        
#>  [88] later_1.4.8            splines_4.6.1          dplyr_1.2.1           
#>  [91] lattice_0.22-9         survival_3.8-9         deldir_2.0-4          
#>  [94] tidyselect_1.2.1       locfit_1.5-9.12        scuttle_1.23.1        
#>  [97] maketools_1.3.2        miniUI_0.1.2           pbapply_1.7-4         
#> [100] transport_0.15-4       knitr_1.51             gridExtra_2.3.1       
#> [103] edgeR_4.11.4           scattermore_1.2        xfun_0.60             
#> [106] statmod_1.5.2          stringi_1.8.7          lazyeval_0.2.3        
#> [109] yaml_2.3.12            evaluate_1.0.5         codetools_0.2-20      
#> [112] tibble_3.3.1           BiocManager_1.30.27    cli_3.6.6             
#> [115] uwot_0.2.4             xtable_1.8-8           reticulate_1.46.0     
#> [118] jquerylib_0.1.4        Rcpp_1.1.2             globals_0.19.1        
#> [121] spatstat.random_3.5-0  png_0.1-9              spatstat.univar_3.2-0 
#> [124] parallel_4.6.1         ggplot2_4.0.3          SingleR_2.15.2        
#> [127] dotCall64_1.2          scran_1.41.1           listenv_1.0.0         
#> [130] viridisLite_0.4.3      scales_1.4.0           ggridges_0.5.7        
#> [133] purrr_1.2.2            rlang_1.3.0            cowplot_1.2.0

References

The manuscript describing these methods is currently in preparation. For questions or issues, please visit:

GitHub: https://github.com/carmonalab/scTypeEval
Bug reports: https://github.com/carmonalab/scTypeEval/issues

scTypeEval: Evaluating Cell Type Labels Consistency in scRNA-seq