Benchmarking of cell type deconvolution pipelines for transcriptomics data

# [Summary] Benchmarking of cell type deconvolution pipelines for transcriptomics data

#### Motivation

> However, an **evaluation of the impact** of data transformation, pre-processing, marker selection, cell type composition and choice of methodology on the deconvolution results is **still lacking**

#### Objective

> Here we provide a comprehensive and quantitative evaluation of the combined **impact of data transformation, scaling/normalization, marker selection, cell type composition and choice of methodology on the deconvolution results**

#### Method

> Use **five single-cell RNA-sequencing (scRNA-seq) datasets**, generate pseudo-bulk mixtures to evaluate the combined impact of these factors **→ Using the cell specific expression data to generate the bulk mixtures data**  
> ( 3 from human pancreas(胰腺), 1 from human kidney and 1 from human PBMC )

>  We evaluate the performance of **20 deconvolution methods** aimed at computing cell type proportions, including **five recently developed methods that use scRNA-seq data as reference**.  
> ( 15 microarry method + 5 scRNA-seq method )

>  The **performance** is assessed by means of **Pearson correlation** and **root-mean-square error (RMSE)** values between the cell type proportions computed by the different deconvolution methods and known compositions.  
> To evaluate the **robustness** of our conclusions, different number of cells (cell pool sizes) are used to build the pseudo-bulk mixtures.

#### Results

**Most relevant factors**
>  
> **(i)** the **data transformation**, with **linear transformation outperforming the others**,   
**(ii)** the **reference matrix**, which should include all cell
types being part of the mixtures,   
**(iii)** a **sensible marker selection strategy** for bulk deconvolution methods

**Memory and time requirements**   
> *Not interested.*

**Impact of data transformation on deconvolution results**  
>  
> Maintaining the data in **linear scale** (linear transformation, consistently showed the **best results (lowest RMSE values)** whereas the **logarithmic** and **VST** scale led to a poorer performance, with **two to four-fold higher median RMSE values**.
>  
> With the **exception of EPIC26, DeconRNASeq20, and DSA17**, the choice of **normalization strategy** does **not** have a substantial **impact** on the deconvolution results.
>  
>  In terms of performance, the five best bulk deconvolution methods **(OLS, nnls, RLR, FARDEEP, and CIBERSORT)** and the three best methods that use scRNA-seq data as reference **(DWLS, MuSiC, SCDC)** achieved median RMSE values lower than 0.05.
>  
>  Penalized regression approaches **(Lasso, Ridge, Elastic net regression, and DCQ)** performed slightly worse than the ones described above (median RMSE ~ 0.1)

**Different combinations of normalization and deconvolution methods**  
>  
> Focusing on the data in linear scale.  
>  
> Among the bulk deconvolution methods, **least-squares** (OLS, nnls), **support-vector** (CIBERSORT) and **robust regression approaches** (RLR/FARDEEP) gave the best results across different datasets and pseudo-bulk cell pool sizes.
>  
> Regarding the choice of normalization/scaling strategy, **column min-max** and **column z-score** consistently led to the **worst** performance. In all other situations, the choice of normalization/scaling strategy had **minor impact** on the deconvolution results for these methods.
>  
> When considering the estimation error relative to the magnitude of the expected cell type proportions, **smaller proportions consistently showed higher relative errors**
>  
> **Quantile normalization** always resulted in **sub-optimal** results in any of the tested bulk deconvolution methods
>  
> For deconvolution methods using scRNA-seq data as reference, **DWLS, MuSiC and SCDC** consistently showed the **highest performance** comparable to the top performers from the bulk method.

**Impact of the markers used in bulk deconvolution methods**
>  
> Methods that use scRNA-seq data as reference because they do not require marker genes to be known prior to performing the deconvolution.  
>  
>  The use of **all possible markers** (all strategy) showed the best performance overall.  
> (For all markers across each dataset, we took a closer look at the fold-change distribution for both the cell type where they were initially found as marker (highest fold change) and the fold change differences among all other cell types. Using the threshold values used to select a gene as marker, we computed the
percentage of those that could also be considered markers for a secondary cell type)

**Effect of removing cell types from the reference matrix**
>  
> We then focussed on those cases where the median absolute RMSE values between the results using the complete reference matrix and all other scenarios where **a cell type was removed**, **increased at least 2-fold**.
> 
> In the PBMC dataset, **removing CD19+, CD34+, CD14+ or NK cells** had an impact on the computed **T-cell** proportions (between a three and six-fold increase in the median absolute RMSE values, both in bulk deconvolution methods and those using scRNA-seq data as reference).
>  
>  **CD14+ monocytes were mostly correlated with dendritic cells**, when removing CD14+ monocytes, the highest RMSE value was found in dendritic cells.
>  
> Pancreas/kidney tissue: *Not interested.*

**Deconvolution of real bulk heterogeneous samples**
>  
> Regarding **bulk deconvolution** methods: **robust regression methods** (RLR, FARDEEP) and **support vector regression** (CIBERSORT) consistently showed the smallest RMSE and highest Pearson correlation values (Fig. 7a). 
>  
>  Similarly, **DWLS** performed best among the deconvolution methods that use **scRNA-seq data** as input.

#### Conclusions

> In conclusion, when performing a deconvolution task, we advise users to:   
**(a)** keep their input data in **linear scale**;  
**(b)** select **any** of the scaling/normalization approaches described here with exception of row scaling, column min-max, column z-score or quantile normalization;   
**(c)** choose a **regression-based** bulk deconvolution method (e.g., RLR, CIBERSORT or FARDEEP) and also perform the same task in parallel with DWLS, MuSiC or SCDC if scRNA-seq data is available;  
**(d)** use a stringent marker selection strategy that **focuses on differences between the first and second cell types** with highest expression values;  
**(e)** use a comprehensive reference matrix that **include all relevant cell types** present in the mixtures

#### Remarks

> Zhong and Liu[6] showed that applying the **logarithmic transformation** to microarray data led to a **consistent under-estimation** of cell-type specific expression profiles.

> Hoffmann et al.[7] showed that **four different normalization** strategies had an impact on the estimation of cell type proportions from microarray data.   
( 4 strategies: **trimmed mean only** (t), **trimmed mean plus cyclic local regression** (clr), **quantile normalization** (q), or **centralization** (c) )

>  Newman et al.[8] highlighted the importance of accounting for differences in normalization procedures when comparing the results from **CIBERSORT**[9] and **TIMER**[10]  
> ( In particular, deconvolution methods cannot be meaningfully compared without taking normalization differences into account. By focusing on relative measures of TIL content in previous work, we avoided the confounding impact of tumor purity. ) **Impact of compositional data?**

> Vallania et al.[11] observed highly concordant results across different deconvolution methods in both blood and tissue samples, suggesting that **the reference matrix was more important than the methodology being used**.

> The logarithmic transformation is routinely included as a part of the pre-processing of omics data in the context of differential gene expression analysis, but **Zhong and Liu showed that it led to worse results than performing computational deconvolution in the linear** (un-transformed) scale. The use of the expression data in its linear form is an important difference with respect to classical differential gene expression analyses, where **statistical tests assume underlying normal distributions, typically achieved by the logarithmic transformation**. Silverman et al. showed that using **log counts per million with sparse data strongly distorts the difference between zero and non-zero values** and Townes et al. showed the same when log-normalizing UMIs. Tsoucas et al. showed that **when the data was kept in the linear scale, all combinations of three deconvolution methods (DWLS, QP, or SVR) and three normalization approaches (LogNormalize from Seurat, Scran or SCnorm) led to a good performance, which was not the case when the data was log-transformed.** Here, we assessed the impact of the log transformation on both full-length and tag-based scRNA-seq quantification methods and confirmed that the **computational deconvolution should be performed on linear scale to achieve the best performance.**

> Both for bulk deconvolution methods and those that use scRNA-seq data as reference, our analyses show that the **normalization strategy had little impact** (except for EPIC, DeconRNASeq, and DSA bulk methods).   
> Of note, **quantile normalization (QN)**, an approach used by default in several deconvolution methods (e.g., FARDEEP, CIBERSORT), **consistently showed sub-optimal performance** regardless of the chosen method.

> Finally, as more scRNA-seq datasets become available in the near future, its aggregation (while carefully removing batch effects) will increase the robustness of the reference matrices being used in the deconvolution and will fuel the development of methodologies similar to SCDC, which allows direct usage of more than one scRNA-seq dataset at a time.

#### Source

**Paper**: [Benchmarking of cell type deconvolution pipelines for transcriptomics data](https://www.nature.com/articles/s41467-020-19015-1)

Back to Contents

Loading