ESFS: A noise-resilient framework for feature selection and marker gene discovery in single-cell transcriptomics

This article is a preprint. Preprints have not been peer-reviewed.

You can read more about preprints.

Abstract

Single-cell RNA sequencing (scRNA-seq) has transformed our ability to resolve cellular heterogeneity, but extracting meaningful signals remains challenging due to technical noise, batch effects, and the limitations of current feature selection methods. We present Entropy Sorting Feature Selection (ESFS), a modular, user-friendly framework that captures multivariate gene expression relationships without imputation or denoising via latent spaces. Across diverse datasets, ESFS improves interpretability and reveals biology missed by standard workflows: identifying coherent developmental programs in eight independent human embryo datasets without batch integration; resolving spatial gene expression in mouse colon obscured by conventional analyses; distinguishing shared and tumour-specific microenvironments in glioblastoma; and disambiguating spatial, temporal, and neurogenic programs in the developing mouse neural tube. By operating in gene expression space, ESFS produces interpretable, biologically meaningful outputs while reducing artefacts introduced by feature extraction. These results position ESFS as a powerful means to uncover relevant molecular signatures in noisy, high-dimensional transcriptomics data.

Details

Archive bioRxiv
Available online
Publication date

Keywords

Type of publication