Sally332

Computational Biologist | Statistical Genomics, Multiomics Integration & Biomedical Data Science

Computational biologist developing computational methods and analytical frameworks for statistical genomics, multiomics integration, single-cell biology, spatial transcriptomics, and biologically informed machine learning. My work focuses on integrating molecular and clinical data, modeling complex biological systems, and developing reproducible computational approaches that improve biological interpretation, methodological rigor, and translational insight.

🔬 Research Focus

My research lies at the intersection of computational biology, statistical genomics, systems biology, and machine learning. I develop and evaluate computational approaches for integrating high-dimensional molecular data, characterizing cellular heterogeneity, modeling biological interactions, and extracting biologically meaningful insight from complex genomic and spatial datasets.

Current areas of interest include multiomics integration, biological representation learning, cell–cell communication, interpretable machine learning, statistical genomics, molecular epidemiology, aging biology, and computational frameworks that improve the reproducibility and biological validity of modern analytical methods.

🧬 Research Software & Method Development

The repositories below contain research software, analytical frameworks, computational methods, and reproducible analytical workflows supporting ongoing work in statistical genomics, machine learning, multiomics integration, single-cell biology, spatial transcriptomics, and systems biology.

Single-Cell Concept Interpretability

Associated manuscript. Yepes S. (2026). Layer- and regime-dependent interpretability in concept bottleneck models for single-cell transcriptomics. DOI: https://doi.org/10.5281/zenodo.19476507

A comprehensive evaluation framework for concept bottleneck models in single-cell transcriptomics. The study investigates whether learned concepts remain biologically meaningful across datasets, architectures, and prediction regimes through concept selectivity analyses, latent-space geometry, and biological validation.

Concept bottleneck model evaluation
Concept-space geometry assessment
Biological concept validation
Cross-regime interpretability benchmarking

Geometry-Aware Ligand–Receptor Analysis for Spatial Transcriptomics

Associated manuscript. Yepes S. (2026). Geometry-aware ligand–receptor analysis reveals tumor communication patterns.
DOI: https://doi.org/10.5281/zenodo.19476574

A geometry-aware framework for prioritizing biologically meaningful ligand–receptor interactions in spatial transcriptomics. The approach distinguishes spatial proximity from true interface enrichment, enabling more robust characterization of cellular communication programs across diverse tissue microenvironments.

Spatially informed ligand–receptor prioritization
Geometry-preserving null models
Distance-weighted boundary scoring
Cross-tissue communication analysis

Falsification-Based Evaluation of Spatial Pathway Models

Associated manuscript. Yepes S. (2026). Falsification-based evaluation of interpretability in spatial pathway models.
DOI: https://doi.org/10.5281/zenodo.19476625

A methodological framework for testing interpretability claims in pathway-informed spatial models. By systematically challenging biological assumptions through controlled perturbations, the framework evaluates whether model explanations remain robust, identifiable, and biologically meaningful.

Falsification-based interpretability evaluation
Structural perturbation testing
Pathway identifiability assessment
Methodological validation framework

MM-KPNN

An interpretable multimodal neural network integrating scRNA-seq and scATAC-seq data through biologically informed pathway priors. The framework combines predictive modeling with mechanistic interpretation to characterize regulatory programs underlying cellular identity and state transitions.

Knowledge-primed neural network architecture
Multimodal single-cell integration
Pathway-constrained latent representations
Regulatory program interpretation

SpatialMMKPNN

A graph-based extension of MM-KPNN designed for spatial transcriptomics. The framework integrates neighborhood structure, cell–cell communication signals, and biological pathway knowledge to model tissue organization while preserving interpretability.

Graph neural network framework
Spatial transcriptomics integration
Cell–cell communication modeling
Tissue ecosystem characterization

📂 Additional Repositories

Organoid Analysis Framework

Computational framework integrating transcriptomic, proteomic, and phenotypic measurements to evaluate organoid fidelity, characterize biological variability, and support translational disease modeling.