Computational biologist developing computational methods and analytical frameworks for statistical genomics, multiomics integration, single-cell biology, spatial transcriptomics, and biologically informed machine learning. My work focuses on integrating molecular and clinical data, modeling complex biological systems, and developing reproducible computational approaches that improve biological interpretation, methodological rigor, and translational insight.
My research lies at the intersection of computational biology, statistical genomics, systems biology, and machine learning. I develop and evaluate computational approaches for integrating high-dimensional molecular data, characterizing cellular heterogeneity, modeling biological interactions, and extracting biologically meaningful insight from complex genomic and spatial datasets.
Current areas of interest include multiomics integration, biological representation learning, cell–cell communication, interpretable machine learning, statistical genomics, molecular epidemiology, aging biology, and computational frameworks that improve the reproducibility and biological validity of modern analytical methods.
The repositories below contain research software, analytical frameworks, computational methods, and reproducible analytical workflows supporting ongoing work in statistical genomics, machine learning, multiomics integration, single-cell biology, spatial transcriptomics, and systems biology.
Associated manuscript. Yepes S. (2026). Layer- and regime-dependent interpretability in concept bottleneck models for single-cell transcriptomics. DOI: https://doi.org/10.5281/zenodo.19476507
A comprehensive evaluation framework for concept bottleneck models in single-cell transcriptomics. The study investigates whether learned concepts remain biologically meaningful across datasets, architectures, and prediction regimes through concept selectivity analyses, latent-space geometry, and biological validation.
- Concept bottleneck model evaluation
- Concept-space geometry assessment
- Biological concept validation
- Cross-regime interpretability benchmarking
Associated manuscript. Yepes S. (2026). Geometry-aware ligand–receptor analysis reveals tumor communication patterns.
DOI: https://doi.org/10.5281/zenodo.19476574
A geometry-aware framework for prioritizing biologically meaningful ligand–receptor interactions in spatial transcriptomics. The approach distinguishes spatial proximity from true interface enrichment, enabling more robust characterization of cellular communication programs across diverse tissue microenvironments.
- Spatially informed ligand–receptor prioritization
- Geometry-preserving null models
- Distance-weighted boundary scoring
- Cross-tissue communication analysis
Associated manuscript. Yepes S. (2026). Falsification-based evaluation of interpretability in spatial pathway models.
DOI: https://doi.org/10.5281/zenodo.19476625
A methodological framework for testing interpretability claims in pathway-informed spatial models. By systematically challenging biological assumptions through controlled perturbations, the framework evaluates whether model explanations remain robust, identifiable, and biologically meaningful.
- Falsification-based interpretability evaluation
- Structural perturbation testing
- Pathway identifiability assessment
- Methodological validation framework
An interpretable multimodal neural network integrating scRNA-seq and scATAC-seq data through biologically informed pathway priors. The framework combines predictive modeling with mechanistic interpretation to characterize regulatory programs underlying cellular identity and state transitions.
- Knowledge-primed neural network architecture
- Multimodal single-cell integration
- Pathway-constrained latent representations
- Regulatory program interpretation
A graph-based extension of MM-KPNN designed for spatial transcriptomics. The framework integrates neighborhood structure, cell–cell communication signals, and biological pathway knowledge to model tissue organization while preserving interpretability.
- Graph neural network framework
- Spatial transcriptomics integration
- Cell–cell communication modeling
- Tissue ecosystem characterization
Computational framework integrating transcriptomic, proteomic, and phenotypic measurements to evaluate organoid fidelity, characterize biological variability, and support translational disease modeling.
- Multiomic integration
- Reproducible analytical workflows
- Biological heterogeneity assessment
Framework for characterizing tissue architecture and cellular organization using spatial transcriptomic profiling.
- Spatial domain identification
- Tissue architecture analysis
- Regional heterogeneity characterization
End-to-end workflow for structural variant discovery, annotation, and interpretation using long-read sequencing datasets.
- Structural variant detection
- Clinical annotation workflows
- Long-read genomics analysis
Framework for rare variant association studies and gene-level burden testing in genomic cohorts.
- SKAT and SKAT-O implementation
- Population structure correction
- Statistical genetics workflows
Systems biology workflow for reconstructing gene regulatory networks and identifying molecular programs associated with biological phenotypes.
- Network inference
- Regulatory program discovery
- Systems biology analyses
Gene co-expression analysis pipeline for identifying expression modules, hub genes, and biologically relevant network structure.
- Module identification
- Hub gene discovery
- Functional network characterization
🔗 GitHub: https://github.com/Sally332