Skip to content
View Sally332's full-sized avatar

Block or report Sally332

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Sally332/README.md

Computational Biologist | Statistical Genomics, Multiomics Integration & Biomedical Data Science

Computational biologist developing computational methods and analytical frameworks for statistical genomics, multiomics integration, single-cell biology, spatial transcriptomics, and biologically informed machine learning. My work focuses on integrating molecular and clinical data, modeling complex biological systems, and developing reproducible computational approaches that improve biological interpretation, methodological rigor, and translational insight.

🔬 Research Focus

My research lies at the intersection of computational biology, statistical genomics, systems biology, and machine learning. I develop and evaluate computational approaches for integrating high-dimensional molecular data, characterizing cellular heterogeneity, modeling biological interactions, and extracting biologically meaningful insight from complex genomic and spatial datasets.

Current areas of interest include multiomics integration, biological representation learning, cell–cell communication, interpretable machine learning, statistical genomics, molecular epidemiology, aging biology, and computational frameworks that improve the reproducibility and biological validity of modern analytical methods.

🧬 Research Software & Method Development

The repositories below contain research software, analytical frameworks, computational methods, and reproducible analytical workflows supporting ongoing work in statistical genomics, machine learning, multiomics integration, single-cell biology, spatial transcriptomics, and systems biology.

Associated manuscript. Yepes S. (2026). Layer- and regime-dependent interpretability in concept bottleneck models for single-cell transcriptomics. DOI: https://doi.org/10.5281/zenodo.19476507

A comprehensive evaluation framework for concept bottleneck models in single-cell transcriptomics. The study investigates whether learned concepts remain biologically meaningful across datasets, architectures, and prediction regimes through concept selectivity analyses, latent-space geometry, and biological validation.

  • Concept bottleneck model evaluation
  • Concept-space geometry assessment
  • Biological concept validation
  • Cross-regime interpretability benchmarking

Associated manuscript. Yepes S. (2026). Geometry-aware ligand–receptor analysis reveals tumor communication patterns.
DOI: https://doi.org/10.5281/zenodo.19476574

A geometry-aware framework for prioritizing biologically meaningful ligand–receptor interactions in spatial transcriptomics. The approach distinguishes spatial proximity from true interface enrichment, enabling more robust characterization of cellular communication programs across diverse tissue microenvironments.

  • Spatially informed ligand–receptor prioritization
  • Geometry-preserving null models
  • Distance-weighted boundary scoring
  • Cross-tissue communication analysis

Associated manuscript. Yepes S. (2026). Falsification-based evaluation of interpretability in spatial pathway models.
DOI: https://doi.org/10.5281/zenodo.19476625

A methodological framework for testing interpretability claims in pathway-informed spatial models. By systematically challenging biological assumptions through controlled perturbations, the framework evaluates whether model explanations remain robust, identifiable, and biologically meaningful.

  • Falsification-based interpretability evaluation
  • Structural perturbation testing
  • Pathway identifiability assessment
  • Methodological validation framework

An interpretable multimodal neural network integrating scRNA-seq and scATAC-seq data through biologically informed pathway priors. The framework combines predictive modeling with mechanistic interpretation to characterize regulatory programs underlying cellular identity and state transitions.

  • Knowledge-primed neural network architecture
  • Multimodal single-cell integration
  • Pathway-constrained latent representations
  • Regulatory program interpretation

A graph-based extension of MM-KPNN designed for spatial transcriptomics. The framework integrates neighborhood structure, cell–cell communication signals, and biological pathway knowledge to model tissue organization while preserving interpretability.

  • Graph neural network framework
  • Spatial transcriptomics integration
  • Cell–cell communication modeling
  • Tissue ecosystem characterization

📂 Additional Repositories

Computational framework integrating transcriptomic, proteomic, and phenotypic measurements to evaluate organoid fidelity, characterize biological variability, and support translational disease modeling.

  • Multiomic integration
  • Reproducible analytical workflows
  • Biological heterogeneity assessment

Framework for characterizing tissue architecture and cellular organization using spatial transcriptomic profiling.

  • Spatial domain identification
  • Tissue architecture analysis
  • Regional heterogeneity characterization

End-to-end workflow for structural variant discovery, annotation, and interpretation using long-read sequencing datasets.

  • Structural variant detection
  • Clinical annotation workflows
  • Long-read genomics analysis

Framework for rare variant association studies and gene-level burden testing in genomic cohorts.

  • SKAT and SKAT-O implementation
  • Population structure correction
  • Statistical genetics workflows

Systems biology workflow for reconstructing gene regulatory networks and identifying molecular programs associated with biological phenotypes.

  • Network inference
  • Regulatory program discovery
  • Systems biology analyses

Gene co-expression analysis pipeline for identifying expression modules, hub genes, and biologically relevant network structure.

  • Module identification
  • Hub gene discovery
  • Functional network characterization

📫 Contact

📧 sallyepes233@gmail.com

🔗 GitHub: https://github.com/Sally332

Pinned Loading

  1. Data-Transfer-Pipeline Data-Transfer-Pipeline Public

    Shell

  2. Perturbation-MMKPNN Perturbation-MMKPNN Public

    Interpretable perturb-seq modeling with a pathway/TF concept bottleneck — predicts effects and identifies stable regulatory drivers that generalize across datasets.

    Python

  3. Spatial_Mapping Spatial_Mapping Public

    Interpretable spatial transcriptomics analysis of breast tumors and lymph node metastases. Mapping of tumor architecture and regional signaling using 10x Visium data

    Jupyter Notebook