Skip to content

Add eFP/ePlant gene ID validation, microarray probeset support, and m…#324

Open
rmobmina wants to merge 1 commit into
BioAnalyticResource:devfrom
rmobmina:cleaned-endpoint
Open

Add eFP/ePlant gene ID validation, microarray probeset support, and m…#324
rmobmina wants to merge 1 commit into
BioAnalyticResource:devfrom
rmobmina:cleaned-endpoint

Conversation

@rmobmina

Copy link
Copy Markdown
Contributor

No description provided.

…aster DB list

Adds per-eFP-project regex validation (EFP_PROJECT_REGEXES, is_efp_gene_valid)
covering both canonical gene IDs and microarray probeset IDs, with a
database-to-project mapping (DATABASE_EFP_PROJECT) so gene_expression.py can
validate against the right format per database.

Adds the eFP+ePlant discovery/validation pipeline:
- scrape_view_databases.py / scrape_species_view_info.py: live-discover every
  eFP and ePlant view across 55 sites, tagging eFP-only vs ePlant-only vs both
- build_proj_id_view_mapping.py: resolves multi-paper databases (e.g.
  atgenexp_*) to per-view proj_id breakdowns
- validate_db_regex_coverage.py: tests every database's real sample IDs
  against production validation, including legacy databases no longer linked
  from a live dropdown
- build_master_db_list.py: rolls everything up by species, tagging source
  (efp/eplant/both/legacy) and platform (microarray/rna_seq)

193 databases validated, 0 unexplained failures.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant