Skip to content

Skip name-sort (perf) + rename to filtersam-tools + PyPI publishing#8

Merged
Robaina merged 3 commits into
mainfrom
perf/skip-name-sort
Jun 19, 2026
Merged

Skip name-sort (perf) + rename to filtersam-tools + PyPI publishing#8
Robaina merged 3 commits into
mainfrom
perf/skip-name-sort

Conversation

@Robaina

@Robaina Robaina commented Jun 19, 2026

Copy link
Copy Markdown
Owner

This PR does two related things needed to ship the parallel-mode speedup under the new PyPI account.

1. Performance: skip parallelbam's name-sort (closes #3)

Both filters evaluate each segment independently, so parallel chunks don't need reads grouped by query name. We pass sort_by_name=False to parallelizeBAMoperation, skipping the full serial samtools sort -n that made -p slower than single-processor mode. (Relies on the sort_by_name option in parallelBAM, Robaina/parallelBAM#3.)

2. Rename distribution to filtersam-tools + PyPI publishing

Re-publishing under a new PyPI account needs a new distribution name. filtersam/filter-sam are rejected (PyPI flags filter-sam as "too similar" to filtersam), so the new distribution is filtersam-tools. Import name unchanged (import filtersam); only pip install becomes filtersam-tools. Dependency renamed to parallelbam-tools>=0.0.20 to match Robaina/parallelBAM#4.

  • setup.py: NAMEfiltersam-tools; dependency → parallelbam-tools>=0.0.20.
  • requirements.txt: parallelbam-tools>=0.0.20.
  • pyproject.toml (new): setuptools PEP 517 build backend.
  • .github/workflows/publish.yml (new): build + publish to PyPI via Trusted Publishing (OIDC) on a published GitHub Release.
  • .github/workflows/tests.yml: install parallelbam-tools from the parallelBAM repo without asserting the dist name (robust to the rename) until it is on PyPI.

Required PyPI setup (one-time)

On https://pypi.org/manage/account/publishing/, add a pending publisher:

  • Project: filtersam-tools · Owner: Robaina · Repo: filterSAM · Workflow: publish.yml · Environment: pypi

Merge order

Merge Robaina/parallelBAM#4 (the parallelbam-tools rename) first, then this PR.

Verification

Builds as filtersam_tools-0.0.11 (Requires-Dist: parallelbam-tools>=0.0.20), twine check passes, full suite 32 passed (incl. the end-to-end parallel run using sort_by_name=False).

Robaina and others added 2 commits June 19, 2026 19:26
The identity and matched filters evaluate each segment independently, so
the parallel chunks do not need reads grouped by query name. Pass
sort_by_name=False to parallelizeBAMoperation so it skips the full serial
samtools name-sort that dominated runtime on large files and made parallel
mode slower than single-processor mode.

Requires the sort_by_name option added in parallelbam 0.0.20, so:
- bump the dependency to parallelbam>=0.0.20 (setup.py, requirements.txt);
- CI installs parallelbam from the repo until 0.0.20 is on PyPI.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Re-publishing under a new PyPI account requires a new distribution name
(the old `filtersam` name belongs to the previous account). The import
name is unchanged (`import filtersam`); only `pip install` changes to
`filter-sam`. The parallelbam dependency is likewise renamed to
`parallel-bam>=0.0.20`.

- setup.py: NAME -> 'filter-sam'; dependency -> 'parallel-bam>=0.0.20'.
- requirements.txt: parallel-bam>=0.0.20.
- pyproject.toml: declare the setuptools build backend (PEP 517).
- publish.yml: build + publish to PyPI via Trusted Publishing (OIDC) on
  GitHub Release.
- tests.yml: install parallel-bam from the parallelBAM repo without
  asserting the distribution name, so CI is robust to the rename until
  parallel-bam 0.0.20 is on PyPI.

Verified locally: builds as filter_sam-0.0.11 (Requires-Dist
parallel-bam>=0.0.20), twine check passes, 32 tests pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@Robaina Robaina changed the title Skip parallelbam name-sort in parallel mode (closes #3) Skip name-sort (perf) + rename to filter-sam + PyPI publishing Jun 19, 2026
…m/parallel-bam too similar to existing PyPI projects)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@Robaina Robaina changed the title Skip name-sort (perf) + rename to filter-sam + PyPI publishing Skip name-sort (perf) + rename to filtersam-tools + PyPI publishing Jun 19, 2026
@Robaina Robaina self-assigned this Jun 19, 2026
@Robaina Robaina merged commit c5fdc01 into main Jun 19, 2026
2 checks passed
@Robaina Robaina deleted the perf/skip-name-sort branch June 19, 2026 22:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

performance in multiprocessing mode much worse than single processor

1 participant