Skip name-sort (perf) + rename to filtersam-tools + PyPI publishing#8
Merged
Conversation
The identity and matched filters evaluate each segment independently, so the parallel chunks do not need reads grouped by query name. Pass sort_by_name=False to parallelizeBAMoperation so it skips the full serial samtools name-sort that dominated runtime on large files and made parallel mode slower than single-processor mode. Requires the sort_by_name option added in parallelbam 0.0.20, so: - bump the dependency to parallelbam>=0.0.20 (setup.py, requirements.txt); - CI installs parallelbam from the repo until 0.0.20 is on PyPI. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Re-publishing under a new PyPI account requires a new distribution name (the old `filtersam` name belongs to the previous account). The import name is unchanged (`import filtersam`); only `pip install` changes to `filter-sam`. The parallelbam dependency is likewise renamed to `parallel-bam>=0.0.20`. - setup.py: NAME -> 'filter-sam'; dependency -> 'parallel-bam>=0.0.20'. - requirements.txt: parallel-bam>=0.0.20. - pyproject.toml: declare the setuptools build backend (PEP 517). - publish.yml: build + publish to PyPI via Trusted Publishing (OIDC) on GitHub Release. - tests.yml: install parallel-bam from the parallelBAM repo without asserting the distribution name, so CI is robust to the rename until parallel-bam 0.0.20 is on PyPI. Verified locally: builds as filter_sam-0.0.11 (Requires-Dist parallel-bam>=0.0.20), twine check passes, 32 tests pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…m/parallel-bam too similar to existing PyPI projects) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR does two related things needed to ship the parallel-mode speedup under the new PyPI account.
1. Performance: skip parallelbam's name-sort (closes #3)
Both filters evaluate each segment independently, so parallel chunks don't need reads grouped by query name. We pass
sort_by_name=FalsetoparallelizeBAMoperation, skipping the full serialsamtools sort -nthat made-pslower than single-processor mode. (Relies on thesort_by_nameoption in parallelBAM, Robaina/parallelBAM#3.)2. Rename distribution to
filtersam-tools+ PyPI publishingRe-publishing under a new PyPI account needs a new distribution name.
filtersam/filter-samare rejected (PyPI flagsfilter-samas "too similar" tofiltersam), so the new distribution isfiltersam-tools. Import name unchanged (import filtersam); onlypip installbecomesfiltersam-tools. Dependency renamed toparallelbam-tools>=0.0.20to match Robaina/parallelBAM#4.setup.py:NAME→filtersam-tools; dependency →parallelbam-tools>=0.0.20.requirements.txt:parallelbam-tools>=0.0.20.pyproject.toml(new): setuptools PEP 517 build backend..github/workflows/publish.yml(new): build + publish to PyPI via Trusted Publishing (OIDC) on a published GitHub Release..github/workflows/tests.yml: install parallelbam-tools from the parallelBAM repo without asserting the dist name (robust to the rename) until it is on PyPI.Required PyPI setup (one-time)
On https://pypi.org/manage/account/publishing/, add a pending publisher:
filtersam-tools· Owner:Robaina· Repo:filterSAM· Workflow:publish.yml· Environment:pypiMerge order
Merge Robaina/parallelBAM#4 (the
parallelbam-toolsrename) first, then this PR.Verification
Builds as
filtersam_tools-0.0.11(Requires-Dist: parallelbam-tools>=0.0.20),twine checkpasses, full suite 32 passed (incl. the end-to-end parallel run usingsort_by_name=False).