Pynteny is Python tool to search for synteny blocks in (prokaryotic) sequence data through HMMs of the ORFs of interest and HMMER. By leveraging genomic context information, Pynteny can be employed to decrease the uncertainty of functional annotation of unlabelled sequence data due to the effect of paralogs. Pynteny can be accessed (i) through the command line or (ii) as a Python module.
Get more info in the documentation pages!
Check out the Pynteny paper in the Journal of Open Source Software!
Pynteny is a pure-Python package (it no longer requires conda or any external binaries). All dependencies, including the HMMER and Prodigal engines, are provided by the pip packages pyhmmer and pyrodigal.
Install with pip:
pip install pyntenyOr install the latest development version directly from GitHub:
pip install git+https://github.com/Robaina/Pynteny.gitCheck that the installation worked fine:
pynteny --helpPynteny is developed and tested on Linux, but since it is now a pure-Python
package, pip install pynteny also works on Windows and macOS (including Apple
Silicon / ARM64), provided wheels are available for pyhmmer and pyrodigal on
your platform.
Consider the following toy example of a syntenic block:
Here, we are interested in four genes which colocate according to the pattern above: genes A-C show consecutive locations in the positive strand, followed by three (untargeted) genes and followed by gene D, which is located in the negative strand.
Pynteny can be run either as a command line tool or as a Python module. To run pynteny in the command line, execute:
pynteny <subcommand> <options>There are a number of available subcommands, which can be explored in the documentation pages.
For intance, to first download the PGAP's database containing a collection of profile HMMs as well as metadata:
pynteny download --outdir data/hmms --unpackNext, to build a labelled peptide database from DNA assembly data:
pynteny build \
--data assembly.fa \
--outfile labelled_peptides.faa
Finally, to search the peptide database for the syntenic structure displayed above: >gene_A 0 >gene_B 0 >gene_C 3 <gene_D, and using the downloaded PGAP database:
pynteny search \
--synteny_struc ">gene_A 0 >gene_B 0 >gene_C 3 <gene_D" \
--data labelled_peptides.faa \
--outdir results/ \
--gene_idsHere are some Jupyter Notebooks with examples to show how Pynteny works:
You can find more notebooks in the examples directory. Find more info in the documentation.
For longer, end-to-end worked analyses on real genomes, see the case studies:
- Finding the nitrogen-fixation (nif) operon, telling diazotrophs from non-fixers across three phyla, handling paralogues, and recovering an operon split by an 11-kb excision element.
- The SusC-SusD polysaccharide-utilization pair, why a syntenic pair beats a single-gene hit, disambiguating a promiscuous gene family by genomic context.
Pynteny would not work without these awesome projects:
Thanks!
Contributions are always welcome! If you don't know where to start, you may find an interesting issue to work in here. Please, read our contribution guidelines first.
If you use this software, please cite it as below:
Semidán Robaina Estévez. (2023). Pynteny: synteny-aware hmm searches made easy (Version 1.0.0). Zenodo. https://zenodo.org/record/7696204


