Skip to content

Robaina/Pynteny

logo

Synteny-aware hmm searches made easy

tests codecov docs

Project Status: Active - The project has reached a stable, usable state and is being actively developed. license Contributor Covenant

PyPI version GitHub release

python Code style: black

pyOpenSci DOI

1. 💡 What is Pynteny?

Pynteny is Python tool to search for synteny blocks in (prokaryotic) sequence data through HMMs of the ORFs of interest and HMMER. By leveraging genomic context information, Pynteny can be employed to decrease the uncertainty of functional annotation of unlabelled sequence data due to the effect of paralogs. Pynteny can be accessed (i) through the command line or (ii) as a Python module.

Get more info in the documentation pages!

Check out the Pynteny paper in the Journal of Open Source Software!

2. 🔧 Setup

Pynteny is a pure-Python package (it no longer requires conda or any external binaries). All dependencies, including the HMMER and Prodigal engines, are provided by the pip packages pyhmmer and pyrodigal.

Install with pip:

pip install pynteny

Or install the latest development version directly from GitHub:

pip install git+https://github.com/Robaina/Pynteny.git

Check that the installation worked fine:

pynteny --help

2.1. Installing on Windows

Pynteny is developed and tested on Linux, but since it is now a pure-Python package, pip install pynteny also works on Windows and macOS (including Apple Silicon / ARM64), provided wheels are available for pyhmmer and pyrodigal on your platform.

3. 🚀 Usage

Consider the following toy example of a syntenic block:

synteny example

Here, we are interested in four genes which colocate according to the pattern above: genes A-C show consecutive locations in the positive strand, followed by three (untargeted) genes and followed by gene D, which is located in the negative strand.

Pynteny can be run either as a command line tool or as a Python module. To run pynteny in the command line, execute:

pynteny <subcommand> <options>

pynyeny-cli

There are a number of available subcommands, which can be explored in the documentation pages.

For intance, to first download the PGAP's database containing a collection of profile HMMs as well as metadata:

pynteny download --outdir data/hmms --unpack

Next, to build a labelled peptide database from DNA assembly data:

pynteny build \
    --data assembly.fa \
    --outfile labelled_peptides.faa

Finally, to search the peptide database for the syntenic structure displayed above: >gene_A 0 >gene_B 0 >gene_C 3 <gene_D, and using the downloaded PGAP database:

pynteny search \
    --synteny_struc ">gene_A 0 >gene_B 0 >gene_C 3 <gene_D" \
    --data labelled_peptides.faa \
    --outdir results/ \
    --gene_ids

4. 📔 Examples

Here are some Jupyter Notebooks with examples to show how Pynteny works:

You can find more notebooks in the examples directory. Find more info in the documentation.

For longer, end-to-end worked analyses on real genomes, see the case studies:

5. 🔄 Dependencies

Pynteny would not work without these awesome projects:

Thanks!

6. :octocat: Contributing

Contributions are always welcome! If you don't know where to start, you may find an interesting issue to work in here. Please, read our contribution guidelines first.

7. ✒️ Citation

If you use this software, please cite it as below:

Semidán Robaina Estévez. (2023). Pynteny: synteny-aware hmm searches made easy (Version 1.0.0). Zenodo. https://zenodo.org/record/7696204