Pipolins constitute a new group of mobile genetic elements (MGEs) that encode a primer-independent B-family DNA polymerase (piPolB). Previously, pipolins had been identified as integrative MGEs in diverse bacteria and as circular plasmids in mitochondria and a few gram positive species. Detailed analysis of E. coli pipolins revealed pipolins are present in diverse phylogroups, encode a diverse repertoire of DNA metabolism genes, and show evidence of recent horizontal transfer between different E. coli strains.
In this repository, we have made available the code developed during our latest work on pipolins. We carried out a pipolin screening of the Assembly (NCBI) database. The analysis of the structure of pipolins revealed that they are commonly integrative elements, usually flanked by direct repeats, sharing known mobile elements integration hotspots (e.g. tRNA genes). Remarkably, integrase dynamics correlates with alternative integration spots and enables diverse lifestyles, ranging from integrative to mobilizable plasmid pipolins. Pipolins harbor a minimal core and a large cargo module enriched for defense factors, which are actively exchanged with other mobile elements. These findings indicate pipolins act as orthogonal reservoirs of defense genes that play a key role in the exchange mechanisms for defense genes in bacterial populations.
Pre-print of this work is available in bioRxiv:
Each folder in this repository contains the code used in each project sub-task:
- Screening: Download genomes from Assembly using Datasets (NCBI) and pipolin detection with ExplorePipolin
- Pipolin metadata: Parse genome metadata and ExplorePipolin results to calculate and plot statistics.
- piPolB phylog: Infer and plot piPolB phylogeny.
- Pipolin CDS clustering: Cluster proteins encoded in pipolins and reannotation.
- Candidate recombinases: Create cluster presence-abscence matrix and find candidate pipolin recombinases.
- MGE extTools annot: Parse information of different annotation tools used on pipolins and other MGEs (plasmids, phages, ciMGEs). Plot annotation resuts.
- wGRR and RG calculation: Calculate wGRR (wighted gene repertoire relatedness) among pipolins and other annotated MGEs to detect recent events of gene exchange (aka recombining genes or RGs).
Disclaimer: Some scripts may require additional software/packages, files or databases that are not included in this repository. Since some input and output files exceed Github limits, the correct execution of several scripts uploaded here could be difficult. Do not hesitate to ask the authors for any missing file that might be of your interest, and ask for help if you encounter difficulties when using the code in this repository.