-
Notifications
You must be signed in to change notification settings - Fork 1
Home
Phylogenetic trees are tree structures that depict relationships between organisms. Popular analysis techniques often produce large collections of hypothetical trees, which can be expensive to store. This can also make the exchange of such data also difficult. TreeZip compresses phylogenetic trees based on the shared evolutionary relationships. In our experiments, TreeZip has been shown to be very effective, typically compressing a tree file to less than 2% of its original size. When coupled with standard compression methods such as 7zip, TreeZip can compress a file to less than 1% of its original size.
With TreeZip 3.0, we've added support for heterogeneous collections of trees. This enables scientists to rapidly identify common relationships between large, disparate analyses. Our experiemental results suggest that TreeZip averages 89.03% space savings on unweighted datsets, and 72.69% on weighted datasets, where the level of heterogeneity is moderate. Since the TreeZip compressed (TRZ) file is text, it can be futher compressed with general-purpose compression methods such as 7zip, if additional space savings is necessary.
TreeZip is funded by NSF grants DEB-0629849, IIS-0713168, and IIS-1018785. Funding was also provided by the Texas A&M University Dissertation Fellowship Program.
-
Matthews, SJ. Heterogeneous Compression of Large Collections of Evolutionary Trees. IEEE/ACM Transactions on Computational Biology and Bioinformatics:Special Issues on Software and Databases, to appear. November 2014.
-
Matthews SJ, Williams TL. An efficient and extensible approach for compressing phylogenetic trees. BMC Bioinformatics 12(Suppl 10), volume 12, S16, 2011.link
-
Matthews SJ, Sul S, and Williams TL. A Novel Approach for Compressing Phylogenetic Trees. In Bioinformatics Research and Applications, pp. 113 – 124, 2010. link