The Data

The Zoonomia Project is comparing genomes of diverse mammals to understanding the basis of remarkable phenotypes, and the origins of disease. In addition to the 131 new assemblies produced, we have generated three types of comparative data.

*Please note that these files are quite large, and will occupy a large amount of computer memory.

Assemblies

To access genome assemblies used in our Zoonomia alignment, please visit the “mammalian tree” and click on your species of interest.

Alignment

Our 240-species Cactus genome alignment (file format: .hal) was made without reference to any single genome. As a result, it includes both regions shared across eutherian mammals, and regions unique to specific lineages. To reference the Cactus alignment, please cite Zoonomia’s original white paper

TOGA annotation of orthologous genes

Annotation of genes across the 240 Zoonomia species —plus many more — inferred using TOGA (Tool to infer Orthologs from Genome Alignments). Files are available as gtf and bed12, with options to reference on human, mouse, chicken, and several other species. Multiple codon alignments are also available. To reference the  TOGA alignment, please cite Kirilenko et al.

Conservation Scores

Conservation scores calculated from the Zoonomia alignment identify sites and regions under purifying selection. With 240 species, we find 3.1% of sites in the human genome to be under purifying selection using PhyloP, with a false discover rate threshold of 5%. To reference conservation scores, please site Zoonomia’s flagship paper on mammalian evolution. 

Phylogeny files

A Newick-form tree with PHAST estimated branch lengths from 242-way tree.