Use-case: taxonomic tree

In this exmaple, we will use the output of the lineage function to build a tree in the Phylo.jl package, then plot it. This example also serves as a showcase for the support of AbstractTrees.jl.

using Plots
using Phylo
using NCBITaxonomy
using AbstractTrees

We will focus on the Lemuriformes infra-order:

tree_root = ncbi"Lemuriformes"
Lemuriformes (ncbi:376915)

We will first create a tree by adding the species as tips – some of the taxa are sub-species, but that's OK (we will visualize it later anyways). Because we support the AbstractTrees interface, we can use the Leaves iterator here:

tree_leaves = collect(Leaves(tree_root))
151-element Vector{NCBITaxon}:
 Lemur catta (ncbi:9447)
 Lemur sp. (ncbi:69491)
 Varecia variegata variegata (ncbi:87289)
 Varecia rubra (ncbi:554167)
 Eulemur coronatus (ncbi:13514)
 Eulemur fulvus mayottensis (ncbi:27680)
 Eulemur fulvus fulvus (ncbi:40322)
 Eulemur fulvus collaris (ncbi:47178)
 Eulemur fulvus albocollaris (ncbi:122224)
 Eulemur macaco macaco (ncbi:30603)
 ⋮
 Phaner furcifer (ncbi:261734)
 Phaner pallescens (ncbi:568393)
 Phaner electromontis (ncbi:1313324)
 Phaner parienti (ncbi:1313325)
 Palaeopropithecus ingens (ncbi:1513477)
 Palaeopropithecus maximus (ncbi:1597978)
 Palaeopropithecus sp. UA4466 (ncbi:322025)
 Palaeopropithecus sp. UA6184 (ncbi:322026)
 Palaeopropithecus sp. KPK-2005 (ncbi:322027)

We can double-check that these taxa all have the correct common ancestor:

commonancestor(tree_leaves)
Lemuriformes (ncbi:376915)

At this point, we can start creating our tree object. Before we do this, we will add a few overloads to the Phylo.jl functions:

Phylo.RootedTree(taxa::Vector{NCBITaxon}) = RootedTree([t.name for t in taxa])
Phylo._hasnode(tr::RootedTree, tax::NCBITaxon) = Phylo._hasnode(tr, tax.name)
Phylo._getnode(tr::RootedTree, tax::NCBITaxon) = Phylo._getnode(tr, tax.name)
Phylo._createnode!(tr::RootedTree, tax::NCBITaxon) = Phylo._createnode!(tr, tax.name)
tree = RootedTree(tree_leaves)
Phylo.RootedTree with 151 tips, 151 nodes and 0 branches.
Leaf names are Lemur catta, Lemur sp., Varecia variegata variegata, Varecia rubra, Eulemur coronatus, ... [145 omitted] ... and Palaeopropithecus sp. KPK-2005

The next step is to look at the lineage of all taxa, and add the required nodes and connections between them. We are setting a value of 1.0 as the distance between two taxonomic ranks, which might not be the best choice, but this is for illustration only. Note that we use the PostOrderDFS tree iteration, which guarantees that children will be visited before the parents, so we can then use the children function to get the relationships.

for node in AbstractTrees.PostOrderDFS(tree_root)
    hasnode(tree, node) || createnode!(tree, node)
    sub_nodes = AbstractTrees.children(node)
    if ~isempty(sub_nodes)
        for sub_node in sub_nodes
            createbranch!(tree, node, sub_node, 1.0)
        end
    end
end

We can finally plot the tree:

sort!(tree, rev=true)
Plots.plot(tree, treetype=:fan)