Phylogenetic Tree (Dendrogram) with R Programming

For the analysis and construction of the phylogenetic tree, R Programming is widely used. In phylogenetics, most of the R packages focus on specific statistical analysis rather than viewing and annotating it in a more generalized form. Phylogenetics has helped researchers to understand the evolution of current-day species and has helped to gain knowledge related to how species have evolved, along with the explanation of similarities and dissimilarities among the species. It helps to analyze the similarities among the species related to different diseases and viruses which definitely helps in the prescription of vaccines against them.

What is Reverse Complement?

DNA is a double-stranded right-handed helix structure, and these two strands run in the opposite direction. One strand runs from 5’ to 3’ from top to bottom whereas the other strand runs from 3’ to 5’ top to bottom. These two strands bind with each other with the Hydrogen Bonds.

Phylogenetic Tree with R Programming
Phylogenetic Tree with R Programming

DNA strand is always read from 5’ to 3’ region and its complementary strand is 3’ to 5’ that is:

For example:

5’ ATTCTGCAT 3’ (original strand)

3’ TAAGACGTA 5’ (complementary strand)

Now, this complementary strand is correct in biology but in R-language it is incorrect and we should always write in a 5’ to 3’ direction. To understand it clearly we can take the following example:

Original = 5’ ATTCTGCAT 3’

Complement = 3’ TAAGACGTA 5’

Reverse complement = 5’ ATGCAGAAT 3’

Learning code for reverse complement

#load libraries
library(Biostrings)

#load sequences
seq <- DNAString(“ATCGTGCAATTGCCCGATACGT”)

#read sequences from a fasta file
seq <- readDNAStringSet(“seq.fa”)
print(seq)

#computing the length of the sequence
len <- length(seq)
print(len)

#creating reverse complement
rev = reversecomplement(seq)
Print(rev)

#viewing sequence in a browser
BrowseSeqs(seq, highlight=0)

#writing output in a file 
writeXStringSet(rev, file=”./output-Rev-comp.fa”)

#computing the frequency of each nucleotide in the sequences.
c= letterFrequency(seq,”C”,as.prob = T)
print(c)
g= letterFrequency(seq,”G”,as.prob = T)
print(g)
a= letterFrquency(seq,”A”,as.prob = T)
print(a)
t= letterFrquency(seq,”T”,as.prob = T)
print(t)

Phylogenetic analysis with Dendrogram in R Programming

The phylogenetic analysis provides information about the relationship and the level of genetic diversity within and among the species. In recent times, a huge amount of data in the molecular genetic area has been produced with the help of different computational programs and In-Silico approaches. 

  • R supports open modifications to its libraries as it is an open-source environment.
  • R is one of the popular platforms for bioinformatics analysis because of its powerful analytical tools available in an open-source package such as ape (Paradis, Claude, and Strimmer, Phangorn, and Phytools.
  • In R a comprehensive array of tools has been developed for manipulation, analysis, and visualization of trees in matrix format.

What is Dendrogram?

A dendrogram is a diagram of trees that represents and shows the hierarchical relationship between and among the species. A dendrogram also called a phylogenetic tree, is a diagram that shows the evolutionary interrelationship between the group of organisms derived from the common ancestral. It helps in the hierarchical clustering of different observations or species.

The dendrogram is generated using the function called as.dendrogram

Phylogram packages:

In R we use the packages called phylogram while constructing and analyzing the interrelationship. This package also contains functions for importing and exporting dendrogram objects, as well as several other functions for the manipulations of trees. 

Importing and exporting trees

The Newick standard is a computer-readable format that works with most of the tree-editing software and represents a universal phylogenetic tree. The phylogram function read.dendrogram along with the different ape packages converts the ‘phylo’ objects to the dendrogram. This function supports both weighted edges, and rooted and unrooted trees.

Example of importing and exporting a tree from a string:

Let us consider the simple example of a tree with three members named “A”, “B”, and “C”. B and C are more closely related to each other and then are related to A. then the unweighted Newick string for this tree will be (A, (B, C)) and the function used for dendrogram is read.dendrogram.

Code for it

library(phylogram)
x <- read.dendrogram(text = “(A, (B,C));”)
plot(x, yaxt = “n”)

#write the object back to the console in Newick format without edge weights:
Write.dendrogram(x, edge = FALSE)
#> [1] “(A, (B,C));”

Converting tree objects

The conversion of a dendrogram to “phylo” objects and conversion of “phylo” objects to a dendrogram can be done using the function as.phylo.dendrogram and as.dendrogram.phylo methods. Similarly, the function as.dendrogram(as.hclust(phy)) ) methods retains all weighted edges and do not require trees to be ultrametric. We should also note that other packages may employ the same function names, and these methods may depend on the order of packages that are loaded. For this purpose, it is safer to use the full functions; for example phylogram::as.phylo.dendrogram(x) and phylogram::as.dendrogram.phylo(x) while using this method.

Example of converting a “phylo” object to a dendrogram

For conversion of “phylo” and dendrogram objects, a common application is required called plotting tanglegrams which is applicable for visualizing incongruencies between two phylogenetic trees. The function tanglegram is used for the versatile plotting of two trees and indicating the discordant nodes. But the non-ultrametric “phylo” objects are not supported by it.

Codes used:

library(ape)
data(woodmouse)

## generate distance matrices for each section of the alignment
dist1 <- dist.dna(woodmouse[, 1:482])
dist2 <- dist.dna(woodmouse[, 483:965])

## build neighbor-joining trees
phy1 <- nj(dist1)
phy2 <- nj(dist2)

## root with No0912S as outgroup
phy1 <- root(phy1, "No0912S")
phy2 <- root(phy2, "No0912S")

## convert phylo objects to dendrograms
dnd1 <- as.dendrogram(phy1)
dnd2 <- as.dendrogram(phy2)

## rearrange in ladderized fashion
dnd1 <- ladder(dnd1)
dnd2 <- ladder(dnd2)

## plot the tanglegram
dndlist <- dendextend::dendlist(dnd1, dnd2)
dendextend::tanglegram(dndlist, fast = TRUE, margin_inner = 5)

Tree editing or manipulation:

The phylogram packages facilitate several other functions for the manipulation of trees. The function prune is used for the removal of leaf nodes and internal branching nodes. Similarly, the function ladder rearranges the tree for better visualization, sorting nodes by the number of members. Another function used for the visualization of the tree is as.cladogram, which resets the height of all terminal leaf nodes to zero and resets the height of inner nodes by single incremental units. The function reposition scales the heights of all nodes in a tree and the function remidpoint collects all midpoints, members, and leaf attributes while converting to dendrogram objects.

Codes for building and manipulating dendrograms

x <- list(1, list(2, 3))

## attach "leaf" and "label" attributes to leaf nodes
attr(x[[1]], "leaf") <- TRUE
attr(x[[2]][[1]], "leaf") <- attr(x[[2]][[2]], "leaf") <- TRUE
attr(x[[1]], "label") <- "A"
attr(x[[2]][[1]], "label") <- "B"
attr(x[[2]][[2]], "label") <- "C"

## set "height" attributes for all nodes
attr(x, "height") <- 2
attr(x[[1]], "height") <- 0
attr(x[[2]], "height") <- 1
attr(x[[2]][[1]], "height") <- attr(x[[2]][[2]], "height") <- 0

## set "midpoints" attributes for all nodes
attr(x, "midpoint") <- 0.75
attr(x[[1]], "midpoint") <- 0
attr(x[[2]], "midpoint") <- 0.5
attr(x[[2]][[1]], "midpoint") <- attr(x[[2]][[2]], "midpoint") <- 0

## set "members" attributes for all nodes
attr(x, "members") <- 3
attr(x[[1]], "members") <- 1
attr(x[[2]], "members") <- 2
attr(x[[2]][[1]], "members") <- attr(x[[2]][[2]], "members") <- 1

## set class as "dendrogram" 
## Note that setting the class for the root node
## automatically sets the class of all nested subnodes
class(x) <- "dendrogram"
x
#> 'dendrogram' with 2 branches and 3 members total, at height 2

Similarly, this simple tree can be recreated more briefly using the function of the phylogram packages.

Example:

x <- list(1, list(2, 3))

## recursively set class, midpoint, members and leaf attributes
x <- remidpoint(x)

## set incremental height attributes
x <- as.cladogram(x)

## set label attributes using dendrapply
set_label <- function(node){
  if(is.leaf(node)) attr(node, "label") <- LETTERS[node]
  return(node)
}
x <- dendrapply(x, set_label)
x

#> 'dendrogram' with 2 branches and 3 members total, at height 2

The following code demonstrates the tree with sister species A and B and C as the ancestor.

## isolate root node (species C)
ancestor <- prune(x, pattern = "C", keep = TRUE) 

## alternative option using subset operator
ancestor <- x[[2]][[2]]

## create subtree without species C
subtree <- prune(x, pattern = "C")

## graft subtree onto root
x <- list(ancestor, subtree)

## set attributes as above
x <- as.cladogram(remidpoint(x))

## plot dendrogram
plot(x, yaxt = "n")

Conclusion

Phylogenetics is one of the best tools for understanding the spread of diseases including the transmission of the human immunodeficiency virus (HIV) and the origin of severe acute respiratory syndrome (SARS) associated coronavirus. Recently, the information from the DNA, RNA, or protein is extracted for analysis during phylogenetics. Basically, researchers follow different methods of sequence analysis for the generation of a phylogenetic tree and the limitation associated with the alignment methods leads to the alignment-free sequence analysis.

References

  1. Yu, G., Smith, D.K., Zhu, H., Guan, Y. and Lam, T.T.-Y. (2017), ggtree: an r package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods Ecol Evol, 8: 28-36.
  2. Revell, L.J. (2012), phytools: an R package for phylogenetic comparative biology (and other things). Methods in Ecology and Evolution, 3: 217-223.
  3. Toparslan E, Karabag K, Bilge U. A workflow with R: Phylogenetic analyses and visualizations using mitochondrial cytochrome b gene sequences. PLoS One. 2020 Dec 15;15(12):e0243927.
  4. Munjal G, Hanmandlu M, Srivastava S. Phylogenetics Algorithms and Applications. Ambient Communications and Computer Systems. 2018 Dec 10;904:187–94.
  5. Tal Galili, dendextend: an R package for visualizing, adjusting and comparing trees of hierarchical clustering, Bioinformatics, Volume 31, Issue 22, 15 November 2015, Pages 3718–3720.
  6. https://learn.omicslogic.com/R-Code/course-3-genomics/lesson/04-dna-replication-and-reverse-complements-in-r
  7. https://www.r-bloggers.com/2008/11/r-function-to-reverse-and-complement-a-dna-sequence/
  8. https://medium.com/biosyntax/reverse-and-find-complement-sequence-in-r-baf33847aab1
  9. https://rdrr.io/bioc/DECIPHER/man/BrowseSeqs.html
  10. https://www.britannica.com/science/phylogenetic-tree
  11. https://www.sciencedirect.com/topics/agricultural-and-biological-sciences/dendrogram
  12. https://www.displayr.com/what-is-dendrogram/
  13. https://evolution.genetics.washington.edu/phylip/newicktree.html
  14. https://cran.r-project.org/web/packages/phylogram/vignettes/phylogram-vignette.html
  15. https://www.nature.com/scitable/topicpage/reading-a-phylogenetic-tree-the-meaning-of-41956/

Leave a Comment