Structure alignments are usually specific to protein sequences and RNA sequences. They provide the information about the secondary and tertiary structure of the protein or RNA molecule which will help in aligning the sequences more accurately.
a) DALI: It is a distance matrix alignment. It is a fragment based method for constructing structural alignments based on similarity patterns between successive hexapeptides in the query sequences. It can generate pair wise or multiple alignments and identify a query sequence's structural neighbors in the Protein Data Bank.
b) SSAP: stands for Sequential Structure Alignment Program. It is a based on dynamic programming method of structural alignment that uses atom-to-atom vectors in structure space as comparison points.
Phylogenetic Analysis
It refers to the studying of evolutionary relationships. Final goal is to construct the evolutionary tree describing the relationships of the various taxa with respect to each other. The trees illustrate the evolutionary relationships among groups of organisms, or among a family of related nucleic acid or protein sequences.
Types of building trees:
a) Cladistic: based on conserved characters trees are drawn. The method constructs a tree called cladogram where branch order is show but branch lengths are meaningless.
b) Phenetic: based on morphological characters, tree are drawn. The method constructs a tree called phylogram where both branch order and branch length are shown properly.
Building a Tree
Homologs: sequences having common origin
Orthologs: homologs derived from speciation
Paralogs: homologs derived from a common ancestral gene that underwent duplication and divergence.
Xenologs: homologs resulting from horizontal transfer of genes.
Steps of building a tree:
1) Alignment:
using multiple sequence alignment to align the sequences.
2) Determining substitution model:
There are two substitution models, DNA substitution models and Amino acid substitution models.
DNA Substitution Models: Jukes Cantor Model gives independent probability of base substitutions at all sites. Kimura gives different rates of transitions and transversions.
Amino Acid Substitution Models: There are two PAM and BLOSUM.. PAM stands for Percent Accepted Mutations and BLOSUM stands for BLocks of Amino Acid Substitution Matrix.
PAM: these matrices are a common family of score matrices. Accepted means that the mutation has been adopted by the sequence in question. For example, using PAM 250 scoring matrix means that about 250 mutations per 100 amino acids may have happened, while with PAM 10 only 10 mutations per 100 amino acids are assumed, so that only very similar sequences will reach the useful alignment scores.
BLOSUM: it is used for the sequence alignment of proteins. For exampled, BLOSUM80 is used for less divergent alignments and BLOSUM45 is used for more divergent alignments. BLOSUM has proved better at scoring distantly related sequences.
3) Tree Construction:
There are three main methods for constructing a tree.
a) Distance Methods: which is useful for recognizable similarity. Transformed Distance Methods, UPGMA, Least Square Methods, ClustalW, Neighbor joining methods, Distnj in Protml package and Darwin methods.
b) Character Based Methods: It involves the Protein Maximum likelihood, Protml, and Puzzle (a heuristic method much faster than protml.
c) Protein Maximal parsimony: This is used for strong similarity. Protpars and Paup methods are used.
4) Tree Evaluation:
To determine the reliability of the generated tree, there are two strategies which are used to evaluate the tree.
a) Bootstrap: A statistical technique that can use random re-sampling of data to determine sampling errors for tree topologies.
b) Jackknife: Replicates are creating by dropping one or more sites within each replicate.
Softwares for Phylogenetics:
1) CLUSTALX
2) PHYLIP (Phylogenetic Inference Package)
3) BioEdit
4) MEGA
5) TreeView and PAUP
a) DALI: It is a distance matrix alignment. It is a fragment based method for constructing structural alignments based on similarity patterns between successive hexapeptides in the query sequences. It can generate pair wise or multiple alignments and identify a query sequence's structural neighbors in the Protein Data Bank.
b) SSAP: stands for Sequential Structure Alignment Program. It is a based on dynamic programming method of structural alignment that uses atom-to-atom vectors in structure space as comparison points.
Phylogenetic Analysis
It refers to the studying of evolutionary relationships. Final goal is to construct the evolutionary tree describing the relationships of the various taxa with respect to each other. The trees illustrate the evolutionary relationships among groups of organisms, or among a family of related nucleic acid or protein sequences.
Types of building trees:
a) Cladistic: based on conserved characters trees are drawn. The method constructs a tree called cladogram where branch order is show but branch lengths are meaningless.
b) Phenetic: based on morphological characters, tree are drawn. The method constructs a tree called phylogram where both branch order and branch length are shown properly.
Building a Tree
Homologs: sequences having common origin
Orthologs: homologs derived from speciation
Paralogs: homologs derived from a common ancestral gene that underwent duplication and divergence.
Xenologs: homologs resulting from horizontal transfer of genes.
Steps of building a tree:
1) Alignment:
using multiple sequence alignment to align the sequences.
2) Determining substitution model:
There are two substitution models, DNA substitution models and Amino acid substitution models.
DNA Substitution Models: Jukes Cantor Model gives independent probability of base substitutions at all sites. Kimura gives different rates of transitions and transversions.
Amino Acid Substitution Models: There are two PAM and BLOSUM.. PAM stands for Percent Accepted Mutations and BLOSUM stands for BLocks of Amino Acid Substitution Matrix.
PAM: these matrices are a common family of score matrices. Accepted means that the mutation has been adopted by the sequence in question. For example, using PAM 250 scoring matrix means that about 250 mutations per 100 amino acids may have happened, while with PAM 10 only 10 mutations per 100 amino acids are assumed, so that only very similar sequences will reach the useful alignment scores.
BLOSUM: it is used for the sequence alignment of proteins. For exampled, BLOSUM80 is used for less divergent alignments and BLOSUM45 is used for more divergent alignments. BLOSUM has proved better at scoring distantly related sequences.
3) Tree Construction:
There are three main methods for constructing a tree.
a) Distance Methods: which is useful for recognizable similarity. Transformed Distance Methods, UPGMA, Least Square Methods, ClustalW, Neighbor joining methods, Distnj in Protml package and Darwin methods.
b) Character Based Methods: It involves the Protein Maximum likelihood, Protml, and Puzzle (a heuristic method much faster than protml.
c) Protein Maximal parsimony: This is used for strong similarity. Protpars and Paup methods are used.
4) Tree Evaluation:
To determine the reliability of the generated tree, there are two strategies which are used to evaluate the tree.
a) Bootstrap: A statistical technique that can use random re-sampling of data to determine sampling errors for tree topologies.
b) Jackknife: Replicates are creating by dropping one or more sites within each replicate.
Softwares for Phylogenetics:
1) CLUSTALX
2) PHYLIP (Phylogenetic Inference Package)
3) BioEdit
4) MEGA
5) TreeView and PAUP
Comments
Post a Comment