- Genome refers to the complete set of genes or genetic material present in a cell or organism while genomics is the study of genomes.
- Genomic studies are characterized by simultaneous analysis of a large number of genes using automated data gathering tools.
- Genomics is a discipline in genetics that applies recombinant DNA, DNA sequencing methods, and bioinformatics to sequence, assemble and analyze the function and structure of genomes.
- The advent of genomics and the ensuing explosion of sequence information are the main driving force behind the rapid development of bioinformatics today.
The genomic study can be tentatively divided into structural genomics and functional genomics.
- Structural genomics refers to the initial phase of genome analysis, which includes the construction of genetic and physical maps of a genome, identification of genes, annotation of gene features, and comparison of genome structures.
- Functional genomics is the study of how genes and intergenic regions of the genome contribute to different biological processes. The goal of functional genomics is to determine how the individual components of a biological system work together to produce a particular phenotype. Functional genomics focuses on the dynamic expression of gene products in a specific context, for example, at a specific developmental stage or during disease.
Comparison of whole genomes from different organisms is comparative genomics, which includes the comparison of gene number, gene location, and gene content from these genomes. The comparison helps to reveal the extent of conservation among genomes, which will provide insights into the mechanism of genome evolution and gene transfer among genomes.
Methods in Genomics
- Genome mapping is a process of identifying relative locations of genes, mutations or traits on a chromosome.
- It involves assigning/locating of a specific gene to particular region of a chromosome and determining the location of and relative distances between genes on the chromosome.
- Linkage maps show the arrangement of genes and genetic markers along the chromosomes as calculated by the frequency with which they are inherited together.
- Physical maps represent chromosomes and provide physical distances between chromosomal landmarks ideally measured in nucleotide bases.
- Genome sequencing is figuring out the order of DNA nucleotides, or bases, in a genome—the order of As, Cs, Gs, and Ts that make up an organism’s DNA.
- Sequencing an entire genome (all of an organism’s DNA) is a complex task. It requires breaking the DNA of the genome into many smaller pieces, sequencing the pieces, and assembling the sequences into a single long “consensus.”
- The rapid speed of sequencing attained with modern DNA sequencing technology has been instrumental in the sequencing of complete DNA sequences, or genomes, of numerous types and species of life, including the human genome and other complete DNA sequences of many animal, plant, and microbial species.
Genome Sequence Assembly
- Initial DNA sequencing reactions generate short sequence reads from DNA clones. The average length of the reads is about 500 bases. To assemble a whole genome sequence, these short fragments are joined to form larger fragments after removing overlaps. These longer, merged sequences are termed contigs, which are usually 5,000 to 10,000 bases long.
- A number of overlapping contigs can be further merged to form scaffolds (30,000–50,000 bases, also called supercontigs), which are uni-directionally oriented along a physical map of a chromosome. Overlapping scaffolds are then connected to create the final highest resolution map of the genome.
- Correct identification of overlaps and assembly of the sequence reads into contigs need computational tools.
Phred, Phrap, VecScreen. TIGR Assembler, ARACHNE are few commonly used assembly programs.
- Before the assembled sequence is deposited into a database, it has to be analyzed for useful biological features. The genome annotation process provides comments for the features.
- This involves two steps: gene prediction and functional assignment which both may be accomplished by bioinformatics tools.
- There is a need to standardize protein functional descriptions since a problem arises when using existing literature because the description of a gene function uses natural language, which is often ambiguous and imprecise.
- Therefore, this demand has spurred the development of the gene ontology (GO) project, which uses a limited vocabulary to describe molecular functions, biological processes, and cellular components.
- Databases searchin1g using GO for a particular protein can easily bring up other proteins of related functions in much the same way as using a thesaurus. Using GO, a genome annotator can assign functional properties of a gene product at different hierarchical levels, depending on how much is known about the gene product.
Whole Genome Alignment
- With an ever-increasing number of genome sequences available, it becomes imperative to understand sequence conservation between genomes, which often helps to reveal the presence of conserved functional elements.
- This can be accomplished through direct genome comparison or genome alignment. The alignment at the genome level is fundamentally no different from the basic sequence alignment.
- Xiong J. (2006). Essential Bioinformatics. Texas A & M University. Cambridge University Press.
- Arthur M Lesk (2014). Introduction to bioinformatics. Oxford University Press. Oxford, United Kingdom