- It was Wilhelm Johannsen, a Danish botanist, who first coined the term “gene” in 1909.
- A gene is a section of chromosomal DNA that is transcribed into a functional RNA molecule and then translated into a functional protein.
- A gene can also be referred to as the region between the start and stop codons, i.e., ‘open reading frame’ (ORF).
- It serves as the fundamental physical and functional unit of heredity, carrying genetic information from one generation to another.
- Locus (plural loci) is a position or place on a chromosome occupied by a particular gene or one of its alleles in all species.
- A locus can be a coding sequence, a regulatory region, or other regions.
- Any pair of homologous chromosomes can contain a specific locus.
- The entire sequence of genes and bases is called a genome.
- All humans are 99.9% identical across their human genome, with only minor genetic differences leading to visible morphological differences.
- Mutations, inheritance patterns, development, quantitative traits, evolution, and biochemical processes are all linked to genes.
Composition of Gene
An entire gene is made up of four distinct nucleotide bases, which are the fundamental units of DNA.
The four bases that can be found in DNA’s nucleotides are:
- Two purines
- Adenine (A)
- Guanine (G)
- Two pyrimidines
- Cytosine (C)
- Thymine (T)
- Nucleotides are connected in the polymer through their phosphate groups, while the bases interact with each other through hydrogen bonding.
- The base pairing is specific where A interacts with T, and C interacts with G.
- DNA is normally double-stranded, with two linear strands aligned such that their bases are facing one another.
- This alignment is specific due to base pairing, ensuring that each DNA strand is a perfect complementary copy of the other.
Types of Genes
There are five major types of genes. They are:
1. Complementary Genes
- For complementary genes to result in a certain phenotype, two dominant genes must cooperate.
- The desired phenotype can only be observed when both genes are present.
- Both the dominant gene and recessive gene cannot function properly on their own.
- Both dominant genes complement each other.
- Examples of complementary genes include Mendelian gene interaction.
2. Supplementary Genes
- It consists of two genes.
- One dominant gene can express itself independently, and the second gene has the potential to express itself, but to do that, it must be paired with the first gene.
- The combination of the two genes can result in the expression of a completely different trait or phenotype.
Mating of two mice with one black gene and one albino gene.
The albino mouse cannot generate a colored coat on its own, but when bred with the black mouse, the offspring are expressed coat color is neither black nor white but a new shade of brown.
3. Duplicate Genes
- It occurs when two genes, dominant or recessive, express themselves similarly.
- One does not require the other to express a particular phenotype because each independently expresses themselves regardless of the other.
- Duplicate clotting factor genes or insulin-producing genes cause serious adverse health complications.
4. Polymeric Genes
- It is also known as additive genes.
- It is similar to duplicate genes in that they have an additive or compounding influence on each other.
- However, polymeric genes do not necessarily have a pair of genes that express similarly.
An entirely new squash shape known as the discoid is resulted by crossing the two existing squash shapes, cylindrical and spherical.
5. Sex-linked Genes
- These genes affect the X or Y chromosomes, which determine sex and how certain traits are inherited based on sex.
- Males possess the XY sex chromosome, while females possess the XX sex chromosome.
- Males can only inherit Y-linked traits because they are the only ones who have Y chromosomes.
Even if the genes for color blindness are recessive, a mother (XX) who is colorblind and a father (XY) who is not can have a colorblind kid.
This is due to the son’s single X chromosome. Therefore, it is likely the recessive genes will express.
Each functional region or component of a gene has a specific role in one aspect of the gene expression process.
The main functional regions of a gene are:
- Promoter region
- Coding region
- Termination Sequence
1. Promoter region
- A promoter is a short region of DNA (100 – 1,000 bp) where RNA polymerase starts to transcribe a gene.
- Normally, it is found at the 5’ end of the transcription initiation site.
- Within promoter regions are DNA sequences known as response elements, which offer RNA polymerase and transcription factors a stable binding site.
- In comparison to eukaryotes, promoter binding in bacteria is significantly different.
- In bacteria, the core RNA polymerase needs a sigma factor to recognize and attach to the promoter.
- RNA polymerase II, an RNA polymerase that is unique to eukaryotes, must bind to a promoter for the process to occur in eukaryotes, where it requires at least seven transcription factors.
Three main portions make up a promoter:
- Core promoter
It includes the RNA polymerase binding site, TATA box, and transcription start site (TSS).
- Proximal promoter
It is the site where transcription factors bind and contain many primary regulatory elements.
- Distal promoter
It includes transcription factor binding sites but mainly contains regulatory elements.
2. Coding region
- Genes are structured into contiguous sections of coding and non-coding sequences called exons (coding) and introns (non-coding), respectively.
- The coding region of a gene is the section of the gene that is subsequently transcribed and translated into protein, i.e., the total of its exons.
- It contains a linear sequence of nucleotides that codes for the amino acid sequence of the protein.
- The genetic code is written in triplet form, meaning that each set of three nucleotides codes for a single amino acid.
- The 64 triplets formed by four nucleotides create 20 distinct standard amino acids for making proteins.
- Due to the degenerative nature of the genetic code, numerous alternative triplets can be used to encode some amino acids.
- Most eukaryotic genes do not have continuous coding regions and usually comprise a very small percentage of the entire gene.
- Just 5% of the human genome is made up of coding regions.
3. Termination Sequence
- The last region of a gene follows the promoter and coding region.
- This region is not typically modified significantly to alter gene expression.
- The termination sequence signals the RNA polymerase molecule to stop transcription when it reaches the end of the gene.
- The absence or inefficiency of this region may cause RNA polymerase to continue the transcription of more genes and can result in the expression of other genes and the production of their proteins which may not be required.
- There are 23 pairs of chromosomes in each human cell and hundreds of thousands of distinct genes on each chromosome.
- Every person has two copies of each gene, one inherited from each parent.
- An individual inherits two versions of each gene, known as alleles, from each parent.
- Alleles are forms of the same gene with small differences in their sequence of DNA bases that controls a heritable trait.
- The chromosomal DNA first must be transcribed into RNA, and the RNA is then processed to be translated into protein for an effective gene expression.
- Organizing the chromosomal DNA into the proper higher-order chromatin structure is essential for controlling the expression of genes in the cell.
- The feed-forward and feedback controls resulting from sets of dependent pathways coordinate the cell cycle and cellular metabolism.
- The timing, location, and frequency of a particular gene’s transcription are all regulated by molecular signals.
- These signals are frequently triggered by environmental factors or signals from other cells, which control the expression of numerous genes via a single regulatory pathway.
- Post-transcriptional modifications, limiting the number of mRNAs that continue to translation, and only permitting certain mRNAs to be translated when and where they are required are other ways that cells control gene expression.
- Similarly, through epigenetic mechanisms, such as DNA folding, histone acetylation, and methylation (i.e., chemical modification of the nucleotide bases), cells control the expression of certain genes.
- Genetic variations in the target genes and the changes seen in the translated cellular regulatory proteins are known to influence the mechanism.
- In 1979, the Human Gene Nomenclature Committee was founded, and the first set of standards for human gene nomenclature was published.
- The Nomenclature Committee was put under the supervision of the Human Genome Organization (HUGO) in 1989 and thus named HUGO Gene Nomenclature Committee (HGNC).
- The HGNC has identified over 40,000 human loci, of which approximately half are protein-coding genes.
- The online HGNC database contains all approved human gene symbols.
HUGO Gene Nomenclature Committee (HGNC) guidelines for naming protein-coding, RNA genes, and pseudogenes are as follows:
- Each gene is assigned a unique symbol, HGNC ID, and descriptive name.
- Symbols contain only uppercase Latin letters and Arabic numerals.
- Symbols should not be the same as commonly used abbreviations
- Nomenclature should not contain a reference to any species or “G” for a gene.
- Nomenclature should not be offensive or pejorative.
The cystic fibrosis transmembrane conductance regulator is a gene on chromosome 7 linked to cystic fibrosis; its symbol is CFTR.
The Human Genome Project (HGP)
- The Human Genome Project was a large, well-organized, and highly collaborative international effort that generated the first sequence of the human genome, which was carried out from 1990–2003.
- The Human Genome Project’s (HGP) primary objectives were to map the locations of the estimated 100,000 human genes and to identify the human DNA sequence.
- Sanger DNA sequencing was the method employed for DNA sequencing.
- The Human Genome Project’s cost is estimated at around $3 billion.
- The sequencing of the human genome involved researchers from 20 separate universities and research centers across the United States, United Kingdom, France, Germany, Japan, and China.
- The groups in these countries became known as the International Human Genome Sequencing Consortium.
- The two initial human genome papers in 2001 reported 30,000 – 40,000 and 26,588 protein-coding genes, respectively.
- Similarly, a paper on a complete draft of the genome, published in the year 2004, estimated that a comprehensive catalog would contain 20,000 – 25,000 protein-coding genes.
- In 2022, the Telomere-to-Telomere (T2T) consortium announced the first truly complete 3.055 billion base pair sequence of a human genome.
Genes impact on Health
All cellular, biochemical, physiological, and morphological characteristics of a human being are influenced by genetic variation. Gene modifications can result in improperly formed proteins being unable to carry out their intended functions. These are referred to as mutations, which can result in genetic disorders.
Genetic disorders can be:
1. Chromosomal disorder
This type impacts the cell’s DNA and gene-holding structures (chromosomes).
People who have these diseases lack or have extra copies of chromosomal material.
2. Complex (Multifactorial) disorder
Gene mutations and other factors contribute to this disorder.
They include chemical exposure, radiation exposure, diet, certain medications, and tobacco or alcohol use.
3. Single-gene (Monogenic) disorder
This type occurs from a single gene mutation.
- Chromosomal disorders
- Fragile-X syndrome
- Klinefelter syndrome
- Triple-X syndrome
- Turner syndrome
- Down syndrome (Trisomy 21)
- Trisomy 1
- Trisomy 13
- Complex (multifactorial) disorders
- Late-onset Alzheimer’s disease
- Autism spectrum disorder
- Coronary artery disease
- Migraine headaches
- Spina bifida
- Isolated congenital heart defects
- Single-gene (Monogenic) disorders
- Cystic fibrosis
- Deafness that’s present at birth (Congenital)
- Duchenne muscular dystrophy
- Familial hypercholesterolemia
- Hemochromatosis (Iron overload)
- Neurofibromatosis type 1 (NF1)
- Sickle cell disease
- Tay-Sachs disease
Similarly, there are some rare genetic disorders which include:
- AA amyloidosis
- Adrenoleukodystrophy (ALD)
- Ehlers-Danlos syndrome
- Mitochondrial diseases
- Usher syndrome
- Addgene. Promoters. Accessed from: https://www.addgene.org/mol-bio-reference/promoters/
- Altenburg E. (1965). Genes. In Genetics (Revised Edition). Holt, Rinehart and Winston, Inc., pp. 13-14
- Bruford, E. A., Braschi, B., Denny, P., Jones, T., Seal, R. L., & Tweedie, S. (2020). Guidelines for human gene nomenclature. Nature genetics, 52(8), 754–758. https://doi.org/10.1038/s41588-020-0669-3
- Cleveland Clinic. (2021). Genetic Disorders. Accessed from:
- Dale J. W., Schantz M., & Plant N. (2012). From Genes to Genomes. Third Edition. John Wiley & Sons, Ltd., West Sussex, UK.
- Genflow. (2021). Types of Genes: What Are They & What Do They Do?
Accessed from: https://genflowbio.com/types-of-genes/
- Hernandez, L., & Blazer, D. (2006). Genetics and Health: Genes, Behavior, and the Social Environment: Moving Beyond the Nature/Nurture Debate. National Academies Press (US), Washington (DC).
Accessed from: https://www.ncbi.nlm.nih.gov/books/NBK19932/?report=reader.
- International Human Genome Sequencing Consortium. (2001). Initial Sequencing and Analysis of the Human Genome. Nature. 409(6822). 860–921. https://doi.org/10.1038/35057062
- International Human Genome Sequencing Consortium. (2004). Finishing the Euchromatic Sequence of the Human Genome. Nature. 431(7011). 931–945.
- MedicalNewsToday. (2022). What are genes and why are they important?
Accessed from: https://www.medicalnewstoday.com/articles/120574
- MedlinePlus. (2021). What is a gene?
Accessed from: https://medlineplus.gov/genetics/understanding/basics/gene
- National Human Genome Research Institute. Human Genome Project: Fact Sheet. Accessed from: https://www.genome.gov/about-genomics/educational-resources/fact-sheets/human-genome-project
- Plant and Soil Science eLibrary. (2022). Termination Sequence. Accessed from:
- Polyak K, Meyerson M. Overview: Gene Structure. (2003). In: Kufe DW, Pollock RE, Weichselbaum RR, et al., editors. Holland-Frei Cancer Medicine. 6th edition. Hamilton (ON): BC Decker. Available from: https://www.ncbi.nlm.nih.gov/books/NBK12983/
- Rosalind. Gene coding region. Accessed from: https://rosalind.info/glossary/gene-coding-region/
- Salzberg, S.L. (2018). Open questions: How many genes do we have?. BMC Biology. 16(94). https://doi.org/10.1186/s12915-018-0564-x
- University of Nebraska. (2001). Gene regions. Accessed from: https://agbiosafety.unl.edu/education/gene.htm
- Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, et al. (2004). The Sequence of the Human Genome. Science. 291(5507). 1304–1351. https://doi.org/10.1126/science.1058040
- Verma P.S. and Agarwal V.K. (2005). Cell Biology, Genetics, Molecular Biology, Evolution and Ecology. Multi-color Edition. S. Chand & Company Ltd. Ram Nagar, New Delhi
- Young J. (2009). Genes: An Open Access Journal. Genes, 1(1), 1–3.