Next-Generation sequencing is the laboratory technology that broadly captures several other technologies that enable massively parallel sequencing technologies and offer ultra-high-throughput along with scalability and speed. Targeted regions of DNA or RNA or the order of nucleotide in the whole genome are determined by the use of this technology.
- Labs are now able to perform a variety of applications and study a wide range of biological systems at a level that was not even possible. This is all possible because of the revolution in the technology of the Next Generation.
- Researchers are able to cover a wide range of genome sequencing and data analysis tool with a rapid decrease in cost due to continuous advancement in the NGS or DNA and RNA sequencing technologies.
- Next-Generation sequencing has filled the gaps and has solved the complex genomics riddles.
- As compared to the Sanger sequencing, this technology allows the sequencing of the DNA and RNA more efficiently and quickly.
- Short-reads and long-reads are the two main approaches in NGS technology.
- NGS technology is used for the diagnosis of various disorders in the clinical area by the identification of germline and somatic mutations.
- NGS technology is also used for the study of metagenomics and various infectious diseases and how to treat those diseases.
Principle of Next-Generation Sequencing (NGS)
The principle of Next-Generation sequencing relies on capillary electrophoresis and is similar to that of the Sanger sequencing method. During this, at first genomic strands are fragmented and emitted signals are used for the identification of each fragment which is ligated against a template strand. The NGS uses array-based sequencing to process millions of reactions in parallel and is the combination of technologies in Sanger sequencing.
Steps in Next-Generation Sequencing (NGS)
Next-generation sequencing is mainly performed in three steps.
- Library preparation: using the random fragmentation of DNA and the following ligation with custom linkers library is prepared.
- Amplification: PCR and clonal amplification methods are used for the amplification of the library.
- Sequencing: there are different sequencing methods, and sequencing is done using any one of the methods.
During this process, the DNA strands are fragmented into small strands by the use of enzymes and other processes like sonication (excitation using ultrasound). Then these small strands are ligated with the adaptors (pieces of short double-stranded DNA) with the help of the DNA ligase enzyme (an enzyme that joins the DNA strands). The adaptor molecule binds with the complementary strand. Adaptors are synthesized so that they have both sticky as well as blunt ends so that the blunt end can be ligated with the blunt end of the DNA. There can arise a problem of self-binding with the adaptors. So to solve this problem the adaptor 5’-P (phosphate) sticky end is removed and replaced with the 5’-OH group, because of which DNA ligase is unable to form the bridge between the two termini. If there was the 5’-P group in the adaptor, then there is a high chance of binding of two adaptors since ligation takes place between the 3’-OH and 5’-P end.
The library fragments need to be clustered in PCR colonies for the sequencing to be successful, as they consist of many copies of particular library fragments. Since these colonies are attached to the planar fashion, they can be manipulated enzymatically in a parallel way.
The signal received from the sequencer must be strong enough to be detected accurately, and for this purpose, library amplification is required. ‘biasing’ and duplication can occur with enzymatic amplification. Several types of amplification processes use PCR to create a large number of DNA clusters.
- Emulsion PCR
- Bridge PCR
An emulsion is formed by the mixture of Emulsion oil, beads, PCR mix, and the library DNA. Each microwell should contain one bead with one strand of DNA for the success of the sequencing process. denaturation of the fragment strand into two separate strands is done by the PCR. Among them, one of the strands anneals with the beads. Amplification of the annealed DNA is performed by the polymerase starting from the bead towards the primer site. The original reverse strand is denatured and is released from the bead so as to bind with the bead again to give two separate strands. Clusters of DNA are formed as this process is continued for 30- 60 cycles.
In bridge PCR, primers densely coat the surface of the flow cell that is complementary to the primer attached to the fragments of the DNA library. The attached DNA to the surface of a cell is exposed to reagents for the polymerase-based extension. The single-stranded DNA-free ends attach themselves to the surface of the cell with the help of primers, creating the bridge structure. Then the enzymes react with the bridge to form double strands.
After the amplification process, sequencing is performed. Several different sequencing techniques are used during the Next-Generation sequencing process.
Types of Next-Generation Sequencing (NGS)
There are several next-generation sequencing methods that have been developed by several companies. They include:
Pyrosequencing is the type of next-generation sequencing method wherein a presence of the polymerase enzyme, a complementary strand, is synthesized and is based on the ‘sequencing by synthesis’ principle. During this process, this sequencing method detects the release of pyrophosphate when nucleotides are added to the DNA chain. Pyrosequencing constructs the polonies required for sequencing by the use of the emulsion PCR technique and removes the complementary strand. Four different dNTPs are then made to flow in and out of the wells over the polonies only after the hybridization of the ssDNA sequencing primer to the end of the strand.
Then the pyrophosphate is released when the strands incorporate the correct dNTP enzymatically. After this, pyrophosphate is converted into ATP in the presence of ATP sulfurylase and adenosine. The released ATP is used for the conversion of luciferin to oxyluciferin, which produces light that can be detected by the camera. The higher the addition of the bases, the higher the relative intensity of light.
Pyrosequencing was one of the earliest successful Next-Generation sequencings to be developed by 454 life sciences. However, other technologies came into existence, and the 454 pyrosequencing platform was discontinued.
Ion Torrent semiconductor sequencing
Ion Torrent semiconductor sequencing is a sequencing technique that uses the ‘Sequencing by Synthesis’ approach in which a new strand is synthesized one base at a time that is complementary to the target strand. During DNA polymerization, hydrogen ion production is detected by a semiconductor chip. Similar to pyrosequencing, the polonies are formed by using emulsion PCR in which DNA fragment is flooded with dNTP. If the target strand is complementary to the new strand, then dNTPs are incorporated. Each time a hydrogen ion is released when the nucleotide is successfully added, and these ions are detected by the sequencer’s pH sensors. Similar to pyrosequencing, the larger the hydrogen ions, the higher the pH intensity.
Ion torrent sequencing is faster and cheaper than other methods as it is the first technique that does not use fluorescence and camera scanning. But the number of identical bases added consecutively is difficult to enumerate.
Sequencing by ligation (SOLiD)
Sequence by ligation is the enzymatic method of sequencing that uses the enzyme to ligate the double-stranded DNA called DNA ligase. ssDNA primer-binding regions which have been conjugated to a targeted sequence on a bead are amplified or immobilized by using emulsion PCR. Then after this, on a glass surface, the beads are then deposited, and the throughput of the technique can be increased by achieving a high density of beads. After the deposition of the bead, to the adapter, a primer of length N is hybridized. The probes have different fluorescent dyes at the phosphate end, and the hydroxyl ends in which the beads are exposed. To the target sequence that is adjacent to the primer, hybridization takes place with the complementary strands only. 8-mer probe to the primer is joined with the DNA ligase. The fluorescent dye is cleaved from the fragment using silver ions that allow phosphorothioate linkage between bases 5 and 6. The cleavage also generates a 5’ phosphate group which can be further ligated. The extension product is melted once the first round of sequencing is completed, and then the second round of sequencing is performed using the short primers.
The SOLiD technique is highly accurate and also inexpensive as it uses a two-base sequencing method. It generally takes seven days to complete a single run, and during that period of time, it can produce 30 GB of data. As compared to other methods, its read length is short, making it unsuitable for various applications.
The Illumina next-generation sequencing enables the identification of single bases of the DNA strand and is based on sequencing by synthesis and reversible dye terminator method. Illumina next-generation sequencing can be used for whole-genome sequencing, target region sequencing, transcriptome analysis, metagenomics, small RNA discovery, methylation profiling, and genome-wide protein-nucleic acid interaction analysis.
The workflow of Illumina’s next-generation sequencing includes:
- Library preparation
DNA fragments with a length of 200-500 base pair in length from the genomic DNA is fragmented by the use of ultrasonic fragmentation. ‘tagmentation’ combine the fragmentation of the 5’ and 3’ adapter to the end of these small segment and the efficiency of the library preparation is greatly increased by a ligation reaction. Then the amplification of the ligated- adapter fragments is performed by the use of PCR and is gel purified. Finally, the sequence library is constructed.
- Cluster generation
The flow cell is a core sequencing reactor vessel where all of the sequencings happens and is a channel for absorbing mobile DNA fragments. When the DNA fragments in the sequencing library pass through the flow cell, they attach to the lane on the surface of the flow cell. The flow cell can absorb the DNA after the building as they can match the adapters added at the end of the DNA fragments in the building process, in which the lanes of the flow cell have the number of the adapters attached to the surface already. There are eight cells present in each flow cell and can support the bridge PCR amplification on the surface of DNA. Theoretically, there is no interaction between the cells.
After the continuous amplification and mutation process and using the adapters in the flow cell surface as the template, Bridge PCR is performed. Each flow cells contain many copies of the DNA template, eventually clustering each DNA fragment into bundles at their respective locations. During this process, the signal intensity of the bases is amplified to meet the signal requirements for sequencing. After the generation of the clusters, it further moves for the sequencing process.
It is based on the ‘sequencing by synthesis method. Different enzymes, primers, and dNTPs were added for the sequencing process to the reaction system which includes DNA polymerase, connector primers, and 4 dNTP with base-specific fluorescent markers. The 3’-OH region of these adapters adds only one base at a time during the sequencing process as they are chemically protected. After the completion of the synthesis reaction, all of the remaining unused free dNTP and DNA polymerase are eluted.
The fluorescence is excited by the addition of the buffer solution, and the excitation of the signals is performed by laser. Finally, the optical instrument is used to record the fluorescence signals. Now with the help of this computer analysis, the optical signal is converted into bases. During this process, a chemical reagent is added at a regular interval so that the next round of sequencing can be performed.
- Alignment and data analysis
The sequence reads that are obtained after the sequencing process are aligned to a reference genome. After the alignment, many variations of bioinformatics are possible SNP/CNV/SV calling, annotation and statistics, population genetics analysis, pathway enrichment analysis, and many more.
This is the Illumina next-generation sequencing chemistry overview.
Applications of Next-Generation Sequencing (NGS)
Innovative sample preparation and data analysis in next-generation sequencing enable a broad range of applications. It includes:
- NGS allows the labs to sequence the whole genome rapidly.
- Allows the researchers to sequence the target regions deeply.
- Helps to study the microbiome related to humans.
- Novel pathogens identification.
- Study the rare somatic variants, tumor subclones, and more by sequencing cancer samples.
- Epigenetic factors such as genome-wide DNA methylation and DNA protein interactions can be analyzed by the researchers in a lab by next-generation sequencing.
- Novel RNA variants and splice sites can be discovered by the RNA sequencing method.
- It has helped researchers to collect vast quantities of genomic sequenced data.
- Understanding and diagnosis of complex diseases.
- Understanding the expression of altered genetic variants and their effects on organisms.
- 90% of the mutation in the human genome is thought to be caused by mutations in the exome, which leads to diseases. Thus this technique helps in the sequencing and understanding of the exome patterns.
- DNA next-generation sequencing is used for ‘gene therapy’ by identifying the isolated genes and providing the correct copy of that defective gene.
- It also helps to study the genes that cause cancer and design the gene to kill cancer cells, also called the “suicide gene therapy’ method.
- Revolutionize the diagnostic stage of personalized medicine.
Advantages of Next-Generation Sequencing (NGS)
- Low-frequency variants can be detected with a higher sensitivity rate.
- Higher sample volumes have a faster turnaround time.
- Comprehensive genomic coverage.
- Ability to detect the lower limit of samples.
- Hundreds and thousands of genes or gene regions can be sequenced simultaneously.
Limitations of Next-Generation Sequencing (NGS)
- The main disadvantage of next-generation sequencing, it requires different infrastructures such as computer capacity and storage and staff for analyzing and interpreting the subsequent data.
- Most scientists face problems when processing data as they are limited by the ability to process the raw, sequenced data.
- This technology is still in the phase of development and modification to make sequencing effective and quicker.
Some famous Next-Generation Sequencers
Third generation sequencing
Third-generation sequencing is the technique brought up, removing the need for clonal amplification, using single-molecule sequencing and single real-time sequencing. It gives a higher read length using higher throughput platforms, and it also reduces the error caused by the PCR. It also simplifies library preparations. Some of the examples of third-generation sequencing include Pacific Bioscience Platform and Helicose Bioscience which uses single-molecule real-time sequencing to give read lengths of around one thousand bases and does not require amplification prior to sequencing.
Bisulfite sequencing is the reactivity difference of 5-methyl cytosine and cytosine with respect to bisulfite. Deamination of cytosine is brought by the bisulfite to form the uracil, whereas 5-methyl cytosine is unreactive. After the treatment of the double-stranded DNA with the bisulfite, the strands can be treated as single-stranded DNA as the strands are no longer complementary.
References of Next-Generation Sequencing (NGS)
- Barzon L, Lavezzo E, Militello V, Toppo S, Palù G. Applications of next-generation sequencing technologies to diagnostic virology. Int J Mol Sci. 2011;12(11):7861-84.