Bioinformatics is the combination of two words Bio and Informatics, collectively called hybrid science. It combines most topics from life science to physics, mathematics, and computer science. One should have information related to genetics, genomics, biology, microbiology, biochemistry, biotechnology, nanotechnology, and physics to get into bioinformatics. Within half a decade this will be one of the emerging topics which will be required in every sector from the Pharmaceutical industry to the medical industry and other food industries.
- Bioinformatics means generating data from different open sources, collecting, and finally storing biological pieces of information.
- Bioinformatics is the use of computational tools to arrange, analyze, understand, compute, visualize and store information associated with proteomics, genomics, metagenomics, transcriptomic, epigenetics, and biological macromolecules.
- Bioinformatics is also used for Drug Discovery (identifying the targeted binding site for repurposing of novel drugs) and Personalized Medicine (identifying the mutated base pairs and regions in the DNA sequences).
- Bioinformatics is the use of computers to extract and arrange biological data such as amino-acid sequences, and DNA or annotation about those sequences. Bioinformatics compares the sequences of genes, RNA, Proteins, or DNA within the same organism or of different organisms to study the evolutionary pattern and to study the function of the protein and the DNA of two organisms.
The most important part of it is to know where the data may be located and how the data is extracted. Besides this, open-source software could be useful for computing and analyzing data.
What are Databases?
Extraction of the data is done from the different open-source platforms like NCBI (National Centre for Biotechnology Information), DDBJ (DNA Data Bank of Japan), and EMBL (European Bioinformatics Institute. These are all the sources for the DNA database. Moreover, there are other databases from which data related to RNA, proteins, genomes, and others can be extracted.
A database is a platform where data or information arranged in structures is stored electronically on the computer. Also, these data or files or information are updated over a period of time.
The database is one of the important systems to properly retrieve, store, and search any type of data.
Types of Biological Databases
There are three different types of biological databases on the basis of data storage.
1. Primary Database:
- It can also be called an abbreviated or original database since the data are originally derived from the primary source and is the results of the scientists performing experiments. These experimental data can be genome structure, macromolecular structure, the structure of the protein, or data of different metabolites and Biomarkers.
- These obtained data are made accessible to all other users without any changes and modifications.
- These data are given a certain kind of accession id number while entering into the database. Users type entered accession id and get the detailed information of their search result.
Examples of a primary database:
- Nucleic Acid Database: GenBank, DDBJ.
- Protein Database: PDB, PIR, SwissProt, TrEMBL, Metacyc, etc.
2. Secondary Database:
- It is also called a derivative database and secondary databases are built from the primary database.
- Data of the primary database are analyzed and stored as a secondary database with the help of computer algorithms.
- Secondary data are more valuable than primary databases as it is the data that is stored by analyzing it in a series of steps.
Examples of a secondary database:
- Prosite, Prints, Blocks
- InterPro (domains, motifs, and protein family)
- UniProt Knowledgebase (sequence and functional information present on protein)
3. Composite Database:
- A composite database consists of various primary databases, and primary databases are merged together on the basis of certain conditions.
- From this database, sequences can be searched quickly.
Examples of the composite database:
- NRD, OWL, and Swissport + TREMBL
List of Biological Databases
There is a lot of different database in NCBI (National Centre for Biotechnology Information) for Bioinformatics. Some of them are listed below:
- 1000 Genome Browser:
The 1000 Genome Browser is a graphical viewer web browser that allows individuals to explore genotype calls, variant calls and aligned sequence reads that have been uploaded by the 1000 genome project.
The Basic Local Alignment Search Tool is one of the most used search tools for the analysis of sequences. It finds regions of similarity between the biological sequences. This software program compares different protein or nucleotide sequences with the sequences available in the database and calculates the significance of their similarity. BLAST can also be used to determine the evolutionary relationship between the sequences.
It stands for Database for Single Nucleotide Polymorphism. It includes multiple single nucleotide variations, microsatellites, and other small insertions and deletions. dbSNP consists of genotype data, molecular context, and mapping information for both neutral variations and clinical mutations.
It is a database that provides free data on relationships among medical variants and phenotypes. It provides reports of health status and human variations along with evidence.
dbVar is a database of genomic structural variation and has been developed to obtain information on huge variations, including deletion, duplication, insertion, inversion, multinucleated substitutions, translocation, mobile element insertion, and complex chromosomal rearrangements. It also stores variants with phenotypic properties.
The database of Genotypes and Phenotypes provides information about the interaction of Genotype and Phenotype in humans. From this database, we can study medical resequencing, molecular diagnostic assay, genome-wide association, and the association between genotype and non-clinical traits.
A nucleotide is the part and building block of nucleic acids (RNA and DNA). Nucleotide data can be collected from several sources like RefSeq, GenBank, PDB (Protein Data Bank), and the Third Party Annotation (TPA) database. Nucleotide sequence data can be extracted from the above database.
The collection of sequences from different sources, which includes translations from annotated coding regions in RefSeq, GenBank, and TPA, as well as collections of sequences from SwissProt, PIR, PRF, and PDB, are protein database. Protein sequences help to determine the function and biological structure of a specific protein.
MedGen serves as a portal to organize information related to human medical genetics. MedGen is designed to be particularly useful to health care professionals for genetic aspects of patient care.
The genome has completely sequenced data of organisms and also the sequences which are in progress or not fully sequenced. The information in the genome includes maps, sequences, chromosomes, assemblies, and annotations. It represents data of organisms like bacteria, archaea, and eukaryotes, as well as plasmids, viroids, phages, and organelles.
Chromosomal localization, gene products, nomenclature, Protein Interaction, phenotype, interaction, links to citation, expression, maps, homologs, protein domain content, and external databases are the information contained in the gene database. It also includes RefSeqs (Reference sequence), pathways, and local-specific resources.
Additional literature Database for Bioinformatics
Following is the additional list of a literature database:
Pubmed is a search engine platform for getting access to Medical and health information. It consists of information from MEDLINE, online books, and life science journals with more than 32 million citations for biomedical literature. It consists of a full free text archive of biomedical and life science journals.
Embase is an Elsevier database. Although being a completely separate database from PubMed and MEDLINE, it consists of a complete article found on MEDLINE. It contains more than 32 million citations from 8,500 journals from at least 90 countries from 1947 to the present date from 2.4 million conference abstracts. It consists of drug and pharmaceutical journals too.
- BioMed Central:
BioMed Central is also a search platform that contains peer-reviewed high-quality journals in different titles like BioMed Central Biology, BioMed Central Medicine, and journals related to microbiome and Malaria. BMC has an evolving portfolio of open access journals, sharing innovation and research in technology, science, medicine, and engineering.
- PLoS Biology:
PLoS is also called as Public Library of Science. PLoS Biology is open access, peer-reviewed journal consisting of works from all the tracks of Biological Science with the areas of other disciplines like mathematics, chemistry, and medicine.
ScienceDirect is an online search website form where different papers related to medicine, microbiology, computational biology, and others can be searched. It gets access from the scholarly publisher Elsevier and its affiliates.
BioOne is the literature database that provides access to biological, ecological, and environmental science brought by the scientific community of non-profit communities.
- Web of Science:
Web of Science also called the web of knowledge, is a database of citations and abstracts from scholarly literature in social science, arts, science, and humanities. The web of science has a record of publications from 1900 to the present. While searching for a particular topic, one can access the citations and see how many people have cited the article or the research page. The benefit of using this platform is that it retrieves a number of articles from various disciplines and gives access to multiple papers.
These are a few literature search databases from which one can get access to different papers and articles of interest. Here, one can get publications related to Medicine, Biology, Microbiology, arts, humanities, etc.
- Bayat A. Science, medicine, and the future: Bioinformatics. BMJ. 2002 Apr 27;324(7344):1018-22. doi: 10.1136/bmj.324.7344.1018. PMID: 11976246; PMCID: PMC1122955.