Lecture 9
Bioinformatics
Describe content of Genbank & how sequences are annotated within Genbank
Perform a BLAST search w/ a DNA sequence & interpret the results
PCR, part 1
Understand the theory of PCR & how it compares to DNA replication in vivo
Know the reagents & steps of PCR & their purposes
Bioinformatics
Bioinformatics is a term used to describe the databases & computational tools used in biology, w/ an emphasis on molecular biology.
A biological database is a large, organized body of data, usually associated w/ computerized software designed to update, query, & retrieve the data stored within the system.
Bioinformatics databases
There are many different types of databases:
DNA databases (nucleotide sequence)
Protein databases (sequence, structure)
Organism specific databases
There are three main nucleotide databases that form the International Nucleotide Sequence Database Collaboration: GenBank, EMBL, & DDBJ. These databases exchange information routinely.
Protein databases include Swiss-Prot, PDB, & sequences that were created from translations of coding regions in DNA sequences stored in GenBank.
GenBank
GenBank serves as a public access repository for all available DNA sequences. Submitting scientists retain complete editorial control over their sequences & contact NCBI (see below) if they wish to make any modifications to their sequence records. GenBank sequence records are owned by the original submitter & can not be altered by a third party.
Each submitted sequence in GenBank has a unique “accession number” Unique # given to each sequence.
GenBank can include redundant entries, even hundreds of records for the same gene or variants of the same gene, & some entries may contain errors in their sequence data.
GenBank is managed by NCBI (National Center for Biotechnology Information) which is a part of the US National Library of Medicine.
Looking for Similar Sequences
An important goal of genomics is to determine if a particular sequence is similar to other known sequences.
This is accomplished by comparing the new sequence w/ sequences that have already been reported & stored in a database.
This process uses alignment procedures to uncover the “like” sequences in the database.
Blast
BLAST (Basic Local Alignment Search Tool) developed by Altschul et al. (1990. J Mol Biol 215:403).
BLAST is a set of algorithms that attempt to find a short fragment of a query sequence (the sequence you input) that align (match) w/ a subject sequence found in a database.
BLASTn alignment result
PCR
The Polymerase Chain Reaction (PCR) provides an extremely sensitive means of amplifying relatively large quantities of short copies of DNA
First described in 1985, Nobel Prize for Kary Mullis in 1993. The technique was made possible by the discovery of Taq polymerase, the DNA polymerase I that is used by the bacterium Thermus aquaticus that live in hot springs.
The primary materials, or reagents, used in PCR are:
1. DNA nucleotides (dNTPs), the building blocks for the new DNA
2. Template DNA, includes DNA sequence that you want to amplify
3. short DNA Primers, which hybridize to sequences surrounding the region to be amplified
4. DNA polymerase, a heat stable enzyme that catalyzes the synthesis of new DNA
5. Mg2+ (cofactor for DNA polymerase) and buffer for correct pH
When DNA is replicated in a bacteria in vivo, ?
A. DNA polymerase creates a new DNA strand in the 3’ to 5’ direction
B. a primase creates a short RNA strand that is complementary to a region on the template DNA strand
C. DNA polymerase begins synthesis of a new DNA strand at the promoter
primers--small pieces of RNA, about 5-10 necleotides long. Made from a form of RNA polymerase called primase.
Primase, unlike DNA polymerase, doesn't require a pre-existing nucleotide for synthesis of RNA primer. The primase synthesizes a short piece of RNA that is complementary to the template DNA strand and form H-bonds with it. This initiates DNA polymerase starting