Project 2: DNA Analysis
Due Dates:
Checkpoint 1
Final Due Date
1/7/14
1/12/14
10%
Students will write a program that uses arrays and files to analyze DNA sequences and determine if they represent proteins. Special thanks to Stuart Reges and Marty Stepp of UW for use of this assignment.
I.
Background
Deoxyribonucleic acid (DNA) is a complex biochemical macromolecule that carries genetic information for cellular life forms and some viruses. DNA is also the mechanism through which genetic information from parents is passed on during reproduction. DNA consists of long chains of chemical compounds called nucleotides. Four nucleotides are present in DNA: Adenine (A), Cytosine (C), Guanine (G), and
Thymine (T). Certain regions of the DNA are called genes. Most genes encode instructions for building proteins (they're called "protein-coding" genes). These proteins are responsible for carrying out most of the life processes of the organism. Nucleotides in a gene are organized into codons. Codons are groups of three nucleotides and are written as the first letters of their nucleotides (e.g., TAC or GGA). Each codon uniquely encodes a single amino acid, a building block of proteins.
The sequences of DNA that encode proteins occur between a start codon (which we will assume to be
ATG) and a stop codon (which is any of TAA, TAG, or TGA). Not all regions of DNA are genes; large portions that do not lie between a valid start and stop codon are called intergenic DNA and have other
(possibly unknown) function. Computational biologists examine large DNA data files to find patterns and important information, such as which regions are genes. Sometimes they are interested in the percentages of mass accounted for by each of the four nucleotide types. Often high percentages of
Cytosine (C) and Guanine (G) are indicators of important genetic data.
In this assignment, you will write a program the reads named nucleotide sequences from an input file and performs analysis on the sequences. You will perform several calculations and analyses with the end goal of determining whether or not the given nucleotide sequence represents a protein. The results will be output to a file, not to the console.
II.
Details
Behavior
i. Program Operation
Your program should being by welcoming the user and providing a brief description of the computations and analysis the program will perform. You will then prompt the user for an input file and an output file (see below for required file formats). For each nucleotide sequence in the input file, your program will compute and output the following:
● the number of each nucleotide (A, C, G, T) in the sequence
● the percentage of the sequence’s total mass accounted for by each nucleotide
Page 1 of 4
AP Computer ScienceMr. Brett Wortzman
12/16/2014
Issaquah High School
● the list of codons present in the sequence
● whether or not this sequence represents a protein (according to our rules)
For our purposes, a nucleotide sequence is a protein gene if:
●
●
●
●
it begins with a valid start codon (ATG), it ends with a valid stop codon (TAA, TAG, or TGA), it contains at least 5 codons total (including the start and stop codons), and
Cytosine (C) and Guanine (G), combined, account for at least 30% of the sequence’s mass
Note that these are not the actual constraints used by computational biologists to identify proteins; they are approximations for our assignment.
The masses for each nucleotide, used for calculating the mass percentages, are as follows:
ii. Input File Format
Input files for your DNA program will consist of a series pairs of lines. The first line in each pair will be a name, and the second will be a nucleotide sequence. You can assume that all input files will contain an even
short section of DNA, every strand of DNA has it's own genetic code which is a bunch of genes put together to form a genetic code. And the chromosomes are made from long DNA molecules. So when the genes are all together they form a chromosome. The relationship between genes and DNA molecules is: · A gene is a short section of DNA. The genes that form the DNA make up a certain genetic code, which determines the characteristics of a living thing. #2. The Human Genome Project was said to be so…
the organism as a whole. This course will also include information on genetics as it relates to heredity as well as discussions on how DNA contains all of the information for creating a living organism. We will also be discussing Biology on a larger scale with topics such as ecology and evolution. There will also be a variety of laboratory exercises and projects which are designed to emphasize the information given in this course. Most importantly, this course is designed to stimulate interest…
Lecture 9 Bioinformatics Describe content of Genbank & how sequences are annotated within Genbank Perform a BLAST search w/ a DNA sequence & interpret the results PCR, part 1 Understand the theory of PCR & how it compares to DNA replication in vivo Know the reagents & steps of PCR & their purposes Bioinformatics Bioinformatics is a term used to describe the databases & computational tools used in biology, w/ an emphasis on molecular biology. A biological database is a large, organized body of data…
papers published simultaneously, the five-year ENCODE project reports the mapping of more than four million regulatory sites across the human genome. Like 0 Tw eet 11 0 2 In an effort that rivals the original human genome project in scale and scope, researchers from around the world have been collaborating for the past five years to understand the non-coding regions of the human genome—the more than 95% of the genome that’s been dubbed “junk DNA” in the past. Now, with the simultaneous publication…
1, early onset.” BRCA1 is the gene's official symbol. 2. Describe the “normal” function of the gene. The BRCA1 gene belongs to a class of genes known as tumor suppressor genes. Like many other tumor suppressors, the protein produced from the BRCA1 gene helps prevent cells from growing and dividing too rapidly or in an uncontrolled way. The BRCA1 gene provides instructions for making a protein that is directly involved in repairing damaged DNA. In the nucleus of many types of normal cells, the BRCA1…
misidentification which has been referred to as the single greatest cause of wrongful convictions nationwide, with nearly 75% of the convictions overturned through DNA testing. There have been 260 exonerations across the country based on forensic DNA testing with 3 out of 4 involving cases of eyewitness misidentification. (Innocence Project 1999) In 1907 or 1908, Hugo Munsterberg published “On the Witness Stand”; he questioned the reliability of eyewitness identification. As recent as 30 or 40 years…
DNA PROJECT By Julia BRown DNA STRUCTURE The DNA molecule is a long polymer / chain of repeating units. The units are called monomers nucleotides that make up DNA. In the nucleotides there’s phosphate, sugar and a base. There are 4 types of bases in DNA which are the thymine, adenine, cytosine, and guanine. DNA is genetic material that identical molecules could not carry different instructions across all organisms. The same 4 bases are found in the DNA of all organisms but the proportion differs from organism to organism…
Strawberry DNA Extraction Lab by C. Kohn, WUHS and Stacy Fritz, NAAE Name: Hour Date: Date Assignment is due: At end of hour Why late? Score: /10 Day of Week Date If your project was late, describe why Materials measuring cup measuring spoons ice-cold rubbing alcohol 1/2 teaspoon salt 1/3 cup water 1 tbsp Dawn dishwashing detergent cheesecloth Re-closable plastic sandwich bags test tube (or small glass jar) 3 strawberries (green…
modifications and additions distributed across genome, which effect gene expression without changing the core genomic sequence of DNA (Handel et al, 2009). The NIH has set up the ‘Epigenetic Roadmap’ project which is being used to sequence Epigenetic maps which will show the variation of epigenetic tags across the genome. There has been a lot of money invested in this project as it is thought to be essential for understanding developmental, environmental and hereditary aspects of disease. Moreover the…
DNA Informational Essay The purpose of the articles, “DNA Testing” and “The Evolution of Criminal Investigation and Forensic Science” is to explain a number of ways DNA is used to solve crimes. For example, “Since the advent of DNA Testing in 1985, biological material (skin, hair, blood,and other bodily fluids) has emerged as the most reliable physical evidence at a crime scene, particularly those involving sexual assaults.” DNA has been used many times to convict criminals. First, one of…