AHN Lab 4
Bioinformatics and Library Approaches to Interpreting Gene Function
Lila Hoachlander-Hobby and Ellie Davis
Date: April 10, 2017
Bioinformatics merges genomic research with information technology to create computer databases for scientists to better understand genes and their functions. It encompasses analysis of data, modeling of biological systems, algorithms, and statistics (Thampi 2003). It has been a revolutionary scientific advancement, leading to breakthrough research like the Human Genome Project, which could not have been done without bioinformatics' rapid gene sequencing abilities. One of the most prevalent databases for this kind of work in relation to Arabidopsis thaliana is The Arabidopsis Information Resource, or TAIR, locus page. This database has the entire genome of Arabidopsis thaliana on record, including the sequence, structure, and patterns of expression, and a plethora of other useful information. A more general, widely used database is NCBI's basic local alignment search tool, or BLAST. This program allows scientists to find similarities between experimental and reference sequences, and it can be used to identify proteins, targeted genomic sections, and RNA and DNA nucleotide sequences (Porterfield 2014). In this lab we used the TAIR and BLAST programs, along with online library research to learn more about our gene's function and how it contributes to protein expression in Arabidopsis thaliana.
Materials and Methods:
Basic Local Alignment Tool (BLAST):3/9/17
The purpose of this lab is to compare our unknown gene to the sequences of genes with known functions, with the purpose of identifying conserved sequences that would allow us to make conclusions about our gene function. This lab is an example of bioinformatics because we used publically available software and library resources to identify homologous sequences in our gene and protein sequence. We used TAIR and NCBI BLAST This is important because it will give us information on how and where our gene may function. The methods were followed as outlined in the lab manual. We were instructed to make a TAIR account. Our username is AHNGene and our password is EllieLila.
Library Research: 3/28/17
The purpose of this lab is to use library resources to understand our protein family. We used NCBI BLAST to locate conserved domains, and then investigated our protein with the European Bioinformatics Institute. This lab is important because it allowed us to make several important conclusions about our gene and protein, and we were able to identify our protein’s superfamily, which will allow us to do further research. The lab was followed as stated in the lab manual.
Representing Sequence Similarity Data: 3/30/17
The purpose of this lab was to finalize our predictions about our gene and protein putative function. The information gleaned from the last lab, namely that our protein was part of the Cupredoxin superfamily and was very homologous to Ascorbate Oxidase, allowed us to perform this lab. We used the pubmed NCBI database to search for cupredoxin in plants. This led us to a paper: Crystal structure of plantacyanin, a basic blue cupredoxin from spinach by Einsle et al. This paper helped us understand the putative structure and function of our protein because of its homology to this known gene. After reading this paper, we decided to search for a textbook excerpt to better understand our protein. We used our homology with Ascorbate Oxidase to guide our search, which we performed through the Umass Library. We found Albrecht Messerschmidt’s textbook: Bioinorganic Chemistry of Copper, and the chapter called Ascorbate Oxidase Structure and Function. We used this information to construct our figures, in addition to another paper found through NCBI, Multi-copper oxidases and human iron metabolism by Vashchenko et al.
Restults from TAIR Locus page
Using the TAIR program, we found that our gene is involved in cellular oxidation-reduction processes and is part of the cupredoxin superfamily (Figure 1). This means that it codes for a protein that aids in copper ion bonding of multicopper oxidases, which oxidize water into oxygen gas using copper as a cofactor. We also found that it is expressed during the embryo globular stage and the protein functions extracellularly in the cell wall or in the plasmodesma. Our gene has 8 exons, or coding regions, and its locus is AT1G21850.
By using the NCBI BLAST program, we were able to see a graphical representation of how homologous our gene is to genes of other organisms (Figure 2). Based on the graph, segments of our gene seemed to be highly conserved, because there are many red segments, which have an alignment score of greater than 200. The second, third, fourth, and fifth columns math up with exons 3,5,6, and 8 respectively. The first column has lower alignment scores, so we will not focus on those hits. Overall, we found that the second column had sequences that matched sequences on chromosomes 1,4, and 5 of Arabidopsis's five chromosomes and columns 3 and 5 had sequences that matched chromosomes 1 and 5. This is useful information to know when picking primers because primers picked in these areas would likely hybridize to multiple positions in the Arabidopsis genome, so they would not be useful to use for future PCR.
Examples of homologous organisms (according to gene sequence):
Arabidopsis lyrata, locus: XM_002893140, 96% identity
Capsella Rubella, locus: XM_006304305, 92% identity
Raphanus Sativus, locus: XM_018578444, 87% identity
For this portion of the lab, we also used NCBI BLAST, but instead for protein sequences instead of gene sequences. The results of the search show that our protein is likely part of the cupredoxin super family, and is highly conserved among other organisms (Figure 3).
Examples of homologous organisms (according to protein sequence):
Arabidopsis lyrata locus: XP_002893186
Eutrema Salsugineum locus: XP_006416260
Camelina Sativa locus: XP_010480557
Library Research/ our Data Representation
Based on NCBI results, our gene likely has homologous functions to ascorbate oxidase, an enzyme that catalyzes the oxidation of ascorbate in times of oxidative stress (Batth 2017). Ascorbate oxidase is often found in the cell wall or cytoplasm, so the protein that our gene codes for would likely be found in the same regions because they have homologous functions (Messerschmidt A.) (Figure 5). We know that our gene is part of the cupredoxin superfamily, which will help us learn more about its function. Cupredoxins are small, blue copper proteins that have four copper centers (Kosman 2010) (Figure 4). A subset of cupredoxins are enzymes called multicopper oxidases (MCO), which play essential roles in the physiology of all aerobes (Kosman 2010). When copper binds to the four sites of MCOs it acts as a cofactor, allowing the enzyme to catalyze a redox reaction, resulting in two molecules of water, using 4 electrons (Kosman 2010)(Figure 6). This family of enzymes play a large role in management of oxidative stress and maintenance of homeostasis in aerobes.
The high level of homology and conservation we found between our gene and its protein and other plant genes and proteins indicate that our gene is important. Additionally, the homology is mostly limited to plant species. Our gene is part of the cupredoxin superfamily, which reduces Cu2+ ions to Cu+ to avoid copper toxicity. Cupredoxins are also implicated in maintaining oxygen and metal homeostasis and managing oxidative stress. After further research, we discovered that our protein is highly homologous to Ascorbate Oxidase (AAO). This particular cupredoxin is implicated in healthy growth, defense, cell wall formation, and stress responses. Some studies indicated that AAO is particularly important in pollen tube and early embryo development. With this information in mind, we hypothesize that our mutant plant may not be able to develop healthy offspring, may not be able to handle oxidative stress, succumb to metal toxicity, and have difficulty forming healthy cell wall. Additionally, we hypothesize that our protein may be localized to the cell wall, which was further indicated by the Einsle et al. paper. We also hypothesize that our gene will be highly expressed during embryogenesis, pollen tube development, and during stress responses.
"Blank Plant Cell Diagram 2." Piper Pages. Altoona Area Junior High, n.d. Web. 30 Mar. 2017.
Messerschmidt A. Ascorbate oxidase structure and chemistry. Journal of Inorganic Biochemistry. 1992;47(3):23. http://www.sciencedirect.com/science/article/pii/0162013492840944. doi: 10.1016/0162-0134(92)84094-4.
Porterfield A. How does BLAST work? Bitesize Bio 2014 July 23. http://bitesizebio.com/21223/how-does-blast-work/
Vashchenko Ganna, MacGillivary Ross. Multi-copper oxidases and human iron metabolism. Nutrients. 2013;5(7):2289-2313. http://www.mdpi.com/2072-6643/5/7/2289/htm.
Thampi S. Bioinformatics. 2003. https://arxiv.org/pdf/0911.4230.pdf
Kosman DJ. Multicopper oxidases: A workshop on copper coordination chemistry, electron transfer, and metallophysiology. Journal of Biological Inorganic Chemistry : JBIC : A Publication of the Society of Biological Inorganic Chemistry 2010 Jan;15(1):15-28.
Batth R, Singh K, Kumari S, Mustafiz A. Transcript profiling reveals the presence of abiotic stress and developmental stage specific ascorbate oxidase genes in plants. Frontiers in Plant Science 2017 Feb 17,.