You are here

aspark's blog

Draft: Lab 2 Results III

Submitted by aspark on Tue, 03/26/2019 - 16:02

In A. thaliana, the SRF genes code for LRR receptor-like kinases (LRR-RLK), although the functions of SRF1-9 vary (Eyüboglu). SRF7 specifically showed increased expression in plants treated with brassinosteroids, hormones that stimulate plant growth. SRF7 also displayed a strong association with with proteins involved in cell wall biogenesis, and it is hypothesized to play a role in cell wall production (Eyüboglu). Additionally, the expression of SRF7 was found to have a positive correlation with leaf size. A loss of function in the SRF7 gene resulted in smaller leaves, and an overexpression of SRF7 resulted in overly large leaves (Baute). SRF7 was also identified as a kinase that was downregulated in plants upon treatment with methylglyoxal, a toxic substance known for inhibiting the growth and development of plants (Kaur).

Bradi1g72430 was identified as a protein kinase phosphorylated during the growth and development of seedlings (Lv). Bradi1g72430 is also co-expressed with phenylalanine ammonia-lyase (PAL), a enzyme essential to lignin production (Harrington). Lignin is a polymer that is essential to create cell wall rigidity, and its hydrophobicity allows plants to effectively transport nutrients and water with minimal absorption. Lignin is deposited in the secondary cell wall of all vascular plants, and PAL catalyzes the reaction that produces the lignin monomers (Zhao).

 

Draft: Lab 2 Results II

Submitted by aspark on Tue, 03/26/2019 - 11:28

The Protein BLAST of Bradi1g72430 resulted in matches that covered the majority of the query sequence. The majority of the matches were to SRF7 and SRF6 in various species, and there was a 100% query cover and identity match to the SRF7 protein in B. distachyon. When matches were searched for in the Brachypodium genome, the >90% query cover matches were mostly to SRF7, SRF8, and SRF3 proteins. The rest of the sequences only matched to the protein kinase domain around 400 residues in. The Bradi1g72430 protein sequence search in plant genomes resulted in >90% query cover matches to SRF7 and SRF6 proteins in different species. When searched for in non-plant species, there were only matches to protein kinases.

There were two generally conserved domains shown in BLAST: leucine-rich repeats (LRRs) roughly from residues 40-250 and the kinase domain with an ATP binding site roughly from residues 420-700. When the gene was searched in the European Bioinformatics Institute website, two homologous superfamilies were listed: leucine-rich repeat domain superfamily from residues 38-265 and the protein kinase-like domain superfamily from residues 402-690. Protein kinases modify proteins by adding phosphate groups to them, and LRRs are 20-29 residue-long structures primarily involved in protein-protein interactions.

Draft: Lab 2 Results I

Submitted by aspark on Tue, 03/26/2019 - 00:48

The Phytozome locus page for the unknown gene described the product as a leucine-rich repeat protein kinase in the LRR-V subfamily. On the UniProt page, several preliminary predictions were shown: Transmembrane receptor protein serine/threonine kinase activity and ATP binding were listed as functions, and protein phosphorylation was listed as the biological process. The plasma membrane was listed as the cellular location.

    The Nucleotide BLAST searching for DNA-level similarities resulted in one match with more than five exons within the coding region; the mRNA sequence for Brachypodium distachyon protein SRF7 (XM_003558607.4) had 12 exons and was the 12th sequence from the top in the list (Figure 3). There was also a short sequence roughly from bases 4060-4340, within exon 11 of Bradi1g72430, that was present in every match. All observed matches were from monocot plant species, including Oryza sativa Japonica, Brachypodium distachyon, Zea mays, and Triticum aestivum.

Draft: Proposal Methods

Submitted by aspark on Sun, 03/24/2019 - 18:16

Each group will be assigned a unique area of the UMass campus. Each group will then use the Frank A. Waugh Arboretum at UMass Amherst website to identify the species of trees that produce fruit available in their area. We will note the number of trees within each species and go on site to count the number of fruit on all or, if there are too many trees of that species, 10 trees. This information about the kinds, number, and fruit of the species in each area will be shared with the whole class. The average number and standard deviation of fruit per tree will be calculated for each species.

 

Each group will also be assigned a species of songbird that is known to migrate through Massachusetts, and each group will research what kinds of fruits their songbird consumes. Based on the density of tree species and their fruiting ability within each area, we will determine which area(s) the bird will be attracted to. We will also determine what kinds of trees need to be planted where in order for the songbird to be present throughout campus.

 
 

PP: Lab 2 Introduction

Submitted by aspark on Thu, 03/21/2019 - 19:33

There are multiple methods to identify the protein coding portions of a gene. Ab initio, meaning “from the beginning,” methods use general rules about coding versus non-coding regions to predict the structure of new genome sequences with no given information. On the other hand, homology-based methods give a more reliable interpretation of an unknown gene, matching the gene to known sequences to predict its structure. The unknown gene is matched to expressed sequence tags (ESTs), short sequences derived from cDNA clones; ESTs that perfectly or almost perfectly match the unknown can then be integrated to create a consensus sequence called a “contig.”

 

The function of an unknown gene can also be predicted thorough research. Because there is such an extensive library of sequenced genomes, there is almost always a close sequence match when comparing an unknown gene; however, the function of these genes are still a mystery. Through bioinformatics, genomics data is accessed and similar DNA and protein sequences are matched to the unknown. By exploring the types of organisms the unknown sequence matches with, the conserved domains among the matches, and the functions of the related proteins, the unknown protein’s function can be hypothesized.  

 

Draft: Lab 2 Introduction

Submitted by aspark on Wed, 03/20/2019 - 14:43

There are multiple methods to identify the protein coding portions of a gene. Ab initio, meaning “from the beginning,” methods use general rules about coding versus non-coding regions to predict the structure of new genome sequences with no given information. On the other hand, homology-based methods give a more reliable interpretation of an unknown gene, matching the gene to known sequences to predict its structure. The unknown gene is matched to expressed sequence tags (ESTs), sequences derived from cDNA clones; however, the cDNA is already shorter than the mRNA it is a copy of, and the EST contains errors when sequenced from its cDNA. ESTs that perfectly or almost perfectly match the unknown can then be combined based on overlapping regions to create a consensus sequence called a “contig.” Contigs can then be compared to the full-length cDNA of the gene to determine which consensus sequence matches closely.

The function of an unknown gene can also be predicted through thorough research. Because there is such an extensive library of sequenced genomes, there is almost always a close sequence match when comparing an unknown gene; however, the function of these genes are still a mystery. Predicting the function of an unknown gene usually starts with bioinformatics, where computer software is used to access genomics data and match similar DNA and protein sequences to the unknown. Information on these related proteins can then be further researched through online and physical libraries to predict the function of the unknown.

 

Draft: Part 5 of Lab 2 Methods

Submitted by aspark on Wed, 03/20/2019 - 12:41

Our unknown gene’s protein sequence was compared for similarities to other proteins by performing a Protein BLAST with the same algorithm parameters as the previous search. The “nonredundant protein” database was searched. This search strategy was saved. In the output, matches that seem false were noted. Back at the blastp page, the searches were sequentially limited to: Brachypodium, plants, animals, insects, mammals, fungi, bacteria, archaebacteria, saving the search strategies every time. The extent to which our protein appears among living organisms was noted.

 

Library research of the unknown gene was performed, first by expanding the descriptions of the conserved domains on the NCBI Protein BLAST output. The Phytozome predicted amino acid sequence was then entered into the European Bioinformatics Institute website, and information and publications were found there. Further research was performed on the Strubbelig-Receptor Family 7 (SRF7) and 6 (SRF6) that were highly similar to our unknown gene.

 

Draft: Part 4 of Lab 2 Methods

Submitted by aspark on Tue, 03/19/2019 - 22:13

On the Phytozome locus page for the unknown gene, the “UniProt” icon was selected. There, we learned more about the gene’s predicted biological process, cellular component, and molecular function. Under Function, “View the complete GO annotation on QuickGO ...” was selected and studied as well.

 

Our unknown gene’s DNA-level similarity to other genes was compared by performing a Nucleotide BLAST with the “Nucleotide collection (nr/nt)’ database and the BLASTN algorithm. The algorithm parameters were set: Max target sequences to 1000 and E threshold to 1. This search strategy was saved. In the output, repetitive sequences in noncoding regions, sequences that match all or most of our gene’s exons, and strongly conserved regions were noted. The species the matching sequences came from were also noted.

 

Draft: Part 3 of Lab 2 Methods

Submitted by aspark on Tue, 03/19/2019 - 19:27

Next, a prediction of the protein sequence was made using the BLAST tool on the Phytozome website with Brachypodium distachyon set as the target. The best match shown in blue under the Query View was selected, and the protein sequence predicted from that Bradi1g72430 gene was viewed and saved in a new text file named “AHP-Phytozome-ProteinSequence.” Back in the Phytozome genome browser, “Log-Scale RNA-Seq Coverage” was selected on the left to view the normalized number of times a sequence was counted. This diagram was saved as “AHP-Phytozome-RNA-seq.”

 

The Working Map File was altered to factor in all the learned information. Positions of the exons and introns, the beginning and end of the coding sequence, and the poly-A tail were marked using the FGENESH output. Additionally, restriction enzyme recognition sites of 6 base pairs or more were marked. The predicted protein sequence was also highlighted, deleting all frames that were incorrect and all translations in noncoding regions.

 

Draft: Part 2 of Lab 2 Methods

Submitted by aspark on Tue, 03/19/2019 - 15:29

The Basic Local Alignment Tool (BLAST) on the National Center for Biotechnology Information (NCBI) website was used to find expressed sequence tags (ESTs) that match the sequence of our unknown gene. A Nucleotide BLAST of the unknown sequence was performed, setting the database to “expressed sequence tag (est)” and the organism to “Brachypodium distachyon.” The sequences that had an Identity number of at least 95%, meaning they perfectly or near perfectly matched cDNAs, were selected and saved in a new file as “Unknown-AHP-ESTs-fasta.txt.” Consensus sequences (contig) of the ESTs were formed using the CAP3 web server at the Pasteur Institute. The contigs were saved in a new text file named “AHP-CAP3-Contigs,” and the sequences of ESTs that were not contig’ed were saved in another text file named “AHP-CAP3-SingleSequences.” A full-length cDNA of the unknown sequence was then found by performing a Nucleotide BLAST with the “Nucleotide collection (nr/nt)’ database instead. Once a cDNA sequence for the gene was found, it was compared to the contigs from CAP3 through a Nucleotide BLAST. The cDNA sequence was pasted as the query sequence, the contigs were pasted as the subject sequence, and the “Align two or more sequences” box was selected. The resulting comparison was saved as a new file named “AHP-Contig+cDNA-fasta.txt.”

Pages

Subscribe to RSS - aspark's blog