You are here

Draft: Lab 2 Introduction

Submitted by aspark on Wed, 03/20/2019 - 14:43

There are multiple methods to identify the protein coding portions of a gene. Ab initio, meaning “from the beginning,” methods use general rules about coding versus non-coding regions to predict the structure of new genome sequences with no given information. On the other hand, homology-based methods give a more reliable interpretation of an unknown gene, matching the gene to known sequences to predict its structure. The unknown gene is matched to expressed sequence tags (ESTs), sequences derived from cDNA clones; however, the cDNA is already shorter than the mRNA it is a copy of, and the EST contains errors when sequenced from its cDNA. ESTs that perfectly or almost perfectly match the unknown can then be combined based on overlapping regions to create a consensus sequence called a “contig.” Contigs can then be compared to the full-length cDNA of the gene to determine which consensus sequence matches closely.

The function of an unknown gene can also be predicted through thorough research. Because there is such an extensive library of sequenced genomes, there is almost always a close sequence match when comparing an unknown gene; however, the function of these genes are still a mystery. Predicting the function of an unknown gene usually starts with bioinformatics, where computer software is used to access genomics data and match similar DNA and protein sequences to the unknown. Information on these related proteins can then be further researched through online and physical libraries to predict the function of the unknown.

 

Post: