There are multiple methods to identify the protein coding portions of a gene. Ab initio, meaning “from the beginning,” methods use general rules about coding versus non-coding regions to predict the structure of new genome sequences with no given information. On the other hand, homology-based methods give a more reliable interpretation of an unknown gene, matching the gene to known sequences to predict its structure. The unknown gene is matched to expressed sequence tags (ESTs), short sequences derived from cDNA clones; ESTs that perfectly or almost perfectly match the unknown can then be integrated to create a consensus sequence called a “contig.”
The function of an unknown gene can also be predicted thorough research. Because there is such an extensive library of sequenced genomes, there is almost always a close sequence match when comparing an unknown gene; however, the function of these genes are still a mystery. Through bioinformatics, genomics data is accessed and similar DNA and protein sequences are matched to the unknown. By exploring the types of organisms the unknown sequence matches with, the conserved domains among the matches, and the functions of the related proteins, the unknown protein’s function can be hypothesized.
Comments
comment
The first two sentences I feel need work. The first sentence is to vague and the second is very confusing.
Suggestion
Maybe break up this sentence as it seems to run on, "The unknown gene is matched to expressed sequence tags (ESTs), short sequences derived from cDNA clones; ESTs that perfectly or almost perfectly match the unknown can then be integrated to create a consensus sequence called a “contig.”