Journal of Oil Palm Research (Special Issue - April 2008), p. 35-43 WILLIS, Laura B * ; LESSARD, Philip A* ; PARKER, Jefferson A* ; O'BRIEN, Xian M* ; SINSKEY, Anthony J *
Recent advances in DNA sequencing technologies have led to a tremendous increase in the amount of sequence information available in public databases. To address the need for automated methods of assigning a putative function to each sequence, we have developed bioinformatics tools that can be run on a desktop computer and save significant time and effort. Elaeis guineensis and Elaeis oleifera sequences were downloaded from PalmGenes and GenBank, and duplicate entries were eliminated by pairwise BLAST searches, resulting in a collection of unique oil palm sequences which we call the UniPalm dataset. We applied the CAPASA (Consensus Annotation by Phrase Anchored Sequence Alignment) software and automatically assigned functions to 5600 oil palm sequences in less than 8 hr. CAPASA mimics the human decision-making process by factoring in the degree of homology, taxonomic relationship and informational value when choosing a name. In addition, we applied COGsensus to place the UniPalm sequences into COG (Clusters of Orthologous Groups of genes) categories, and compared these results to a COGsensus analysis of the rice genome. COG classification is a homology-based method for distinguishing gene sets, particularly with regard to closely related genes found in different organisms. Our results indicate that the diversity of COG groups are well represented in the UniPalm set.KEYWORDS:
*Department of Biology, Massachusetts Institute of Technology,
77 Massachusetts Avenue, Cambridge, MA 02139, USA.
Journal of Oil Palm Research Special Issue on Malaysia-MIT Biotechnology Partnership Programme: Volume 1-Oil Palm Tissue Culture