Gramene Standard Analyses - Core Genome
From GramenePublic
Author: --Whs 09:45, 22 September 2009 (UTC)
This page provides details of the standard analyses that Gramene runs against each core genome database in its collection. Our annotation workflow is adapted from the Ensembl pipeline.
Contents |
Repeat Finding
Gramene uses the MIPS reDAT plant repeat library with RepeatMasker to identify complex repeats. DUST and TRF are run as separate analyses to identify low-complexity and tandem repeats respectively.
Genomic Alignment of Public Sequences
The Gramene Markers database houses a large collection of DNA/mRNA sequences from GenBank, UniGene, Gene Indices and others. Sequences of the same or related species to the genome are aligned using BLAT, and the results stored in the Ensembl database. For some sequence sets, such as UniGene and Gene Index EST clusters, the entire collection regardless of species is aligned.
Ab-Inito Gene Prediction
Gramene uses the FGENESH program, run from the Ensembl Pipeline infrastructure, to annotate genomes with ab-initio gene predictions.
Protein Domain Prediction on Genes
Gramene uses InterProScan software, run from the Ensembl Pipeline framework, to annotate gene translations with functional domains including Pfam, PIRSF, PRINTS, PROSITE, SMART, SUPERFAMILY, and TIGRFAM. InterProScan also assigns InterPro identifiers to the translations. Additionally, NCOILS, SignalP and SEG are run to detect coiled-coils, signal proteins and low-complexity regions respectively.
Database Cross-References between Genes and External IDs
The Ensembl XRef pipeline is used to assign cross-references between our genes/transcripts/translations and identifiers in 3rd party databases. XRefs can be inferred based on sequence similarity or through shared identifiers. Multi-species databases include Entrez Gene, PlantGDB PUTs, RefSeq, Gene Index, UniGene, UniProt and WikiGenes. Single species databases include, for Arabidopsis thaliana, TAIR and NASC, and for Oryza sativa, RAP-DB, MSU rice and BGI-RIS.

