Blat pipeline: sequence mapping analysis

From GramenePublic

Jump to: navigation, search

Author: Sharon Wei

Summary

This page documents the mapping process of gramene markers db sequences to gramene genomes using Gramene blat pipeline .

Notes

For each sequenced genome, a subset of unmapped markers db dna sequences are grouped into four mapping categories, "same species coding", "cross species coding", "same species genomic", "cross species genomic". "same species coding" contains expressed sequences from the same genus as the targeting genome. "cross species coding" contains expressed sequences from different genus from the targeting genome. "same species genomic" contains genomic sequences from the same genus as the targeting genome. "cross species genomic" contains genomic sequences from different genus from the targeting genome. Each category has different mapping parameters.

The mapping pipeline has 4 analysis run in serial, "blat", "plsSort", "plsReps", "GrameneFilter". The "blat" analysis generates local alignments in psl format, which are then fed to "pslSort" for sorting. Next, "pslReps" analyzes repeats and generates genome wide best alignments from this sorted set of local alignments. Finally, "GrameneFilter" goes through the alignments, keeping the top 10 hits for the repetitive features and converting the mappings to a format loadable to markers db. The following describes the different parameters for each category of input sequences.


same_species_coding

    blat     -minScore=120 -maxIntron=20000
    psl_reps -minAli=0.96 -nearTop=0.005


cross_species_coding

    blat     -minIdentity=50 -maxIntron=20000
    psl_reps -minAli=0.1
    

same_species_genomic

   blat     -minScore=160 
    psl_reps -nearTop=0.02 -minCover=0.60 -minAli=0.85 -noIntrons

cross_species_genomic

    blat     -minIdentity=50
    psl_reps -minAli=0.4 -noIntrons






<references/>

Personal tools