Blat pipeline: sequence mapping analysis
From GramenePublic
Author: Sharon Wei
Summary
This page documents the mapping process of gramene markers db sequences to gramene genomes using Gramene blat pipeline .
Notes
For each sequenced genome, a subset of unmapped markers db dna sequences are grouped into four mapping categories, "same species coding", "cross species coding", "same species genomic", "cross species genomic". "same species coding" contains expressed sequences from the same genus as the targeting genome. "cross species coding" contains expressed sequences from different genus from the targeting genome. "same species genomic" contains genomic sequences from the same genus as the targeting genome. "cross species genomic" contains genomic sequences from different genus from the targeting genome. Each category has different mapping parameters.
The mapping pipeline has 4 analysis run in serial, "blat", "plsSort", "plsReps", "GrameneFilter". The "blat" analysis generates local alignments in psl format, which are then fed to "pslSort" for sorting. Next, "pslReps" analyzes repeats and generates genome wide best alignments from this sorted set of local alignments. Finally, "GrameneFilter" goes through the alignments, keeping the top 10 hits for the repetitive features and converting the mappings to a format loadable to markers db. The following describes the different parameters for each category of input sequences.
same_species_coding
blat -minScore=120 -maxIntron=20000
psl_reps -minAli=0.96 -nearTop=0.005
cross_species_coding
blat -minIdentity=50 -maxIntron=20000
psl_reps -minAli=0.1
same_species_genomic
blat -minScore=160
psl_reps -nearTop=0.02 -minCover=0.60 -minAli=0.85 -noIntrons
cross_species_genomic
blat -minIdentity=50
psl_reps -minAli=0.4 -noIntrons
<references/>

