jagomart
digital resources
picture1_Pairwise Sequence Alignment Slideshare 68136 | Lesson4 3 Alignments Blast Beyond


 206x       Filetype PPTX       File size 1.59 MB       Source: www.bx.psu.edu


File: Pairwise Sequence Alignment Slideshare 68136 | Lesson4 3 Alignments Blast Beyond
finds seeds and extend blast heuristics for efficient computation of near optimal alignments 08 28 2022 2 alignment method needs to fit the problem part 1 problem features method example ...

icon picture PPTX Filetype Power Point PPTX | Posted on 28 Aug 2022 | 3 years ago
Partial capture of text on file.
      Finds seeds and extend: Blast
      HEURISTICS FOR EFFICIENT 
      COMPUTATION OF NEAR-OPTIMAL 
      ALIGNMENTS
    08/28/2022                                          2
            Alignment method needs to fit the problem, part 1 
        Problem                    Features                    Method                     Example of program
        Pairwise alignment of      Moderate size               Dynamic                    Needleman-Wunsch 
        proteins or genes          (hundreds of letters),      programming, find          (needle in 
                                   similar throughout          optimal global             EMBOSS/Galaxy)
                                                               alignment
                                   Moderate size               Dynamic                    Smith-Waterman 
                                   (hundreds of letters),      programming, find          (water in 
                                   subsequences similar        optimal local              EMBOSS/Galaxy)
                                                               alignment
        Find a match between       Query sequence could  Heuristic approach;              Blast family of 
        a query sequence and       be hundreds of letters,  find seeds (hits) and         programs; FastA 
        a database                 database has >100M          extend; local              (NCBI)
                                   entries                     alignments
        Find a match between       Query is 25 or more         Heuristic approach,        Blat (UCSC Genome 
        a query sequence that      nucleotides, genome         find and extend seeds,  Browser)
        is part of a large         can be 3 billion            but engineered to be 
        genome                     nucleotides                 very fast
        Align short reads to a     10’s to 100’s of million    Employ the                 Bowtie or bwa, both 
        genome                     reads, find best match      Burroughs-Wheeler          implemented in 
                                   in an assembled             transform for efficient    Galaxy
                                   genome                      alignments
        08/28/2022                                                                                                3
                       Heuristic algorithms
    •  Rapid heuristic algorithms, like FASTA and blast, do not 
       examine all possible paths for local alignments
    •  Are much faster than the rigorous Smith-Waterman algorithm 
       because they examine only a portion of the potential 
       alignments
    •  Can produce results of similar quality in many cases
    •  Ideal for searching large databases of sequences and for 
       aligning long sequences
     08/28/2022                                                         4
                    Basics of blast algorithm
    •  Blast first scans the database for words (short strings of amino 
       acids or nucleotides) that score at least T when aligned with 
       some word in the query sequence. The aligned word pairs are 
       hits.
        – Typically a word is 3 amino acids or 10 nucleotides
        – T is the significance threshold (Karlin and Altschul, 1990)
    •  Blast then checks whether each hit lies within an alignment 
       with a score sufficient to be reported.
        – Extend the hit in both directions, until the running alignment score has 
           dropped more than X below the maximum score yet attained.
        – These extended alignments are high scoring segment pairs, or HSPs.
        – Extension step accounts for >90% of the execution time. 
     08/28/2022                                                         5
                    Improvements in blast2
     •  Fewer extensions: require 2 hits (aligned word pairs) on a 
        diagonal before extending 
         – Extension of each aligned word pair generates an HSP
     •  Introduce gaps
         – For HSPs (high scoring segment pairs) above a certain threshold, 
           trigger a gapped extension
             • Set threshold so about 1 gapped extension is executed per 50 database 
               sequences
         – Gapped extensions are costly in time, but improve sensitivity
         – Run the extension until alignment score drops by more than X below 
           the maximum score yet obtained
         – Report gapped extension if the E-value is low enough to be of interest
     08/28/2022                                                        6
The words contained in this file might help you see if this file matches what you are looking for:

...Finds seeds and extend blast heuristics for efficient computation of near optimal alignments alignment method needs to fit the problem part features example program pairwise moderate size dynamic needleman wunsch proteins or genes hundreds letters programming find needle in similar throughout global emboss galaxy smith waterman water subsequences local a match between query sequence could heuristic approach family be hits programs fasta database has m ncbi entries is more blat ucsc genome that nucleotides browser large can billion but engineered very fast align short reads s million employ bowtie bwa both best burroughs wheeler implemented an assembled transform algorithms rapid like do not examine all possible paths are much faster than rigorous algorithm because they only portion potential produce results quality many cases ideal searching databases sequences aligning long basics first scans words strings amino acids score at least t when aligned with some word pairs typically signif...

no reviews yet
Please Login to review.